# Online Hybrid Neural Network for Stock Price Prediction: A Case Study of High-Frequency Stock Trading in the Chinese Market

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Data and Materials

#### 2.1. High-Frequency Limit Order Book Data

#### 2.2. LOBs in the Chinese Market

## 3. Methods

#### 3.1. Problem Statement

#### 3.2. Recurrent Neural Network (RNN)

#### 3.3. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU)

#### 3.4. Transformer

#### 3.5. Online LGT (O-LGT)

#### 3.5.1. Sweep Operator

#### 3.5.2. O-LGT Framework

- Pre-process the data, ensuring there are no None values in the input data, and then transfer the input data into tensors using
`torch.as_Tensor()`; - Create a class inherited from
`torch.nn.Module`; initialize it with the GRU, LSTM, and transformer layers; write the forward function according to Figure 5; - Initialize the PyTorch optimizer and scheduler according to Table 2; Choose the loss function
`nn.MSELoss()`and finalize the training.

- Input layer: ${X}_{T}$ (matrix of observed LOB features for the previous T moments);
- LSTM layer: output ${U}_{T}$ (capture important information from the input data);
- GRU layer: output ${V}_{T}$ (process in a condensed way to prevent overfitting);
- Transformer layer: output ${\mathbf{w}}_{T}$ (incorporating interaction effects between different time steps);
- Concatenation: combine transformer output with time length information;
- Linear regression layer: make final predictions ${y}_{T}$.

- Input layer: ${\mathbf{x}}_{T}$ (vector of observed LOB features at time T) and ${\mathbf{u}}_{t-1}$ (previous LSTM layer output);
- LSTM layer: output ${\mathbf{u}}_{t}$ (updated LSTM output);
- GRU layer: output ${\mathbf{v}}_{t}$ (updated GRU output), using ${\mathbf{u}}_{t}$ and previous GRU layer output, ${\mathbf{v}}_{t-1}$;
- Transformer layer: output ${\mathbf{w}}_{t}$ (updated transformer output), using ${\mathbf{v}}_{t}$ and previous transformer layer output, ${\mathbf{w}}_{t-1}$;
- Concatenation: combine updated transformer output with time length information;
- Linear regression layer: make latest prediction ${y}_{t}$.

#### 3.5.3. Experimental Design for Implementation

#### 3.5.4. Data Standardization and Transformation

## 4. Experiment

#### 4.1. Data Pre-Processing for CSI-500

#### 4.2. Setting and Specification

#### 4.3. Prediction Error Evaluation

- Mean Squared Error (MSE):$${L}_{mse}={\displaystyle \frac{1}{N}}\sum _{i=1}^{N}\sum _{t=1}^{n}{({y}_{it}-{\widehat{y}}_{it})}^{2}$$
- Mean Absolute Error (MAE):$${L}_{mae}={\displaystyle \frac{1}{N}}\sum _{i=1}^{N}\sum _{t=1}^{n}|{y}_{it}-\widehat{{y}_{it}}|$$

#### 4.4. Implementation Details

#### 4.5. Experiment Results

- The moving window for model training had a fixed back section size $s=99$. The model testing was performed on all 4610 input segments;
- The moving window for model training had a fixed back section size $s=99$, while the model testing was performed on a stratified random sample of 461 input segments;
- The moving window for model training had its back section size s, which was chosen from $[89,109]$ at random, while the model testing was performed on a stratified random sample of 461 input segments.

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv arXiv:1409.0473. [Google Scholar]
- Baron, Matthew, Jonathan Brogaard, Björn Hagströmer, and Andrei Kirilenko. 2019. Risk and return in high-frequency trading. Journal of Financial and Quantitative Analysis 54: 993–1024. [Google Scholar] [CrossRef]
- Beaton, Albert E. 1964. The Use of Special Matrix Operators in Statistical Calculus. Educational Testing Service Research Bulletin RB-64-51. Available online: https://www.ets.org/research/policy_research_reports/publications/report/1964/hpec.html (accessed on 5 January 2023).
- Catania, Leopoldo, Roberto Di Mari, and Paolo Santucci de Magistris. 2022. Dynamic discrete mixtures for high-frequency prices. Journal of Business & Economic Statistics 40: 559–77. [Google Scholar]
- Cenesizoglu, Tolga, Georges Dionne, and Xiaozhou Zhou. 2016. Asymmetric Effects of the Limit Order Book on Price Dynamics. CIRRELT, Centre interuniversitaire de recherche sur les réseaux d’entreprise…. Available online: https://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjLrO3Ps-_-AhVA7jgGHRgaCDgQFnoECAkQAQ&url=https%3A%2F%2Fchairegestiondesrisques.hec.ca%2Fwp-content%2Fuploads%2F2015%2F02%2F16-05.pdf&usg=AOvVaw15rprmLd-KdoMiSIKKMXks (accessed on 15 November 2022).
- Chen, Shile, and Changjun Zhou. 2020. Stock prediction based on genetic algorithm feature selection and long short-term memory neural network. IEEE Access 9: 9066–72. [Google Scholar] [CrossRef]
- Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv arXiv:1406.1078. [Google Scholar]
- Ding, Qianggang, Sifan Wu, Hao Sun, Jiadong Guo, and Jian Guo. 2020. Hierarchical multi-scale Gaussian Transformer for stock movement prediction. Paper presented at the IJCAI, Yokohama, Japan, July 11–17; pp. 4640–46. [Google Scholar]
- Goodnight, James H. 1979. A tutorial on the sweep operator. The American Statistician 33: 149–58. [Google Scholar]
- Gould, Martin D., Mason A. Porter, Stacy Williams, Mark McDonald, Daniel J. Fenn, and Sam D. Howison. 2013. Limit order books. Quantitative Finance 13: 1709–42. [Google Scholar] [CrossRef]
- Henrique, Bruno Miranda, Vinicius Amorim Sobreiro, and Herbert Kimura. 2019. Literature review: Machine learning techniques applied to financial market prediction. Expert Systems with Applications 124: 226–51. [Google Scholar] [CrossRef]
- Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9: 1735–80. [Google Scholar] [CrossRef] [PubMed]
- Kercheval, Alec N., and Yuan Zhang. 2015. Modelling high-frequency limit order book dynamics with support vector machines. Quantitative Finance 15: 1315–29. [Google Scholar] [CrossRef]
- Mondal, Prapanna, Labani Shit, and Saptarsi Goswami. 2014. Study of effectiveness of time series modeling (arima) in forecasting stock prices. International Journal of Computer Science, Engineering and Applications 4: 13. [Google Scholar] [CrossRef]
- Tran, Dat Thanh, Martin Magris, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis. 2017. Tensor representation in high-frequency financial data for price change prediction. Paper presented at the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, November 27–December 1; pp. 1–7. [Google Scholar]
- Tsantekidis, Avraam, Nikolaos Passalis, Anastasios Tefas, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis. 2017. Forecasting stock prices from the limit order book using convolutional neural networks. Paper presented at the 2017 IEEE 19th Conference on Business Informatics (CBI), Thessaloniki, Greece, July 24–27; vol. 1, pp. 7–12. [Google Scholar]
- Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, December 4–9. [Google Scholar]
- Yun, Kyung Keun, Sang Won Yoon, and Daehan Won. 2021. Prediction of stock price direction using a hybrid ga-xgboost algorithm with a three-stage feature engineering process. Expert Systems with Applications 186: 115716. [Google Scholar] [CrossRef]
- Zhang, Zihao, and Stefan Zohren. 2021. Multi-horizon forecasting for limit order books: Novel deep learning approaches and hardware acceleration using intelligent processing units. arXiv arXiv:2105.10430. [Google Scholar]
- Zhang, Zihao, Stefan Zohren, and Stephen Roberts. 2019. DeepLOB: Deep convolutional neural networks for limit order books. IEEE Transactions on Signal Processing 67: 3001–12. [Google Scholar] [CrossRef]

**Figure 1.**An illustration of LOBs in trading at a given time stamp, where the bid and ask orders are sorted by price. When a bid order is priced higher than an ask order, the two are automatically matched and put into execution.

**Figure 2.**The structure of a recurrent neural network. ${\mathbf{x}}_{t}$ is the input at time t; ${\mathbf{h}}_{t}$ denotes the output of hidden layer at time t; ${\mathbf{y}}_{t}$ is final output at time t; U represents the input layer weight matrix; W represents the hidden layer weight matrix; V represents the output layer weight matrix; $f(\xb7)$ is the activation function.

**Figure 3.**The $\mathbf{left}$ panel shows the structure of long short-term memory, and the $\mathbf{right}$ panel shows the structure of the gated recurrent unit.

**Figure 4.**Illustration of a transformer. The dimension of ${\mathbf{v}}_{i}$ is the input dimension of the transformer and the dimension of ${\mathbf{w}}_{i}$ is the hidden layer dimension of the transformer.

**Figure 5.**Schematic of O-LGT model architecture. The left panel shows the details of the prediction steps for ${y}_{T}$ at initial step and the right panel shows the details of the prediction steps for ${y}_{t}$ at acceleration steps. T represents the length of time for the input in the initial step; J represents the input dimension; ${q}_{L},{q}_{G},{q}_{T}$ represent the hidden layer dimensions of the LSTM, GRU, and transformer, respectively. The values of $[\xb7,\xb7]$ represent input dimensions and output dimensions. The values of $(\xb7\times \xb7)$ represent dimensions of a matrix and a vector.

**Figure 6.**The adjusted moving window method process. Here, $s=99$, $b=19$, and $h=100$. Observations of LOB data are updated every 3 s. The operation in the dashed box is updated every 20 time units (1 min). The dashed box indicates an acceleration process, where ${\mathbf{x}}_{t}$ indicates the feature information at time t and ${y}_{T+h}$ represents the predicted stock price percentage change at time $T+h$. The first line in the first dashed box indicates that the stock price prediction ${y}_{T+h}$ at time $T+h$ is obtained using the feature information from time $T-s$ to time T. The last line in the first dashed box indicates that the stock price prediction ${y}_{T+h+19}$ at time $T+h+19$ is obtained using the feature information from time $T-s$ to time $T+19$.

**Figure 7.**Schematic of the O-LGT structure for analyzing the Chinese LOB data. The

**left**panel shows the details of the prediction steps for ${y}_{T}$ in the initial step and the

**right**panel shows the details of the prediction steps for ${y}_{t}$ in the acceleration steps.

Features | Description |
---|---|

${p}_{bid}^{(1)}(t),\cdots ,{p}_{bid}^{(5)}(t)$ | Highest five prices bid at time t |

${v}_{bid}^{(1)}(t),\cdots ,{v}_{bid}^{(5)}(t)$ | Corresponding LOB bid volumes of the highest five prices bid at time t |

${p}_{ask}^{(1)}(t),\cdots ,{p}_{ask}^{(5)}(t)$ | Lowest five ask prices at time t |

${v}_{ask}^{(1)}(t),\cdots ,{v}_{ask}^{(5)}(t)$ | Corresponding LOB ask volumes of the lowest five ask prices at time t |

${v}_{ask}(t)$ | Total ask volumes at time t |

${v}_{bid}(t)$ | Total bid volumes at time t |

${p}_{avg}(t)$ | Average transaction price over the last 3 s |

${p}_{ask}(t)$ | Average ask price over the last 3 s |

${p}_{bid}(t)$ | Average bid price over the last 3 s |

${p}_{last}(t)$ | Latest transaction price at time t |

$p(t)$ | A stock price at time t |

Item | Tuning-Parameter |
---|---|

Optimization | Adam |

Initial Learning Rate | 0.001 |

Exponential Linear Decay | 0.95 |

Epoch Number | 100 |

Item | Configuration |
---|---|

Python Version | Python 3.9 |

Pytoch Version | 1.10.2 |

CPU | i7-7500U 2.70 GHz |

RAM | 16 G |

**Table 4.**Results of the three comparison experiments to verify the validity of the model design. Values in () represent standard errors. ↑ means the higher the value, the better the model performs. ↓ means the lower the value, the better the model performs.

Methods | 100 × RMSE ↓ | 100 × MAE ↓ |
---|---|---|

Training fixed -> Test fixed | 0.171 (0.0466) | 0.0943 (0.00454) |

Training fixed -> Test random | 0.174 (0.0487) | 0.0955 (0.00471) |

Training random -> Test random | 0.171 (0.0486) | 0.0948 (0.00468) |

**Table 5.**Benchmark models comparison in missing value scenarios. Values in () represent standard errors. ↑ means the higher the value, the better the model performs. ↓ means the lower the value, the better the model performs.

Methods | 100 × RMSE ↓ | 100 × MAE ↓ |
---|---|---|

Linear | 0.352 (0.1300) | 0.2290 (0.12900) |

LGT | 0.171 (0.0467) | 0.0944 (0.00454) |

O-LGT | 0.171 (0.0486) | 0.0948 (0.00468) |

**Table 6.**Performance comparison of different models with no missing values. Values in () represent standard errors. ↑ means the higher the value, the better the model performs. ↓ means the lower the value, the better the model performs.

Methods | 100 × RMSE ↓ | 100 × MAE ↓ | Time (Millisecond) |
---|---|---|---|

Linear | 0.330 (0.1160) | 0.2030 (0.01180) | 0.0352 |

XGBoost | 0.259 (0.0880) | 0.1530 (0.00754) | 0.704 |

DeepLOB | 0.173 (0.0468) | 0.0945 (0.00457) | 3.68 |

DeepAcc | 0.178 (0.0466) | 0.0930 (0.00454) | 0.695 |

LGT | 0.171 (0.0465) | 0.0943 (0.00454) | 2.21 |

O-LGT | 0.171 (0.0486) | 0.0948 (0.00468) | 0.0579 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, C.; Shen, L.; Qian, G.
Online Hybrid Neural Network for Stock Price Prediction: A Case Study of High-Frequency Stock Trading in the Chinese Market. *Econometrics* **2023**, *11*, 13.
https://doi.org/10.3390/econometrics11020013

**AMA Style**

Li C, Shen L, Qian G.
Online Hybrid Neural Network for Stock Price Prediction: A Case Study of High-Frequency Stock Trading in the Chinese Market. *Econometrics*. 2023; 11(2):13.
https://doi.org/10.3390/econometrics11020013

**Chicago/Turabian Style**

Li, Chengyu, Luyi Shen, and Guoqi Qian.
2023. "Online Hybrid Neural Network for Stock Price Prediction: A Case Study of High-Frequency Stock Trading in the Chinese Market" *Econometrics* 11, no. 2: 13.
https://doi.org/10.3390/econometrics11020013