Next Article in Journal
Review of Novel High-Entropy Protective Materials: Wear, Irradiation, and Erosion Resistance Properties
Next Article in Special Issue
Economic Trading Susceptibility: Constructing Networks of Mutual Influence through the Fitness of Countries
Previous Article in Journal
Suppressing Decoherence in Quantum State Transfer with Unitary Operations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

CEGH: A Hybrid Model Using CEEMD, Entropy, GRU, and History Attention for Intraday Stock Market Forecasting

1
School of Management Science and Engineering, Shandong University of Finance and Economics, Jinan 250014, China
2
Business School, Shandong University of Political Science and Law, Jinan 250014, China
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(1), 71; https://doi.org/10.3390/e25010071
Submission received: 11 November 2022 / Revised: 26 December 2022 / Accepted: 27 December 2022 / Published: 30 December 2022
(This article belongs to the Special Issue Complexity in Economics and Finance: New Directions and Challenges)

Abstract

:
Intraday stock time series are noisier and more complex than other financial time series with longer time horizons, which makes it challenging to predict. We propose a hybrid CEGH model for intraday stock market forecasting. The CEGH model contains four stages. First, we use complete ensemble empirical mode decomposition (CEEMD) to decompose the original intraday stock market data into different intrinsic mode functions (IMFs). Then, we calculate the approximate entropy (ApEn) values and sample entropy (SampEn) values of each IMF to eliminate noise. After that, we group the retained IMFs into four groups and predict the comprehensive signals of those groups using a feedforward neural network (FNN) or gate recurrent unit with history attention (GRU-HA). Finally, we obtain the final prediction results by integrating the prediction results of each group. The experiments were conducted on the U.S. and China stock markets to evaluate the proposed model. The results demonstrate that the CEGH model improved forecasting performance considerably. The creation of a collaboration between CEEMD, entropy-based denoising, and GRU-HA is our major contribution. This hybrid model could improve the signal-to-noise ratio of stock data and extract global dependence more comprehensively in intraday stock market forecasting.

1. Introduction

It is generally accepted that stock markets are crucial for modern societies and economies. With the increasing availability of stock time series at an intraday frequency and the development of quantitative trading, intraday stock market forecasting has become a hot issue for economists, investors, and regulators. Intraday stocks are securities that trade on the markets during regular business hours. Typically, 5-min, 30-min, and 60-min charts are used to capture intraday stock price movements. Unfortunately, intraday stock time series are noisier and more complex than other financial time series with longer time horizons. To begin with, the intraday market is more vulnerable to policy uncertainty and investor sentiment. Furthermore, the intraday time series contains intraday seasonal cycles. In addition, black swans, such as the current COVID-19 pandemic, can have a serious impact on stock markets, which are particularly sensitive to changes [1]. Moreover, the forecasting models often pick up noise instead of signals since intraday stock time series have low signal-to-noise ratios. These factors make it challenging to predict the intraday stock market. Several related methods have been proposed to predict the stock market. Traditional statistical models have a long history in stock forecasting. These models assume that time series are generated from a linear and stationary process and try to model the underlying time series generation process. However, this is inconsistent with the real stock market. The availability and analyzability of financial high-frequency data are gradually enhanced by the advantages of big data and artificial intelligence. Stock market forecasting has become an area at the intersection of finance and computer science. Machine learning has also drawn the attention of many researchers. Some classic machine learning models, such as the neural network (NN), support vector regression, and decision tree, have been applied for predicting stock time series [2]. Gu, Kelly, and Xiu predicted the stock return with various models and found that machine learning models significantly improved accuracy when compared with traditional statistical models [3].
In recent years, deep learning has been wildly used for stock prediction [4,5,6]. It extracts high-level features from a large amount of data by hierarchical hidden layers. Specifically, the recurrent neural network (RNN), one of the most popular deep learning models, has an internal memory that enables it to process sequences. Compared with the classic feedforward neural network (FNN), an RNN with memory cells takes information from prior inputs to influence the current input and output. Long short-term memory (LSTM) and gated recurrent unit (GRU) networks are variants of the RNN, and both are widely used to predict financial time series. Fischer and Krauss deployed LSTM for predicting stocks and found that LSTM outperformed memory-free methods [7]. Zhao et al. demonstrated that their GRU-based model performed slightly better than their LSTM-based model for stock price trend prediction [8]. Sako et al. used RNNs to predict the closing price of the stock index and also found that the GRU gave the overall best results [9]. It can be concluded that variants of the RNN have been the most frequently adopted deep learning models in stock prediction. However, RNNs still have some limitations, such as a lack of explanation and inefficiencies in learning sequential dependence.
Many researchers use attention models to narrow such gaps. Recently, the attention model has become an important concept and has been wildly used in NLP [10] and Computer Vision [11]. There are three neural architectures used in conjunction with attention [12]: (1) the encoder–decoder framework [13,14]; (2) the transformer [15,16]; and (3) memory networks [17]. In the financial field, researchers have applied attention to process text, such as economic news, and learn sequential representation for time series, such as price. To identify long-term temporal dependencies, Zhang et al. proposed an attention-enhanced LSTM model. The experimental results for stock price demonstrated that this attention-based model improved forecasting performance considerably [18]. Teng et al. proposed a multi-scale local cue and hierarchical attention-based LSTM model to capture stock price trend patterns [19]. Xu et al. predicted stock movements from financial news and historical prices using a GRU based on reinforcement learning with incorporated attention mechanisms [20]. Wang et al. proposed the AlphaStock investment strategy, using LSTM with history attention (LSTM-HA) to extract asset representation from multiple time series [21].
Although deep learning methods and attention mechanisms have improved feature representation, the complexity of stock data often leads to the risk of overfitting. Frequency decomposition methods are useful for reducing the complexity of stock time series. Empirical mode decomposition (EMD) can decompose any complicated data set into a finite and usually small number of intrinsic mode functions (IMFs) [22]. Unlike traditional Fourier spectral analysis, EMD does not require stationarity and linearity, although mode mixing occurs frequently with EMD. To overcome this drawback, ensemble empirical mode decomposition (EEMD), consisting of sifting and an ensemble of added white noise, has been proposed [23]. Complete ensemble empirical mode decomposition with adaptive noise (CEEMD) [24] can effectively eliminate residue noise in IMFs, which is an improvement on EEMD. These frequency decomposition methods could play a useful role in financial time series analysis if coupled with machine learning models. Cao et al. found that, compared with LSTM, the proposed CEEMDAN-LSTM model performed better in financial forecasting [25]. Zhang et al. proposed a hybrid CEEMD-PCA-LSTM model for stock prediction which outperformed benchmark models in terms of predictive accuracy and profitability performance [26]. Rezaei et al. predicted stock prices with a CEEMD-CNN-LSTM model which uses CEEMD to decompose stock price to different frequency spectra and uses CNN-LSMT to extract patterns and data dynamics [27]. Lv et al. proposed a hybrid model based on CEEMDAN, ADF, ARMA, and LSTM to predict the stock index [28]. The results of these studies suggest that the hybrid models based on decomposition and RNNs perform better than the individual models in stock forecasting. Improving the performance of hybrid models using various methods is a rising trend in stock prediction.
Intraday stock time series are inherently noisy, and therefore need denoising before forecasting. Information entropy can measure the complexity of time series, which helps to filter noise. Delgado-Bonal and Marshak showed that approximate entropy (ApEn) and sample entropy (SampEn) are two algorithms for determining the regularity of series based on the existence of patterns [29]. Considering the temporal dimension of uncertainty, Vinte and Ausloos proposed a comprehensive cross-sectional volatility estimator for stock markets based on intrinsic entropy [30]. Olbrys and Majewska used SampEn to capture sequential regularity in stock market time series [31]. Raubitzek and Neubauer found a correlation between ApEn, SampEn, and predictability in stock market data [32]. Moreover, some studies have introduced entropy to the EMD models. Chou et al. extracted the entropy features (ApEn and SampEn) of EMD-derived signals to predict the fall risk of elderly people [33]. Shang et al. combined the advantages of CEEMD and ApEn to eliminate the influence of noise in partial discharge and demonstrated the hybrid model’s effectiveness and superiority [34].
Despite the extensive studies above, using hybrid models combining frequency decomposition methods and machine learning for intraday stock market forecasting still faces several limitations. First, there has been little discussion about the combination of entropy and EMD-based hybrid models in stock forecasting. Specifically, very little attention has been paid to filtering the high-frequency noise effectively after decomposition for stock forecasting. Second, most hybrid models use the same forecasting model for the IMFs indiscriminately, ignoring the characteristics of different signals. Third, most existing RNN models only use RNN layers to learn the sequential dependence in hidden states or directly stack the hidden states as inputs of the next layer. These models cannot effectively learn the global dependence of stock time series.
To overcome such limitations, we propose a CEGH (CEEMD-Entropy-GRU-HA) model for intraday stock market forecasting. This hybrid model is composed of frequency decomposition, entropy, GRU, and attention mechanisms. We first use CEEMD to decompose the original intraday stock market data into IMFs. Then, entropies, which can measure the complexities of signals, are used to denoise the IMFs. On this basis, the retained IMFs are divided into several groups and predicted using the FNN or the GRU with history attention (GRU-HA). Finally, we integrate the prediction results of the groups to obtain the ensemble results.
The main contributions are summarized as follows: (1) To the best of our knowledge, our work is among the first to attempt to create a collaboration between frequency decomposition, entropy, and attention. The proposed model was compared with the individual state-of-the-art models and other decomposition-based models. The experiments on the U.S. and China markets showed that this hybrid model improved accuracy in intraday stock forecasting; (2) we introduced an attention mechanism to IMF forecasting, which enhanced the ability to explore global dependence in stock time series; (3) we used entropies to remove the noise with high complexity, and this could improve the signal-to-noise ratio of the intraday stock data.
The rest of this paper is organized as follows. Section 2 describes the proposed CEGH model. Section 3 describes the experiments on the U.S. stock market and the China stock market. Section 4 evaluates the forecasting performance of the CEGH model and discusses the findings. Section 5 concludes this paper.

2. The CEGH Model

In this section, we first introduce the framework of CEGH in general and then describe the four main components.

2.1. Framework

To improve the signal-to-noise ratio of intraday stock data and extract global dependence more comprehensively, we propose a hybrid CEGH model. This model combines frequency decomposition, entropies, recurrent neural networks, and attention mechanisms. As is shown in Figure 1, the model consists of four stages.
Stage 1: Decomposition. Intraday forecasting with raw stock data is challenging due to the complexity of financial time series. Using highly complex data may result in the poor predicting performance of an excessively complicated machine learning model. To better extract high-level features from the original intraday stock data, CEEMD is used to decompose the raw data into several components with a well-defined instantaneous frequency using IMFs.
Stage 2: Entropy-based denoising. Information entropy provides information regarding the complexity of the time series, which helps to filter the noise. In this stage, the approximate entropy and sample entropy of each IMF are calculated, and a certain noise threshold is set to filter the noise.
Stage 3: Grouping and Forecasting. Different IMFs have different complexities and time–frequency characteristics which need different prediction models. In this stage, the retained IMFs are divided into a high-frequency group, a medium-frequency group, a low-frequency group, and a trend group. The comprehensive signals of each group are then predicted using the FNN or the GRU-HA separately.
Stage 4: Ensemble. The final prediction results are obtained by integrating the prediction results for each group in Stage 3.

2.2. CEEMD

In Stage 1, CEEMD [24] is used to decompose the original intraday stock data into IMFs. CEEMD is a noise-assisted EMD technique, which can be described with the following steps.
Step 1: Add white noise to the original intraday stock data x as follows:
x i = x + ε 0 w i ,
where w i ( i = 1 , 2 , , I ) are different realizations of white noise and ε 0 is the parameter of white noise power.
Step 2: Decompose x i using EMD to obtain their first modes. By averaging I M F 1 i , the first component can be obtained as follows:
I M F 1 = 1 I i = 1 I I M F 1 i .
Step 3: Calculate the first residue as follows:
r 1 = x - I M F 1 .
Step 4: Calculate the second IMF as follows:
I M F 2 = 1 I i = 1 I E 1 ( r 1 - ε 1 E 1 ( w i ) ) ,
where E j ( ) is the operator that produces the j-th mode obtained using EMD.
Step 5: For k = 2 , , K , calculate the k-th residual as follows:
r k = r k 1 I M F k .
Step 6: Calculate the (k + 1)-th IMF as follows:
I M F k + 1 = 1 I i = 1 I E 1 ( r k + ε k E k ( w i ) ) .
Step 7: Repeat Step 5 and Step 6 until the obtained residual can no longer be decomposed feasibly. The final residual can be described as follows:
r = x k = 1 K I M F k .
Therefore, the given intraday stock data can be expressed as follows:
x = k = 1 K I M F k + r .

2.3. Entropy-Based Denoising

In Stage 2, information entropy is used to remove those IMFs that represented noise components. Because of the low signal-to-noise ratios of intraday stock data, denoising is particularly important. Information entropy is utilized to quantify the complexity and regularity of time series. A larger value means that the sequence has a higher probability of generating a new pattern. The larger the information entropy value, the more complex and irregular the sequence is. Unlike previous studies that used only single entropy for denoising [34], we use approximate entropy (ApEn) [35] and sample entropy (SampEn) [36] to measure the complexities of intraday stock IMFs.
In our model, if the entropy value is above a certain threshold, the IMF is regarded as noise and is discarded. Otherwise, the IMF is assumed to contain useful intraday stock market information and is kept. Given a time series u ( i ) , i = 1 , 2 , , N , we define an embedding dimension m and a tolerance r . The calculation of ApEn and SampEn can be defined with the following steps.
Step 1: Extend u ( i ) to the m t h vector V m ( i ) as follows:
V m ( i ) = [ u ( i ) , u ( i + 1 ) , , u ( i + m 1 ) ] ,
where i = 1 , 2 , , N m + 1 .
Step 2: Calculate the distance between V m ( i ) and V j ( j ) as follows:
D [ V m ( i ) , V m ( j ) ] = max k = 0 , 1 , , m 1 { | u ( i + k ) u ( j + k ) | } ,
where j = 1 , 2 , , N m + 1 , j i .
Step 3: Calculate approximate entropy. First, measure the regularity and frequency of patterns within tolerance r as follows:
C i m ( r ) = number   of   j   such   that   D [ V m ( i ) , V m ( j ) ] r N m + 1 .
Then, calculate the mean value of the logarithm of C i m ( r ) as follows:
φ m ( r ) = i = 1 N m + 1 ln [ C i m ( r ) ] N m + 1 .
Finally, the ApEn can be defined as follows:
A p E n ( m , r ) = φ m ( r ) φ m + 1 ( r ) .
Step 4: Calculate sample entropy. First, calculate the two coefficients A m ( r ) and B m ( r ) as follows:
A i m ( r ) = j = 1 , j i N m number   of   times   that   D [ V m + 1 ( i ) , V m + 1 ( j ) ] < r N m 1 ,
B i m ( r ) = j = 1 , j i N m number   of   times   that   D [ V m ( i ) , V j ( j ) ] < r N m 1 .
Add them as follows:
A m ( r ) = i = 1 N m A i m ( r ) N m ,
B N m ( r ) = i = 1 N m B i m ( r ) N m .
The SampEn can be defined as follows:
S a m p E n ( m , r ) = ln [ A m ( r ) B m ( r ) ] .

2.4. GRU-HA

After decomposing and grouping, we use different models to predict different group signals. The classic FNN is used for the relatively regular groups and the GRU-HA is used for the more complex groups. The GRU-HA combines the GRU and an attention mechanism as shown in Figure 2. First, the GRU layers learn the hidden states of the input stock data, and then the history attention layers exploit the global dependency in the hidden states. The history attention representation and the last hidden state are concatenated into an input vector of a fully connected layer. Finally, we obtain the output of the dense layer as the predicted value. The main components of the GRU-HA are described in detail as follows.

2.4.1. GRU

With the GRU-HA, we first use the GRU to learn the history state of the input data as follows:
h t = G R U ( h t 1 , x t ) , t [ 1 , T ] ,
where x t denotes the total signal of each group at time step t and h t denotes the hidden state, which contains the sequential dependence, leaned by the GRU. The structure of the GRU is shown in Figure 3.
At the time step t , the activation h t j of the j-th GRU unit is a linear interpolation between the previous activation h t 1 j and the candidate activation h ˜ t j :
h t j = ( 1 z t j ) h t 1 j + z t j h ˜ t j ,
where the update gate z t j controls how much the unit updates its information. The update gate is defined as follows:
z t j = σ ( W z x t + U z h t 1 ) j .
The candidate activation h ˜ t j is calculated as follows:
h ˜ t j = tanh [ W x t + U ( r t h t 1 ) ] j ,
where r t is the reset gate which is defined as follows:
r t j = σ ( W r x t + U r h t 1 ) j .

2.4.2. History Attention

The hidden state h t learned by the GRU can represent the sequential dependencies of x t , but cannot effectively learn the global dependencies. To enhance the efficiency of dependency learning, we combine the GRU with history attention.
We define a query vector q T for the last hidden state h T :
q T = W q h T .
We then define a key vector k t and a value vector v t for all hidden states h t :
k t = W k h k ,
v t = W v h t ,
where W q , W k , W v are the parameters to learn. Then, following Luong’s multiplicative style, the attention score is computed as follows:
α t = q t k t T .
Next, the attention scores are used to calculate the attention weight:
A T T ( q T , k t ) = exp ( α t ) i = 1 T exp ( α i ) .
Finally, the history states attention representation can be calculated as follows:
r = t = 1 T A T T ( q T , k t ) v t .

3. Experiments

In this section, we empirically evaluate the proposed model using the U.S. stock market and China stock market. All the experiments are performed in Python 3.6.12. Specifically, the experiments related to neural network models and attention models are implemented using the deep learning end-to-end platform Tensorflow. The decomposition models are implemented using PyEMD and pyhht.

3.1. Data

The U.S. stock market data were collected from the Trade and Quote database (TAQ) of Wharton Research Data Services (WRDS). The Standard and Poor’s 500 (SP500) was chosen to represent the U.S. stock market. We used the price of SPY, the actively traded SP500 ETF, to represent the SP500. The U.S. stock market opens at 9:30 and closes at 16:00 Eastern Time. Every trading day has 13 half-hour intervals. The China stock market data were obtained from the Wind database. The China Securities Index 300 (CSI300) typically represents the overall China stock market. Chinese stock exchanges operate from 9:30 to 11:30 and from 13:00 to 15:00. Every trading day has eight half-hour intervals. The sample period spans from January 2015 through December 2020, a period which covers the well-known market event, the COVID-19 pandemic. The data are collected every half-hour during stock market trading hours.
We selected the last 20% of the sample for testing (out-of-sample data) and split the remaining data (in-sample data) into a training set and a validation set for 9:1.

3.2. Decomposing and Denoising

As mentioned before, we used CEEMD to decompose the raw intraday stock market signals into IMFs. The number of trials was set to 200, and the white noise standard deviation was set to 0.2. The decomposition results are shown in Figure 4. The arrangement of IMFs is from high to low frequency, and the residual occurs at the end.
To improve the signal-to-noise ratio of the intraday stock data, we use entropy-based denoising to remove the irregular noise. According to previous research [34], the parameters of the entropy algorithms are defined as: m = 2 and r = 0.2 E S D , where E S D is the standard deviation of the IMFs. The ApEn values and SampEn values of the IMFs are shown in Table 1. Different IMFs process different information entropy values, which means that different degrees of complexity exist in the diverse decomposition levels. The ApEn and SampEn values are gradually reduced for both the SP500 or CSI300, so the irregularities of the IMFs are gradually decreased. We set the noise thresholds as λ A p E n = 1 and λ S a m p E n = 0.6 for the two entropies. The decomposed sub-signals can be retained only if the ApEn value is less than 1 and the SampEn value is less than 0.6. Then, IMF1 and IMF2 are abandoned as noise. IMF3–IMF11 are kept as useful information for training.

3.3. Grouping and Forecasting

After denoising, we divide the retained IMFs into several groups and use different models to predict them.

3.3.1. Grouping

The IMFs are grouped based on their characteristics. The standard deviation values are shown in Table 1. For the U.S. stock market, the standard deviations of IMF3 and IMF4 are relatively small, both less than 0.5. This reflects the fluctuations caused by short-term news. Therefore, IMF3 and IMF4 are used as the high-frequency group. The amplitudes of IMF5–IMF8 increase gradually with the standard deviation concentrated in the 0.7–2.5 range, and the period of IMF5–IMF8 becomes longer, reflecting the fluctuations caused by medium-term market factors. Therefore, IMF5–IMF8 are regarded as the medium-frequency group. The standard deviations of IMF9 and IMF10 are close to each other, around 5.7, and the periods become longer. This represents fluctuations caused by special events, such as the COVID-19 pandemic. Therefore, IMF9 and IMF10 are regarded as the low-frequency group. IMF11, the residual term after decomposition, represents the long-term economic situation of the U.S. stock market and is regarded as the trend group. As with the U.S. stock market, the retained IMFs of the China stock market are divided into four groups. IMF3, IMF4–IMF7, IMF8–IMF10, and IMF11 are regarded as the high-frequency group, the medium-frequency group, the low-frequency group, and the trend group, respectively.
The different groups reflect the volatility of the stock market in different financial cycles. For the U.S. stock market, the fourth part of Figure 5a shows that there is an overall growth trend during 2015–2019, but that the growth rate slows down significantly in 2018 and 2019. As can be seen in the third part, there is a relatively stable trend from 2015 to 2017. The market fluctuates greatly in 2018 and 2019. In 2020, the market resembles a rollercoaster, influenced by the COVID-19 pandemic. The first and second parts of Figure 5a show that there were extreme short-term fluctuations in 2020.
For the China stock market, as is shown in the fourth part of Figure 5b, the stock market has been on an upward trend from 2015 to 2019. Compared with the U.S. stock market, the China stock market displays a steeper curve, reflecting the rapid growth of the Chinese capital market. The fluctuations of the low-frequency group reflect the influences of long-term market factors. As illustrated in the third part of Figure 5b, the stock market in 2015–2016 experiences a rapid growth followed by a huge recession, which corresponds to the stock market crash in 2015. Moreover, the first and second parts of Figure 5b show that this recession is accompanied by intense fluctuations, and the rise in fluctuations was due to the government’s bailout policy, which did not reverse the bear market trend. This decline was accompanied by the participation of a large amount of leveraged funds, which also led to a greater fluctuation and a stronger downtrend.

3.3.2. Forecasting

Before training, the input data had to be rescaled within the range of 0 and 1. We use the min–max scaler to normalize the grouped IMFs as follows:
X i n = X min ( X ) max ( X ) min ( X ) .
After predicting, the output values were converted back to their original scale by reversing Equation (30).
Table 2 shows the hyperparameters of the forecasting models. As mentioned before, we select different models to predict groups by their characteristics. Apparently, predicting a complex group is much more difficult than predicting a relatively regular one. As is shown in Figure 1, the high-frequency group and the medium-frequency group are predicted using the GRU-HA, and the low-frequency group and the trend group are predicted using the FNN. The FNN is composed of dense layers, i.e., regular densely connected neural network layers. The GRU-HA consists of GRU layers, attention layers, and dense layers. The grid search method is employed to determine the appropriate network structure, including the number of hidden layers and the number of units. For the hidden layers, we apply the rectified linear unit (ReLU) activation function to obtain the maximum of 0 and the input tensor as follows:
Re L U ( x ) = max ( 0 , x ) .
For training, we use Adam [37], the stochastic gradient descent method based on adaptive moment estimation, for optimization. Its default learning rate is set to 0.001. In terms of loss function, we use the mean squared error (MSE) to compute the mean of squares of errors between labels and predicted values as follows:
M S E = 1 m i = 1 m ( y ^ i y i ) 2 .
The number of epochs is the number of complete passes through the entire training set. Since too many epochs could lead to overfitting, we use early stopping as regularization. A training loop will check at the end of every epoch whether the loss is no longer decreasing. The maximum number of epochs is set to 500.

4. Results and Discussion

To test the forecasting performance of our model on the intraday stock market, we conducted experiments using the proposed CEGH model and the baseline methods. In this section, we first define the evaluation metrics for the forecasting models. The baseline methods are then described. Last, we report the results and discuss the findings in two phases: group results and ensemble results.

4.1. Evaluation Metric

The evaluation metric indicates the difference between the predicted value and the actual value. In our study, mean absolute error (MAE) and root mean squared error (RMSE) are used as metrics to evaluate the forecasting models, and they can be defined as follows:
M A E = 1 n i = 1 n | y i y i | ,
R M S E = 1 n i = 1 n ( y ^ i y i ) 2 ,
where y ^ i is the predicted value and y i is the actual value. The lower the above two error values, the better the forecasting performance.

4.2. Baseline Methods

The proposed CEGH model is compared with some baseline models, including:
Feedforward neural network (FNN): the classical artificial neural network in which the connections of nodes do not form a loop.
Long short-term memory (LSTM) [38]: a wildly used extension of the recurrent neural networks (RNN) with three logic gates.
Gated recurrent unit (GRU) [39]: a variant of LSTM with a simpler unit structure.
GRU-HA: a hybrid model that combines a GRU and history state attention.
CEEMD-GRU-HA: a hybrid model using CEEMD, a GRU, and HA. Compared with CEGH, CEEMD-GRU-HA does not have an entropy-based denoising sector.
The FNN, LSTM, GRU, and GRU-HA models are undecomposed models. The CEEMD-GRU-HA model is a decomposition-base model. Specifically, the GRU-HA and CEEMD-GRU-HA models are proposed in this paper.

4.3. Group Results

As mentioned in Section 3.3 we divided the IMFs into a high-frequency group, a medium-frequency group, a low-frequency group, and a trend group. After that, the total signal of each group was predicted. Using the method described above, we obtained the group results.
Table 3 shows the performance comparison for the groups in stage 3. The control group was composed of IMF1, IMF2, and IMF3. The different total signals of the groups have different characteristics and degrees of complexity. Training the models on the groups with low complexity is easier than training them using the IMFs with high complexity. For the control group, the high-frequency group, and the medium-frequency group, we used the FNN, LSTM, and GRU models as baseline methods. The trend group and low-frequency group are predicted directly using the FNN due to the relatively low complexity of their patterns. Some interesting aspects of this table are highlighted below.
(1) The recurrent neural network models performed better than the FNN, a finding which is consistent with results obtained in previous studies [8]. For the high-frequency U.S. stock market group, the MAE values of the FNN, LSTM, and GRU models were 0.0774, 0.0312, and 0.0308, respectively. For the high-frequency China stock market group, the MAE values of the FNN, LSTM, and GRU models were 1.1830, 0.6829, and 0.4744, respectively. These results indicate that for the two RNN variants, the GRU model performed better than LSTM model, a finding which is consistent with a previous study [9]. It may be reasonable to suppose that GRU is a better choice when the size of training sample is limited.
(2) The GRU-HA performed better than the GRU. For the high-frequency U.S. stock market group, the MAE value of the GRU-HA was 18% lower than that of the GRU. The MAE value of the GRU-HA was as much as 80% lower than that of the GRU for the medium-frequency group. For high-frequency China stock market group, the MAE value of the GRU-HA was 7% lower than that of the GRU. The MAE value of the GRU-HA was 35% lower than that of the GRU for the medium-frequency group. A similar descending trend was also true for the RMSE value. These results imply that adding attention to the GRU enhances the prediction performance. Moreover, the performance improvement of the medium-frequency group was much larger than that of the high-frequency group. The high-frequency group retains only IMF3 after denoising. The GRU is sufficient for identifying this sequential pattern of relatively low complexity. The medium-frequency group composed of multiple IMFs is more complex. It may be inferred that the GRU-HA has more significant advantages for complex time series prediction.

4.4. Ensemble Results

In our study, the ensemble results were obtained by integrating the prediction results of the high-frequency group, the medium-frequency group, the low-frequency group, and the trend group, as explained in the previous subsection. The baseline method, CEEMD-GRU-HA, integrated the results of the above groups. The FNN, LSTM, GRU, and GRU-HA models were undecomposed models, as the original intraday stock data were predicted by these models directly.
Table 4 shows the performance comparison of the CEGH and baseline methods for the intraday stock market. In general, the proposed model had the lowest forecasting error, which verified its effectiveness. Some interesting observations are highlighted below.
(1) Specifically, the decomposition-based models outperformed the undecomposed models, which is consistent with results obtained in previous studies [27,40,41]. For U.S. stock market, the MAE values of the decomposition-based models were about 20% lower than that of the best-performing undecomposed model, the GRU-HA. For China stock market, the MAE values of the decomposition-based models were about 40% lower than that of the best-performing undecomposed model, the GRU-HA. The RMSE values exhibited a similar descending trend. These results indicate that such hybrid methods could extract the sequential representation more efficiently. A possible explanation for this is that the decomposition method could transform the original intraday stock time series into several relatively regular components. Apparently, time series with low complexity are much easier to predict.
(2) According to the results for the undecomposed models in Table 4, the GRU-HA had the lowest error values, which is consistent with the group results in Section 4.3. The results indicate that the RNN models performed better than the FNN. They also imply that the idea of introducing history attention to the GRU for intraday stock market forecasting is effective. Compared with the existing models that only use RNNs to learn the sequential dependence in stock time series, the GRU-HA extracted global dependence more comprehensively.
(3) According to the results for the decomposition-based models in Table 4, the CEGH outperformed the CEEMD-GRU-HA. As can be seen, the MAE errors of the CEGH were approximately 7% lower than those of the CEEMD-GRU-HA for both the U.S. and China stock markets. Specifically, from the comparison of the control group and the high-frequency group in Table 3, it can be seen that the errors of the high-frequency group were much lower than that of the Control group. This could be attributed to the entropy-based denoising section of the CEGH. ApEn and SampEn can measure the complexity of stock time series to recognize the noisy components in IMFs. It is an effective way to improve forecasting performance because it allows us to increase the signal-to-noise ratio before training the model.
Furthermore, Figure 6 shows brief graphs of the ensemble results for a better comparison. In these figures, the blue lines represent the actual values, the red lines indicate the predicted values, and the green lines express the errors between the actual values and the predicted values. As we can see, the proposed CEGH model had a poor performance during the COVID-19 outbreak. Figure 6a indicates that the absolute values of the errors became significantly larger from February to March 2020. During this time, the U.S. stock market fell sharply due to the impact of the COVID-19 pandemic. As we can see in Figure 6b, the absolute values of the errors were extremely large in early February. For the China stock market, the errors were highly volatile when the market fell again around March 2020. It is noteworthy that the stock market was closed due to the Chinese New Year holiday at the end of January. This may imply that the proposed model is not good at learning from the limited historical data of intraday stock markets under the influence of black swan events.

5. Conclusions

We proposed a hybrid CEGH model to better analyze and predict intraday stock markets. This model combined frequency decomposition, entropy, GRU, and history attention. Based on the results and discussion of experiments on the U.S. and China stock markets, the following conclusions can be drawn: (1) The decomposition-based models outperformed the undecomposed models, demonstrating the advantages of CEEMD. This could be due to the fact that CEEMD reduced the complexity of the intraday stock time series. (2) The GRU-HA had a better performance than other individual models. This implies that introducing history attention to the GRU for intraday stock market forecasting is effective, as the history attention could allow it to extract global dependence more comprehensively. (3) The prediction error of the CEGH is smaller than that of the CEEMD-GRU-HA. This indicates that the entropy-based denoising section of the CEGH improved the forecasting performance. An explanation could be that the ApEn and SampEn measured the complexity of the stock time series to recognize the noisy components in IMFs, which allowed us to increase the signal-to-noise ratio before training the model. (4) The CEGH performed poorly at the beginning of the COVID-19 pandemic since it is difficult to learn patterns from limited historical training data during black swan events. This tells us that machine learning forecasting models should not be blindly relied on when extreme events occur in the stock market.
Overall, our results provide compelling evidence for the effectiveness of the CEGH in intraday stock market forecasting. However, some limitations are worth noting: (1) the stock forecasting model was only evaluated using error values; (2) the model did not take into consideration the fact that the stock market changed its behavior due to the economic situation; (3) the experiments were only conducted on the U.S. and China stock markets. Future works could be investigate the following: (1) incorporating the trading strategy and the profit evaluation; (2) combining adaptive learning methods and trading rules based on financial concepts; (3) comparing European or other conditions.

Author Contributions

Conceptualization, Y.L. and X.L.; formal analysis, Y.L.; funding acquisition, X.L.; investigation, Y.L.; methodology, Y.L.; software, Y.L.; supervision, X.L.; validation, Y.L. and Y.Z.; visualization, Y.L.; writing—original draft, Y.L.; writing—review and editing, Y.L., Y.Z. and S.L.; funding acquisition, X.L. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shandong Province Key Research and Development Program (Soft Science Project), Grant No. 2021RKY03052.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

U.S. stock market data are available in WRDS at https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/nyse-trade-and-quote-taq (accessed on 10 November 2022) China stock market data are available in Wind at https://www.wind.com.cn/portal/en/WFT/index.html (accessed on 10 November 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Valaskova, K.; Kliestik, T.; Gajdosikova, D. Distinctive Determinants of Financial Indebtedness: Evidence from Slovak and Czech Enterprises. Equilibrium. Q. J. Econ. Econ. Policy 2021, 16, 639–659. [Google Scholar] [CrossRef]
  2. Cavalcante, R.C.; Brasileiro, R.C.; Souza, V.L.F.; Nobrega, J.P.; Oliveira, A.L.I. Computational Intelligence and Financial Markets: A Survey and Future Directions. Expert Syst. Appl. 2016, 55, 194–211. [Google Scholar] [CrossRef]
  3. Gu, S.; Kelly, B.; Xiu, D. Empirical Asset Pricing via Machine Learning. Rev. Financ. Stud. 2020, 33, 2223–2273. [Google Scholar] [CrossRef] [Green Version]
  4. Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial Time Series Forecasting with Deep Learning: A Systematic Literature Review: 2005–2019. Appl. Soft Comput. 2020, 90, 106181. [Google Scholar] [CrossRef] [Green Version]
  5. Hu, Z.; Zhao, Y.; Khushi, M. A Survey of Forex and Stock Price Prediction Using Deep Learning. ASI 2021, 4, 9. [Google Scholar] [CrossRef]
  6. Thakkar, A.; Chaudhari, K. A Comprehensive Survey on Deep Neural Networks for Stock Market: The Need, Challenges, and Future Directions. Expert Syst. Appl. 2021, 177, 114800. [Google Scholar] [CrossRef]
  7. Fischer, T.; Krauss, C. Deep Learning with Long Short-Term Memory Networks for Financial Market Predictions. Eur. J. Oper. Res. 2018, 270, 654–669. [Google Scholar] [CrossRef] [Green Version]
  8. Zhao, J.; Zeng, D.; Liang, S.; Kang, H.; Liu, Q. Prediction Model for Stock Price Trend Based on Recurrent Neural Network. J. Ambient Intell. Hum. Comput. 2021, 12, 745–753. [Google Scholar] [CrossRef]
  9. Sako, K.; Mpinda, B.N.; Rodrigues, P.C. Neural Networks for Financial Time Series Forecasting. Entropy 2022, 24, 657. [Google Scholar] [CrossRef]
  10. Galassi, A.; Lippi, M.; Torroni, P. Attention in Natural Language Processing. IEEE Trans. Neural Netw. Learn. Syst. 2020, 1–18. [Google Scholar] [CrossRef]
  11. Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention Mechanisms in Computer Vision: A Survey. Comp. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
  12. Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An Attentive Survey of Attention Models. arXiv 2020, arXiv:1904.02874. [Google Scholar] [CrossRef]
  13. Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
  14. Luong, M.-T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-Based Neural Machine Translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
  15. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
  16. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. arXiv 2021, arXiv:2012.07436. [Google Scholar] [CrossRef]
  17. Sukhbaatar, S.; Szlam, A.; Weston, J.; Fergus, R. End-To-End Memory Networks. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
  18. Zhang, Q.; Yang, L.; Zhou, F. Attention Enhanced Long Short-Term Memory Network with Multi-Source Heterogeneous Information Fusion: An Application to BGI Genomics. Inf. Sci. 2021, 553, 305–330. [Google Scholar] [CrossRef]
  19. Teng, X.; Zhang, X.; Luo, Z. Multi-Scale Local Cues and Hierarchical Attention-Based LSTM for Stock Price Trend Prediction. Neurocomputing 2022, 505, 92–100. [Google Scholar] [CrossRef]
  20. Xu, H.; Chai, L.; Luo, Z.; Li, S. Stock Movement Prediction via Gated Recurrent Unit Network Based on Reinforcement Learning with Incorporated Attention Mechanisms. Neurocomputing 2022, 467, 214–228. [Google Scholar] [CrossRef]
  21. Wang, J.; Zhang, Y.; Tang, K.; Wu, J.; Xiong, Z. AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy Using Interpretable Deep Reinforcement Attention Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 1900–1908. [Google Scholar]
  22. Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-Stationary Time Series Analysis. Proc. R. Soc. Lond. A 1998, 454, 903–995. [Google Scholar] [CrossRef]
  23. Wu, Z.; Huang, N.E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
  24. Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A Complete Ensemble Empirical Mode Decomposition with Adaptive Noise. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 4144–4147. [Google Scholar]
  25. Cao, J.; Li, Z.; Li, J. Financial Time Series Forecasting Model Based on CEEMDAN and LSTM. Phys. A Stat. Mech. Its Appl. 2019, 519, 127–139. [Google Scholar] [CrossRef]
  26. Zhang, Y.; Yan, B.; Aasma, M. A Novel Deep Learning Framework: Prediction and Analysis of Financial Time Series Using CEEMD and LSTM. Expert Syst. Appl. 2020, 159, 113609. [Google Scholar] [CrossRef]
  27. Rezaei, H.; Faaljou, H.; Mansourfar, G. Stock Price Prediction Using Deep Learning and Frequency Decomposition. Expert Syst. Appl. 2021, 169, 114332. [Google Scholar] [CrossRef]
  28. Lv, P.; Wu, Q.; Xu, J.; Shu, Y. Stock Index Prediction Based on Time Series Decomposition and Hybrid Model. Entropy 2022, 24, 146. [Google Scholar] [CrossRef]
  29. Delgado-Bonal, A.; Marshak, A. Approximate Entropy and Sample Entropy: A Comprehensive Tutorial. Entropy 2019, 21, 541. [Google Scholar] [CrossRef] [Green Version]
  30. Vințe, C.; Ausloos, M. The Cross-Sectional Intrinsic Entropy—A Comprehensive Stock Market Volatility Estimator. Entropy 2022, 24, 623. [Google Scholar] [CrossRef]
  31. Olbryś, J.; Majewska, E. Regularity in Stock Market Indices within Turbulence Periods: The Sample Entropy Approach. Entropy 2022, 24, 921. [Google Scholar] [CrossRef]
  32. Raubitzek, S.; Neubauer, T. An Exploratory Study on the Complexity and Machine Learning Predictability of Stock Market Data. Entropy 2022, 24, 332. [Google Scholar] [CrossRef]
  33. Chou, L.-W.; Chang, K.-M.; Wei, Y.-C.; Lu, M.-K. Empirical Mode Decomposition-Derived Entropy Features Are Beneficial to Distinguish Elderly People with a Falling History on a Force Plate Signal. Entropy 2021, 23, 472. [Google Scholar] [CrossRef]
  34. Shang, H.; Li, Y.; Xu, J.; Qi, B.; Yin, J. A Novel Hybrid Approach for Partial Discharge Signal Detection Based on Complete Ensemble Empirical Mode Decomposition with Adaptive Noise and Approximate Entropy. Entropy 2020, 22, 1039. [Google Scholar] [CrossRef]
  35. Pincus, S.M. Approximate Entropy as a Measure of System Complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297–2301. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  36. Richman, J.S.; Moorman, J.R. Physiological Time-Series Analysis Using Approximate Entropy and Sample Entropy. Am. J. Physiol. Heart Circ. Physiol. 2000, 278, H2039–H2049. [Google Scholar] [CrossRef] [PubMed]
  37. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
  38. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  39. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
  40. Lin, Y.; Lin, Z.; Liao, Y.; Li, Y.; Xu, J.; Yan, Y. Forecasting the Realized Volatility of Stock Price Index: A Hybrid Model Integrating CEEMDAN and LSTM. Expert Syst. Appl. 2022, 206, 117736. [Google Scholar] [CrossRef]
  41. Lv, P.; Shu, Y.; Xu, J.; Wu, Q. Modal Decomposition-Based Hybrid Model for Stock Index Prediction. Expert Syst. Appl. 2022, 202, 117252. [Google Scholar] [CrossRef]
Figure 1. The framework of CEGH.
Figure 1. The framework of CEGH.
Entropy 25 00071 g001
Figure 2. The structure of the gate recurrent unit with history attention (GRU-HA).
Figure 2. The structure of the gate recurrent unit with history attention (GRU-HA).
Entropy 25 00071 g002
Figure 3. The structure of the GRU.
Figure 3. The structure of the GRU.
Entropy 25 00071 g003
Figure 4. The CEEMD decomposition results. (a) The IMFs of the U.S. stock market; (b) the IMFs of the China stock market.
Figure 4. The CEEMD decomposition results. (a) The IMFs of the U.S. stock market; (b) the IMFs of the China stock market.
Entropy 25 00071 g004
Figure 5. The total signals of the groups. (a) U.S. stock market; (b) China stock market.
Figure 5. The total signals of the groups. (a) U.S. stock market; (b) China stock market.
Entropy 25 00071 g005
Figure 6. The ensemble results and prediction errors. (a) U.S. stock market. (b) China stock market.
Figure 6. The ensemble results and prediction errors. (a) U.S. stock market. (b) China stock market.
Entropy 25 00071 g006
Table 1. ApEn, SampEn, and standard deviations of the IMFs.
Table 1. ApEn, SampEn, and standard deviations of the IMFs.
IMFSP500CSI300
ApEnSampEnstdApEnSampEnstd
IMF11.50971.08830.21281.34051.01859.5485
IMF21.23930.95260.22240.92560.62939.9564
IMF30.68190.55770.30870.66450.557213.3986
IMF40.59090.47340.48410.51050.377725.4280
IMF50.44680.32710.74960.36110.266832.2427
IMF60.20710.15461.02340.11320.086356.3356
IMF70.07580.06461.68710.04190.037091.7791
IMF80.03160.02932.54160.01200.0069206.8343
IMF90.00890.00765.76620.00360.0032358.5965
IMF100.00290.00285.60950.00100.0009207.4260
IMF110.00010.000232.66160.00010.0003175.7148
Table 2. Hyperparameter settings of the forecasting models.
Table 2. Hyperparameter settings of the forecasting models.
ParametersValues
Types of layersDense, GRU, Attention
Time steps40
Number of hidden layers1, 2, 3, 4, 5
Number of units2, 4, 8, 16, 32
Activation functionReLU
OptimizerAdam
Loss functionMSE
RegularizationEarly stopping
Epochs500
Table 3. Performance comparison of the groups.
Table 3. Performance comparison of the groups.
GroupMethodSP500CSI300
MAERMSEMAERMSE
Control groupFNN0.61101.133610.491415.6528
LSTM0.60691.122410.392315.4842
GRU0.60571.122010.388915.4876
GRU-HA0.60601.120910.386615.4379
High groupFNN0.07740.26981.18301.9700
LSTM0.03120.05510.68291.0783
GRU0.03080.05330.47440.8126
GRU-HA0.02520.04120.44160.7841
Med groupFNN0.12710.35663.80236.4337
LSTM0.11050.33242.46644.0836
GRU0.06710.25912.23443.4608
GRU-HA0.01210.11021.44392.3653
Low groupFNN0.49720.66387.38249.2645
Trend groupFNN0.10310.10610.10350.1192
Table 4. Performance comparison of ensemble results.
Table 4. Performance comparison of ensemble results.
MethodsSP500CSI300
MAERMSEMAERMSE
Undecomposed model FNN1.36961.958041.297951.2116
LSTM2.17112.875673.999383.5591
GRU1.98142.500930.335842.6427
GRU-HA1.08641.693721.134730.0292
Decomposition-based modelCEEMD-GRU-HA 0.93731.343813.037917.9178
CEGH0.87611.252812.184916.8103
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Liu, X.; Zhang, Y.; Li, S. CEGH: A Hybrid Model Using CEEMD, Entropy, GRU, and History Attention for Intraday Stock Market Forecasting. Entropy 2023, 25, 71. https://doi.org/10.3390/e25010071

AMA Style

Liu Y, Liu X, Zhang Y, Li S. CEGH: A Hybrid Model Using CEEMD, Entropy, GRU, and History Attention for Intraday Stock Market Forecasting. Entropy. 2023; 25(1):71. https://doi.org/10.3390/e25010071

Chicago/Turabian Style

Liu, Yijiao, Xinghua Liu, Yuxin Zhang, and Shuping Li. 2023. "CEGH: A Hybrid Model Using CEEMD, Entropy, GRU, and History Attention for Intraday Stock Market Forecasting" Entropy 25, no. 1: 71. https://doi.org/10.3390/e25010071

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop