1. Introduction
Over time, studies to predict prices in the stock market have actively been conducted. Various studies have been carried out to predict stock prices using the auto-regressive integrated moving average (ARIMA) model, which is a time series data prediction method [
1,
2], as well as other methods. Pai and Lin [
3] applied a hybrid methodology combining the ARIMA model and support vector machine (SVM) for stock price prediction. Wang and Leu [
4] proposed a model that predicts the price trend of Taiwan’s stock market by combining the ARIMA model and a neural network. Adebiyi et al. [
5] compared the performance of the ARIMA model and the artificial neural network (ANN) model using the stock data of the New York Stock Exchange (NYSE). Oh and Kim [
6] proposed a piecewise nonlinear model using ANN to predict the stock market, which uses backpropagation neural networks to find points of continuous change in time series data. Similarly, there has been much research on applying neural network methodology to stock price prediction. Yoon and Swales [
7] proved that neural networks are effective for solving complex problems such as stock price prediction, and Kohara et al. [
8] showed that the neural network and prior knowledge of prediction were effective in predicting stock prices. Tsai and Wang [
9] showed that when a stock price prediction model was created by combining ANN and decision tree, it showed higher accuracy than a single model. Hadavandi et al. [
10] proposed a stock price prediction expert system by combining ANN and a genetic fuzzy system. Chen et al. [
11] proved that a fuzzy time series model based on the Fibonacci sequence is effective in predicting the Taiwan semiconductor manufacturing company (TSMC) stock price data and Taiwan capitalization weighted stock index (TAIEX) data. Cheng et al. [
12] proposed a hybrid model for stock price prediction based on genetic algorithms and rough sets theory.
In addition to predicting stock prices of various markets, investors are also interested in stock trading volume. Trading volume is an important indicator for investors to buy or sell certain stocks. A number of studies confirmed that the trading volume has a positive correlation with the volatility of the price [
13,
14], and various studies predicted the price volatility based on this positive correlation by utilizing the trading volume [
15,
16]. Tsang and Chong [
17] presented a strategy to obtain investment returns using volume-based on-balance volume (OBM) indicators, and Nedunchezian [
18] conducted a study to predict the price movement of multi commodity exchange (MCX) energy using OBV indicators. However, research that predicts the actual intraday trading volume, rather than just using the trading volume as an indicator for prediction, has not been actively conducted [
19]. Without sophisticated research on the intraday trading volume, large institutions are still taking the strategy to consume liquidity using the simple average of the trading volume in the past period or to consume liquidity in the market at the beginning or end of the market when the trading volume is high.
It is widely known that the extensive trading volume by large institutions has a significant impact on the market liquidity and the volume weighted average price (VWAP) model is used by large institutions. Based on a distribution of intraday trading volume in a stock market, the VWAP model allocates trades in a way that reduces the impact of large institutions’ trading on the stock market liquidity. Thus, it is critical to predict the trading volume accurately for VWAP trading. In this study, we propose optimal models to predict stock trading volume using dynamic time warping (DTW) and a genetic algorithm (GA). DTW and GA have been widely used for developing various investment strategies. Previous studies proposed a pattern matching trading system using DTW to predict exchange rates and stock prices [
20,
21,
22,
23]. Additionally, GA has been used for predicting stock indices, real estate auction prices and appraisals, and was also used to optimize IPO investment strategies or trading strategies that hedge options [
24,
25,
26,
27,
28,
29].
In the empirical study, we predicted the trading volume using DTW and GA using the trading volume data of the Korean stock price index 200 (KOSPI 200) futures index from December 2006 to September 2020. We used four methods to predict the trading volume and compare their performance. Those methods include calculating a simple moving average of trading volume over the past 20 days, DTW trading volume, DTW trading volume based on grouping, and GA trading volume. We employed a GA in three ways to forecast trading volume: a fixed 20-day GA weighted average method, an optimal GA weighted dynamic period method, and a simple average of the optimal GA period method. Our empirical results show that predicting the trading volume using a simple average of the optimal GA period method achieves the best performance.
This paper is organized as follows:
Section 2 presents the literature review,
Section 3 presents the data and methodology used in this paper.
Section 4 shows the empirical study, and
Section 5 presents the conclusions of our study.
4. Empirical Study
For the empirical study on the various methods presented in
Section 3, the test period was set from January 2017 to September 2020. First, we calculated a simple average trading volume over a commonly used 20-day period introduced in
Section 3.2.1. The average of the minute trading volume from day
to day
was calculated to test the trading volume on day
t (
). Then, we compared the predicted trading volume using a simple average and the actual trading volume on day
t (
) by the MAPE.
Figure 5 shows the daily MAPE in Equation (9) between the predicted volume using a 20-days simple average and the actual volume. As shown in
Table 1, the average value of the MAPE within the test period of the predicted trading volume is 0.3845 and the variance of the MAPE is 0.01692.
Next, for the DTW empirical study proposed in
Section 3.2.2, the reference data period was set from December 2006 to the day before the test day. The DTW values were calculated for all reference data for day
, and then DTW values were sorted in ascending order. Among the DTW values sorted in ascending order, the trading volume of the next day after the day corresponding to the top 1 ranking (DTW Top 1) was used as the forecast of the trading volume on day
day (
). In a similar manner, the trading volume of the next day after the day corresponding to the top 1% ranking (DTW 1%), the top 5% ranking (DTW 5%), and the top 20 ranking (DTW Top 20) was used as a forecast of the trading volume on day
. In addition, we calculated the MAPE between the predicted trading volume (
) and the actual trading volume on day
(
).
Figure 6 shows the daily MAPE between the predicted volume using DTW and the actual volume.
As displayed in
Figure 6, DTW Top 1 generally shows a relatively higher MAPE value than other DTW methods. As shown in
Table 2, the average MAPE for the test period was 0.5166 for DTW Top 1, 0.4226 for DTW 1%, 0.4414 for DTW 5%, and 0.4226 for DTW Top 20. The variance of MAPE for the test period was 0.03392 for DTW Top 1, 0.02768 for DTW 1%, 0.03252 for DTW 5%, and 0.02740 for DTW Top 20. When only DTW is used, it is best to use the volume of the day following the Top 20 as the predicted volume. However, the DTW Top 20 method underperformed the 20-day simple average trading volume method.
Next, we conducted an experiment that adds the concept of grouping to the DTW method as proposed in
Section 3.2.3. The DTW based on the grouping method is an experiment in which only the grouping filter rule is added to the DTW experiment. Group A was set as shown in
Figure 3. First, in order to conduct a DTW experiment based on group A, group A reference data was formed by collecting dates corresponding to the same group as day
t from the reference data. In addition, based on the group A reference data, the DTW experiment was carried out in the same way as before.
Figure 7 shows the daily MAPE between the predicted volume using DTW based on group A and the actual volume. As illustrated in
Figure 7 and
Table 3, the DTW experiment based on group A also showed the worst performance in the case of DTW Top 1. As shown in
Table 3, the average MAPE for the test period was 0.5193 for DTW (group A) Top 1, 0.4501 for DTW (group A) 1%, 0.4447 for DTW (group A) 5%, and 0.4470 for DTW (group A) Top 20. The variance of MAPE for the test period was 0.03896 for DTW (group A) Top 1, 0.03078 for DTW (group A) 1%, 0.03469 for DTW (group A) 5%, and 0.03491 for DTW (group A) Top 20. DTW based on group A showed overall lower performance than the simple DTW result.
In the case of DTW based on group B, the experimental process was the same as DTW based on group A, but group B was set as shown in
Figure 4.
Figure 8 displays the daily MAPE between the predicted volume using DTW based on group B and the actual volume. As shown in
Table 4, the results of DTW (group B) Top 1 was also the worst. The average MAPE for the test period was 0.5154 for DTW (group B) Top 1, 0.4493 for DTW (group B) 1%, 0.4389 for DTW (group B) 5%, and 0.4425 for DTW (group B) Top 20. The variance of MAPE for the test period was 0.03461 for DTW (group B) Top 1, 0.03054 for DTW (group B) 1%, 0.03239 for DTW (group B) 5%, and 0.03398 for DTW (group B) Top 20. Subsequently, all of the DTW-related experiments exhibited lower performance than the 20-day simple average trading volume method.
Finally, an experiment was conducted using the genetic algorithm as proposed in
Section 3.2.4. First, we optimized the GA weights, where a weighted average of trading volume from day
to day
was calculated to predict trading volume on day
t (
). Then, the GA weight optimized volume on day
was used as the predicted volume on day
.
Figure 9 shows the daily MAPE between the predicted volume using GA and the actual volume. As shown in
Table 5, the average MAPE of the fixed 20-day GA weighted average volume for the test period was 0.3892, which was higher than those of the 20-day simple average volume, but it showed better results than those from the DTW experiments.
Given the better performance of the GA weighted average volume, we tried to optimize the weight and period simultaneously. After applying the GA weight to period
(
) based on day
, the period
and GA weight generating the lowest MAPE were selected, and the period
and GA weight were used to predict the volume on day
1. The count of windows per optimal GA period
is shown in
Table 6. The average MAPE of the optimal GA weighted dynamic period volume for the test period was 0.3938, which was also higher than those of the 20-day simple average volume, but it showed better results than those from the DTW experiments.
Therefore, we used only the optimal GA period
volume on day
t, as summarized in
Table 6. The optimal GA period
volume on day
t was used to predict the trading volume on day
, and in this case, a simple average was used instead of the GA weighted average. As a result, the simple average of the optimal GA period
volume showed better performance than the 20-day simple average and the fixed 20-day GA weighted volume. The MAPE of the simple average of the GA dynamic period volume for the test period was 0.3815.
Last, we compared the average and the variance of MAPE in the test period for all experiments. As shown in
Figure 10, the simple average of the optimal GA period volume for the test period achieved the best performance. In addition, the GA method was found to outperform the DTW method for predicting trading volume.
Figure 11 illustrates that the model of the simple average of the optimal GA period volume showed the lowest variance over the test period. However, it was not found to be significantly lower than that of the model of 20-days simple average.
For more formal evaluation, we performed the paired
t-test for MAPE to compare the predictive power of the GA models with the baseline model of 20-days simple average. Test results in
Table 7 indicate that MAPE of the simple average of the optimal GA period model is significantly lower than that of the baseline model.
Additionally, we performed the F-test to compare the variation in performance of the GA model and the baseline model. As shown in
Table 8, variance of the simple average of the optimal GA period model was not found to be significantly lower than that of the baseline model.
5. Discussion and Concluding Remarks
In this paper, we presented four methods for predicting the trading volume in the stock market using DTW and GA: the simple moving average method, the DTW method, the DTW based on the grouping method, and the GA method. Our empirical study shows that predicting the trading volume using GA achieves the best performance.
The DTW method is known to be effective in finding trading volume similar to current trading volume. However, it has been found to have limitations in predicting future trading volume using historical data of trading volume. According to the results of the three DTW methods, classifying markets based on Group A and B did not improve predictive power. Nevertheless, when comparing DTW based on market grouping, the results of DTW with Group B combining the VKOSPI group and KOSPI 200 futures index group were found to be more effective than DTW with Group A combining the VKOSPI group and KOSPI 200 futures volume group. This result suggests that market grouping based on index value is more appropriate than that based on volume when classifying the market to predict trading volume.
We also suggest three ways to use GA to forecast trading volume: a fixed 20-day GA weighted average method, an optimal GA weighted dynamic period method, and a simple average of the optimal GA period method. The simple average of the optimal GA period method performed better than the conventional 20-day simple average method. It was expected that the GA weights would play a key role in predicting trading volume with a particular pattern, but our empirical results did not show improvement of predictive power by the GA weight method. It was found that the simple average of the GA optimal period outperformed the GA weight methods. This result implies that when predicting the trading volume, the number of periods used has a stronger impact on the predictive power than the weight of the days for prediction. Compared with the DTW methods, the GA method generates optimal weights and periods using the latest market data, which results in higher predictive power than the DTW methods.
In the literature, research that predicts the actual intraday trading volume in a stock market, rather than just using the trading volume as an indicator for prediction of stock price or market volatility, has not been actively conducted. It is widely known that large institutions are taking a strategy to consume liquidity using the simple average of the trading volume in the past period or to consume liquidity in the market at the beginning or end of the market when the trading volume is high. Thus, in this study, we propose to use sophisticated tools for predicting trading volume in a stock market and our empirical results show slightly better predictive power of the simple average of the optimal GA period method than the baseline method for predicting trading volume in a stock market. Rather than using the 20-day simple average volume for prediction, large institutions are expected to appropriately consume liquidity and bring market vitality by applying a proposed model that optimizes the number of periods by GA. As a result of this study, we expect that large institutions will perform more appropriate VWAP trading in a sustainable manner, leading the stock market to be revitalized by enhanced liquidity. In this sense, the model proposed in this paper contributes to creating efficient stock markets and helps to achieve sustainable stock markets.
This study has potential limitations. As shown in the empirical results, the predictive power of the GA models was not improved significantly compared to the baseline method. In addition, the GA models proposed in this paper were based on the trading volume data of the KOSPI 200 futures index in Korean markets. Therefore, the empirical results are limited to Korean market data. Based on the idea of our GA models, future research can be enriched by developing a trading volume prediction model that combines various artificial intelligence methodologies and GA. In addition, various ranges of parameters can be applied to the GA models. In this study, a prediction was made by setting a day as one window. However, in the future, a more precise prediction study could be conducted by setting a minute as one window.