Traffic Flow Prediction Method Based on Seasonal Characteristics and SARIMA-NAR Model

Wang, You; Jia, Ruxue; Dai, Fang; Ye, Yunxia

doi:10.3390/app12042190

Open AccessArticle

Traffic Flow Prediction Method Based on Seasonal Characteristics and SARIMA-NAR Model

School of Civil Engineering, Central South University, Changsha 410075, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(4), 2190; https://doi.org/10.3390/app12042190

Submission received: 8 January 2022 / Revised: 17 February 2022 / Accepted: 18 February 2022 / Published: 19 February 2022

(This article belongs to the Special Issue Smart Computing and Big Data Analysis: Latest Advances and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Traffic flow is used as an essential indicator to measure the performance of the road network and a pivotal basis for road classification. However, the combined prediction model of traffic flow based on seasonal characteristics has been given little attention at present. Because the seasonal autoregressive integrated moving average model (SARIMA) has superior linear fitting characteristics, it is often used to process seasonal time series. In contrast, the non-autoregressive dynamic neural network (NAR) has a vital memory function and nonlinear interpretation capabilities. They are suitable for constructing combined forecasting models. The traffic flow time series of a highway in southwest China is taken as the research object in this paper. Combining the SARIMA (0,1,2) (0,1,2)₁₂ model and the NAR model with 15 hidden layer neurons and fourth-order delay, two combined models are constructed: the linear and nonlinear component combination method is realized by the SARIMA-NAR combination model 1, and the MSE weight combination method is used by the SARIMA-NAR combination model 2. We calculated that the prediction accuracy of SARIMA-NAR combined model 1 is as high as 0.92, and the prediction accuracy of SARIMA-NAR combined model 2 is 0.90. In addition, the traffic flow forecast under the influence of the epidemic is also discussed. Through a comprehensive comparison of multiple indicators, the results show that the SARIMA-NAR combined model 1 has better road traffic flow fitting and prediction effects and is suitable for the greater volatility of traffic flow during the epidemic. This model improves the effectiveness and reliability of traffic flow forecasting, and the forecasting process is more convenient and efficient.

Keywords:

intelligent transportation; traffic volume prediction; SARIMA-NAR model; time series; dynamic neural network; the seasonal difference; transportation

1. Introduction

With the rapid development of urban roads and the popularization of vehicles, traffic jams have become severe in recent years. As an important indicator measuring the service status of the road network and an essential basis for road classification, traffic flow not only reflects the economic level and urbanization degree, but also provides a reference for road planning and design, traffic control, and policy adjustments.

In addition, due to the impact of the COVID-19 epidemic, controlling of urban traffic has been upgraded, and people’s travel has been restricted. When a city enters epidemic prevention and control from a state of congestion, it needs a timely updated traffic control strategy to reduce the impact on society and the economy [1,2]. Predicting and monitoring the traffic flow during the epidemic can effectively improve the control level of the urban transportation network and examine the impact of the traffic situation on the spread of the epidemic. However, the existing traffic flow prediction methods rarely explore the model’s prediction performance during the epidemic.

At present, traffic prediction based on time series mainly includes Markov model [3,4], genetic algorithm [5,6], neural network [7,8,9], wavelet decomposition and reconstruction [10], grey system theory [11,12], the autoregressive integrated moving average model (ARIMA) [13,14], and support vector regression [15,16]. Among these methods, ARIMA is widely used because it performs best in linear fitting and can make full use of the historical data of the series [17]. In 1979, Ahmed et al. used the ARIMA model to predict traffic flow and occupancy of the highway [18]. The neural network has a strong ability to deal with the nonlinear problem. In 1999, Yasdi proposed the method of using the artificial neural network to predict road conditions [19]. However, a single prediction model cannot accurately capture the complex information of traffic flow time series.

Based on this, many scholars have carried out research on combined prediction models. According to the combination principle, the main combination methods can be divided into the following types: method of combining linear and nonlinear component, method of assigning weight combinations, method of stacking combination after decomposition, and the space–time combination method, etc. The specific methods are shown in Table 1. It is known that the combination model can more effectively extract the information in the time series of traffic flow. However, some scholars have tried to consider seasonal effects in the existing traffic flow forecasting research work, but no one has adopted the combined model of the seasonal autoregressive integrated moving average model (SARIMA) and the non-autoregressive dynamic neural network (NAR). SARIMA can not only deal with non-stationary data, but also accurately capture the characteristics of seasonal data with trends and cycles to make up for the lack of research on the change laws and characteristics of meso-level traffic flow [20,21]. At the same time, traffic flow data have a significant random fluctuation due to sudden and major events. NAR can consider and effectively extract the nonlinear fluctuation components existing in the sequence. Static neural networks, such as the BP neural network, which are common in traffic flow prediction, have no memory function and rely only on the current input, so their prediction ability is far lower than that of dynamic neural networks [22].

This paper takes a highway in southwest China as the research object. Firstly, the traffic flow time series is decomposed seasonally, and the SARIMA basic model with higher fitting goodness is screened out through seasonal and non-seasonal differences; then, two different combined prediction models are proposed based on the SARIMA model and NAR model. Finally, the average absolute percentage error (MAPE), the average absolute error (MAE), the root mean square error (RMSE), and other indicators of these two models are comprehensively compared to that of the traditional prediction model; at the same time, the performance of the combined prediction model on traffic flow during the epidemic period is further investigated.

This paper is mainly divided into the following parts: in Section 2, the construction methods of two different combined models of SARIMA, NAR, and SARIMA-NAR are described in detail; in Section 3, the prediction results of traffic flow are obtained through specific case analysis; in Section 4, the prediction results of different models are compared, and the adaptability of the combined model under the influence of the epidemic is discussed, in Section 5, some conclusions are drawn.

2. Methods

2.1. SARIMA Model

The SARIMA model is an ARIMA model considering season influence, which can predict the data with seasonal characteristics [20,35].

The ARIMA model was proposed by Box and Jenkins in the 1970s [36], and it is a single-product autoregressive moving average process, in which a random process containing d unit roots is firstly assumed, and it can be transformed into a stable autoregressive moving average process after d times difference [37]. In other words, the time series are all stationary series in the autoregressive AR model, the moving average MA model, and the autoregressive moving average ARMA model. However, time series behave as d-order unit root processes in many cases. Therefore, we first need to perform differential processing on the data to convert them into a static time series before establishing the model. The ARIMA (p, d, q) model is as follows:

(1 - \sum_{i = 1}^{p} α_{i} L^{i}) {(1 - L)}^{d} y_{t} = α_{0} + (1 + \sum_{i = 1}^{q} β_{i} L^{i}) ε_{t}

(1)

where p is the non-seasonal autoregressive order, d is the non-seasonal difference order, q is the non-seasonal moving average order,

α_{i}

is the autoregressive term coefficient,

β_{i}

is the moving average term coefficient, L is the lag operator,

ε_{t}

is the white noise series,

α_{0}

is the constant term coefficient, i is the coefficient number, and

y_{t}

is the time series value at time t.

Considering the seasonal factors, the SARIMA (p, d, q) (P, D, Q)_m model is as follows:

(1 - \sum_{i = 1}^{p} ϕ_{i} L^{i}) (1 - \sum_{i = 1}^{P} Φ_{i} L^{m i}) {(1 - L)}^{d} {(1 - L^{m})}^{D} y_{t} = α_{0} + (1 + \sum_{i = 1}^{q} θ_{i} L^{i}) (1 + \sum_{i = 1}^{Q} Θ_{i} L^{m i}) ε_{t}

(2)

where P is the seasonal autoregressive order, D is the seasonal difference order, Q is the seasonal moving average order, m is the number of periods (monthly data m = 12, quarterly data m = 4),

ϕ_{i}

is the non-seasonal autoregressive coefficient,

Φ_{i}

is the seasonal autoregressive coefficient,

θ_{i}

is the non-seasonal moving average coefficient, and

Θ_{i}

is the seasonal moving average coefficient.

2.2. NAR Dynamic Neural Network

The NAR model is a kind of dynamic neural network, which specializes in analyzing and processing time series. Compared with the static neural networks, such as BP and RBF neural networks, the current output content in the dynamic neural network can be assigned as the input of the neural network layer during the next period, thus providing a basis and reference for the next period [38], realizing a constant update of parameters and more reliable prediction of the time series.

At the same time, different from the NARX model, the NAR model only has the output, so it only needs to analyze the time series of a certain indicator [39]. In the NAR model, there are fewer requirements for data information and fewer restriction factors, so the prediction can be completed with only one set of data; this makes the prediction process simpler and more convenient, even when the reference data are insufficient [40].

The mathematical equation of NAR dynamic neural network can be constructed as follows:

y (t) = f [y (t - 1), y (t - 2), \dots, y (t - d)]

(3)

where d is the delay order, and y(t) is the current predicted value.

The specific structure of the model is shown in Figure 1, where “4” is the delay order, “15” is the number of hidden layer neurons, W is the connection weight, and b is the threshold.

The output

H_{j}

of each neuron in the NAR neural network is calculated according to Equation (4).

H_{j} = f (\sum_{i = 1}^{n} w_{i j} x_{i} + a_{j}), j = 1, 2, \dots, l

(4)

where i is the number of input data, l is the number of neurons in the hidden layer, f is the activation function of the hidden layer,

x_{i}

is the i-th input data,

w_{i j}

is the connection weight between the i-th output delay signal and the j-th neuron in the hidden layer, and

a_{j}

is the threshold of the j-th hidden layer neuron.

The output layer performs linear calculation according to the output

H_{j}

of the hidden layer and calculates the output O of the neural network.

O = f (\sum_{j = 1}^{l} H_{j} w_{j} + b)

(5)

where

w_{j}

is the connection weight between the j-th neuron in the hidden layer and the output layer neuron, and b is the output layer neuron threshold.

2.3. SARIMA-NAR Model

Considering the advantages of the SARIMA model and NAR dynamic neural network, two combination models were established for traffic flow prediction. Their prediction process is shown in Figure 2.

Method of combining linear and nonlinear components

In general, the time series of traffic flow can be simplified to have both linear and nonlinear components simultaneously. The ARIMA model is widely used to solve linear problems; on this basis, the SARIMA model can also extract the seasonal fluctuation part of the time series, and the NAR dynamic neural network has an excellent ability to predict the nonlinear part [41,42,43]. Thus, a method of combining linear and nonlinear components is constructed, that is, the combination of SARIMA model and NAR dynamic neural network; its mathematical expression is as follows:

{\hat{y}}_{t} = {\hat{L}}_{t} + {\hat{N}}_{t}

(6)

where

{\hat{y}}_{t}

,

{\hat{L}}_{t}

, and

{\hat{N}}_{t}

are the prediction results of the SARIMA–NAR combination model, SARIMA model, and NAR model, respectively.

{\hat{L}}_{t}

represents the linear component and

{\hat{N}}_{t}

represents the nonlinear component.

As shown in Figure 2, the first step is to use the SARIMA model to perform linear prediction on the traffic flow time series

{\hat{L}}_{t}

; the second step is to calculate the residual difference

N_{t}

according to Formula (7) and extract the residual series; the third step is to use the extracted residual series to construct the NAR model and perform nonlinear prediction to obtain the nonlinear result

{\hat{N}}_{t}

; the fourth step is to superimpose these two results to obtain the final traffic flow prediction result

{\hat{y}}_{t}

[21,44].

N_{t} = y_{t} - {\hat{L}}_{t}

(7)

The method of MSE weight combination

As shown in Figure 2, while constructing this combination method, a higher weight is assigned to the prediction model with the better fitting performance, and a smaller weight is assigned to the prediction model with poor fitting performance [45]. The mathematical expression of the combination model is as follows:

\hat{y} (t) = ω_{1} {\hat{y}}_{1} (t) + ω_{2} {\hat{y}}_{2} (t) + \dots + ω_{n} {\hat{y}}_{n} (t) = \sum_{i = 1}^{k} ω_{i} {\hat{y}}_{i} (t)

(8)

where

ω_{i}

is the weight coefficient of model i, and

{\hat{y}}_{i} (t)

is the predicted value of model i at time t.

In this paper, k = 2, and the combination model

\hat{y} (t)

is shown in Formula (9):

\hat{y} (t) = ω_{1} {\hat{y}}_{1} (t) + ω_{2} {\hat{y}}_{2} (t)

(9)

In the combination model, the choice of weight is very important. In this paper, the fitting mean square error (MSE) of the historical traffic flow series was selected to evaluate the fitting result, to determine the weight. The calculation formula of MSE is Formula (10):

e_{M S E} = \frac{1}{n} \sum_{t = 1}^{n} {(y_{t} - {\hat{y}}_{t})}^{2}

(10)

Denote the MSE of the historical traffic flow series of model i as

s_{i}

, then the weight coefficient of model i is as follows:

ω_{i} = \frac{1 / s_{i}}{\sum_{i = 1}^{k} 1 / s_{i}}

(11)

3. Case Analysis

In order to verify the validity and reliability of two established combination models, the traffic flow time series of a highway in southwest China was taken as an example. The traffic flow data from January 2014 to December 2018 were chosen as the historical fitting data; the data from January 2019 to December 2019 were used as the forecast comparison sample.

3.1. The Prediction of the SARIMA Model

3.1.1. The Stationarization of Series

The time series of traffic flow from January 2014 to December 2018 and its breakdown are shown in Figure 3, and it can be seen that there is a noticeable trend and seasonality. Since the traffic flow is monthly collected, the number of periods is m = 12. At the same time, difference processing needs to be performed to obtain a steady time series for model construction, during which the original series, one-order non-seasonal difference series, one-order seasonal difference series, one-order non-seasonal difference, and one-order seasonal difference series were tested for unit root (ADF). The test results are shown in Table 2.

It can be seen that when one-order non-seasonal difference and one-order seasonal difference (d = 1, D = 1) and one-order seasonal difference (d = 0, D = 1) were respectively performed on the series, the results of the ADF test are relatively similar; thus, the SARIMA (p, 1, q) (P, 1, Q)₁₂ and SARIMA (p, 0, q) (P, 1, Q)₁₂ models can be constructed.

3.1.2. The Determination of Model Parameters

The autocorrelation (ACF) and partial autocorrelation (PACF) are shown in Figure 4. It can be seen from Figure 4b that, when d = 0 and D = 1, the autocorrelation coefficient exceeds the confidence range for four periods, and the coefficient of the sixth period is still larger than the upper confidence limit, showing a tailing phenomenon. It can also be found that, when d = 0 and D = 1, the sequence is still in a non-stationary state; when d = 1 and D = 1, only one coefficient in the autocorrelation and non-autocorrelation plots exceeds the confidence limit, showing tail censoring, suggesting that the sequence is stationary when d = 1 and D = 1. Referring to the empirical method, the value of P and Q is generally less than or equal to 2; therefore, the multiple SARIMA models shown in Table 3 were selected to calculate the Bayesian Information Metric (BIC), and the smaller the BIC value, the better the fitting effect of the model. It can be seen that the BIC value of the SARIMA (0,1,2) (0,1,2)₁₂ model was 22.119, indicating the highest fitting accuracy and that this model is the optimal model.

3.2. The Prediction of the NAR Model

3.2.1. Series Normalization

To eliminate the difference in the order of magnitude of the data, the data need to be normalized in advance, so the prediction of traffic flow can be more scientific and reliable [46]. In this paper, the Max-Min normalization method suitable for the neural network was chosen, which is also called dispersion standardization. The main process is to carry out the linear transformation to the original series and adjust the series data to the range of [0, 1]. The normalized conversion formula is as follows:

x^{*} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(12)

where

x^{*}

is the normalized series value, x is the original series, and

x_{\max}

and

x_{\min}

are the maximum and minimum values of the original series, respectively.

We normalized the original traffic flow time series and the residual sequence of the SARIMA model. Then, we took the normalized result of the residual sequence of the SARIMA model as an example, as shown in Figure 5.

3.2.2. The Establishment of the NAR Model

Because the traffic flow time series is a small scale of the data sample with only 60 cases, the proportions of the training set, verification set, and testing set were 70%, 15%, and 15%, respectively. Besides, the Levenberg–Marquardt training algorithm was used to construct the NAR dynamic neural network. By adjusting the number of neurons in the hidden layer and the delay order, the model’s mean square error (MSE) and stability were compared to determine the optimal model parameter; the number of hidden layer neurons was determined using the empirical Formula (13).

n_{1} = \sqrt{n + m} + a

(13)

where

n_{1}

is the number of neurons in the hidden layer, n is the number of input samples, m is the number of output samples, and a is a constant (usually a = 1~10).

According to the empirical formula, the number of hidden layer neurons for the original traffic flow time series and the residual series of the SARIMA model is 10~19 and 9~18, respectively. After multiple iterations, it was decided that, for the original traffic flow time series, the optimal number of hidden layer neurons is 15, and the delay order is 3; for the residual difference series of the SARIMA model, the optimal number of hidden layer neurons is 15, and the delay order is 4. The autocorrelation coefficient graphs and residual graphs of these two series are shown in Figure 6 and Figure 7.

It can be clearly seen that the autocorrelation coefficients of these two series both meet the requirements of the confidence interval. For the residual series, the residual difference of the SARIMA model was much smaller than that of the original traffic flow time series.

3.3. Analysis of Forecast Results

The prediction curves of the SARIMA–NAR combination model 1 constructed according to the linear and nonlinear component combination method and the SARIMA–NAR combination model 2 constructed according to the MSE weight combination method are shown in Figure 8. It can be seen that, compared to the SARIMA–NAR combined model 2, the resulting curve of SARIMA–NAR combined model 1 is closer to the actual value curve, and the effect of fitting and prediction is better.

In addition, in Figure 9, the absolute error (AE) is used to evaluate the basic errors of the SARIMA model, the NAR model, and the two combined models mentioned in the article in monthly units. The absolute error of the NAR model was the largest, and the SARIMA–NAR combined model 1 was the smallest; the absolute error of the two combined models was much smaller than that of a single model. The calculation formula of absolute error is shown in Formula (14).

e_{A E} = | y_{t} - {\hat{y}}_{t} |

(14)

4. Discussion

4.1. Comparison of Multiple Models

In order to make the comparison among the constructed models more convenient, the average absolute percentage error (MAPE), the average absolute error (MAE), and the root mean square error (RMSE) were introduced to evaluate the prediction error [47,48]. Besides, the accuracy ratio (A) and correlation coefficient (R) were also introduced to measure the correctness and reliability of the model. The specific calculation formulas of indicators are as follows:

e_{M A P E} = \frac{1}{n} \sum_{t = 1}^{n} | \frac{y_{t} - {\hat{y}}_{t}}{y_{t}} | \times 100 %

(15)

e_{M A E} = \frac{1}{n} \sum_{t = 1}^{n} | y_{t} - {\hat{y}}_{t} |

(16)

e_{R M S E} = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} {(y_{t} - {\hat{y}}_{t})}^{2}}

(17)

A = 1 - \frac{\sqrt{\sum_{t = 1}^{n} {(y_{t} - {\hat{y}}_{t})}^{2}}}{\sqrt{\sum_{t = 1}^{n} {(y_{t})}^{2}}}

(18)

R = \frac{C o v (y_{t}, {\hat{y}}_{t})}{\sqrt{V a r [y_{t}] \cdot V a r [{\hat{y}}_{t}]}}

(19)

The calculated error indicators are shown in Table 4. Figure 10 and Figure 11 show that the SARIMA–NAR combination model 1 had the highest accuracy. The MAPE of the fitted and predicted series were as low as 5.77% and 5.89%, respectively, and the accuracy values were as high as 0.93 and 0.92, respectively. In addition, most of the indicators showed that the fitting effect was better than the prediction effect.

4.2. Performance of Prediction Model under the Influence of the Epidemic

In order to further examine the SARIMA–NAR combination model’s adaptability to the great fluctuation of traffic flow during special periods and verify its performance, the traffic flow in 2020 under the influence of the epidemic was predicted in this paper. The prediction process is the same as above, and the prediction result is shown in Figure 12, and the comparison of error of each model is shown in Table 5.

The average absolute error of SARIMA–NAR combination model 1 prediction was 8.07%, and its fitted MAE was 8.28%, indicating a good prediction effect. For the sudden decrease in traffic flow during the epidemic, the SARIMA–NAR combination model can get a relatively accurate prediction result. In particular, the SARIMA-NAR model 1 performed better and could quickly adapt to intense changes in traffic flow. The fitting and prediction accuracy of the two combined models were both greater than 0.85, indicating that the combined model was reliable and effective.

In addition, in the process of constructing the SARIMA model, attention should be paid to the parameter selection of the non-seasonal difference coefficient d and the seasonal difference coefficient D. When the ADF test results of the sequences after different differences are similar, it can be verified whether the model can effectively stabilize the traffic flow sequence from the aspects of autocorrelation diagram, sequence diagram, and BIC value statistics. Subsequent calculations cannot be performed if the series is non-stationary after differencing.

In future research, automated methods for image recognition and data integration decision making for SARIMA parameter selection can be explored. Further, the spatiotemporal correlation of traffic flow can also be considered in the future to realize the fusion prediction of spatial mode and time series. Coupling applications between different forecasting dimensions is a general trend towards more in-depth research on complex problems.

5. Conclusions

In this paper, the time series of traffic flow of an expressway in southwest China was taken as the research object to establish two combination models using the linear and nonlinear component combination method and the MSE weight combination method. Through comparative analysis, the results show the following:

The SARIMA–NAR combination model 1 is superior to the SARIMA, NAR, and commonly used non-combination models, such as grey prediction and exponential smoothing models, have fewer errors than the SARIMA–NAR combination model 2. In the SARIMA–NAR combination model 2 using the MSE weight combination method, the predicted MAPE is smaller than other common non-combination models. However, it has a mediocre performance in evaluating other indicators.
The time series of traffic flow has obvious seasonal characteristics. The SARIMA–NAR combination model has low requirements for data information and fewer restriction. In addition, the prediction results are highly effective and accurate, and the prediction process is more convenient and efficient.
The SARIMA–NAR combination model can adapt to the fluctuations in traffic flow under the influence of the epidemic, and the model prediction has a high accuracy rate, especially the SARIMA–NAR combination model 1.
The SARIMA–NAR combination model 1 and combination model 2 have different prediction characteristics. Further, based on the improved weight combination method, the advantages of these two combination models can be combined to realize the fusion prediction of the linear and nonlinear component combination method and the MSE weight combination method.

Author Contributions

Conceptualization, Y.W.; data curation, R.J.; formal analysis, R.J. and F.D.; Funding acquisition, Y.W.; Investigation, F.D. and Y.Y.; Methodology, R.J.; Writing—original draft, R.J.; Writing—review & editing, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by the National Natural Science Foundation of China (Grant No. 51778633, 51308552).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflict of interest regarding the publication of this paper.

References

Macioszek, E.; Kurek, A. Extracting Road Traffic Volume in the City before and during covid-19 through Video Remote Sensing. Remote Sens. 2021, 13, 2329. [Google Scholar] [CrossRef]
de Souza, D.G.B.; dos Santos, E.A.; Alves Junior, F.T.; Nascimento, M.C.V. On Comparing Cross-Validated Forecasting Models with a Novel Fuzzy-TOPSIS Metric: A COVID-19 Case Study. Sustainability 2021, 13, 13599. [Google Scholar] [CrossRef]
Ko, E.; Ahn, J.; Kim, E.Y. 3D Markov Process for Traffic Flow Prediction in Real-Time. Sensors 2016, 16, 147. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yao, R.; Zhang, W.; Zhang, D. Period Division-Based Markov Models for Short-Term Traffic Flow Prediction. IEEE Access 2020, 8, 178345–178359. [Google Scholar] [CrossRef]
Ma, C.; Tan, L.; Xu, X. Short-term traffic flow prediction based on genetic artificial neural network and exponential smoothing. Promet-Traffic Transp. 2020, 32, 747–760. [Google Scholar] [CrossRef]
Tang, J.; Zeng, J.; Wang, Y.; Yuan, H.; Liu, F.; Huang, H. Traffic flow prediction on urban road network based on License Plate Recognition data: Combining attention-LSTM with Genetic Algorithm. Transp. A Transp. Sci. 2021, 17, 1217–1243. [Google Scholar] [CrossRef]
Hou, Q.; Leng, J.; Ma, G.; Liu, W.; Cheng, Y. An adaptive hybrid model for short-term urban traffic flow prediction. Phys. A-Stat. Mech. Its Appl. 2019, 527. [Google Scholar] [CrossRef]
Zhang, W.; Yu, Y.; Qi, Y.; Shu, F.; Wang, Y. Short-term traffic flow prediction based on spatio-temporal analysis and CNN deep learning. Transp. A-Transp. Sci. 2019, 15, 1688–1711. [Google Scholar] [CrossRef]
Peng, H.; Wang, H.; Du, B.; Bhuiyan, M.Z.A.; Ma, H.; Liu, J.; Wang, L.; Yang, Z.; Du, L.; Wang, S.; et al. Spatial temporal incidence dynamic graph neural networks for traffic flow forecasting. Inf. Sci. 2020, 521, 277–290. [Google Scholar] [CrossRef]
Wang, X.; Xu, L. Wavelet-based short-term forecasting with improved threshold recognition for urban expressway traffic conditions. IET Intell. Transp. Syst. 2018, 12, 463–473. [Google Scholar] [CrossRef]
Mao, S.; Xiao, X. Short-Term Traffic Flow Grey Forecasting Model GM(1,1 vertical bar tan(k-tau)p, sin(k-tau)p) of Single-Cross-Section and Its Particle Swarm Optimization. J. Grey Syst. 2010, 22, 383–394. [Google Scholar]
Zhan, W.; Lu, Q.; Shang, Y. Prediction of Traffic Volume in Highway Tunnel Group Region Based on Grey Markov Model. In Proceedings of the 4th International Conference on Manufacturing Science and Engineering (ICMSE 2013), Dalian, China, 30–31 March 2013. [Google Scholar]
Wang, C.; Ye, Z. Traffic flow forecasting based on a hybrid model. J. Intell. Transp. Syst. 2016, 20, 428–437. [Google Scholar] [CrossRef]
Shahriari, S.; Ghasri, M.; Sisson, S.A.; Rashidi, T. Ensemble of ARIMA: Combining parametric and bootstrapping technique for traffic flow prediction. Transp. A-Transp. Sci. 2020, 16, 1552–1573. [Google Scholar] [CrossRef]
Lippi, M.; Bertini, M.; Frasconi, P. Short-Term Traffic Flow Forecasting: An Experimental Comparison of Time-Series Analysis and Supervised Learning. IEEE Trans. Intell. Transp. Syst. 2013, 14, 871–882. [Google Scholar] [CrossRef]
Zhang, L.; Alharbe, N.R.; Luo, G.; Yao, Z.; Li, Y. A Hybrid Forecasting Framework Based on Support Vector Regression with a Modified Genetic Algorithm and a Random Forest for Traffic Flow Prediction. Tsinghua Sci. Technol. 2018, 23, 479–492. [Google Scholar] [CrossRef]
Zhang, Y.-G.; Tang, J.; He, Z.-Y.; Tan, J.; Li, C. A novel displacement prediction method using gated recurrent unit model with time series analysis in the Erdaohe landslide. Nat. Hazards 2021, 105, 783–813. [Google Scholar] [CrossRef]
Yao, R.; Zhang, W.; Zha, L. Hybrid Methods for Short-Term Traffic Flow Prediction Based on ARIMA-GARCH Model and Wavelet Neural Network. J. Transp. Eng. Part A-Syst. 2020, 146, 04020086. [Google Scholar] [CrossRef]
Yasdi, R. Prediction of road traffic using a neural network approach. Neural Comput. Appl. 1999, 8, 135–142. [Google Scholar] [CrossRef]
Yang, Y.; Zheng, H.; Zhang, R. Prediction and Analysis of Aircraft Failure Rate Based on SARIMA Model. In Proceedings of the 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA), Beijing, China, 8–11 September 2017; pp. 567–571. [Google Scholar]
Song, Z.; Guo, Y.; Wu, Y.; Ma, J. Short-term traffic speed prediction under different data collection time intervals using a SARIMA-SDGM hybrid prediction model. PLoS ONE 2019, 14, e0218626. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Xie, Y.; Zhang, Y.; Qiu, J.; Wu, S. The adoption of deep neural network (DNN) to the prediction of soil liquefaction based on shear wave velocity. Bull. Eng. Geol. Environ. 2021, 80, 5053–5060. [Google Scholar] [CrossRef]
Chen, X.; Cai, X.; Liang, J.; Liu, Q. Ensemble Learning Multiple LSSVR With Improved Harmony Search Algorithm for Short-Term Traffic Flow Forecasting. IEEE Access 2018, 6, 9347–9357. [Google Scholar] [CrossRef]
Xu, X.; Jin, X.; Xiao, D.; Ma, C.; Wong, S.C. A hybrid autoregressive fractionally integrated moving average and nonlinear autoregressive neural network model for short-term traffic flow prediction. J. Intell. Transp. Syst. 2021, 1–18. [Google Scholar] [CrossRef]
Sharifi, J.; Saeednia, N. Neuro-Fuzzy Modeling of Data Singular Spectrum Decomposition and Traffic Flow Prediction. Iran. J. Sci. Technol. -Trans. Electr. Eng. 2020, 44, 519–535. [Google Scholar] [CrossRef]
Duan, H.; Liu, Y.; Wang, D.; He, L.; Xiao, X. Prediction of a multi-mode coupling model based on traffic flow tensor data. J. Intell. Fuzzy Syst. 2019, 36, 1691–1703. [Google Scholar] [CrossRef]
Xia, D.; Zhang, M.; Yan, X.; Bai, Y.; Zheng, Y.; Li, Y.; Li, H. A distributed WND-LSTM model on MapReduce for short-term traffic flow prediction. Neural Comput. Appl. 2021, 33, 2393–2410. [Google Scholar] [CrossRef]
Zhao, F.; Zeng, G.-Q.; Lu, K.-D. EnLSTM-WPEO: Short-Term Traffic Flow Prediction by Ensemble LSTM, NNCT Weight Integration, and Population Extremal Optimization. IEEE Trans. Veh. Technol. 2020, 69, 101–113. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, L.; Li, S.; Wen, X.; Xiong, Y. Short Term Traffic Flow Prediction of Urban Road Using Time Varying Filtering Based Empirical Mode Decomposition. Appl. Sci. 2020, 10, 2038. [Google Scholar] [CrossRef] [Green Version]
Lu, W.; Rui, Y.; Yi, Z.; Ran, B.; Gu, Y. A Hybrid Model for Lane-Level Traffic Flow Forecasting Based on Complete Ensemble Empirical Mode Decomposition and Extreme Gradient Boosting. IEEE Access 2020, 8, 42042–42054. [Google Scholar] [CrossRef]
Li, Y.; Chai, S.; Ma, Z.; Wang, G. A Hybrid Deep Learning Framework for Long-Term Traffic Flow Prediction. IEEE Access 2021, 9, 11264–11271. [Google Scholar] [CrossRef]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3848–3858. [Google Scholar] [CrossRef] [Green Version]
Vijayalakshmi, B.; Ramar, K.; Jhanjhi, N.Z.; Verma, S.; Kaliappan, M.; Vijayalakshmi, K.; Vimal, S.; Kavita; Ghosh, U. An attention-based deep learning model for traffic flow prediction using spatiotemporal features towards sustainable smart city. Int. J. Commun. Syst. 2021, 34, e4609. [Google Scholar] [CrossRef]
Li, R.; Hu, Y.; Liang, Q. T2F-LSTM Method for Long-Term Traffic Volume Prediction. IEEE Trans. Fuzzy Syst. 2020, 28, 3256–3264. [Google Scholar] [CrossRef]
Li, Z.; Zheng, Z.; Washington, S. Short-Term Traffic Flow Forecasting: A Component-Wise Gradient Boosting Approach With Hierarchical Reconciliation. IEEE Trans. Intell. Transp. Syst. 2020, 21, 5060–5072. [Google Scholar] [CrossRef]
Mohammad, K.F.; Rajiv, G. ARIMA and NAR based prediction model for time series analysis of COVID-19 cases in India. J. Saf. Sci. Resil. 2020, 1, 12–18. [Google Scholar]
Wu, W.; An, S.; Guo, J.; Guan, P.; Ren, Y.; Xia, L.; Zhou, B. Application of nonlinear autoregressive neural network in predicting incidence tendency of hemorrhagic fever with renal syndrome. Zhonghua Liu Xing Bing Xue Za Zhi = Zhonghua Liuxingbingxue Zazhi 2015, 36, 1394–1396. [Google Scholar]
Rege, A.; Obradovic, Z.; Asadi, N.; Parker, E.; Pandit, R.; Masceri, N.; Singer, B. Predicting Adversarial Cyber-Intrusion Stages Using Autoregressive Neural Networks. IEEE Intell. Syst. 2018, 33, 29–39. [Google Scholar] [CrossRef]
Li, Z.; Hayashibe, M.; Zhang, Q.; Guiraud, D.; Japan, I.R.S.o. FES-Induced Muscular Torque Prediction with Evoked EMG Synthesized by NARX-Type Recurrent Neural Network. In Proceedings of the 2012 IEEE/Rsj International Conference on Intelligent Robots and Systems, Algarve, Portugal, 7–12 October 2012; pp. 2198–2203. [Google Scholar]
Xie, H.; Tang, H.; Liao, Y.-H. Time series prediction based on NARX neural networks: An advanced approach. In Proceedings of the International Conference on Machine Learning and Cybernetics 2009, Dalian, China, 13–16 August 2006; pp. 1275–1279. [Google Scholar]
Chenxia, X.; Zilong, W.; Ng, J.C.Y. Hybrid Model Based on Wavelet Decomposition for Electricity Consumption Prediction. J. Donghua Univ. 2019, 36, 77–87. [Google Scholar] [CrossRef]
Ordonez, C.; Sanchez Lasheras, F.; Roca-Pardinas, J.; de Cos Juez, F.J. A hybrid ARIMA-SVM model for the study of the remaining useful life of aircraft engines. J. Comput. Appl. Math. 2019, 346, 184–191. [Google Scholar] [CrossRef]
Wang, X.; Xu, L.; Chen, K. Data-Driven Short-Term Forecasting for Urban Road Network Traffic Based on Data Processing and LSTM-RNN. Arab. J. Sci. Eng. 2019, 44, 3043–3060. [Google Scholar] [CrossRef]
Wang, K.W.; Deng, C.; Li, J.P.; Zhang, Y.Y.; Li, X.Y.; Wu, M.C. Hybrid methodology for tuberculosis incidence time-series forecasting based on ARIMA and a NAR neural network. Epidemiol. Infect. 2017, 145, 1118–1129. [Google Scholar] [CrossRef] [Green Version]
Valle dos Santos, R.d.O.; Vellasco, M.M.B.R. Neural Expert Weighting: A NEW framework for dynamic forecast combination. Expert Syst. Appl. 2015, 42, 8625–8636. [Google Scholar] [CrossRef]
Luor, D.-C. A comparative assessment of data standardization on support vector machine for classification problems. Intell. Data Anal. 2015, 19, 529–546. [Google Scholar] [CrossRef]
Zhao, L.; Wang, Q.; Jin, B.; Ye, C. Short-Term Traffic Flow Intensity Prediction Based on CHS-LSTM. Arab. J. Sci. Eng. 2020, 45, 10845–10857. [Google Scholar] [CrossRef]
Zhang, Y.G.; Tang, J.; Cheng, Y.; Huang, L.; Guo, F.; Yin, X.; Li, N. Prediction of landslide displacement with dynamic features using intelligent approaches. Int. J. Min. Sci. Technol. 2022, 12, 368–382. [Google Scholar] [CrossRef]

Figure 1. Structure of NAR model.

Figure 2. SARIMA–NAR combined model forecast flow chart.

Figure 3. The time series of traffic flow and its breakdown from January 2014 to December 2018. (a) Original, (b) Trend, (c) Seasonal, (d) Residual.

Figure 4. Autocorrelation function and partial autocorrelation function. (a) ACF (d = 1, D = 1), (b) ACF (d = 0, D = 1), (c) PACF (d = 1, D = 1), (d) PACF (d = 0, D = 1).

Figure 5. The normalized results of the residual sequence of the SARIMA model.

Figure 6. Autocorrelation coefficient graph after NAR modelling. (a) Original traffic volume time series. (b) SARIMA model residual series.

Figure 7. Residuals after NAR modelling. (a) Original traffic volume time series. (b) SARIMA model residual series.

Figure 8. Prediction results of the combined model.

Figure 9. Absolute error area chart.

Figure 10. Error statistics of fitting and prediction. (a) MAPE, (b) MAE, (c) RMSE.

Figure 11. Accuracy index statistics of fitting and prediction. (a) Accuracy (A), (b) Correlation coefficient (R).

Figure 12. Forecast results of traffic volume under the influence of the epidemic.

Table 1. Previous literature on combining models.

Combination Method Category	Features	Methodology	Literature Examples
Method of combining linear and nonlinear component	Build predictive models considering linear and nonlinear characteristics of traffic flow	LSSVR with the Gaussian kernel function and the linear kernel function	[23]
		ARFIMA-NAR	[24]
		SSA-LLNF	[25]
		ARIMA-GARCH	[18]
Method of assigning weight combinations	Optimize the model by assigning the best weights	ARIMA-WNN	[7]
		GM(1,1), SGM with wavelet analysis	[26]
		WND-LSTM	[27]
		EnLSTM-WPEO	[28]
Method of stacking combination after decomposition	Capture different characteristics of traffic flow data through decomposition	TVF-EMD-LSSVM	[29]
		CEEMDAN-XGBoost	[30]
		W-CNN-LSTM	[31]
Space–time combination method	Predict traffic flow based on temporal and spatial features	T-GCN	[32]
		CNN-LSTM	[33]
		T2F-LSTM	[34]

Table 2. ADF inspection result.

Differential Order	t	p	Critical Value
Differential Order	t	p	1%	5%	10%
d = 0, D = 0	0.862	0.993	−3.616	−2.941	−2.609
d = 1, D = 0	−3.826	0.003	−3.621	−2.944	−2.610
d = 0, D = 1	−4.802	≤0.001	−3.629	−2.945	−2.611
d = 1, D = 1	−4.874	≤0.001	−3.633	−2.949	−2.613

Table 3. Statistics of BIC values.

SARIMA (p, 1, q) (P, 1, Q)₁₂		SARIMA (p, 0, q) (P, 1, Q)₁₂
Model	BIC	Model	BIC
SARIMA (0,1,2) (0,1,2)₁₂	22.119	SARIMA (1,0,0) (0,1,0)₁₂	22.132
SARIMA (2,1,2) (2,1,1)₁₂	22.122	SARIMA (1,0,0) (1,1,0)₁₂	22.141
SARIMA (1,1,1) (0,1,2)₁₂	22.124	SARIMA (1,0,1) (2,1,0)₁₂	22.143
SARIMA (1,1,1) (2,1,0)₁₂	22.136	SARIMA (2,0,0) (2,1,0)₁₂	22.143
SARIMA (0,1,0) (2,1,1)₁₂	22.140	SARIMA (0,0,2) (2,1,0)₁₂	22.162
…	…	…	…
SARIMA (1,1,1) (1,1,2)₁₂	22.401	SARIMA (2,0,0) (1,1,2)₁₂	22.542
SARIMA (1,1,2) (1,1,2)₁₂	22.408	SARIMA (0,0,2) (1,1,2)₁₂	22.568
SARIMA (0,1,0) (1,1,2)₁₂	22.413	SARIMA (2,0,2) (1,1,1)₁₂	22.581
SARIMA (2,1,2) (1,1,2)₁₂	22.421	SARIMA (2,0,1) (1,1,2)₁₂	22.650
SARIMA (1,1,0) (1,1,2)₁₂	22.487	SARIMA (2,0,2) (1,1,2)₁₂	22.717

Table 4. Statistics of traffic forecast results.

Model	${e_{M A P E}}_{(%)}$		${e_{M A E}}_{(10^{4})}$		${e_{R M S E}}_{(10^{4})}$		A		R
Model	Prediction	Fitting	Prediction	Fitting	Prediction	Fitting	Prediction	Fitting	Prediction	Fitting
SARIMA	7.94	9.06	4.35	3.41	5.31	4.90	0.90	0.87	0.82	0.82
NAR	11.37	16.54	6.56	5.99	8.44	8.33	0.85	0.77	0.53	0.59
Grey prediction	14.29	12.01	8.66	4.32	11.82	5.93	0.79	0.83	−0.29	0.69
Exponential smoothing	8.77	6.90	5.25	2.52	7.11	4.04	0.87	0.89	0.86	0.87
SARIMA–NAR combination model 1	5.89	5.77	3.32	2.03	4.62	2.77	0.92	0.93	0.88	0.95
SARIMA–NAR combination model 2	7.88	10.05	4.36	3.74	5.70	5.17	0.90	0.86	0.81	0.80

Table 5. Model prediction and fitting situation under the influence of the epidemic.

Model	$e_{M A P E}$ _(%)		${e_{M A E}}_{(10^{4})}$		${e_{R M S E}}_{(10^{4})}$		A		R
Model	Prediction	Fitting	Prediction	Fitting	Prediction	Fitting	Prediction	Fitting	Prediction	Fitting
SARIMA	12.25	9.63	6.18	4.29	7.67	6.62	0.87	0.84	0.76	0.82
NAR	13.24	12.17	7.47	4.79	9.44	6.64	0.83	0.83	0.54	0.81
SARIMA–NAR combination model 1	8.07	8.28	4.29	3.48	5.66	5.57	0.90	0.87	0.86	0.87
SARIMA–NAR combination model 2	10.30	9.98	5.46	4.31	7.13	6.38	0.87	0.85	0.73	0.83

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Jia, R.; Dai, F.; Ye, Y. Traffic Flow Prediction Method Based on Seasonal Characteristics and SARIMA-NAR Model. Appl. Sci. 2022, 12, 2190. https://doi.org/10.3390/app12042190

AMA Style

Wang Y, Jia R, Dai F, Ye Y. Traffic Flow Prediction Method Based on Seasonal Characteristics and SARIMA-NAR Model. Applied Sciences. 2022; 12(4):2190. https://doi.org/10.3390/app12042190

Chicago/Turabian Style

Wang, You, Ruxue Jia, Fang Dai, and Yunxia Ye. 2022. "Traffic Flow Prediction Method Based on Seasonal Characteristics and SARIMA-NAR Model" Applied Sciences 12, no. 4: 2190. https://doi.org/10.3390/app12042190

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Traffic Flow Prediction Method Based on Seasonal Characteristics and SARIMA-NAR Model

Abstract

1. Introduction

2. Methods

2.1. SARIMA Model

2.2. NAR Dynamic Neural Network

2.3. SARIMA-NAR Model

3. Case Analysis

3.1. The Prediction of the SARIMA Model

3.1.1. The Stationarization of Series

3.1.2. The Determination of Model Parameters

3.2. The Prediction of the NAR Model

3.2.1. Series Normalization

3.2.2. The Establishment of the NAR Model

3.3. Analysis of Forecast Results

4. Discussion

4.1. Comparison of Multiple Models

4.2. Performance of Prediction Model under the Influence of the Epidemic

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI