Next Article in Journal
Bootstrapping State-Space Models: Distribution-Free Estimation in View of Prediction and Forecasting
Previous Article in Journal
Advancements in Downscaling Global Climate Model Temperature Data in Southeast Asia: A Machine Learning Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improvement on Forecasting of Propagation of the COVID-19 Pandemic through Combining Oscillations in ARIMA Models

Department of Applied Statistics, Gachon University, Seongnam-si 13120, Republic of Korea
Forecasting 2024, 6(1), 18-35; https://doi.org/10.3390/forecast6010002
Submission received: 13 October 2023 / Revised: 22 December 2023 / Accepted: 22 December 2023 / Published: 26 December 2023
(This article belongs to the Section Environmental Forecasting)

Abstract

:
Daily data on COVID-19 infections and deaths tend to possess weekly oscillations. The purpose of this work is to forecast COVID-19 data with partially cyclical fluctuations. A partially periodic oscillating ARIMA model is suggested to enhance the predictive performance. The model, optimized for improved prediction, characterizes and forecasts COVID-19 time series data marked by weekly oscillations. Parameter estimation and out-of-sample forecasting are carried out with data on daily COVID-19 infections and deaths between January 2021 and October 2022 in the USA, Germany, and Brazil, in which the COVID-19 data exhibit the strongest weekly cycle behaviors. Prediction accuracy measures, such as RMSE, MAE, and HMAE, are evaluated, and 95% prediction intervals are constructed. It was found that predictions of daily COVID-19 data can be improved considerably: a maximum of 55–65% in RMSE, 58–70% in MAE, and 46–60% in HMAE, compared to the existing models. This study provides a useful predictive model for the COVID-19 pandemic, and can help institutions manage their healthcare systems with more accurate statistical information.

1. Introduction

The prevalence of COVID-19 has been a worldwide concern for more than three years and continues to threaten human health. The trends of COVID-19 cases display different patterns across various countries. In some countries, daily cases are decreasing due to beneficial policies, such as booster vaccine campaigns. On the other hand, some other countries have experienced surges in COVID-19 infections due to local problems. Moreover, cyclical fluctuations or waves are also observed in some countries, either long-term or short-term. For the dynamic time series patterns of COVID-19, numerous studies have been conducted on modeling and forecasting since the outbreak began in 2019–2020. For instance, see [1,2,3,4,5,6,7,8,9,10] for remarkable works on the forecasting analysis of COVID-19. They dealt with ARIMA models and machine learning for COVID-19 pandemic forecasting. Refs. [11,12] proposed exponential decay models for short-term forecasts of COVID-19, which proved to be effective in short-term forecasting. Developing accurate predictive models for dynamic data represents a significant challenge. This is because the process of modeling and forecasting such random phenomena carries academic importance. Moreover, reliable statistical analysis can play a crucial role in enhancing social policies aimed at human health. In academia and health institutions, efforts to prevent the transmission of respiratory diseases should continue until the proliferation of the virus is completely over.
Many infectious diseases, including malaria, dengue, the influenza virus, as well as COVID-19, are not maintained in a state of equilibrium but exhibit significant fluctuations in prevalence over time [13], for which mathematical modelings have been developed with gradually improved achievements in the past years. For instance, we refer to [14,15,16] for the seasonality of malaria, dengue, and influenza virus. Refs. [17,18,19] focused on the seasonal trends in COVID-19 cases. However, beyond the seasonality of COVID-19, they also observed high-frequency oscillations with a periodicity of approximately one week. In other words, one distinctive characteristic of the COVID-19 pandemic patterns is the presence of periodic oscillations with weekly cycles. This aspect was also discussed by [20], who investigated high-frequency (i.e., weekly) oscillatory patterns in COVID-19 infections and deaths.
Moreover, Ref. [20] urged the scientific community to conduct an in-depth exploration of the periodicity in COVID-19 cases, which might lead to a better understanding and forecasting of COVID-19 transmission. Refs. [21,22,23,24,25,26] discussed the weekly cycle behaviors and periodic recurrent waves of COVID-19 data. In particular, Refs. [22,24] applied the cyclical fluctuation to infer or predict the spread rate and incidence rate of the coronavirus, while [26] dealt with modeling the drivers of oscillations in COVID-19 data on college campuses by emphasizing that the oscillations of COVID-19 exist as a result of incorporating human behaviors into the systems.
Refs. [20,22,23] pointed out that periodic oscillations are associated with a testing bias. As global COVID-19 cases rose, the overwhelming tasks of managing the severe virus have led to a testing bias, resulting in varied patterns in COVID-19 data. This testing bias stems from more frequent testing on certain days of the week and less on others, contributing to the weekly cycle fluctuations in the number of COVID-19 cases. For example, in some of the most affected countries, such as the USA, Germany, and Brazil, recent COVID-19 time series data exhibit exceptionally partial-periodic oscillations with weekly cycles. These oscillations are characterized by stronger fluctuations at larger magnitudes.
Meanwhile, Refs. [27,28] handled the 7-day smoothed data of COVID-19. Their modeling/forecasting work is significant in itself, as social policies against COVID-19, such as lockdowns and travel restrictions, typically span periods longer than 7 days. Nevertheless, as claimed by [20,22,23] periodic oscillation phenomena should be explored in depth in the evolutionary history of the COVID-19 pandemic. It is important to identify the cyclical behaviors of COVID-19 time series data for the purpose of their full understanding and improved prediction.
The oscillations observed in COVID-19 time series data do not fit well into existing models, necessitating the development of a new model for improved predictive performance. This study focuses on modeling and forecasting the partially periodic oscillatory patterns of COVID-19 data. We utilize an autoregressive integrated moving average (ARIMA) model and incorporate a partially periodic oscillating (PPO) component to capture the weekly cyclical fluctuations. This model is referred to as the PPO-ARIMA model. However, unlike a seasonal ARIMA (SARIMA) model with a 7-day cycle, in our proposed model, the oscillation amplitudes are proportional to the magnitudes of the ARIMA part: stronger oscillations are reflected on larger magnitudes of the ARIMA part, whereas weaker oscillations align with smaller magnitudes. To create this feature, the PPO part is generated theoretically by indicator variables and weights, depending on the values of the ARIMA part. The oscillations occur by adopting periodic weights on the values of the ARIMA part. An additional oscillation part is the main difference from the traditional ARIMA model.
This study aims to improve the forecasting capability for the spread of the COVID-19 pandemic by adding the PPO part to existing ARIMA models. We conduct estimation and out-of-sample forecasting through empirical analysis of real data from three countries: the USA, Germany, and Brazil, which possess the strongest oscillations in their COVID-19 infection and death cases. The estimation methods are simple and easy to implement by means of average and linear regression. As the forecasting performance measures, the root mean square error (RMSE), mean absolute error (MAE), and heterogeneous MAE (HMAE) are computed and compared with other existing models. Some discussions about the superiority of the proposed model are addressed, including the evaluation of the efficiency of the model based on the forecasting performance accuracy. Finally, prediction intervals are constructed.
The rest of the paper is organized as follows: In Section 2, the model and estimation are described. In Section 3, the empirical analysis results with estimations and out-of-sample forecasting are presented. The conclusion and discussion are presented in Section 4.

2. Method

To achieve the forecasting analysis on COVID-19 data, in this section, we first describe the datasets and then introduce the PPO-ARIMA model.

2.1. Data

In the empirical experiments, the daily numbers of confirmed COVID-19 cases and related deaths are considered for three countries—USA, Germany, and Brazil—are considered. These countries have strong partial periodic oscillations among others. COVID-19 time series data from 1 January 2021 to 13 October 2022, with a size of 651, were obtained from the WHO website: https://covid19.who.int/data (accessed on 12 October 2023) A summary of the statistics is given in Table 1. To achieve the purpose of estimation and forecasting, the standardized data, subtracted by the mean and then divided by the standard deviation, are applied to the PPO-ARIMA model. In other words, { Y t = ( Y t o μ ^ ) / σ ^ , t = 1 ,   2 ,     , n } with n = 651 is used in the proposed model, where Y t o is the (original) daily COVID-19 confirmed (or death) cases at time t, μ ^ = μ ^ n , and σ ^ = σ ^ n are its sample mean and sample standard deviation given in Table 1. The transformed data { Y t } form a triangular array with Y t Y t , n = ( Y t o μ ^ n ) / σ ^ n . Once the estimation and prediction have been conducted using { Y t } , the empirical results for the original data are then inversely transformed for the visualizations presented in the following section. The estimation results discussed below are derived from applying the proposed model to the standardized data, while the illustrations of the one-step ahead predictions and their prediction intervals are displayed using the original data.
Oscillation modeling is needed to forecast the propagation of COVID-19 more precisely. Oscillation is due to daily differences in testing for the virus and death reporting, as mentioned by [21]. In other words, it is caused by testing bias, which means that testing for the virus is performed more often during certain days of the week and less often on other days, as mentioned by [22,23]. In order to represent the oscillation more precisely, we suggest combining periodic oscillations in the ARIMA models.

2.2. ARIMA Model with Partial Periodic Oscillation

In this work, we consider an ARIMA model with partial periodic oscillation, { Y t , t = 0 , 1 ,   } , given by
Y t = X t + Δ t + ε t
where { X t } is an ARIMA model, { Δ t } is a oscillation component, and { ε t } is an i.,i.d. noise process.
Firstly, we briefly describe the ARIMA model { X t } of order ( p , d , q ) . Using the back-shift operator B, let D = 1 B , be the difference operator, such that D t D ( X t ) = ( 1 B ) X t = X t X t 1 . The ARIMA ( p , d , q ) model { X t } satisfies the following: defining D t d = ( 1 B ) d X t , which is the d-th order differenced series of X t ,
D t d = ϕ 1 D t 1 d + + ϕ p D t p d + ϵ t + θ 1 ϵ t 1 + + θ q ϵ t q
for coefficients ϕ i and θ j , ( i = 1 ,   2 ,     p and j = 1 ,   2 ,     ,   q ), and for a white noise { ϵ t } . The characteristic function ϕ ( z ) = 1 ϕ 1 z ϕ 2 z 2 ϕ p z p has roots outside the unit circle. Then, the d-th order difference series { D t d } is stationary. The ARIMA model is very popular in time series analysis and has been used by many researchers; for example, see [29,30]).
Secondly, we describe the partial periodic oscillation component { Δ t } as follows:
Δ t = ω | X t x 0 | δ I { X t > x 0 }
where I ( · ) is the indicator function, x 0 is the threshold, δ is the exponent, and ω , ( = 0 ,   1 ,   ,   τ 1 ) are weights that are chosen with the relationship of = t mod τ periodically as τ is periodicity, in other words, is the remainder of t as divided by τ . In order to generate oscillations, we consider τ different values for weights, ω 0 , ω 1 ,     , ω τ 1 . If t 1 and t 2 have the same remainder, as divided by τ , then Δ t 1 and Δ t 2 have the same weight ω with the same remainder { 0 ,   1 ,     ,   τ 1 } . Since the summation expression of = 0 τ 1 ω I { t = mod τ } implies ω with = t mod τ , (1) can be expressed as
Δ t = | X t x 0 | δ I { X t > x 0 } = 0 τ 1 ω I { t = mod τ } .
This expression unifies all τ cases for the general time index t and, thus, it is a better expression of the mathematical analysis below.
We focus on the partial periodic oscillation (PPO) part { Δ t } , which is constructed by indicator variables and weights, depending on the values of the ARIMA part. From the expressions in (1) or (2), we see that the partial periodic oscillation part Δ t is generated by three parameters: threshold x 0 , exponent δ , and weights ω . Moreover, it consists of three terms: | X t x 0 | δ , I { X t > x 0 } , and ω . The indicator variable I { X t > x 0 } implies the existence of the PPO part in the model; if X t x 0 , then Δ t = 0 , i.e., if the magnitude of the ARIMA part is less than the threshold, the PPO part does not exist. Thus, the role of threshold x 0 is to control the portion of partial oscillations in the model. The amplitude of the PPO part is proportional to the value of the ARIMA part if the value is greater than threshold x 0 : Δ t is proportional to | X t x 0 | δ if it does not vanish. Also, the amplitude depends on the weight, of which, the index is determined by the remainder of the time epoch, divided by τ , so that τ -period oscillations occur in the time series data. Thus, the role of weights ω is to control the occurrence of the pure oscillations by having increasing/decreasing patterns on the values of ω . The exponent δ plays a role in finding pure oscillation magnitudes as well as controlling the magnitudes of the oscillations depending on the values of the ARIMA part. The bigger the δ , the larger the PPO values. Also, Δ t / | X t x 0 | δ makes pure oscillation weights ω .
The goal of this work is to model and forecast COVID-19 case data, focusing on the partial periodic oscillations with a periodicity of τ = 7 , by focusing on weekly oscillatory patterns in the COVID data. As observed in the COVID-19 confirmed and death case figures for the three countries, extreme values (local maximal or minimal points) exhibit a period of 7 days. Rather than other intervals, such as 28 days, 7 days are adopted for τ to describe the oscillation periodicity for our purpose. In the following, we first propose parameter estimation and then perform an out-of-sample forecasting analysis to present our main results.

2.3. Estimation

We now describe the estimation of parameters in (1) before providing the empirical results. Suppose that a sample { Y 1 ,     , Y n } is observed with periodicity τ . We use τ = 7 and n = 7 k for a positive integer k; that is, n is a multiple of τ for COVID-19 data analysis. Since the value of τ = 7 is small, compared to the total sample size n, if the sample size is not a multiple of τ , the initial finite set of data, which is smaller than τ may be deleted without affecting the analysis. In order to estimate parameters δ and ω , ( = 0 ,   1 ,     ,   τ 1 ) , from the sample { Y 1 ,     , Y n } , we follow three steps: the first is to decompose the time series { Y t } into two parts, the ARIMA part { X t } and the PPO part { Δ t } ; the second is to estimate the exponent parameter δ , and the final step is to estimate the weights ω by averaging.
First, to decompose into two parts, we compute the τ -day smoothed moving average series { X t , t = 1 ,     , n } given by
X t X t , τ = 1 τ j = t τ / 2 t + τ / 2 Y j if j { 1 ,   2 ,     , n }
where a is the integer part of a real number a; if j { 1 ,   2 ,     , n } , Y j is regarded as 0 and X t is evaluated as the average of nonzero observations, instead of dividing by τ . For the transformed data { X t } , an ARIMA model is fitted with the estimated ARIMA coefficients. For the PPO part { Δ t } , which is obtained by Δ t = Y t X t , the model (1) will be fitted.
Second, in order to estimate δ in (1): Δ t = ω | X t x 0 | δ I ( X t > x 0 ) , where is the index in { 0 ,   1 ,     , τ 1 } , such that t = mod τ , with some chosen threshold x 0 (whose selection will be discussed in the next section), we split the time period into disjoint subperiods with the ith time period [ ( i 1 ) τ + 1 , i τ ] , for i { 1 ,   2 ,     , n / τ } . For each i, let M i ( δ ) and m i ( δ ) denote, respectively, the maximum and minimum of Δ t | X t x 0 | δ in the ith period, provided X t > x 0 + ϵ 0 for t [ ( i 1 ) τ + 1 , i τ ] , with some small constant ϵ 0 > 0 . That is,
M i ( δ ) = max Δ t | X t x 0 | δ I { X t > x 0 + ϵ 0 } , ( i 1 ) τ + 1 t i τ ,
m i ( δ ) = min Δ t | X t x 0 | δ I { X t > x 0 + ϵ 0 } , ( i 1 ) τ + 1 t i τ
where the constant ϵ 0 > 0 , which is added to avoid a too-small value of | X t x 0 | in the denominator, plays a role where there fraction Δ t / | X t x 0 | δ falls within a bounded range. The choice of ϵ 0 is not so sensitive to the estimation since the maximum M i ( δ ) and minimum m i ( δ ) are not affected by the value of ϵ 0 , which just controls the amount of zero-nonzero portions of periodic oscillations.
Let δ 0 be the true (unknown) value of the exponent δ in the model. Note that M i ( δ 0 ) and m i ( δ 0 ) are constants representing the highest weight and the lowest weight, respectively, i.e., independent of i for the true value δ 0 . The following explains why M i ( δ ) and m i ( δ ) are constants for all i if δ = δ 0 for the true exponent parameter δ 0 . Let Δ t ( δ 0 ) = | X t x 0 | δ 0 I { X t > x 0 + ϵ 0 } = 0 τ 1 ω I { t = mod τ } . Also, let ¯ be the index in { 0 ,   1 ,     , τ 1 } with the highest extreme ω ¯ of oscillations; that is, ω ¯ ω for all ¯ . For each i { 1 ,   2 ,     , n / τ } , let t i [ ( i 1 ) τ + 1 , i τ ] , if t i = ¯ mod τ , then we have Δ t i ( δ 0 ) = | X t i x 0 | δ 0 I { X t i > x 0 + ϵ 0 } ω ¯ , or equivalently, if X t i > x 0 + ϵ 0 , then ω ¯ = Δ t i ( δ 0 ) / | X t i x 0 | δ 0 and, thus, M i ( δ 0 ) = ω ¯ for all i. Hence, for all i , j , | M i ( δ 0 ) M j ( δ 0 ) | = 0 for the true exponent δ 0 . In the same way, let ̲ be the index with the lowest extreme ω ̲ . Then we have m i ( δ 0 ) = ω ̲ and for all i , j , | m i ( δ 0 ) m j ( δ 0 ) | = 0 . Hence, we have that M i ( δ 0 ) and m i ( δ 0 ) are constants that are independent of i.
Therefore, we choose δ , such that
sup i j | M i ( δ ) M j ( δ ) | < ϵ , sup i j | m i ( δ ) m j ( δ ) | < ϵ
for small ϵ > 0 . To do this, for two sets of { M i ( δ ) : i = 1 ,   2 ,     , n / τ } and { m i ( δ ) : i = 1 ,   2 ,     , n / τ } , we consider two linear regression models of { ( i , M i ( δ ) ) : i = 1 ,   2 ,     , n / τ } and { ( i , m i ( δ ) ) : i = 1 ,   2 ,     , n / τ } with coefficients α 1 , β 1 and α 2 , β 2 , respectively, as follows:
M i ( δ ) = α 1 + β 1 i + ϵ 1 , i , m i ( δ ) = α 2 + β 2 i + ϵ 2 , i
where ϵ 1 , i , ϵ 2 , i are error terms. From the two regression models, estimates β ^ 1 , β ^ 2 of slope coefficients β 1 , β 2 are computed, noticing that slopes β 1 = β 2 = 0 when δ = δ 0 .
Note that if the estimated slope coefficients β ^ 1 and β ^ 2 are close to zero, then (3) is satisfied. Thus, we may choose δ ^ , so that it minimizes β ^ 1 2 + β ^ 2 2 :
δ ^ = arg min δ Θ β ^ 1 2 + β ^ 2 2
for a compact set Θ . We claim that δ ^ converges to the true exponent δ 0 in probability, as n . For a given compact set Θ of δ , suppose that δ 0 Θ . Let
B ( δ ) = β 1 ( δ ) 2 + β 2 ( δ ) 2 and B ^ ( δ ) = β ^ 1 ( δ ) 2 + β ^ 2 ( δ ) 2 ,
which are continuous functions of δ Θ , since M i ( δ ) and m i ( δ ) are continuous functions of δ Θ . Note that δ 0 is the minimizer of B ( δ ) , whereas δ ^ is the minimizer of B ^ ( δ ) . Moreover, for all δ Θ , we have B ^ ( δ ) p B ( δ ) . Thus, we may write
δ ^ = arg min δ Θ B ^ ( δ ) p arg min δ Θ B ( δ ) = δ 0
as n . Hence, the desired convergence in probability holds.
Finally, using δ ^ , for each { 0 ,   1 ,     , τ 1 } , we compute estimates of ω given by
ω ^ = 1 # ( A ) t A Δ t | X t x 0 | δ ^
where A = { t : t = mod τ } { t : X t > x 0 + ϵ 0 } . Note that if δ = δ 0 , then for t A , Δ t / | X t x 0 | δ 0 = ω , and since δ ^ p δ 0 as n , each of { Δ t / | X t x 0 | δ ^ , t A } converges to ω and so does the average of { Δ t / | X t x 0 | δ ^ , t A } as n . On the other hand, the median of { Δ t / | X t x 0 | δ ^ , t A } can also be chosen as an alternative to the average, which is a good alternative in the case of the presence of outliers. However, in this work, we choose the average in (4), based on a basic theory, where the sample mean converges to the population mean in probability.
The idea of estimation is simple and easy to implement because just basic statistical methods, such as regression analysis and averaging, are used to estimate the parameters of the PPO part. The statistical analysis was performed using Python statistical software version 3.8, numpy, scipy, statsmodels.tsa.arima.model, statsmodels.tsa.stattools, etc., to assess the empirical results.

3. Results

This section presents an empirical analysis of confirmed and death cases of COVID-19 in the USA, Germany, and Brazil. A primary objective of this work is to provide modeling and forecasting for pandemic data characterized by partial periodic oscillations. The dataset { Y t , ( or Y t o ) , t = 1 ,   2 ,     , n } will be fitted to a PPO-ARIMA model. From this sample, ARIMA part { X t , t = 1 ,   2 ,     , n } and PPO part { Δ t , t = 1 ,   2 ,     , n } are decomposed, as detailed in Section 2.2. Figure 1, Figure 2 and Figure 3 depict the plots of the (original or unstandardized) Y t o , its ARIMA part X t and PPO part Δ t , as well as the sample autocorrelation function (SACF) of the original data with weekly cycles, in the three countries, respectively. The plots of the SACF are presented to show how strong the 7-day oscillations are in each dataset of the three countries. We see the strongest oscillation patterns in the confirmed cases of Germany, whereas the weakest are in the confirmed cases of the USA.

3.1. Estimation Results

The parameters of the model are estimated from the standardized data, as mentioned before. For the PPO-ARIMA model Y t = X t + Δ t + ε t , the 7-day smoothed moving averaging data X t are fitted to an ARIMA model. To do this, we test the unit–root non-stationarity of X t by means of the ADF (augmented Dickey–Fuller) test. In Table 2, the results of the ADF test on the data { X t } are reported along with the p-values. The death cases of the USA and the confirmed cases in Brazil are 0.0019 and 0.0016, respectively, as the p-values of the ADF test. Since the values are less than 0.01, we reject the unit–root non-stationarity at the 1% level. Thus, they have order d = 0 in the fitted ARIMA ( p , d , q ) models. Other orders are selected by the criteria, such as AIC and root mean square errors. Table 2 also presents orders of the ARIMA ( p , d , q ) models { X t } as well as coefficient estimates and their standard error (s.e).
Table 3 presents the estimates of parameters of the PPO part { Δ t } : Threshold x 0 is selected as the minimum of { X t } . This is because all observations appear to be oscillated, even though some small magnitudes yield slight fluctuations, as seen in Figure 1, Figure 2 and Figure 3. However, unless all observations are oscillated, one method for choosing x 0 is to minimize the mean square error. In other words, we choose x ^ 0 = arg min M S E ( x 0 ) , where M S E ( x 0 ) = 1 n t = 1 n ϵ ^ t 2 , the mean square error. ϵ ^ t = Y t X ^ t Δ ^ t , X ^ t is the fitted value derived from the coefficient estimates of the ARIMA part, and Δ ^ t is the fitted value derived from the estimates ω ^ and δ ^ . In this work, we use the minimum of { X t } for the value of x 0 , because all plots of the second rows of Figure 1, Figure 2 and Figure 3 show the oscillations.
The exponent δ and the weights ω are estimated by means of arguments stated in Section 2.2. The values of the estimated weights in Table 3 indicate oscillations. In particular, in the confirmed case of Germany, stronger oscillations occur, which can be seen in the plot of the PPO part { Δ t } in the second row and second column of Figure 2. To highlight clear oscillations, Figure 4 depicts the periodicity of the estimates of weights, ω ^ , { 0 ,   1 ,     , 6 } . In the figure, weights are repeatedly plotted so that the 7-day periodicity can be seen. Note that 0 on the horizontal axis indicates Friday. In the USA and Brazil, on Friday, there are more confirmed/death cases than on other days, whereas in Germany, Wednesdays see a higher number of cases than on other days.

3.2. Prediction Results

Now, in order to see the forecasting performance, an out-of-sample forecasting analysis is conducted. We first compute k-step ahead predicted values, ( k = 1 ,   2 ,   ), along with their accuracy measures, and secondly construct 95% prediction intervals of one-step ahead forecasts. For the out-of-sample forecasting, the total sample is divided into two subsamples. As the sample size is T = 651 , the initial in-sample of size n = 231 and out-of-sample of size m = 420 are split into two subsamples. A rolling window technique is used to compute k-step ahead forecasts and their errors. At time t, the k-step ahead forecast of Y t is given by
Y ^ t ( k ) = X ^ t + k + ω ^ k X ^ t + k x ^ 0 δ ^ I { X ^ t + k > x ^ 0 }
where X ^ t + k is the k-step ahead forecast of X t by using the ARIMA model and k = t + k mod τ .
From these, the root mean square error (RMSE), the mean absolute error (MAE), and heterogeneous MAE (HMAE) of the k-step ahead forecasts are evaluated as follows:
R M S E k = 1 m i = 1 m ( Y t i + k Y ^ t i ( k ) ) 2 1 / 2
M A E k = 1 m i = 1 m | Y t i + k Y ^ t i ( k ) |
H M A E k = 1 m i = 1 m Y t i + k o Y ^ t i o ( k ) Y t i + k o
where Y t and Y t o are standardized and unstandardized data, respectively. Because { Y t } and { Y ^ t } are standardized data and their forecasts, respectively, RMSE and MAE are appropriate metrics to compare all confirmed and death case data, along with those of other existing models, such as ARIMA and SARIMA models. Also, in the expression of HMAE, the denominators are unstandardized since it is important to see the ratio of the forecast errors to the positive original data, and if the standardized data are used in the HMAE, the denominator can be too small in absolute terms, nearly zero, which would lead to too big HMAE values, rendering them nonsensical. All six instances of confirmed and death cases in the three countries are compared with each other, together with those from the other two models; thus, the formulas of the three error metrics using Y t in RMSE, MAE, and Y t o in HMAE are appropriate.
The three accuracy measures in Table 4 are obtained by the formulas above with m = 420 and k { 1 ,   2 ,   3 } . Also, Table 4 reports comparisons with the existing models: the ARIMA and SARIMA models. Model selections for the ARIMA ( p ,   d ,   q ) models are given by the criteria of the best AIC values via Python auto_arima, setting the range of orders: p , q { 0 ,   1 ,     ,   7 } and d { 0 ,   1 ,   2 } . In the SARIMA models, seasonal period s   =   7 is taken and order is chosen by the AIC values as well. In Table 4, we see that the PPO-ARIMA models have the smallest error values in most cases, except for the HMAE on the one-step ahead forecasts of the USA and two-, three-steps of Germany. The best values are indicated by the bolded numbers in Table 4. Most of the values of RMSE, MAE, and HMAE in Table 4 are the best in the PPO-ARIMA models. In Germany’s COVID-19 data, instead of the PPO-ARIMA model, the ARIMA model gives the best values of HMAE for the two- and three-step ahead forecasts. It might be due to relatively large values of real data in the last part of the sample, as seen in Figure 2.
The one-step ahead forecasts by the PPO-ARIMA models for the last 420 days and their errors in the USA, Germany, and Brazil, are depicted, respectively, in Figure 5. The one-step ahead forecasted values fit well with the actual data, even though there are some errors. Also, we see that periodic oscillations of one-step ahead forecasts seem to be as strong as the actual data.
To understand how well the PPO-ARIMA model performs in prediction errors, compared to other models, we provide two results: illustrations between real values and forecasts, and efficiency evaluations. First, Figure 6 shows a straight-line relationship between real values and forecasts, along with slopes and R 2 -values of the linear regressions in the three models. In the PPO-ARIMA models, slopes are closer to one and R 2 -values are higher than the other two models. As the second measure, the efficiency of the prediction by the PPO-ARIMA model is evaluated from the error values in Table 4. For an error function f { R M S E ,   M A E ,   H M A E } , the efficiencies denoted by Effi A and Effi S , relative to the two benchmarks, the ARIMA and SARIMA models, are defined by
Effi A = 100 × f A f P P O f P P O , Effi S = 100 × f S f P P O f P P O
where A on the subscript stands for ARIMA, S for SARIMA, and P P O for the PPO-ARIMA model. The results of the PPO-ARIMA prediction efficiencies, Effi A and Effi S , relative to the ARIMA and SARIMA models, are reported in Table 5. Because the SARIMA model is a full periodic oscillation model, SARIMA underperforms compared to the ARIMA model and, thus, efficiency relative to the SARIMA model is better than that of the ARIMA model. Note that the ARIMA models use order p = 7 in their AR parts, chosen by the criteria of the best AIC values. Since the data do not have full periodic oscillation, the comparison with the SARIMA model might be somewhat unfair. To solve the unfairness, an action, such as the regime-switching Markov chain, might be required in the SARIMA model. However, this would require extensive theoretical and empirical analysis and is therefore left for future study.
From the efficiency results in Table 5, we conclude that our proposed PPO-ARIMA model improves the forecast errors, such as the RMSE, MAE, and HMAE for one-, two-, and three-step ahead forecasts. The superiority of the proposed model is demonstrated by large values of efficiency in Table 5. A maximum of 46–58% efficiency relative to the ARIMA model and 65–70% relative to the SARIMA model are seen in the error metrics of RMSE, MAE, and HMAE. Also, the PPO-ARIMA model achieves a maximum improvement of 55–65% in RMSE, 58–70% in MAE, and 46–60% in HMAE for the one-step forecasts, compared to the existing models.
Finally, the 95% prediction intervals of the one-step forecasts are constructed by using a normal approximation. For the empirical analysis, among the 420 days forecasts in Figure 4, the last 70 days are selected to draw the prediction intervals, which are computed as follows:
Y ^ t ( 1 ) z 0.975 σ ^ 1 , Y ^ t ( 1 ) + z 0.975 σ ^ 1
where z 0.975 = 1.96 is used and σ ^ 1 2 is the one-step prediction variance given by σ ^ 1 2 = 1 70 i = 1 70 ( Y t i + 1 Y ^ t i ( 1 ) ) 2 1 70 i = 1 70 ( Y t i + 1 Y ^ t i ( 1 ) ) 2 . The 95% prediction intervals for the last 70 days are illustrated in Figure 7. Most of the actual data belong to the prediction intervals; indeed, the 95% prediction intervals include 94.28–98.57% of actual data. These values are close to the nominal coverage of 95%. The reason for the deviation between the nominal and empirical coverage is that the sample size is 70 in the construction of the intervals and the evaluation of the prediction variance. It is well-known that the empirical coverage converges to the nominal one as the size increases. Also, we see from Figure 7 that the prediction intervals possess the features of oscillations with a periodicity of 7 as well. The prediction intervals in (5) have the same length, 2 z 0.975 σ ^ 1 , and the oscillations occur, depending on the values of the one-step predicted values. In the cases of Germany, the last ten days have somewhat large extreme actual values in both confirmed and death cases (see Figure 7) and, thus, because of the large extremes, the proposed model for the cases of Germany does not give the best values in the HMAE for the two- and three-step ahead forecasts in Table 4. However, the 95% prediction intervals need to be improved because the residuals might not follow the normal distribution. As for the prediction interval improvements, Ref. [28] discussed the bootstrap improvement on the prediction intervals for COVID-19 data, along with the approach of the Laplace distribution. For the PPO-ARIMA model and its prediction in this work, the topic of prediction interval improvement will be deferred to further study.
As discussed above in Section 2, the roles of threshold x 0 and exponent δ are important because they might incur problems of overfitting or underfitting. Even though x 0 is chosen as the minimum of the standardized data in this work, for other real-world data, some other criteria should be chosen, for instance, through MSE, as discussed in Section 3.1. As seen in the plots of weights ω , { 0 ,   1 ,     , τ 1 } , in Figure 4, which shows the 7-day periodicity of COVID-19 data, most have distinct periodic oscillations with large amplitudes. However, in other datasets with somewhat small amplitudes of oscillations, we need to perform more actions like finding the standard error of the estimates, which are not given in this work. Instead of the consistency of the estimators, the asymptotic distributions should be established to find the standard errors. This generalizability problem addresses potential concerns and will be dealt with in a future study.

4. Discussion and Conclusions

The scientific community should continue to make efforts to predict and mitigate the COVID-19 pandemic using reliable scientific methods as long as the virus continues to spread globally. In particular, as discussed by [20], the high-frequency oscillatory patterns in COVID-19 infections and deaths should be incorporated into prediction analyses for a comprehensive understanding and improved forecasting. A remarkable feature, resulting from testing bias or human behaviors in health systems, is the periodic oscillations observed in the most affected countries and continents, such as North America and Europe. As [22,23] noted, identifying such cyclical oscillations in COVID-19 time series data is a significant issue. Reliable forecasting of these oscillation phenomena will mark a notable advancement in the history of the COVID-19 pandemic.
This study focused on forecasting COVID-19 data with 7-day cyclical fluctuations by combining the ARIMA model with a partial periodic oscillation model. Employing this proposed predictive model, which utilizes a straightforward mathematical approach, we predicted confirmed and death cases of COVID-19. The USA, Germany, and Brazil were selected for empirical analysis due to the strong oscillatory patterns in their COVID-19 data. New daily COVID-19 data for both confirmed and death cases in these three countries were empirically estimated. Out-of-sample forecasting experiments were conducted to evaluate prediction accuracy and construct 95% prediction intervals.
In order to see the forecasting performance, prediction accuracy measures, such as root mean square error (RMSE), mean absolute error (MAE), and heterogeneous MAE (HMAE), were evaluated. RMSE, MAE, and HMAE of the one-, two-, and three-step ahead forecasts of COVID-19 confirmed/death cases were computed and compared with other existing models. Comparisons with ARIMA models (with order p = 7 of the AR part) and SARIMA models (with 7-day periodicity) were reported; model selections were determined by the optimal AIC values. The efficiencies of the PPO-ARIMA model, relative to each of the two benchmarks, were evaluated. The results showed that our model improved the ARIMA model by a maximum of 58% and the SARIMA model by 70%. More specifically, predictions of the daily COVID-19 cases can be improved by the PPO-ARIMA model: by a maximum of 55–65% in RMSE, 58–70% in MAE, and 46–60% in HMAE, compared to the existing models.
Moreover, the 95% prediction intervals of one-step ahead forecasts were constructed for the six cases; their illustrations showed that the intervals include 94.28–98.57% of actual data in the out-of-sample forecasting as well as exhibit interval–oscillation patterns, coincidentally.
The PPO-ARIMA model will be a practical tool for predicting the spread of the global COVID-19 pandemic. The results of this study can assist health institutions in medical resource allocation and emergency strategy development by providing more accurate statistical information. Hence, a contribution of this study is the identification and superior forecasting of partially weekly oscillating COVID-19 cases using the proposed model, coupled with a new mathematical approach. The PPO-ARIMA model is well-suited for data exhibiting partial oscillation, where the SARIMA model may not be appropriate. Also, our model can deliver robust results for fully oscillated data, for which the SARIMA model is suitable. This is because the values of the PPO part are proportional to the values of the ARIMA part. Therefore, the PPO-ARIMA model can offer optimal performance on data with periodicity and seasonality, whether it exhibits partial or full oscillation.
A limitation of this study is the residual analysis, from which the prediction intervals were constructed. Because this work focuses on the partial periodicity of COVID-19 data, the main concentration of the paper is not on the residual analysis. A complement to this would be the more refined construction of prediction intervals through the estimation of the distribution of residuals. This topic will be addressed in future work. Moreover, another limitation of this study is that it analyzed only three countries that have the strongest oscillations in the world. The PPO-ARIMA model could be applied to datasets from other countries with weaker oscillations. Experiments on more general datasets are needed to justify the robustness of the model.
A recent study about the exponential decay model by [11] showed its effectiveness for short-term forecasting. Our model also shows good performance in short-term forecasting by reflecting the 7-day periodicity. However, the approaches of the exponential decay model and the PPO-ARIMA model differ: the latter emphasizes oscillation, which is a critical aspect of our study. Their explicit comparison will be interesting and will need extensive experiments; therefore, it remains a topic for future study.
Three directions for further study related to the partial periodic oscillations of COVID-19 are suggested: First, in terms of time series modeling, other models such as a heterogeneous autoregression (HAR) model or nonparametric models could be adopted instead of the ARIMA model. As discussed by [28], the HAR model with lagged average regressors is suitable for the smoothed data of COVID-19, and thus, a combined model incorporating the HAR model with partial periodic oscillations might offer enhanced predictive ability. Second, some exogenous variables can be added as significant regressors in the model, as in [30]. For example, the booster vaccination rate, which influences the spread of COVID-19, could be added as an explanatory variable. Third, from the perspective of forecast error distribution, efforts to minimize errors could be made through distribution inferences. This work assumed normal approximation for the residuals to construct prediction intervals. However, for a refinement of more accurate prediction intervals, the residual distribution can be inferred by means of the bootstrap procedure or kernel method. A comparative analysis of various prediction intervals, by evaluating their average length, empirical coverage probability, and mean interval score, will be able to yield the most improved prediction for the oscillatory patterns of COVID-19, which remains an area for future research. Overall, a variety of statistical extensions will be attempted in data analysis for COVID-19 prediction. This could be a contributing role of statistics in fostering a healthy society, by providing insights into disease transmission through modeling and forecasting with reduced errors.

Funding

This work was supported partially by the Research Fund of Gachon University (GCU-202206300001) and by the National Research Foundation of Korea (NRF-2023R1A2C1005395).

Data Availability Statement

All datasets used in this study are available in the WHO website: http://covid19.who.int/data (accessed on 12 October 2023).

Acknowledgments

The author thanks the editor and four anonymous referees for their valuable comments.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Ribeiro, M.H.D.M.; Silva, R.G.D.; Mariani, V.C.; Coelho, L.D.S. Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil. Chaos Solitons Fractals 2020, 135, 109853. [Google Scholar] [CrossRef] [PubMed]
  2. Maleki, M.; Mahmoudi, M.; Wraith, D.; Pho, K. Time series modelling to fore cast the confirmed and recovered cases of COVID-19. Travel. Med. Infect. Dis. 2020, 37, 101742. [Google Scholar] [CrossRef] [PubMed]
  3. Maleki, M.; Mahmoudi, M.R.; Heydari, M.H. Modeling and forecasting the spread and death rate of coronavirus (COVID-19) in the world using time series models. Chaos Solitons Fractals 2020, 140, 110151. [Google Scholar] [CrossRef] [PubMed]
  4. Sarkar, K.; Khajanchi, S.; Nieto, J.J. Modeling and forecasting the COVID-19 pandemic in India. Chaos Solitons Fractals 2020, 139, 110049. [Google Scholar] [CrossRef] [PubMed]
  5. Balli, S. Data analysis of COVID-19 pandemic and short-term cumulative case forecasting using machine learning time series methods. Chaos Solitons Fractals 2021, 142, 110512. [Google Scholar] [CrossRef] [PubMed]
  6. Ala’raj, M.; Majdalawieh, M.; Nizamuddin, N. Modeling and forecasting of COVID-19 using a hybrid dynamic model based on SEIRD with ARIMA corrections. Infect. Dis. Model. 2021, 6, 98–111. [Google Scholar] [CrossRef]
  7. Kumar, Y.; Koul, A.; Kaur, S.; Hu, Y.C. Machine learning and deep learning based time series prediction and forecasting of ten nations’ COVID-19 pandemic. SN Comput. Sci. 2022, 4, 91. [Google Scholar] [CrossRef] [PubMed]
  8. Fang, L.; Wang, D.; Pan, G. Analysis and estimation of COVID-19 spreading in Russia based on ARIMA model. Sn Compr. Clin. Med. 2020, 2, 2521–2527. [Google Scholar] [CrossRef]
  9. Ilie, O.D.; Cojocariu, R.O.; Ciobica, A.; Timofte, S.I.; Mavroudis, I.; Doroftei, B. Fore casting the spreading of COVID-19 across Nine countries from Europe, Asia, and the American continents using the ARIMA Models. Microorganisms 2020, 8, 1158. [Google Scholar] [CrossRef]
  10. Toğa, G.; Atalay, B.; Toksari, M.D. COVID-19 prevalence forecasting using Autoregressive Integrated Moving Average (ARIMA) and Artifcial Neural Networks (ANN): Case of Turkey. J. Infect. Public Health 2021, 14, 811–816. [Google Scholar] [CrossRef]
  11. Bartolomeo, N.; Trerotoli, P.; Serio, G. Short-term forecast in the early stage of the COVID-19 outbreak in Italy. Application of a weighted and cumulative average daily growth rate to an exponential decay model. Infect. Dis. Model. 2021, 6, 212–221. [Google Scholar] [CrossRef] [PubMed]
  12. Petropoulos, F.; Makridakis, S.; Stylianou, N. COVID-19: Forecasting confirmed cases and deaths with a simple time series model. Int. J. Forecast. 2022, 38, 439–452. [Google Scholar] [CrossRef] [PubMed]
  13. Lourenco, J.; Recker, M. Natural, persistent oscillations in a spartial multi-strain disease system with application to dengue. PLoS Comput. Biol. 2013, 9, e1003308. [Google Scholar] [CrossRef] [PubMed]
  14. Selvaraj, P.; Wenger, E.A.; Gerardin, J. Seasonality and heterogeneity of malaria transmission determine success of interventions in high-endemic settings: A modeling study. BMC Infect. Dis. 2018, 18, 413. [Google Scholar] [CrossRef] [PubMed]
  15. Polwiang, S. The time series seasonal patterns of dengue fever and associated weather variables in Bangkok (2003–2017). BMC Infect. Dis. 2020, 20, 208. [Google Scholar] [CrossRef] [PubMed]
  16. Yuan, H.; Kramer, S.C.; Lau, E.H.Y.; Cowling, B.J.; Yang, W. Modeling influenza seasonality in the tropics and subtropics. PLoS Comput. Biol. 2021, 17, e1009050. [Google Scholar] [CrossRef] [PubMed]
  17. Li, Z.; Zhang, T. Analysis of a COVID-19 epidemic model with seasonality. Bull. Math. Biol. 2022, 84, 146. [Google Scholar] [CrossRef]
  18. Ndlovu, M.; Moyo, R.; Mpofu, M. Modelling COVID-19 infection with seasonality in Zimbabwe. Phys. Chem. Earth Parts A/B/C 2022, 127, 103167. [Google Scholar] [CrossRef]
  19. Wiemken, T.L.; Khan, F.; Puzniak, L.; Yang, W.; Simmering, J.; Polgreen, P.; Nguyen, J.L.; Jodar, L.; McLaughlin, J.M. Seasonal trends in COVID-19 cases, hospitalizations, and mortality in the United States and Europe. Sci. Rep. 2023, 13, 3886. [Google Scholar] [CrossRef]
  20. Bukhari, Q.; Jameel, Y.; Massaro, J.M.; D’Agostino, R.B.; Khan, S. Periodic oscillations in daily reported infections and deaths for coronavirus disease 2019. JAMA Netw. Open 2020, 3, e2017521. [Google Scholar] [CrossRef]
  21. Bergman, A.; Sella, Y.; Agre, P.; Casadevall, A. Oscillations in U.S. COVID-19 incidence and mortality data reflect diagnostic and reporting factors. mSystems 2020, 5, e00544-20. [Google Scholar] [CrossRef] [PubMed]
  22. Dehning, J.; Zierenberg, J.; Spitzner, F.P.; Wibral, M.; Neto, J.P.; Wilczek, M.; Priesmann, V. Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science 2020, 369, 160. [Google Scholar] [CrossRef] [PubMed]
  23. Huang, J.; Liu, X.; Zhang, L.; Zhao, Y.; Wang, D.; Gao, J.; Lian, X.; Liu, C. The oscillation-outbreak characteristic of the COVID-19 pandemic. Natl. Sci. Rev. 2021, 8, nwab100. [Google Scholar] [CrossRef] [PubMed]
  24. Soukhovolsky, V.; Kovalev, A.; Pitt, A.; Shulman, K.; Tarasova, O.; Kessel, B. The cyclicity of coronavirus cases: “Waves” and the “weekend effect”. Chaos Solitons Fractals 2021, 114, 110718. [Google Scholar] [CrossRef] [PubMed]
  25. Campi, G.; Bianconi, A. Periodic recurrent waves of Covid-19 epidemics and vaccination campaign. Chaos Solitons Fractals 2022, 160, 112216. [Google Scholar] [CrossRef] [PubMed]
  26. Simeonov, O.; Eaton, C.D. Modeling the drivers of oscillations in COVID-19 data on college campuses. Ann. Epidemiol. 2023, 82, 40–44. [Google Scholar] [CrossRef] [PubMed]
  27. Ekinci, A. Modeling and forecasting of growth rate of new COVID-19 cases in top nine affected countries: Considering conditional variance and asymmetric effect. Chaos Solitons Fractals 2021, 151, 111227. [Google Scholar] [CrossRef] [PubMed]
  28. Hwang, E. Prediction intervals of the COVID-19 cases by HAR models with growth rates and vaccination rates in top eight affected countries: Bootstrap improvement. Chaos Solitons Fractals 2022, 155, 111789. [Google Scholar] [CrossRef]
  29. Ceylan, Z. Estimation of COVID-19 prevalence in Italy, Spain and France. Sci. Total Environ. 2020, 729, 138817. [Google Scholar] [CrossRef]
  30. Selinger, C.; Choist, M.; Alison, S. Predicting COVID-19 incidence in French hospitals using human contact network analytics. Int. J. Infect. Dis. 2021, 111, 100–107. [Google Scholar] [CrossRef]
Figure 1. USA: COVID-19 daily confirmed/death cases Y t with their 7-day smoothed data X t and PPO part Δ t = Y t X t of size n = 651 between 1 January 2021 and 13 October 2022, and the sample autocorrelation functions.
Figure 1. USA: COVID-19 daily confirmed/death cases Y t with their 7-day smoothed data X t and PPO part Δ t = Y t X t of size n = 651 between 1 January 2021 and 13 October 2022, and the sample autocorrelation functions.
Forecasting 06 00002 g001
Figure 2. Germany: COVID-19 daily confirmed/death cases Y t with their 7-day smoothed data X t and PPO part Δ t = Y t X t of size n = 651 between 1 January 2021 and 13 October 2022, and the sample autocorrelation functions.
Figure 2. Germany: COVID-19 daily confirmed/death cases Y t with their 7-day smoothed data X t and PPO part Δ t = Y t X t of size n = 651 between 1 January 2021 and 13 October 2022, and the sample autocorrelation functions.
Forecasting 06 00002 g002
Figure 3. Brazil: COVID-19 daily confirmed/death cases Y t with their 7-day smoothed data X t and PPO part Δ t = Y t X t of size n = 651 between 1 January 2021 and 13 October 2022, and the sample autocorrelation functions.
Figure 3. Brazil: COVID-19 daily confirmed/death cases Y t with their 7-day smoothed data X t and PPO part Δ t = Y t X t of size n = 651 between 1 January 2021 and 13 October 2022, and the sample autocorrelation functions.
Forecasting 06 00002 g003
Figure 4. The 7-day periodicity of the confirmed and death cases in the USA, Germany, and Brazil: (Repetition of { ω 0 , ω 1 ,     , ω 6 } ; 0 = Friday on the horizontal axis.
Figure 4. The 7-day periodicity of the confirmed and death cases in the USA, Germany, and Brazil: (Repetition of { ω 0 , ω 1 ,     , ω 6 } ; 0 = Friday on the horizontal axis.
Forecasting 06 00002 g004
Figure 5. One-step ahead forecasts of COVID-19 confirmed/death cases and one-step forecast errors for the last 420 days in the USA, Germany, and Brazil.
Figure 5. One-step ahead forecasts of COVID-19 confirmed/death cases and one-step forecast errors for the last 420 days in the USA, Germany, and Brazil.
Forecasting 06 00002 g005
Figure 6. Real values vs. forecasted values by ARIMA, SARIMA, and PPO-ARIMA models for the confirmed/death cases in the USA, Germany, and Brazil: Each plot gives slopes and R 2 -values of linear regressions.
Figure 6. Real values vs. forecasted values by ARIMA, SARIMA, and PPO-ARIMA models for the confirmed/death cases in the USA, Germany, and Brazil: Each plot gives slopes and R 2 -values of linear regressions.
Forecasting 06 00002 g006aForecasting 06 00002 g006b
Figure 7. The 95% prediction intervals of the USA, Germany, and Brazil confirmed/death cases for the last 70 days.
Figure 7. The 95% prediction intervals of the USA, Germany, and Brazil confirmed/death cases for the last 70 days.
Forecasting 06 00002 g007
Table 1. Statistics of daily confirmed (C) and death (D) cases with n = 651 days between 1 January 2021 and 13 October 2022; SD = standard deviation.
Table 1. Statistics of daily confirmed (C) and death (D) cases with n = 651 days between 1 January 2021 and 13 October 2022; SD = standard deviation.
USAGermanyBrazil
CDCDCD
Mean116,581.671079.8150,280.45161.8541,745.69759.27
SD148,459.10969.7364,236.28165.3239,735.55862.98
Min827549208000
Median76,415.0703.020,841.0113.030,671.0361.0
Max1,265,5205061307,9351045298,4084249
Skewness3.851.31.832.132.161.63
Kurtosis17.741.012.886.337.032.25
Table 2. Results of the ADF test, orders of the ARIMA ( p ,   d ,   q ) model, coefficient estimates ϕ ^ 1 , θ ^ 1 , θ ^ 2 , and the standard error (s.e.) of the ARIMA part X t in the PPO-ARIMA model Y t = X t + Δ t + ε t , where Y t denotes the (standardized) COVID-19 confirmed (C)/death (D) case data from the USA, Germany, and Brazil, with n = 651 days between 1 January 2021 and 13 October 2022.
Table 2. Results of the ADF test, orders of the ARIMA ( p ,   d ,   q ) model, coefficient estimates ϕ ^ 1 , θ ^ 1 , θ ^ 2 , and the standard error (s.e.) of the ARIMA part X t in the PPO-ARIMA model Y t = X t + Δ t + ε t , where Y t denotes the (standardized) COVID-19 confirmed (C)/death (D) case data from the USA, Germany, and Brazil, with n = 651 days between 1 January 2021 and 13 October 2022.
USAGermanyBrazil
CDCDCD
Test statistics−2.7255−3.9143 −1.6715−2.5841−3.967 −1.8058
p-value0.06970.0019 0.44580.09630.0016 0.3776
orders ( p , d , q ) (1,1,2)(1,0,1)(1.1.1)(1,1,2)(1,0,2)(1,1,2)
ϕ ^ 1 0.95620.99860.41150.96210.99130.9499
(0.005)(0.004)(0.028)(0.005)(0.003)(0.017)
θ ^ 1 −0.77220.27030.4279−0.48330.1981−0.6314
(0.014)(0.025)(0.025)(0.016)(0.019)(0.029)
θ ^ 2 0.2012--−0.17880.2308−0.1951
(0.021)--(0.016)(0.027)(0.022)
indicates that the ADF test rejects the unit–root non-stationarity at a 1% level.
Table 3. Estimation results of parameters for the partial-periodic part Δ t in the PPO-ARIMA model Y t = X t + Δ t + ε t , where Y t is the (standardized) COVID-19 confirmed (C)/death (D) case data from the USA, Germany, and Brazil, with n = 651 days between 1 January 2021 and 13 October 2022.
Table 3. Estimation results of parameters for the partial-periodic part Δ t in the PPO-ARIMA model Y t = X t + Δ t + ε t , where Y t is the (standardized) COVID-19 confirmed (C)/death (D) case data from the USA, Germany, and Brazil, with n = 651 days between 1 January 2021 and 13 October 2022.
USAGermanyBrazil
C  D  C  D  C  D  
x 0 −0.7295−1.063−0.7795−0.9788−1.0506−0.8798
δ 1.021.3581.130.9560.900.794
ω 0 0.24860.30200.26090.12870.26610.2453
ω 1 0.21130.15140.11880.04920.26600.2314
ω 2 0.23990.3126−0.3643−0.35010.19480.1355
ω 3 −0.3816−0.3443−0.7460−0.5727−0.0373−0.0437
ω 4 −0.5119−0.4915−0.16980.1421−0.4992−0.4823
ω 5 −0.0525−0.21930.47840.3045−0.4243−0.4141
ω 6 0.11040.32350.44150.24480.15420.2450
Table 4. Out-of-sample forecasting results and comparison: RMSE, MAE, and HMAE of k-step ahead forecasts, ( k = 1 ,   2 ,   3 ), for the last 420 days in the PPO-ARIMA models for the COVID-19 confirmed (C)/death (D) case data from the USA, Germany, and Brazil, and a comparison with those of the ARIMA and SARIMA models.
Table 4. Out-of-sample forecasting results and comparison: RMSE, MAE, and HMAE of k-step ahead forecasts, ( k = 1 ,   2 ,   3 ), for the last 420 days in the PPO-ARIMA models for the COVID-19 confirmed (C)/death (D) case data from the USA, Germany, and Brazil, and a comparison with those of the ARIMA and SARIMA models.
USAGermanyBrazil
k -StepCDCDCD
RMSEARIMA10.33350.39010.24570.21850.49150.1436
20.65970.61550.61180.41430.64320.2039
30.78210.82250.90980.57640.81620.2624
SARIMA10.30860.41150.25200.21550.49850.1635
20.69470.70140.62620.43310.63880.2259
30.81710.93840.92510.59770.82810.2832
PPO-ARIMA10.31530.29200.20270.16690.31520.0989
20.49010.53480.46480.34190.50010.1865
30.53060.78670.83250.53580.58570.1926
MAEARIMA10.17290.23070.14460.14130.26090.0940
20.33970.41840.40930.29190.37720.1376
30.45780.60140.63650.43400.48880.1834
SARIMA10.15710.23610.14590.14410.28130.1031
20.35210.47220.41640.30480.38370.1527
30.48170.68950.64850.44530.51070.1976
PPO-ARIMA10.16150.19940.12870.10770.16500.0614
20.26720.36910.30080.23390.26700.1119
30.31330.55090.55990.39310.32070.1355
HMAEARIMA10.23270.34210.20380.25260.41610.4318
20.52370.76450.55490.47980.69210.7285
30.80041.25840.97260.79740.97360.9928
SARIMA10.20810.38050.20630.28170.52450.4542
20.53750.91150.57530.53150.76130.8382
30.82921.51921.00820.82971.00431.0679
PPO-ARIMA10.23300.39980.20080.21570.34630.2946
20.44500.74360.60710.52710.47560.5286
30.59611.20101.12890.90190.69490.8046
The bold indicates the best values.
Table 5. Efficiency(%) of prediction by the PPO-ARIMA model, relative to the ARIMA and SARIMA models, respectively, defined as Effi A = 100 × ( f A f P P O ) / f P P O and Effi S = 100 × ( f S f P P O ) / f P P O where f { R M S E , M A E , H M A E } ; A = ARIMA, S = SARIMA, and PPO = PPO-ARIMA model.
Table 5. Efficiency(%) of prediction by the PPO-ARIMA model, relative to the ARIMA and SARIMA models, respectively, defined as Effi A = 100 × ( f A f P P O ) / f P P O and Effi S = 100 × ( f S f P P O ) / f P P O where f { R M S E , M A E , H M A E } ; A = ARIMA, S = SARIMA, and PPO = PPO-ARIMA model.
USAGermanyBrazil
f k -Step C  D  C  D  C  D  
RMSEEffi A 15.8033.5921.2130.9255.9345.19
234.6115.0931.6321.1828.619.33
347.404.559.297.5739.3536.24
Effi S 1−2.1240.9224.3229.1258.1565.32
241.7532.1534.7226.6727.7221.17
353.9919.1842.6511.5541.3747.04
MAEEffi A 17.0615.6912.3531.2058.1253.09
227.1313.3636.0723.7941.2722.96
346.129.1713.6810.4052.4135.35
Effi S 1−2.7218.4113.3633.7970.4867.92
231.7727.9338.4330.3143.7136.46
353.7525.1615.8213.2859.2545.83
HMAEEffi A 1−0.13−14.321.4917.1120.1646.57
217.682.81−8.59−8.9745.5237.82
334.274.78−13.84−11.5940.1123.39
Effi S 1−10.68−4.822.7330.5951.4554.18
220.7822.58−5.230.8460.0958.56
330.1026.49−10.69−8.0144.5232.72
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hwang, E. Improvement on Forecasting of Propagation of the COVID-19 Pandemic through Combining Oscillations in ARIMA Models. Forecasting 2024, 6, 18-35. https://doi.org/10.3390/forecast6010002

AMA Style

Hwang E. Improvement on Forecasting of Propagation of the COVID-19 Pandemic through Combining Oscillations in ARIMA Models. Forecasting. 2024; 6(1):18-35. https://doi.org/10.3390/forecast6010002

Chicago/Turabian Style

Hwang, Eunju. 2024. "Improvement on Forecasting of Propagation of the COVID-19 Pandemic through Combining Oscillations in ARIMA Models" Forecasting 6, no. 1: 18-35. https://doi.org/10.3390/forecast6010002

Article Metrics

Back to TopTop