# Improvement of Time Forecasting Models Using Machine Learning for Future Pandemic Applications Based on COVID-19 Data 2020–2022

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. ARIMA Modelling

#### 2.2. Support Vector Machines Modelling

#### 2.3. Least-Square Support Vector Machines Modelling

#### 2.4. Proposed Hybrid Model

#### 2.5. Proposed Algorithm

**Step 1**: Three selected time series of COVID-19 cases datasets (1 October 2020–4 November 2022), namely daily new positive cases, daily new deaths cases, and daily new recovered cases, are generated in R programming Language.**Step 2**: Each of the generated datasets is defined as $\left\{{X}_{1i}={x}_{11},{x}_{12},{x}_{13},\dots ,{x}_{n1}\right\}$, $\left\{{X}_{2i}={x}_{21},{x}_{22},{x}_{23},\dots ,{x}_{2n}\right\},$ and $\left\{{X}_{3i}={x}_{31},{x}_{32},{x}_{33},\dots ,{x}_{3n}\right\}$ for daily new positive cases, daily new deaths cases, and daily new recovered cases, respectively. Then, the best ARIMA (p, d, q) is selected after checking the autocorrelation function (ACF) plot of ARIMA (p, d, q) residuals. The best fitted value for daily new positive cases is ARIMA (2, 1, 2), while it is ARIMA (1, 1, 2) and ARIMA (0, 1, 1) for daily new fatalities cases and daily new recovered cases of COVID-19, respectively.**Step 3**: The fitted value, ${\mathcal{Y}}_{t-i}=({\mathcal{Y}}_{t-1}$, ${\mathcal{Y}}_{t-2}$, …, ${\mathcal{Y}}_{t-m})$ and the residuals ${\epsilon}_{t-i}=\left({\epsilon}_{t-1},{\epsilon}_{t-2},\dots .,{\epsilon}_{t-n}\right)$.**Step 4**: Combine the values in step 3 as a set of input variables to obtain the output ${\mathcal{Y}}_{t}$**Step 5**: The ARIMA (p, d, q) is defined by the order of q. According to the information in step 4, Vector Machines is carried out to examine the residuals to obtain the output ${L}_{t}$ using R-programming Language.**Step 6**: A fitted value of ARIMA with the hybridization of Vector Machines model is obtained for all sample data. Then, the residuals ${\epsilon}_{t}$ is generated to obtain the forecasting result $\widehat{{\mathcal{N}}_{t}}$.**Step 7**: The framing data split randomly into training data and testing data for further Vector Machines modelling. Run the Vector Machines procedure using the “e1071” and “liquidSVM” package in R-Programming Language.**Step 8**: The two modifiable parameters of the LSSVM technique (γ and σ) derived by objective function minimization such as mean square error (MSE). The grid-search method updates the parameters exponentially in the specified range using predetermined equidistant steps.**Step 9**: Assume the split data as the processing data and the order q as in Step 5. Therefore, the combine forecast as in Equation (16): ${\widehat{\mathcal{Y}}}_{\mathrm{t}}={\widehat{\ell}}_{\mathrm{t}}+\widehat{{\mathcal{N}}_{t}}$**Step 10**: Estimate the model performance using the statistical measurement which are MSE, RMSE, MAE, and MAPE.

#### 2.6. Forecasting Evaluation Criteria

## 3. Results and Discussion

#### 3.1. Application of the Hybrid Model of COVID-19 in Malaysia

#### 3.1.1. New Positive Cases Data Forecasts

#### 3.1.2. New Deaths Cases Data Forecasts

#### 3.1.3. New Recovered Cases Data Forecasts

## 4. Conclusions

## 5. Limitations and Future Recommendation

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

**Table 1.**Descriptive statistics of COVID-19 daily new cases, death, and recovered cases of Malaysia.

New Case | New Death | New Recovered | |
---|---|---|---|

Min | 2.60000 × 10^{2} | 0 | 1.8 |

1st Qu | 1.9220 × 10^{3} | 4 | 1.8430 × 10^{3} |

Median | 3.4710 × 10^{3} | 11 | 3.4470 × 10^{3} |

Mean | 6.4155 × 10^{3} | 47.5098 | 6.3227 × 10^{3} |

3rd Qu | 6.8240 × 10^{3} | 58 | 6.7750 × 10^{3} |

Max | 3.3406 × 10^{4} | 592 | 3.3872 × 10^{4} |

SD | 7.0978 × 10^{3} | 81.1215 | 7.0583 × 10^{3} |

COVID-19 Daily Cases | ARIMA (p, d, q) | AIC | BIC |
---|---|---|---|

Daily New Positive Cases | (2, 1, 2) | 12,564.54 | 12,587.73 |

Daily New Deaths Cases | (1, 1, 2) | 6930.12 | 6948.63 |

Daily New Recovered Cases | (0, 1, 1) | 13,044.74 | 13,054.01 |

Model Parameters | Estimate | Z-Stat | p-Value |
---|---|---|---|

New Case ARIMA (2, 1, 2) | |||

${\theta}_{1}$ | 1.2408 | 120.085 | 2.2 × 10^{−16} |

${\theta}_{2}$ | −0.9715 | −98.320 | 2.2 × 10^{−16} |

${\phi}_{1}$ | −1.2628 | −42.225 | 2.2 × 10^{−16} |

${\phi}_{2}$ | 0.8738 | 48.102 | 2.2 × 10^{−16} |

Recovered Case ARIMA (0, 1, 1) | 2.2 × 10^{−16} | ||

${\phi}_{1}$ | −0.3473 | −9.953 | 2.2 × 10^{−16} |

Death Case ARIMA (1, 1, 2) | 2.2 × 10^{−16} | ||

${\theta}_{1}$ | 0.8595 | 19.852 | 2.2 × 10^{−16} |

${\phi}_{1}$ | −1.6196 | −35.651 | 2.2 × 10^{−16} |

${\phi}_{2}$ | 0.7039 | 20.432 | 2.2 × 10^{−16} |

COVID-19 Daily Cases | LSSVM Parameter | MSE |
---|---|---|

γ = 11, σ = 0.008 | 11,432,512 | |

γ = 38, σ = 0.008 | 10,235,488 | |

Daily New Positive Cases | γ = 74, σ = 0.008 | 9,025,413 |

γ = 110, σ = 0.008 | 8,014,123 | |

γ = 264, σ = 0.008 | 6,661,412 | |

γ = 25, σ = 0.006 | 1678.364 | |

γ = 56, σ = 0.006 | 1233.481 | |

Daily New Deaths Cases | γ = 277, σ = 0.006 | 965.143 |

γ = 436, σ = 0.006 | 554.368 | |

γ = 877, σ = 0.006 | 250.887 | |

γ = 54, σ = 0.008 | 28,412,113 | |

γ = 89, σ = 0.008 | 27,140,039 | |

Daily New Recovered Cases | γ = 125, σ = 0.008 | 26,412,142 |

γ = 275, σ = 0.008 | 23,032,256 | |

γ = 334, σ = 0.008 | 21,114,252 |

MODELS | TRAIN | TEST | ||||
---|---|---|---|---|---|---|

MSE | MAE | MSE | MAPE | RMSE | MAE | |

ARIMA | 929,843.169 | 611.0274 | 298,988.28 | 0.15167 | 546.7982 | 397.57 |

SVM | 8,355,184.483 | 2001.644 | 274,588.16 | 0.15421 | 524.0116 | 390.3848 |

LSSVM | 1084.1527 | 739.5387 | 83,026.550 | 0.07580 | 288.1432 | 205.6450 |

ARIMA–SVM | 42,552.7137 | 90.34845 | 61,223.474 | 0.05633 | 247.4337 | 146.9841 |

ARIMA–LSSVM | 10,634.1142 | 46.54471 | 25,478.114 | 0.01547 | 159.6182 | 75.6987 |

**Table 6.**Percentage improvement of the proposed models with other forecasting models (the COVID-19 cases of daily new positive cases).

Model | MAE | MAPE | MSE | RMSE |
---|---|---|---|---|

ARIMA | 80.9596549 | 89.80022417 | 91.47855762 | 70.80857252 |

SVM | 80.60920917 | 89.96822515 | 90.72133554 | 69.53918577 |

LSSVM | 63.18962289 | 79.59102902 | 69.31329316 | 44.60455773 |

ARIMA–SVM | 48.49871517 | 72.5368365 | 58.38505669 | 35.49051726 |

MODELS | TRAIN | TEST | ||||
---|---|---|---|---|---|---|

MSE | MAE | MSE | MAPE | RMSE | MAE | |

ARIMA | 697.999 | 11.8083 | 6.06741 | 0.56838 | 2.46321 | 1.92791 |

SVM | 1409.19 | 21.8006 | 5.38920 | 0.53687 | 2.32146 | 1.85605 |

LSSVM | 505.181 | 11.4309 | 5.38920 | 0.53687 | 2.32146 | 1.85605 |

ARIMA–SVM | 49.4459 | 3.53812 | 0.92630 | 0.19088 | 0.96303 | 0.76230 |

ARIMA–LSSVM | 19.6422 | 1.03218 | 0.89114 | 0.18741 | 0.94400 | 0.72364 |

**Table 8.**Percentage improvement of the proposed models with other forecasting models (the COVID-19 cases of daily new death cases).

Model | MAE | MAPE | MSE | RMSE |
---|---|---|---|---|

ARIMA | 62.46505283 | 67.02734086 | 85.31267872 | 61.67602437 |

SVM | 61.60592539 | 66.73588924 | 83.92334934 | 59.90434808 |

LSSVM | 61.01182619 | 65.09210796 | 83.46433608 | 59.33593514 |

ARIMA–LSSVM | 5.071494162 | 1.81789606 | 3.795746518 | 1.976054744 |

**Table 9.**Performance measures of the proposed model for daily new recovered COVID-19 cases datasets.

MODELS | TRAIN | TEST | ||||
---|---|---|---|---|---|---|

MSE | MAE | MSE | MAPE | RMSE | MAE | |

ARIMA | 1,802,678.36 | 804.4378 | 271,462.22 | 0.1560 | 521.0203 | 387.2768 |

SVM | 7,636,804.13 | 1890.917 | 239,672.00 | 0.1504 | 489.5630 | 371.6573 |

LSSVM | 1,206,113.52 | 723.9413 | 149,871.53 | 0.1127 | 387.1324 | 285.9190 |

ARIMA–SVM | 99,205.699 | 136.8519 | 26,108.02 | 0.0396 | 161.5797 | 104.1002 |

ARIMA–LSSVM | 47,602.551 | 80.2214 | 13,004.11 | 0.0125 | 114.0351 | 54.14471 |

**Table 10.**Percentage improvement of the proposed models with other forecasting models (the COVID-19 cases of daily new recovered cases).

Model | MAE | MAPE | MSE | RMSE |
---|---|---|---|---|

ARIMA | 86.01911863 | 91.98717949 | 95.20960596 | 78.11311767 |

SVM | 85.43154944 | 91.68882979 | 94.57420558 | 76.70675684 |

LSSVM | 81.06291992 | 88.90860692 | 91.32316191 | 70.54364347 |

ARIMALSSVM | 47.98789051 | 68.43434343 | 50.19112901 | 29.42485968 |

