1. Introduction
The COVID-19 pandemic is the most significant global crisis since the second world war. The disease is currently spreading across the globe at a surprisingly faster rate, affecting more than 213 countries, infecting more than 545,900,772 people and leading to 6,343,950 deaths worldwide as of 22 June 2022 according to the World Health Organization (WHO) [
1].
COVID-19 is an infectious disease caused by the emergence of the new coronavirus in Wuhan, China, in December 2019. Four to five days after a person contracts the virus, symptoms typically appear. However, in some cases, the onset of symptoms can take up to two weeks. Some individuals never even exhibit any symptoms. The most common symptoms of COVID-19 are fever, cough, shortness of breath, fatigue, shaking chills, muscle pains, headaches, sore throats, runny or stuffy noses, and issues with taste or smell (
Figure 1). If a patient has some of the symptoms presented in
Figure 1, they are asked to test immediately. Saudi Arabia adopted the two tests approved by the American Food and Drug Administration (FDA) for diagnosing COVID-19, namely the Reverse Transcription Polymerase Chain Reaction (RT-PCR) and Antigen tests. RT-PCR is also called a molecular test. It detects the genetic material of the virus using a lab technique called reverse transcription polymerase chain reaction. A medical expert will take a fluid sample from the back of a patient’s nose by inserting a nasal swab into his nostril. When properly conducted by a medical specialist, RT-PCR tests are quite accurate; however, the quick test may miss some cases. If the patient is infected with a virus at the time of the test, results will reveal its presence. Even when the patient is no longer sick, the test may still be able to find remnants of the virus. The Antigen test detects certain proteins in the virus. This test is fast but it is less accurate than PCR. There is a higher likelihood of false-negative results, which means it is possible to have the viral infection but have a negative result. Depending on the circumstances, the medical professional might advise performing an RT-PCR test to confirm a negative antigen test result.
The disease symptoms encountered in Saudi Arabia were found to be common and similar to those encountered worldwide. As illustrated in
Figure 1, such symptoms subside in a few days to weeks. However, a small percentage of infected persons develop severe illnesses and lose the ability to breathe on their own. In extreme circumstances, their organs fail, which can be fatal. The disease effect depends on the age and health of the infected individual. In fact, people who have cardiac illness, chronic obstructive pulmonary disease, asthma, high blood pressure, or weakened immune systems may experience severe problems. In some cases, the COVID-19 virus can cause death. In addition to the previous health consequences, the spread of the disease has also led to severe effects on social and economic systems [
2,
3,
4]. In fact, due to stopping many economic activities, several persons have lost their jobs and many companies have closed. The disease’s spread dynamics can be explained by several factors including the demographic distribution of the population, the efficiency of the public healthcare system, the mitigation countermeasures taken by local health authorities and the availability of vaccines, among other factors. The COVID-19 pandemic’s evolution, like that of earlier pandemics, is not entirely random. The path of the disease looks like a life cycle, with the outbreak followed by an acceleration phase, an inflection point, a deceleration phase, and finally a stop or termination. The occurrence of novel variants of the virus such as Omicron or Delta, as well as other factors such as the organization of vaccination campaigns, may be all linked to the disease spread. Due to preventative measures, such as lockdowns and social distancing, the disease life cycle may differ from one country to another, and various countries may be in different phases at a given moment. In the same country, the disease might sometimes present several features depending on the region, most likely as a result of sociological and climatic factors. Research is currently being undertaken using various mathematical models to forecast the progression of the pandemic and to characterize its dynamics in order to aid in understanding its trajectory through time. The Susceptible-Infectious-Removed (SIR) class of compartmental modelling techniques, developed by Kermack and McKendrick [
5] almost a century ago, is one of those popular models. SIR has played a key role in treating infectious diseases and continues to do so. The “Susceptible”, “Infectious”, and “Removed” percentages of a given population are divided into compartments. These compartments are related by dynamic interactions that are represented by non-linear ordinary differentials (ODEs). Authors in [
6,
7,
8] used SIR models to forecast the COVID-19 outbreak, respectively, in India, Algeria and Saudi Arabia. The simulation results showed the necessity of interventions in flattening the disease propagation curve, delaying the peak, and lowering the fatality rate.
In order to predict the spread of COVID-19, it was discovered that the “econometric models” family of models was effective. The time series Auto-Regressive-Integrated-Moving-Average (ARIMA) model is the most well-known member of this family. The prevalence and incidence of COVID-19 were predicted by the authors in [
9] using the ARIMA model and the Johns Hopkins epidemiological data. Authors in [
10,
11,
12,
13,
14] performed forecasting of the spread of the COVID-19 pandemic using the ARIMA prediction model under current public health interventions, respectively, in Saudi Arabia, Kuwait, Egypt, Korea and Morocco. The prediction accuracy of the ARIMA model was found to be acceptable and adequate. Authors in [
15] came to the conclusion that the ARIMA model, when compared to the AR (Auto Regression) model, provides the best match for predicting new cases in India. The authors of [
16] sought to create the best models to anticipate new daily instances. The daily new cases in India and the US were fitted using ARIMA and a hybrid ARIMA model. In India, the ARIMA model’s predictive values were the most accurate; however, in the US, the hybrid ARIMA model performed better.
As an alternative to epidemiological and time series models, machine learning models showed potential in predicting COVID-19, as they did for modeling other outbreaks. References [
17,
18,
19,
20] included an overview of research using mathematical, machine learning and Deep Learning models to detect, diagnose, or forecast COVID-19. Authors in [
21,
22,
23,
24] used Support Vector Machine, variants of Recurrent neural network (RNN) and variants of long-short term memory (LSTM) for predicting COVID-19. It was found that these algorithms were effective due to the non-linear nature of how they handle the datasets.
References [
25,
26,
27] trained the Facebook Prophet model in order to examine and predict the number of COVID-19 cases and fatalities based on the previously available data.
From the above literature review, a plethora of techniques from the field of statistics, data science, machine learning and artificial intelligence [
28] have been used for COVID-19 prediction. To the best of the authors’ knowledge, the accuracy of the time-series and machine learning models are sensitive to the case study as well as to the time window covered by the utilized datasets. Saudi Arabia is a country of rapid economic growth, visited annually by millions of Muslim people for performing Hajj and Umrah and hosting immigrants from different countries, cultures and religions. Saudi Arabia also has a large surface territory exhibiting various climatic conditions. Therefore, Saudi Arabia includes several patterns making it a suitable case study that can represent Arabic, Muslim, developing and Oil-Producer countries. In addition, Saudi Arabia has had a relatively successful experience in mitigating the disease.
For the reasons cited above, this study’s main aim is to investigate statistical and machine learning-inspired time series approaches for modeling/analyzing the spread of COVID-19 in Saudi Arabia in terms of numbers of confirmed, recovered and death cases. More specifically, ARIMA and Facebook’s Prophet approaches were developed and then compared. In addition, healthcare implications regarding the Saudi experience in facing the disease will be analyzed according to the time-line evolution and more particularly regarding the events related to the countermeasures taken. Saudi Arabia announced on Monday 13 June 2022, that it is relaxing the restrictions on the use of face masks in enclosed spaces with the exception of the Grand Mosque in Mecca and the Prophet’s Mosque in Medina, as well as medical facilities, public gatherings, sporting events, flights, and public transportation. This study aims to evaluate the future short-term impacts of such a decision.
The rest of this paper is structured as follows. In
Section 2, a data description and the proposed methods are introduced. Detailed experiments are outlined in
Section 3.
4. Discussion
To date, over 615 million confirmed cases have been reported, and over 6 million people died worldwide. The physical and mental health of people is seriously impacted by COVID-19, which also has an impact on everyone’s lifestyle and the world economy. Given that COVID-19 is currently a serious global crisis, it is essential to fully comprehend the pandemic curve and forecast its future trajectory. An accurate COVID-19 forecast may have various advantages. It can assist in preserving lives, minimizing losses of financial resources, managing medical and human resources in healthcare, and reviving the world economy. The total number of cases of COVID-19 (or any other disease) is a common example of time series data, and there are currently a variety of analysis techniques to help identify patterns or forecast trends in such data. Time series data analysis has been proven to be successful with statistical methods (AR, MA, ARIMA, etc.), machine and deep learning methods (ANN, LSTM, etc.), and many other methods. Some of these techniques have already provided significant insights into the dynamics of infectious disease transmission and its surveillance. Records of new infections, recoveries and deaths are often released daily, which makes the task of predicting the future trend of the disease having lower performance when using data-demanding techniques such as deep learning (DL). However, statistical techniques such as ARIMA and straightforward machine learning (ML) techniques such as Facebook’s Prophet continue to show good performance. To the best of the authors’ knowledge, using ARIMA and Facebook’s Prophet with more than two years of COVID-19 pandemic data in Saudi Arabia represents the first study of its category. This relatively long period was marked by several events such as the vaccination campaigns organized over many countries, the emergence of new variants of the virus as well as the relaxation of lockdown measures and mask wearing. For every time series forecasting task, establishing a baseline is crucial. Usually, before starting any forecasting exercise, a baseline model should be first investigated. In that sense, the baseline model offers a basis for comparison for any additional forecasting approach. The newly developed approach should be modified or replaced if its performance is at the same level or below the baseline model. The persistence or the “Zero Rule” algorithm, also called the naive model, is the most often used. It uses the value at the current time step (t) to predict the expected outcome at the next time step (t + 1) [
33]. In this study, Facebook’s Prophet and the ARIMA models are compared to the naive model.
Table 10 includes the RMSE, MAE, and
metrics for each model used in this study. The table clearly shows that the ARIMA model outperforms both Facebook’s Prophet and the baseline models at predicting confirmed and recovered cases of COVID-19 since it has the highest
and the lowest MAE. However, the Prophet model performs better at predicting death cases. The ARIMA and Prophet model performances are comparable with almost similar results. Both models predict values that are practically equal to the actual values, which means they may be very useful in anticipating the number of confirmed, recovered and death cases in the future and provide individuals helpful recommendations on how to improve the COVID-19 mitigating measures. The optimal performance that ARIMA is capable of relies on a particular tuning parameter range. With a wide parameter selection, the model may identify better parameters for each of the cases under consideration; however, the amount of time and computational cost required to implement it may significantly increase. Under these conditions, data scientist may sacrifice an amount of the accuracy for quicker and simpler implementation of the model by using a restricted range of parameter selection. A reduced range of parameters for faster grid search (or similar approaches) can thus be used to speed up computing in some circumstances where the model accuracy does not need to be exceptionally high. Regarding the Prophet model, it has been found to present clear benefits and drawbacks. In fact, it requires the least amount of design and computing effort among ARIMA (for confirmed, recovered, and death cases). Furthermore, in this research, it has been found to be robust to missing data in a time series.
Although a fair comparison should be conducted on the same data set, covering the same period and the same location, results obtained in this study have been compared to those of three previous works in Saudi Arabia. The work in reference [
10] used the ARIMA model but covered the beginning of the pandemic (From 2 March 2020 to 20 April 2020). The disease spread has after that shown many fluctuations. Our work is more beneficial to the work in [
10] since it covered a period of more than two years where many aspects and virus variants appeared and highly impacted the disease dynamics. Reference [
40] developed several growth models for the case study of Saudi Arabia. It obtained a coefficient of determination comparable to the performances of our present study. However, those models failed in predicting the probable end date of the disease. The work in [
41] provided performances worst than those obtained in this study although it used sophisticated deep learning techniques shown to be data demanding.
In this study, the Prophet and ARIMA models produced reliable findings. However, given the complexity of the COVID-19 situation which exhibited concerns with virus mutation, population density, international travel, human behaviors, etc., the Prophet and ARIMA models are not well suited to handle such data trends. They occasionally perform worse than the baseline model in terms of RMSE and MAE (see
Table 10). To overcome this issue, either multivariate time series forecasting or hybrid techniques may be implemented. Multivariate forecasting can be used to improve the thoroughness of the experiment and attain the best results since more data sources can improve accuracy. Recently, hybrid models ([
42,
43,
44]) were used in time series analysis. For improved outcomes, these models typically incorporate machine learning techniques such as ANN (Artificial Neural Network) and statistical models including ARIMA. Future research can examine these models to improve their ability to forecast COVID-19.