Next Article in Journal
Improving the Efficiency of Fuel Combustion with the Use of Various Designs of Embrasures
Previous Article in Journal
Analytical Calculation of Air Gap Magnetic Field of SPMSM with Eccentrically Cut Poles Based on Magnetic Pole Division
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Oil Price Forecasting Using FRED Data: A Comparison between Some Alternative Models

by
Abdullah Sultan Al Shammre
1 and
Benaissa Chidmi
2,*
1
Economic Department, College of Business Administration, King Faisal University, Alahsa 31982, Saudi Arabia
2
Department of Agricultural & Applied Economics, Texas Tech University, Lubbock, TX 79424, USA
*
Author to whom correspondence should be addressed.
Energies 2023, 16(11), 4451; https://doi.org/10.3390/en16114451
Submission received: 5 May 2023 / Revised: 26 May 2023 / Accepted: 28 May 2023 / Published: 31 May 2023

Abstract

:
This paper investigates the forecasting accuracy of alternative time series models when augmented with partial least-squares (PLS) components extracted from economic data, such as Federal Reserve Economic Data, as well as Monthly Database (FRED-MD). Our results indicate that PLS components extracted from FRED-MD data reduce the forecasting error of linear models, such as ARIMA and SARIMA, but produce poor forecasts during high-volatility periods. In contrast, conditional variance models, such as ARCH and GARCH, produce more accurate forecasts regardless of whether or not the PLS components extracted from FRED-MD data are used.

1. Introduction

Energy consumption is required to power the worldwide economy, including most human activities [1]. Accordingly, nearly all worldwide industrial production activities, transportation systems, economic development projects, and machinery and equipment rely upon the energy production and distribution of oil, natural gas, coal, electricity, biofuels, waste, and other sources of supply and their derivatives to enable all of these integrated components of the worldwide economy to function [2]. This also means that energy prices that fuel all parts of the supply chain distribution system are a vital fraction of the price of every commercial product that requires shipping and electrical products to function [3].
Because of this dependence, unexpected steep increases in energy prices can suppress economic growth. They could generate inflation in countries that are net importers of oil, depending upon circumstances, duration, the type of oil, existing inventory, the size of the nation, energy substitution capability, and the number of industries affected [4,5]. Conversely, substantial energy price drops create serious budgetary problems for oil-exporting countries [6]. At present, oil, which is by far the most highly consumed and concentrated energy source, provides somewhere between 30–35% of all of the energy needed to run the global economy, which is 5–10% more than coal, 10–15% more than natural gas, and 25–30% more than renewable and nuclear energy [7].
Despite the importance of energy in human activities, there needs to be more consensus on the roles of energy consumption and energy pricing mechanisms [8]. Historical examples such as oil demand shocks in the price of oil in 1973/74, 1979/80, 1986, 1997/2000, and 2003/2008 are examples of complex events that have both stimulating and depressing effects on national economies. There is also a lack of consensus about the predictive accuracy of energy commodity pricing models, e.g., [9,10]. The lack of understanding of how oil prices are developed and structured has resulted in unexpected, international high and low oil price volatility, spikes, and supply and demand shocks. Price volatility has a significant impact on global financial markets, economic development activity, and political stability, as well as on automobile fuel prices, airline transportation prices, and shipping rates. It also financially impacts national economies and consumer goods and services.
To alleviate the oil price modeling challenges, researchers have used a panoply of models spanning from Stevenson and Bear’s [11] most trivial random walk to highly sophisticated nonlinear models, such as dynamic model averaging, not to mention the recent advances in commodity forecasting using machine learning models. Most of the literature on commodity price forecasting can be grouped into three categories: univariate time series, multivariate time series, and volatility forecasting. In turn, under each category, researchers have used linear and nonlinear models with and without exogenous variables.
The autoregressive and moving average (ARMA) model of Box and Jenkins [12] and its variants are widely used in time series forecasting. ARMA models stipulate that a stationary (A time series is stationary if its mean, variance, and covariance do not vary with time. If the series is not stationary, we can integrate it to make it stationary by taking successive differences, yielding an autoregressive integrated moving average (ARIMA) process) time series can be modeled as a weighted average of its past observations (AR process) and the past observations of the white noise process. The literature abounds with studies that use ARIMA as either a forecasting tool or a benchmark in predicting commodity prices, for example, [13,14,15,16] for oil prices, [17] for electricity prices, and [18] gold.
There are other variations of the ARIMA model, such as the seasonal autoregressive integrated moving average or SARIMA, the controlled autoregressive integrated segmented moving average or CARISMA [19], the fuzzy autoregressive integrated moving average or FARIMA [20], and the fuzzy seasonal autoregressive integrated moving average (FSARIMA) that combines the seasonal SARIMA with the fuzzy regression model [21]. The ARIMA model, Poisson, Markov, autoregressive, moving average, ARMA, and ARIMA processes are limited to short-range dependencies. In addition, the autoregressive fractional integral moving average model (ARFIMA) is identified as a fractional order signal processing technique as a generalization of the ARIMA and the ARMA models [22]. Because of this integration, the authors of [22] found the ARFIMA to offer broader applications than all of the previously mentioned time series analysis methodologies with both short-term and long-term dependence. In addition, the authors found the ARFIMA forecasting model to result in a superior fit compared with established integer-order models when working with spatial or time series data with long-range dependence (LRD), that is, long-memory or long-range persistence.
To overcome the limitations of the ARMA and its variant models, some researchers investigate whether oil price time series portray long-memory properties or volatility (see, for example, [23,24,25,26,27]). However, with evidence of nonlinear dependence, predictable returns, and volatility, long-memory properties contradict the validity of weak-form oil market efficiency. Although, at the same time, some studies have empirically tested the modeling and forecasting of long-memory fluctuations in crude oil markets with the use of generalized autoregressive conditional heteroskedasticity (GARCH) type models (e.g., [28,29,30,31]; researchers conducting these studies still think that long memories and volatility that appear in returns are irrelevant. Nevertheless, because it has been well-known that market shocks have a substantial simultaneous influence on returns and volatility, they have dual long-memory properties. Thus, based on this, researchers such as [32,33,34,35] use the joint ARFIMA-FIGARCH model to study the relationship between returns and volatility in economic and financial time series. This model is well suited to conducting such an analysis of a process demonstrating dual long-memory properties.
The recent availability of economic data, such as the Federal Reserve Economic Data (FRED) (For a detailed description of the FRED data, see [36]) and the Economic Policy Uncertainty data (EPU; the authors of [37] have presented new research perspectives to use the exogenous variable approach to forecast oil prices. For instance, the authors of [38] use FRED and EPU datasets and other financial variables to predict crude oil prices using various machine learning models. However, these data exhibit high multicollinearity among their variables. For instance, Appendix Table A1 indicates that approximately 50% of the variables in FRED data show a correlation coefficient greater than 0.50 in absolute value, with more than 26% exhibiting a correlation coefficient greater than 0.75.
The high correlation coefficient and dimensionality of big data, such as FRED, make data reduction techniques suitable by projecting the high dimension data onto a few orthogonal components, eliminating multicollinearity, and reducing the computational cost when the big set is used (for instance, the dynamic model averaging, introduced by [39], does not allow more than 30 variables in the DMA R package). Principal component analysis (PCA) appears to be a popular choice among researchers in the area of economics and finance (see, for example, [40,41,42,43]). The partial least squares (PLS) method is another data reduction technique to avoid the multicollinearity problem while taking advantage of the data pattern. While the PCA technique uses only the explanatory variables to extract the principal components without considering how each variable relates to the dependent variable, the partial least squares method offers an alternative approach to PCA by capturing the relationship between explanatory variables and dependent variables when extracting the components.
This paper assesses the out-of-sample forecasting accuracy of several time series models in predicting the West Texas Intermediate (WTI) crude oil prices. We consider a basic ARIMA model, a seasonal ARIMA (SARIMA), the partial least squares using FRED economic data, augmented ARIMA and SARIMA models with exogenous variables (the PLS components, COVID-19, and the 2008 financial crisis dummy variables), an autoregressive conditional heteroskedasticity model (ARCH), and a generalized autoregressive conditional heteroskedasticity model (GARCH). We also ask and answer the following question: Can economic data, such as FRED, improve oil forecasting accuracy? We compare these alternative models using the mean absolute (MAE), the mean absolute percentage error (MAPE), and the root-mean-squared error (RMSE) of the out-of-sample predictions.
This paper adds to the existing literature on oil price forecasting in at least two aspects. First, we combine mean and variance models to produce more accurate oil price forecasts. The results show that considering the volatility equation along the mean equation improves the forecasting accuracy by 70%. Second, we use the FRED data after reducing its dimension using partial least-squares. This is crucial as the original FRED data are characterized by a high correlation among most variables, potentially inducing overfitting.
The rest of the paper is organized as follows. First, in Section 2, we describe the alternative models used for prediction. Then, Section 3 describes the data and estimation issues. Finally, in Section 4 and Section 5, we present and discuss the results; in Section 6, we conclude and suggest future research avenues.

2. Methods

We start this section by presenting the partial least squares method to extract the most relevant components from FRED economic data, to explain the variation in crude oil price WTI. Then, we present ARIMA and SARIMA models and their variants and close the section by describing the ARCH-GARCH models.

2.1. Partial Least Squares

The starting point is the research’s goal to explain the variation of the dependent variable y by the variation of the explanatory variables X. However, using the linear regression  y = X β + ϵ  poses the issue of collinearity in the variables of the matrix X. To overcome this issue, in the case of principal component analysis, the matrix X of explanatory variables is decomposed into a smaller set of uncorrelated components. While PCA extracts the features without considering the dependent variable y, the partial least squares extract the components by maximizing the covariance between y and X. We project the explanatory variable space into components z, such as  z = v X  with  v v = I , where I is the identity matrix. Similarly, we project the dependent variable space into components r, such that  r = w y , where  w w = I  (y canrepresent multivariate dependent variables. In this study, y represents the oil price.) The loading factors v and w are then found by maximizing the covariance between the loading scores z and r, that is,
max E z r = max E [ v X w y ] s u b j e c t t o v v = I a n d w w = I
With v at hand, we can predict the dependent variable as  y = z γ = v X γ . Another alternative is to use a set of components explaining a given level of variance in y as regressors instead of X.

2.2. ARIMA-SARIMA Models

In these models, the dependent variable  y t  is a weighted average of the past value of  y t  and a white noise error term  ϵ t , that is,
y t = δ + θ 1 y t 1 + θ 2 y t 2 + + θ p y t p + ϵ t + α 1 ϵ t 1 + α 2 ϵ t 2 + + α t q ϵ t q ,
where p is the number of autoregressive processes (AR) and q is the number of moving average processes (MA). The dependent variable needs to be stationary; otherwise, it should be differenced as many times as necessary to make it stationary, yielding the integration order, d. So, an  A R I M A ( p , d , q )  is a process that has been differenced d times and decomposed into p AR and q MA processes.
In addition, some time series exhibit seasonality in the  AR , the d, and the  MA  processes. Therefore, the ARIMA model is modified accordingly, yielding the seasonal ARIMA model or SARIMA. The following equation summarizes the SARIMA model:
y t = δ + j = 1 p θ j y t j + l = 1 P ϕ l y t l × s j = 1 p l = 1 P θ j ϕ l y t l × s j + m = 1 q α m ϵ t m + k = 1 Q λ k ϵ t k + m = 1 q k = 1 Q α m λ k ϵ t k × s m + ϵ t ,
where P indicates the number of seasonal autoregressive terms, Q represents the number of moving average terms, and s is the number of seasonal periods (for example,  s = 12  for monthly data, and  s = 4  for quarterly data). For instance,  S A R I M A ( p = 1 ,   d = 0 ,   q = 1 ) ( P = 1 ,   D = 0 ,   Q = 1 ) s = 12 = S A R I M A ( 1 , 0 , 1 ) ( 1 , 0 , 1 ) 12  is given by:
y t = δ + θ 1 y t 1 + ϕ 1 y t 12 θ 1 ϕ 1 y t 13 + α 1 ϵ t 1 + λ 1 ϵ t 12 + α 1 λ 1 ϵ t 13 + ϵ t
In addition, the ARIMA and SARIMA models can be extended to include exogenous variables. This study will add partial least squares components to ARIMA and SARIMA to create ARIMAX and SARIMAX models.

2.3. ARCH-GARCH Models

The ARIMA, ARIMAX, SARIMA, and SARIMAX models assume the variance is constant over time and therefore do not capture the time-varying volatility commodity prices exhibit. One of the stylized facts of financial time series is volatility clustering: periods of high volatility follow periods of high volatility, and vice versa [44]. To account for volatility clustering, the mean equation model is augmented by the variance equation. There are two main approaches to modeling volatility: the autoregressive conditionally heteroscedastic (ARCH) model and the generalized autoregressive conditionally heteroscedastic (GARCH) model. The ARCH model, introduced by [45], simultaneously models the mean of the time series  y t  and the shocks variance  σ t 2  as follows:
y t = μ + ϵ t σ t 2 = ω + α 1 ϵ t 1 2 + α 2 ϵ t 2 2 + + α r ϵ t r 2 + η t ,
where  η t  is a white noise process. Here,  μ  can follow any process described in the previous section, such as ARIMA or SARIMA. We obtain the GARCH model by adding lagged shock variances as suggested by [46]. Hence, the model in Equation (5) becomes:
y t = μ + ϵ t σ t 2 = ω + α 1 ϵ t 1 2 + α 2 ϵ t 2 2 + + α r ϵ t r 2 + β 1 σ t 1 2 + β 2 σ t 2 2 + + β o σ t o 2 + η t .
The use of these panoplies of models allows the researchers to benefit from their advantages and avoid their disadvantages. For example, the linear models of ARIMA and SARIMA present the advantage of linearity and parsimony that are simple to estimate. However, most financial and economic time series are highly nonlinear, implying the failure of linear models to provide better forecasts. To avoid this disadvantage, volatility models (ARCH and GARCH) are necessary, especially if the time series shows periods of volatility clustering, as in the case of crude oil prices. Nevertheless, when there is an asymmetry in the volatility clustering between good news and bad news, the GARCH models are inadequate.

3. Data and Estimation Strategy

3.1. Data

The data used in this study consist of monthly West Texas Intermediate crude oil prices obtained from the U.S. Energy Information Administration (EIA) and the Federal Reserve Economic Data-Monthly Database (FRED-MD) from February 1992 to October 2022. The FRED-MD data consist of 128 economic variables (FRED-MDalso has an oil price variable, but we excluded it and used WTI from US-EIA), classified into eight groups: (1) output and income, (2) labor market, (3) consumption and orders, (4) orders and inventories, (5) money and credit, (6) interest rate and exchange rates, (7) prices, and (8) the stock market. (Giventhe big number of variables in FRED-MD data, we do not provide descriptive statistics for this data. Interested readers are referred to [36] for a thorough description of the FRED-MD data.).
Table 1 provides descriptive statistics of the oil prices and the percentage change in the oil price (price return), defined as  r t = 100 p t p t 1 p t 1 , during the period of study. The WTI prices averaged $51.27 a barrel and had a standard deviation of $29.50 a barrel, indicating the high volatility of the oil prices during this period. The oil prices oscillated between a minimum of $11.28 a barrel and a maximum of $133.93 a barrel with relatively low skewness (0.53) and kurtosis (−0.81).
Figure 1 shows the monthly evolution of WTI oil prices (panel Figure 1a) and their return (panel Figure 1b) from February 1992 to October 2022. Oil prices showed relatively stable behavior from 1992 to 2000. Later, prices showed rapid growth after January 2002. The peak was in July 2008, when the price hit $133.93 a barrel, an all-time high. The price rise can be attributed to increasing global demand and a shortage of global oil production. Subsequently, during the Global Financial Crisis, prices dropped sharply within six months to $39.16 a barrel in February 2009.
Oil prices increased again until July 2014 but fell sharply as the United States shale oil product expanded. From 2016 to 2019, oil prices exhibited relatively stable behavior but dropped dramatically during the COVID-19 pandemic. In recent months, oil prices exhibited an upward trend in response to the war between Russia and Ukraine. Moreover, oil prices showed high levels of volatility from 2006 to 2008, increasing by almost 120%, from $60 to approximately $134 a barrel. Within six months, they decreased by nearly 250% to $39 a barrel by February 2009. Since then, oil prices have substantially oscillated, indicating structural breaks in the industry.

3.2. Estimation Strategies

Before using the price in time series analysis, proceeding to some diagnostic checks, such as the stationarity and autoregressive conditional heteroscedasticity tests, is routine. This study applies different traditional tests to test the stationary nature of oil prices, namely, the augmented Dickey–Fuller test ([47], ADF hereafter). For completeness, we also run the Phillips and Perron (PP) test [48] and the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test proposed by [49]. In addition, we perform Engle’s autoregressive conditional heteroscedasticity test [45] to test for the presence of volatility clustering or time-varying volatility.
For the specification of the ARIMA and SARIMA models, we follow the authors of [12]’s strategy, which consists of identification, estimation, and diagnostic checking stages. In the identification stage, we determine the autoregressive and moving average orders using the autocorrelation functions (ACF) and the partial autocorrelation functions (PACF) and their corresponding plots (see Figure A1 in the Appendix A.2). For the WTI crude oil price, the figure indicates that the autocorrelations do not die out quickly for the price variable, suggesting nonstationarity. However, the ACF dies out quickly for the percentage change in the price or price return, implying stationarity, as established by the stationarity tests in Table 2. For a moving average process  M A ( q ) , the autocorrelations are zero for  j > q , and the partial autocorrelations tapper off. The cutoff point for the autocorrelation function is to compare the sample autocorrelations to  ± 2 T , where T is the number of observations. For an  A R ( p )  process, the partial autocorrelations  θ i i = 0   f o r   i > p , and the autocorrelations die out quickly. We also compare the partial autocorrelation function to  ± 2 T  to decide about the  A R ( p )  cutoff. If neither the autocorrelations nor the partial autocorrelations have a cutoff point, then the ARMA model may be appropriate [50].
In the empirical part, we split the data into a training set (75%) and a test set (25%). We also augment the FRED-MD data with a dummy representing the 2007–2009 recession (According to the U.S. National Bureau of Economic Research (NBER) website, the recession started in December 2007 and ended in June 2009. See https://www.nber.org/research/data/us-business-cycle-expansions-and-contractions, accessed on 22 April 2023), and a dummy representing the COVID-19 pandemic. We then estimate and use the following models to forecast the WTI oil price:
  • Partial least squares model (PLS). We use this model to forecast the WTI oil price and extract a reduced set of components that we use as explanatory variables in subsequent models.
  • Ordinary least-squares (OLS) with the most important PLS components as explanatory variables, in addition to recession and COVID-19 dummy variables.
  • Autoregressive integrated moving average (ARIMA) with p, d, and q determined by minimizing the Akaike information criterion.
  • Autoregressive integrated moving average with the most important PLS components as exogenous variables (ARIMAX1) and the PLS components, in addition to recession and COVID-19 dummy variables as exogenous variables (ARIMAX2).
  • Seasonal autoregressive integrated moving average (SARIMA) with p, d, q, P, and Q determined by minimizing the Akaike information criterion.
  • Seasonal autoregressive integrated moving average with the most important PLS components as exogenous variables (SARIMAX1) and the PLS components, in addition to recession and COVID-19 dummy variables as exogenous variables (SARIMAX2).
  • Autoregressive conditional heteroscedasticity (ARCH) model. Given its wide use for financial data and parsimony, we used ARCH(1).
  • The autoregressive conditional heteroscedasticity model with the most important PLS components as exogenous variables (ARCHX(1)).
  • The generalized autoregressive conditional heteroscedasticity (GARCH) model. Given its wide use for financial data and parsimony, we used GARCH(1,1).
  • The generalized autoregressive conditional heteroscedasticity model with the most important PLS components as exogenous variables (GARCHX(1,1)).
To compare the forecasting performance of each model, we use the mean absolute (MAE), the mean absolute percentage error (MAPE), and the root mean squared error (RMSE) on the out-of-sample observations. Equation (7) gives the expression of each criterion used in model comparison.
MAPE = 1 T o s t = 1 T o s y t y ^ t y t MAE = t = 1 T o s y t y ^ t T o s RMSE = t = 1 T o s y t y ^ t 2 T o s ,
where  T o s  is the out-of-sample total number of observations,  y t  is the observed WTI oil price, and  y ^ t  is the forecast WTI oil price.

4. Results

4.1. Time Series Specification and Diagnostic Results

Table 2 shows the p-values of the unit root test result using the ADF, PP, and KPSS tests for the WTI oil price and its return. For the ADF and PP tests, the null hypothesis is the existence of a unit root (the time series is not stationary), while for the KPSS test, the null hypothesis is that the time series is stationary. The p-values indicate that the ADF and PP test statistics fail to reject the null hypothesis for the oil price. The KPSS test also confirms this result, rejecting the null hypothesis of the stationarity of the oil prices. In contrast, the ADF and PP tests reject the null hypothesis for the price return variable, indicating that the oil price is integrated with order one. A similar conclusion is reached using the KPSS test. To avoid spurious regression results, we estimate all of the models using the percentage change in the price (price return) but recover the forecast of the original price series.
Moreover, Table 2 provides the results of Engle’s test, testing the null hypothesis of the absence of conditional heteroscedasticity. The p-values of Engle’s test indicate the presence of autoregressive conditional heteroscedasticity for the WTI oil price and its return.
In selecting the appropriate ARIMA model, we consider the parsimony concept; the authors of [12] argue that parsimonious models produce better forecasts than over-parameterized models. We achieve parsimony by using the Akaike information criterion (AIC) when using the function  a u t o _ a r i m a  from the Python package pmdarima [51]. The algorithm returned an ARIMA(3,1,1) as the best model with  A I C = 1574.575 . For the seasonal autoregressive moving average model, the algorithm returned a SARIMA(3,1,1)(3,0,1,12) for the seasonal autoregressive moving average.
Moreover, the partial least-squares method reduces the number of explanatory variables and removes the multicollinearity issues inherent in the FRED-MD data, as evidenced in Appendix Table A1. Table 3 gives the cumulative variance and the reduction in the mean squared error of the PLS components. In contrast, Figure 2 provides a graphical representation of Table 3. The results indicate that ten PLS components explain more than 98% of the oil price variation. As the number of components increases, the explained variance increases but very marginally. Consequently, we can use the first ten components as explanatory variables of the oil price variation, that is,
p t = β 0 + j = 1 10 β j C o m   p j t + ϵ t

4.2. PLS and OLS Results

4.2.1. PLS Results

The partial least-squares model produced the out-of-sample WTI oil price forecast summarized in Figure 3. The figure shows that the PLS model with ten components successfully forecasts the in-sample WTI oil prices. The out-of-sample forecasts tell a different story, however. The out-of-sample predictions mimic the observed oil price movements, but the prediction values are off, especially during the COVID-19 pandemic. During the pandemic, the PLS model with ten components predicted negative oil prices. Nevertheless, the results indicate that the FRED-MD data, summarized in PLS principal components, is a good predictor tool, but not during extreme events, such as the COVID-19 pandemic.

4.2.2. OLS Results

To overcome this issue, we augment the ten PLS components with the 2007–2009 financial crisis and COVID-19 pandemic dummy variables and use them as predictors for the WTI oil price. We summarize the results of the least-squares estimation in Table 4. The OLS results show that all ten PLS components are statistically significant in explaining the variation in oil prices. In addition, the 2007–2009 financial crisis had a positive and statistically significant effect on WTI crude oil prices. However, the COVID-19 dummy variable is not statistically significant in explaining the variation in crude oil prices.
Figure 4 displays the in- and out-of-sample forecasts using the OLS model with the PLS components, the COVID-19, and the financial crisis dummy variables. As with PLS, the OLS model successfully fitted the data. However, the out-of-sample forecasts fail to provide high-quality forecasts, especially during high-volatility periods. The OLS model overshoots the price decrease during the COVID-19 pandemic in 2020 and the price increase during the start of the Russia–Ukraine war in 2022.

4.3. ARIMA and SARIMA Results

4.3.1. ARIMA Results

As indicated before, the Box–Jenkins [12] identification step yielded the following ARIMA(3,1,1) as the best model:
Δ p t = δ + θ 1 Δ p t 1 + θ 2 Δ p t 2 + θ 3 Δ p t 3 + α 1 ϵ t 1 + ϵ t .
The maximum likelihood estimation of this model yields the results summarized in Table 5. All of the parameter estimates are statistically significant. The model fitted well the in-sample data, as shown in Figure 5. However, ARIMA(3,1,1) performed poorly in producing accurate out-of-sample forecasts (see Figure 5). The model does not even mimic the price movements as the PLS and OLS did.
Compared to the PLS and OLS models, the ARIMA model’s poor out-of-sample performance may suggest that the FRED-MD data’s explanatory power, summarized in PLS components, could improve the forecasting accuracy. To explore this possible explanation, we augment the ARIMA(3,1,1) model with ten PLS components and estimate the following equation:
Δ p t = δ + θ 1 Δ p t 1 + θ 2 Δ p t 2 + θ 3 Δ p t 3 + α 1 ϵ t 1 + β 0 + j = 1 10 β j C o m   p j t + ϵ t .
The results of this estimation are summarized in Table 6. The findings indicate that all of the variables included are statistically significant except for the second lag of the price difference.
In addition, the augmented ARIMA model produces better out-of-sample forecasts than the plain ARIMA model, as indicated by Figure 6. This could indicate the importance of including economic data, such as FRED data, when forecasting commodity prices. However, this model also fails to reproduce the out-of-sample oil prices during high-volatility periods.
To account for the 2007–2009 financial crisis and the COVID-19 pandemic, we added these two events as dummy variables and estimated the following model:
Δ p t = δ + θ 1 Δ p t 1 + θ 2 Δ p t 2 + θ 3 Δ p t 3 + α 1 ϵ t 1 + β 0 + j = 1 10 β j C o m   p j t + γ 1 f i n _ c r i s i s t + γ 2 C O V I D 19 t + ϵ t .
The estimation results, summarized in Table 7, show that all PLS components and ARIMA terms are statistically significant at the 1% level, except for the second lag of the price difference, which is significant at the 10% level. The COVID-19 dummy variable has a negative but not statistically significant effect on the oil price. In contrast, the 2007–2009 financial crisis contributed to oil price increases statistically significantly. Nevertheless, adding these two event variables did not improve the forecasting accuracy of the model, as indicated by the negative price forecast during the COVID-19 pandemic period (see Figure 7).

4.3.2. SARIMA Results

To investigate the seasonality of the WTI crude oil prices, we estimate the following seasonal ARIMA model (the orders of SARIMA were chosen using the AIC in the  a u t o _ a r i m a  function of the pdmarima Python package.):
Δ p t = δ + θ 1 Δ p t 1 + θ 2 Δ p t 2 + θ 3 Δ p t 3 + θ 12 Δ p t 12 + θ 24 Δ p t 24 + θ 36 Δ p t 36 α 1 ϵ t 1 + α 12 ϵ t 12 + ϵ t .
Though the seasonal lags are statistically significant, as indicated by the results in Table 8, the out-of-sample SARIMA forecasts in Figure 8 show poor performance as the model fails to mimic the price movements, let alone the forecast accuracy.
However, SARIMA forecasts improve when we add the PLS components alone (Figure 9a) and when we augment them with the 2007–2009 financial crisis and the COVID-19 dummy variables (Figure 9b), at least in terms of mimicking the price movements. In terms of the results of the regressions, most PLS components and seasonal terms are statistically significant (see Table 9). In contrast, the results in Table 10 show the COVID-19 dummy variable is not statistically significant in explaining the variation in oil prices for the sample at hand.

4.4. ARCH and GARCH Results

4.4.1. ARCH Results

As seen in Figure 1b, the WTI crude oil price exhibits volatility clustering with periods of high volatility and periods of low volatility. To account for this phenomenon, we estimate augment the ARIMA models to account for conditional variance. For parsimony reasons, we estimated an ARIMA(1,1,0) for the mean and autoregressive conditional heteroscedasticity, ARCH(1), for the variance, that is,
Δ p t = δ + θ 1 Δ p t 1 + ϵ t σ t 2 = ω + α 1 ϵ t 1 2 + η t .
The maximum likelihood estimation of Equation (14) produces the results summarized in Table 11. The parameter estimates of the mean and volatility equations are statistically significant at conventional levels. More importantly, the out-of-sample forecasts of the model accurately mimic the observed oil prices even during periods of high instabilities. Figure 10 shows how the ARIMA(1,1,0)-ARCH(1) model predicts future prices with high accuracy. Including the PLS components does not provide a good statistical fit as all of the variables included in the mean and volatility equations are not statistically significant (see Table 12). Nonetheless, the ARIMA(1,1,0)-ARCH(1) with exogenous variables forecasts still produce accurate forecasts, as evidenced in Figure 11.

4.4.2. GARCH Results

We generalized the ARCH model by including past volatility values and estimated the following ARIMA(1,10)-GARCH(1,1):
Δ p t = δ + θ 1 Δ p t 1 + ϵ t σ t 2 = ω + α 1 ϵ t 1 2 + β 1 σ t 1 2 + η t .
Table 13 presents the maximum likelihood estimation results. The findings show that ARIMA(1,1,0)-GARCH(1,1) provides an appropriate statistical fit for the data at hand. The lagged price difference in the mean equation and the ARCH and GARCH effects are statistically significant at the 5% level.
Moreover, the model produces accurate out-of-sample forecasts. Figure 12 shows that ARIMA(1,1,0)-GARCH(1,1) accurately predicts the oil price movements and values, even during periods of high volatility, such as the COVID-19 pandemic and the Russia–Ukraine war. This narrative does not change when we add the PLS components to the mean equation. Unlike in the case of ARCH, adding PLS components to the GARCH specification (mean equation) does not impact the statistical significance of ARCH and GARCH effects (see Table 14), nor does it significantly improve the forecasting accuracy of the model (see Figure 13).

5. Discussion

In this study, we provided alternative models to forecast WTI crude oil prices. We also emphasized the forecasting power of the FRED-MD economic data by estimating models with and without the data. In Table 15, we used the mean absolute percentage error (MAPE), the mean absolute error (MAE), and the root mean squared error (RMSE) to compare alternative models. Our results indicate that the ARIMA(1,1,0)-GARCH(1,1) model outperforms all of the other models in the three criteria. In addition, the models with conditional volatility outperform all of the other models using the three criteria.
Table 15 shows a significant difference between the ARIMA family models and volatility models, whether we use FRED data or not. Hence, the worst volatility model (in terms of RMSE), namely, ARIMA(1,1,0) with ARCH(1), reduces the forecasting error of ARIMAX1(3,1,1) by more than 70%. This highlights the importance of considering nonlinear time series when studying variables with extreme events.
On the other hand, using FRED-MD data reduces the forecasting error of the corresponding model, except for GARCH. For instance, ARIMA(3,1,1)’s mean absolute percentage error drops from 32.62% to 27.80% when we add the PLS components extracted from FRED-MD data. Moreover, the forecasting error also drops when the PLS components are added to the mean equation of the ARCH(1) model, from 5.69 to 5.34 in terms of root mean squared error.
Regarding previous studies, the authors of [14,38] offer two interesting studies we can use to compare our findings as their goals were to forecast crude oil prices using somewhat similar series. The authors of [14] use ARIMA and SARIMA to forecast crude oil prices in the United States and Europe from January 2017 to September 2021. Their ARIMA and SARIMA models yielded an out-of-sample MAPE of 0.05 and 0.09, respectively. However, their training set included the COVID-19 pandemic period. In contrast, our training set did not include that period, but our conditional variance models (ARCH and GARCH) were successful in producing highly accurate forecasts.
The study by [38] is close to our study regarding the variables and the training set used. The authors used several machine learning models to forecast the WTI crude oil prices, using data from March 1993 to December 2021 (thedata we used was from February 1992 to October 2022) with a test set including the COVID-19 pandemic. In addition, their study used also FRED data besides the economic policy uncertainty data and other financial data. Their change point-adaptive recursive neural network (CP-ADARNN) allows for more reduction in forecast errors compared to our best model. However, one has to be cautious regarding the use of economic data without orthogonalization (partial least-squares or principal components) due to multicollinearity and the consequent risk of overfitting. In contrast, the GARCH model does not present the overfitting risk and is available in most traditional statistical software.

6. Conclusions

In this study, we propose alternative time series models to forecast WTI crude oil prices. We also assess the forecasting power of economic data, namely, FRED data. Our results indicate that when linear models are used (ARIMA and SARIMA), the inclusion of the partial least-squares components extracted from FRED reduces the forecasting error without providing an accurate forecast during high-volatility periods. In contrast, including PLS components from FRED data does not improve the forecasting power volatility models, especially the GARCH model.
Moreover, this paper offers empirical evidence against ignoring conditional volatility in forecasting commodity prices. Even when augmented with economic data spanning all economic activities, linear models, such as ARIMA, provide poor forecasts, especially during extreme events. Our finding highlights the importance of considering nonlinear time series when studying variables with extreme events. In addition, this study’s outcome has practical implications for commodity traders as the GARCH estimation only requires the commodity price.
The findings of this study confirm the use of the generalized autoregressive conditional heteroskedasticity (GARCH) models as a statistical tool that provides high-quality forecasts, not only for the volatility but also for the time series mean. The GARCH model is important because it makes it possible to model financial and economic time series data more precisely. The model effectively captures the key characteristics of the data by taking volatility clustering into consideration. This is critical in crude oil pricing since forecasting depends on precise modeling of volatility. As indicated by Figure 1, the WTI oil prices exhibited volatility clusters during several periods in the 1992–2022 monthly data. Unlike studies, such as [14], our study properly accounts for volatility clustering to capture the important feature of WTI oil prices and produce accurate forecasts.
This study can be improved in several respects. First, future studies may extend the factor models to include global demand, international economic, and geopolitical indicators instead of limiting the analysis to the United States’ economic indicators. In fact, the FRED database only includes variables related to U.S. economic activities. Exploring other variables or indices, such as the ones provided by the Economic Policy Uncertainty (EPU) database, may take into consideration global economic activities.
Second, to palliate linear models’ shortcomings, one of the promising nonlinear modeling techniques is the dynamic model averaging developed by [39]. To allow the forecasting model’s parameters and forecasting accuracy to vary over time, the model integrates the state space approach with the Markov chain process. This modeling approach allows the parameters and the set of explanatory variables to vary over time. To select the model with the highest probability at any given time, the model is frequently accompanied by dynamic model selection (see, for instance, [52]).
Finally, future studies should also consider asymmetric volatility models. Observational evidence shows that negative shocks have different impacts than positive ones on volatility. The use of asymmetric GARCH models, such as the exponential general autoregressive conditional heteroskedastic (EGARCH) model, developed by [53], and the Glosten–Jagannathan–Runkle GARCH(GJR-GARCH) model, developed by [54], are some examples that consider asymmetric volatility.

Author Contributions

Conceptualization, B.C. and A.S.A.S.; methodology, B.C. and A.S.A.S.; software, B.C. and A.S.A.S.; validation, B.C. and A.S.A.S.; formal analysis, B.C. and A.S.A.S.; data curation, A.S.A.S.; writing—original draft preparation, B.C. and A.S.A.S.; and writing—review and editing, B.C. and A.S.A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study will be made available upon request.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1

Table A1. Correlation among FRED data variables.
Table A1. Correlation among FRED data variables.
CorrelationPercentage of Variables
More than 0.955.55%
More than 0.9011.06%
More than 0.7526.07%
More than 0.5049.46%

Appendix A.2

Figure A1. Autocorrelation and partial autocorrelation functions for the WTI oil price and its return. (a) Autocorrelation function for the WTI oil price. (b) Partial autocorrelation function for the WTI oil price. (c) Autocorrelation function for the WTI oil price return. (d) Partial autocorrelation function for the WTI oil price return.
Figure A1. Autocorrelation and partial autocorrelation functions for the WTI oil price and its return. (a) Autocorrelation function for the WTI oil price. (b) Partial autocorrelation function for the WTI oil price. (c) Autocorrelation function for the WTI oil price return. (d) Partial autocorrelation function for the WTI oil price return.
Energies 16 04451 g0a1aEnergies 16 04451 g0a1b

References

  1. Noreng, O. Crude Power: Politics and the Oil Market; IB Tauris: London, UK, 2006; Volume 21. [Google Scholar]
  2. Naser, H. Estimating and forecasting the real prices of crude oil: A data rich model using a dynamic model averaging (DMA) approach. Energy Econ. 2016, 56, 75–87. [Google Scholar] [CrossRef]
  3. Hummels, D. Transportation costs and international trade in the second era of globalization. J. Econ. Perspect. 2007, 21, 131–154. [Google Scholar] [CrossRef]
  4. Kilian, L.; Lewis, L.T. Does the Fed respond to oil price shocks? Econ. J. 2011, 121, 1047–1072. [Google Scholar] [CrossRef]
  5. Kilian, L.; Vigfusson, R.J. Are the responses of the US economy asymmetric in energy price increases and decreases? Quant. Econ. 2011, 2, 419–453. [Google Scholar] [CrossRef]
  6. Abosedra, S.; Baghestani, H. On the predictive accuracy of crude oil futures prices. Energy Policy 2004, 32, 1389–1393. [Google Scholar] [CrossRef]
  7. Dudley, B. BP statistical review of world energy 2016. In British Petroleum Statistical Review of World Energy; Pureprint Group Limited: East Sussex, UK, 2019. [Google Scholar]
  8. Belke, A.; Dobnik, F.; Dreger, C. Energy consumption and economic growth: New insights into the cointegration relationship. Energy Econ. 2011, 33, 782–789. [Google Scholar] [CrossRef]
  9. Kilian, L. Oil Price Volatility: Origins and Effects; Technical Report, WTO Staff Working Paper; World Trade Publications: Geneva, Switzerland, 2010. [Google Scholar]
  10. Alquist, R.; Kilian, L. What do we learn from the price of crude oil futures? J. Appl. Econom. 2010, 25, 539–573. [Google Scholar] [CrossRef]
  11. Stevenson, R.A.; Bear, R.M. Commodity futures: Trends or random walks? J. Financ. 1970, 25, 65–81. [Google Scholar] [CrossRef]
  12. Box, G.E.; Jenkins, G.M. Time Series Analysis: Forecasting and Control; Holden Day: San Francisco, CA, USA, 1976. [Google Scholar]
  13. Tularam, G.A.; Saeed, T. Oil-price forecasting based on various univariate time-series models. Am. J. Oper. Res. 2016, 6, 226–235. [Google Scholar] [CrossRef]
  14. Lee, J.Y.; Nguyen, T.T.; Nguyen, H.G.; Lee, J.Y. Towards Predictive Crude Oil Purchase: A Case Study in the USA and Europe. Energies 2022, 15, 4003. [Google Scholar] [CrossRef]
  15. Mostafa, M.M.; El-Masry, A.A. Oil price forecasting using gene expression programming and artificial neural networks. Econ. Model. 2016, 54, 40–53. [Google Scholar] [CrossRef]
  16. Álvarez-Díaz, M. Is it possible to accurately forecast the evolution of Brent crude oil prices? An answer based on parametric and nonparametric forecasting methods. Empir. Econ. 2020, 59, 1285–1305. [Google Scholar] [CrossRef]
  17. Karabiber, O.A.; Xydis, G. Electricity price forecasting in the Danish day-ahead market using the TBATS, ANN and ARIMA methods. Energies 2019, 12, 928. [Google Scholar] [CrossRef]
  18. Madziwa, L.; Pillalamarry, M.; Chatterjee, S. Gold price forecasting using multivariate stochastic model. Resour. Policy 2022, 76, 102544. [Google Scholar] [CrossRef]
  19. Purkayastha, D.D. An exposition of the decomposition in a Controlled Autoregressive Integrated Segmented Moving Average (CARISMA) model. Econ. Lett. 1995, 48, 1–7. [Google Scholar] [CrossRef]
  20. Tseng, F.M.; Tzeng, G.H.; Yu, H.C.; Yuan, B.J. Fuzzy ARIMA model for forecasting the foreign exchange market. Fuzzy Sets Syst. 2001, 118, 9–19. [Google Scholar] [CrossRef]
  21. Tseng, F.M.; Tzeng, G.H. A fuzzy seasonal ARIMA model for forecasting. Fuzzy Sets Syst. 2002, 126, 367–376. [Google Scholar] [CrossRef]
  22. Liu, K.; Chen, Y.; Zhang, X. An evaluation of ARFIMA (autoregressive fractional integral moving average) programs. Axioms 2017, 6, 16. [Google Scholar] [CrossRef]
  23. Brunetti, C.; Gilbert, C.L. Bivariate FIGARCH and fractional cointegration. J. Empir. Financ. 2000, 7, 509–530. [Google Scholar] [CrossRef]
  24. Serletis, A.; Andreadis, I. Random fractal structures in North American energy markets. Energy Econ. 2004, 26, 389–399. [Google Scholar] [CrossRef]
  25. Elder, J.; Serletis, A. Long memory in energy futures prices. Rev. Financ. Econ. 2008, 17, 146–155. [Google Scholar] [CrossRef]
  26. Tabak, B.M.; Cajueiro, D.O. Are the crude oil markets becoming weakly efficient over time? A test for time-varying long-range dependence in prices and volatility. Energy Econ. 2007, 29, 28–36. [Google Scholar] [CrossRef]
  27. Aloui, C.; Mabrouk, S. Value-at-risk estimations of energy commodities via long-memory, asymmetry and fat-tailed GARCH models. Energy Policy 2010, 38, 2326–2339. [Google Scholar] [CrossRef]
  28. Agnolucci, P. Volatility in crude oil futures: A comparison of the predictive ability of GARCH and implied volatility models. Energy Econ. 2009, 31, 316–321. [Google Scholar] [CrossRef]
  29. Kang, S.H.; Kang, S.M.; Yoon, S.M. Forecasting volatility of crude oil markets. Energy Econ. 2009, 31, 119–125. [Google Scholar] [CrossRef]
  30. Mohammadi, H.; Su, L. International evidence on crude oil price dynamics: Applications of ARIMA-GARCH models. Energy Econ. 2010, 32, 1001–1008. [Google Scholar] [CrossRef]
  31. Sadorsky, P. Modeling and forecasting petroleum futures volatility. Energy Econ. 2006, 28, 467–488. [Google Scholar] [CrossRef]
  32. Conrad, C.; Karanasos, M. Dual long memory in inflation dynamics across countries of the Euro area and the link between inflation uncertainty and macroeconomic performance. Stud. Nonlinear Dyn. Econom. 2005, 9. [Google Scholar] [CrossRef]
  33. Conrad, C.; Karanasos, M. On the inflation-uncertainty hypothesis in the USA, Japan and the UK: A dual long memory approach. Jpn. World Econ. 2005, 17, 327–343. [Google Scholar] [CrossRef]
  34. Kang, S.H.; Yoon, S.M. Long memory properties in return and volatility: Evidence from the Korean stock market. Phys. A Stat. Mech. Its Appl. 2007, 385, 591–600. [Google Scholar] [CrossRef]
  35. Kasman, A.; Kasman, S.; Torun, E. Dual long memory property in returns and volatility: Evidence from the CEE countries’ stock markets. Emerg. Mark. Rev. 2009, 10, 122–139. [Google Scholar] [CrossRef]
  36. McCracken, M.W.; Ng, S. FRED-MD: A monthly database for macroeconomic research. J. Bus. Econ. Stat. 2016, 34, 574–589. [Google Scholar] [CrossRef]
  37. Baker, S.R.; Bloom, N.; Davis, S.J. Measuring economic policy uncertainty. Q. J. Econ. 2016, 131, 1593–1636. [Google Scholar] [CrossRef]
  38. Boubaker, S.; Liu, Z.; Zhang, Y. Forecasting oil commodity spot price in a data-rich environment. In Annals of Operations Research; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–18. [Google Scholar]
  39. Raftery, A.E.; Kárnỳ, M.; Ettler, P. Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill. Technometrics 2010, 52, 52–66. [Google Scholar] [CrossRef] [PubMed]
  40. Stock, J.H.; Watson, M.W. Forecasting using principal components from a large number of predictors. J. Am. Stat. Assoc. 2002, 97, 1167–1179. [Google Scholar] [CrossRef]
  41. Shabri, A.; Samsudin, R. Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis. Sci. World J. 2014, 2014, 854520. [Google Scholar] [CrossRef]
  42. Sopipan, N.; Kanjanavajee, W.; Sattayatham, P. Forecasting SET50 index with multiple regression based on principal component analysis. J. Appl. Financ. Bank. 2012, 2, 271. [Google Scholar]
  43. Binder, K.E.; Pourahmadi, M.; Mjelde, J.W. The role of temporal dependence in factor selection and forecasting oil prices. Empir. Econ. 2020, 58, 1185–1223. [Google Scholar] [CrossRef]
  44. Franses, P.H.; Van Dijk, D. Non-Linear Time Series Models in Empirical Finance; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
  45. Engle, R.F. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. In Econometrica: Journal of the Econometric Society; The Econometric Society: Cleveland, OH, USA, 1982; pp. 987–1007. [Google Scholar]
  46. Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
  47. Dickey, D.A.; Fuller, W.A. Likelihood ratio statistics for autoregressive time series with a unit root. In Econometrica: Journal of the Econometric Society; The Econometric Society: Cleveland, OH, USA, 1981; pp. 1057–1072. [Google Scholar]
  48. Phillips, P.C.; Perron, P. Testing for a unit root in time series regression. Biometrika 1988, 75, 335–346. [Google Scholar] [CrossRef]
  49. Kwiatkowski, D.; Phillips, P.C.; Schmidt, P.; Shin, Y. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? J. Econom. 1992, 54, 159–178. [Google Scholar] [CrossRef]
  50. Griffiths, W.; Hill, R.; Judge, G. Learning and Practicing Econometrics; Wiley: Hoboken, NJ, USA, 1993. [Google Scholar]
  51. Smith, T. ARIMA Estimators for Python. Available online: http://www.alkaline-ml.com/pmdarima (accessed on 23 April 2017).
  52. Bork, L.; Møller, S.V. Forecasting house prices in the 50 states using dynamic model averaging and dynamic model selection. Int. J. Forecast. 2015, 31, 63–78. [Google Scholar] [CrossRef]
  53. Nelson, D.B. Conditional heteroskedasticity in asset returns: A new approach. In Econometrica: Journal of the Econometric Society; The Econometric Society: Cleveland, OH, USA, 1991; pp. 347–370. [Google Scholar]
  54. Glosten, L.R.; Jagannathan, R.; Runkle, D.E. On the relation between the expected value and the volatility of the nominal excess return on stocks. J. Financ. 1993, 48, 1779–1801. [Google Scholar] [CrossRef]
Figure 1. Evolutionof the WTI oil price and its return. (a) Evolution of the WTI oil prices from February 1992 to October 2022. (b) Evolution of the WTI oil prices from February 1992 to October 2022.
Figure 1. Evolutionof the WTI oil price and its return. (a) Evolution of the WTI oil prices from February 1992 to October 2022. (b) Evolution of the WTI oil prices from February 1992 to October 2022.
Energies 16 04451 g001
Figure 2. Explained cumulative variance and reduction in the mean squared error and the number of PLS components.
Figure 2. Explained cumulative variance and reduction in the mean squared error and the number of PLS components.
Energies 16 04451 g002
Figure 3. Partial least-squares forecasts with ten components.
Figure 3. Partial least-squares forecasts with ten components.
Energies 16 04451 g003
Figure 4. OLS forecasts with ten components, COVID-19, and 2007–2009 financial crisis.
Figure 4. OLS forecasts with ten components, COVID-19, and 2007–2009 financial crisis.
Energies 16 04451 g004
Figure 5. ARIMA(3,1,1) forecasts.
Figure 5. ARIMA(3,1,1) forecasts.
Energies 16 04451 g005
Figure 6. ARIMA(3,1,1) and PLS components forecasts.
Figure 6. ARIMA(3,1,1) and PLS components forecasts.
Energies 16 04451 g006
Figure 7. ARIMA(3,1,1) with 10 PLS components, COVID-19, and financial crisis forecasts.
Figure 7. ARIMA(3,1,1) with 10 PLS components, COVID-19, and financial crisis forecasts.
Energies 16 04451 g007
Figure 8. SARIMA(3,1,1)(3,1,1) forecasts.
Figure 8. SARIMA(3,1,1)(3,1,1) forecasts.
Energies 16 04451 g008
Figure 9. SARIMA with exogenous variables forecasts. (a) SARIMA(3,1,1)(3,1,1) with PLS components forecasts. (b) SARIMA(3,1,1)(3,1,1) with PLS components, COVID-19, and financial crisis forecasts.
Figure 9. SARIMA with exogenous variables forecasts. (a) SARIMA(3,1,1)(3,1,1) with PLS components forecasts. (b) SARIMA(3,1,1)(3,1,1) with PLS components, COVID-19, and financial crisis forecasts.
Energies 16 04451 g009
Figure 10. ARIMA(1,1,0) with ARCH(1) forecasts.
Figure 10. ARIMA(1,1,0) with ARCH(1) forecasts.
Energies 16 04451 g010
Figure 11. ARIMA(1,1,0) with PLS components and ARCH(1) forecasts.
Figure 11. ARIMA(1,1,0) with PLS components and ARCH(1) forecasts.
Energies 16 04451 g011
Figure 12. ARIMA(1,1,0) with GARCH(1,1) forecasts.
Figure 12. ARIMA(1,1,0) with GARCH(1,1) forecasts.
Energies 16 04451 g012
Figure 13. ARIMA(1,1,0) with PLS components and GARCH(1,1) forecasts.
Figure 13. ARIMA(1,1,0) with PLS components and GARCH(1,1) forecasts.
Energies 16 04451 g013
Table 1. Summary statistics of oil prices.
Table 1. Summary statistics of oil prices.
ObservationsMeanStd.Dev.MinMaxSkewnessCurtosis
Oil price36851.2729.5011.28133.930.53−0.81
Price return3670.798.51−42.2034.14−0.442.33
Table 2. Probabilityvalues for stationarity and ARCH tests for oil price and its return.
Table 2. Probabilityvalues for stationarity and ARCH tests for oil price and its return.
VariableADFPPKPSSARCH
Oil price0.38150.40000.01000.0000
Price return0.00000.00000.10000.0000
Table 3. Cumulative variance explained and the mean squared error reduction by the PLS components.
Table 3. Cumulative variance explained and the mean squared error reduction by the PLS components.
ComponentsCumulative Variance (%)Mean Squared Error
One component85.11151.32
Ten components98.1119.31
Fifteen components98.3516.70
Twenty components98.4715.36
Thirty components98.5514.53
Sixty components98.5514.33
One hundred components98.4514.33
Table 4. Lest-squares regression results with 10 PLS components, the 2007–2009 financial crisis, and COVID-19 dummy variables.
Table 4. Lest-squares regression results with 10 PLS components, the 2007–2009 financial crisis, and COVID-19 dummy variables.
VariableCoefficientStd. Errort-Valuep-Value
Intercept50.92470.309164.8030.000
PLS component 13.89160.04585.6020.000
PLS component 26.14800.14243.1830.000
PLS component 34.63190.13733.7950.000
PLS component 45.16320.15832.6890.000
PLS component 5−6.46300.260−24.8460.000
PLS component 6−4.14940.240−17.2920.000
PLS component 73.98540.30113.2510.000
PLS component 82.60770.4675.5800.000
PLS component 92.96430.4207.0530.000
PLS component 101.58000.3075.1470.000
COVID-191.68981.5641.0800.281
2007–2009 financial crisis4.21522.0122.0950.037
R-squared:0.969
Adjusted R-squared:0.968
F-statistic:937.3
Prob (F-statistic):0.000
AIC:2277
BIC:2328
Table 5. ARIMA(3,1,1) results.
Table 5. ARIMA(3,1,1) results.
VariableCoefficientStd. Errort-Valuep-Value
  Δ p t 1 1.25640.06519.2870.000
  Δ p t 2 −0.19990.088−2.2710.023
  Δ p t 3 −0.18420.047−3.9240.000
  ϵ t 1 −0.90800.047−19.5170.000
Log likelihood:−782.169
AIC:1574
BIC:1592
Table 6. ARIMA(3,1,1) with 10 PLS components regression results.
Table 6. ARIMA(3,1,1) with 10 PLS components regression results.
VariableCoefficientStd. Errort-Valuep-Value
PLS component 13.91350.07453.1550.000
PLS component 25.53680.28019.7800.000
PLS component 33.88890.23216.7490.000
PLS component 44.32670.28715.0660.000
PLS component 5−6.04910.575−10.5220.000
PLS component 6−3.44040.463−7.4260.000
PLS component 73.63620.4947.3530.000
PLS component 84.12900.6846.0390.000
PLS component 93.81840.4787.9960.000
PLS component 101.63720.4423.7020.000
  Δ p t 1 0.78310.06711.7710.000
  Δ p t 2 0.11930.0801.4840.138
  Δ p t 3 −0.19290.060−3.2310.001
  ϵ t 1 −0.99610.071−13.9340.000
Log likelihood:−693.196.169
AIC:1416.392
BIC:1470.644
Table 7. ARIMA(3,1,1) with 10 PLS components, COVID-19, and financial crisis regression results.
Table 7. ARIMA(3,1,1) with 10 PLS components, COVID-19, and financial crisis regression results.
VariableCoefficientStd. Errort-Valuep-Value
PLS component 13.86520.11035.2960.000
PLS component 25.59100.36215.4490.000
PLS component 33.91480.25415.4250.000
PLS component 44.42320.28215.6710.000
PLS component 5−5.04740.510−9.9050.000
PLS component 6−3.62220.438−8.2610.000
PLS component 73.66090.4168.7940.000
PLS component 83.81240.7595.0220.000
PLS component 94.72750.5159.1790.000
PLS component 102.17980.4075.3530.000
COVID-19−0.10283.281−0.0310.975
2007–2009 financial crisis2.68131.4761.8160.069
  Δ p t 1 0.81450.05215.7710.000
  Δ p t 2 0.12940.0701.8450.065
  Δ p t 3 −0.16720.051−3.2820.001
  ϵ t 1 −0.99660.040−24.7250.000
Log likelihood:−961.427
AIC:1956.855
BIC:2023.246
Table 8. SARIMA(3,1,1)(3,1,1) regression results.
Table 8. SARIMA(3,1,1)(3,1,1) regression results.
VariableCoefficientStd. Errort-Valuep-Value
  Δ p t 1 0.23431.7170.1360.891
  Δ p t 2 0.19330.6720.2880.774
  Δ p t 3 −0.03580.232−0.1540.877
  ϵ t 1 0.16251.7060.0950.924
  Δ p t 12 −0.98860.301−3.2830.001
  Δ p t 24 −0.71370.188−3.7930.000
  Δ p t 36 −0.36920.132−2.7930.005
  ϵ t 12 0.35750.2941.2180.223
Log likelihood:−802.728
AIC:1623.456
BIC:1636.376
Table 9. SARIMA(3,1,1)(3,1,1) with PLS components regression results.
Table 9. SARIMA(3,1,1)(3,1,1) with PLS components regression results.
VariableCoefficientStd. Errort-Valuep-Value
PLS component 12.35912.1111.1180.264
PLS component 24.62791.3703.3780.001
PLS component 33.48841.1593.0110.003
PLS component 43.33850.8104.1220.000
PLS component 5−5.36621.851−2.8980.004
PLS component 6−2.38271.296−1.8380.066
PLS component 73.22590.9613.3580.001
PLS component 84.46851.7032.6230.009
PLS component 94.66640.9364.9870.000
PLS component 101.18261.0411.1360.256
  Δ p t 1 −0.03982.117−0.0190.985
  Δ p t 2 0.08600.1840.4670.641
  Δ p t 3 −0.04250.199−0.2140.830
  ϵ t 1 −0.02992.135−0.0140.989
  Δ p t 12 −0.64850.317−2.0450.041
  Δ p t 24 −0.54520.193−2.8280.005
  Δ p t 36 −0.30370.151−2.0130.044
  ϵ t 12 −0.03220.329−0.0980.922
Log likelihood:−716.896
AIC:1471.793
BIC:1499.068
Table 10. SARIMA(3,1,1)(3,1,1) with PLS components, COVID-19, and financial crisis regression results.
Table 10. SARIMA(3,1,1)(3,1,1) with PLS components, COVID-19, and financial crisis regression results.
VariableCoefficientStd. Errort-Valuep-Value
PLS component 12.48251.7441.4230.155
PLS component 25.32221.1914.4690.000
PLS component 33.34970.8473.9530.000
PLS component 43.80010.7385.1500.000
PLS component 5−3.83261.475−2.5980.009
PLS component 6−2.23861.200−1.8660.062
PLS component 72.69100.9152.9410.003
PLS component 83.71371.5762.3570.018
PLS component 95.22190.7986.5430.000
PLS component 101.31460.9051.4520.146
COVID-19−3.17114.709−0.6730.501
2007–2009 financial crisis4.83002.4811.9460.052
  Δ p t 1 −0.03430.824−0.0420.967
  Δ p t 2 0.15060.0911.6550.098
  Δ p t 3 −0.09220.140−0.6590.510
  ϵ t 1 −0.01120.839−0.0130.989
  Δ p t 12 −0.72170.346−2.0850.037
  Δ p t 24 −0.53670.226−2.3740.018
  Δ p t 36 −0.25850.154−1.6790.093
  ϵ t 12 0.04220.3610.1170.907
Log likelihood:−995.132
AIC:2032.263
BIC:2113.578
Table 11. ARIMA(1,1,0) with ARCH(1) regression results.
Table 11. ARIMA(1,1,0) with ARCH(1) regression results.
Mean Equation
VariableCoefficientStd. Errort-Valuep-Value
Intercept1.02220.4852.1090.035
  Δ p t 1 0.16130.0642.5100.012
Volatility Equation
VariableCoefficientStd. Errort-Valuep-Value
Intercept45.67495.2298.7350.000
  ϵ t 1 2 0.21070.0882.4000.016
Log likelihood:−935.877
AIC:1879.75
BIC:1894.19
Table 12. ARIMA(1,1,0) with PLS components and ARCH(1) regression results.
Table 12. ARIMA(1,1,0) with PLS components and ARCH(1) regression results.
Mean Equation
VariableCoefficientStd. Errort-Valuep-Value
Intercept−0.23331.633−0.1430.886
  Δ p t 1 0.10500.0721.4510.147
PLS component 1−0.07540.121−0.6210.534
PLS component 20.17960.4090.4390.661
PLS component 30.50690.4551.1150.265
PLS component 40.57700.5830.9900.322
PLS component 50.09080.9080.10000.920
PLS component 6−0.25910.683−0.3790.704
PLS component 70.08670.6870.1260.900
PLS component 80.23251.2630.1840.854
PLS component 90.69850.7750.9010.368
PLS component 100.60870.6360.9580.338
Volatility Equation
VariableCoefficientStd. Errort-Valuep-Value
Intercept42.72007.6115.6130.000
  ϵ t 1 2 0.24420.1811.3480.178
Log likelihood:−930.915
AIC:1889.83
BIC:1940.36
Table 13. ARIMA(1,1,0) with GARCH(1,1) regression results.
Table 13. ARIMA(1,1,0) with GARCH(1,1) regression results.
Mean Equation
VariableCoefficientStd. Errort-Valuep-Value
Intercept0.57500.4621.2450.213
  Δ p t 1 0.18750.0652.8850.004
Volatility Equation
VariableCoefficientStd. Errort-Valuep-Value
Intercept6.03255.1451.1720.241
  ϵ t 1 2 0.13750.0622.2110.027
  σ t 1 2 0.76230.1355.6650.000
Log likelihood:−929.995
AIC:1869.99
BIC:1888.04
Table 14. ARIMA(1,1,0) with PLS components and GARCH(1,1) regression results.
Table 14. ARIMA(1,1,0) with PLS components and GARCH(1,1) regression results.
Mean Equation
VariableCoefficientStd. Errort-Valuep-Value
Intercept−1.85321.462−1.2670.205
  Δ p t 1 0.12180.0681.7880.074
PLS component 1−0.14120.104−1.3550.175
PLS component 20.62350.5371.1610.246
PLS component 30.87880.5841.5060.132
PLS component 40.97930.7121.3760.169
PLS component 5−0.88221.089−0.8100.418
PLS component 6−0.34400.782−0.4400.660
PLS component 70.39360.7550.5210.602
PLS component 8−0.10771.048−0.1030.918
PLS component 91.32710.9211.4410.150
PLS component 100.57490.5900.9750.330
Volatility Equation
VariableCoefficientStd. Errort-Valuep-Value
Intercept6.04984.1291.4650.143
  ϵ t 1 2 0.12920.0592.1950.028
  σ t 1 2 0.76230.1146.6840.000
Log likelihood:−923.863
AIC:1877.73
BIC:1931.87
Table 15. Forecasting power of alternative models.
Table 15. Forecasting power of alternative models.
ModelMAPEMAERMSE
PLS0.258515.103920.2560
OLS0.386821.280625.5831
ARIMA(3,1,1)0.326219.056122.3438
ARIMAX1(3,1,1)0.278016.241119.3317
ARIMAX2(3,1,1)0.280316.374519.4875
SARIMA(3,1,1)(3,1,1)1.8440107.734128.3105
SARIMAX1(3,1,1)(3,1,1)0.554432.390737.1448
SARIMAX2(3,1,1)(3,1,1)0.496829.025133.4761
ARIMA(1,1,0) with ARCH(1)0.07324.28915.6916
ARIMAX1(1,1,0) with ARCH(1)0.06894.03845.3432
ARIMA(1,1,0) with GARCH(1,1)0.06053.53834.6759
ARIMAX1(1,1,0) with GARCH(1,1)0.06924.05895.0577
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al Shammre, A.S.; Chidmi, B. Oil Price Forecasting Using FRED Data: A Comparison between Some Alternative Models. Energies 2023, 16, 4451. https://doi.org/10.3390/en16114451

AMA Style

Al Shammre AS, Chidmi B. Oil Price Forecasting Using FRED Data: A Comparison between Some Alternative Models. Energies. 2023; 16(11):4451. https://doi.org/10.3390/en16114451

Chicago/Turabian Style

Al Shammre, Abdullah Sultan, and Benaissa Chidmi. 2023. "Oil Price Forecasting Using FRED Data: A Comparison between Some Alternative Models" Energies 16, no. 11: 4451. https://doi.org/10.3390/en16114451

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop