# Energy Markets Forecasting. From Inferential Statistics to Machine Learning: The German Case

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. General Overview

#### 2.1. Type of Forecasts

#### 2.1.1. Point Forecast

#### 2.1.2. Probabilistic Forecast

#### 2.1.3. Ensemble Forecast

#### 2.2. Modelling Approaches

#### 2.2.1. Statistical Approaches

- similar-day methods, such as the naive method, which sets ${\widehat{P}}_{d,h}={P}_{d-7,h}$ for Monday, Saturday, or Sunday, and ${\widehat{P}}_{d,h}={P}_{d-1,h}$, otherwise,
- the generalized autoregressive conditional heteroskedasticity (GARCH) models, typically in connection with volatility forecasting, and
- shrinkage techniques, such as the least absolute shrinkage and selection operator (LASSO), but also Ridge regression and elastic nets.

#### 2.2.2. Computational Intelligence Methods (CI)

#### 2.3. The German Electricity Market

## 3. The Probabilistic Approach

#### 3.1. The Problem

#### 3.2. Construction of Probabilistic Forecasts

#### 3.2.1. Historical Simulation

#### 3.2.2. Distribution-Based Probabilistic Predictions

#### 3.2.3. Bootstrapped PIs

- 1.
- Provide an estimate for the parameters $\widehat{\theta}$ characterizing the model and obtaining a fit with corresponding set of residuals ${\u03f5}_{t}$.
- 2.
- Provide a set of (simulated) data, which are then not real world based, with distribution that is steered by the parameters’ set $\widehat{\theta}$ and normalized residuals ${\u03f5}_{t}^{*}$. The latter means that, for a general autoregressive model of order r, w.r.t. variables (that constitute the external sources of noise) ${P}_{2}^{*}={P}_{1},\dots ,{P}_{r}^{*}={P}_{r}$, we recursively define:$${P}_{t}^{*}={\widehat{\beta}}_{1}{P}_{t-1}^{*}+\cdots +{\widehat{\beta}}_{r}{P}_{t-r}^{*}+\widehat{f}\left({X}_{t}\right)+{\u03f5}_{t}^{*},\phantom{\rule{1.em}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}\mathrm{all}\phantom{\rule{1.em}{0ex}}t\in \{r+1,\dots ,T\}.$$
- 3.
- Provide a new estimation for the model to then compute the bootstrap one step-ahead forecast for the new time step $t=T+1$.
- 4.
- Starting from the the bottom, repeat both step 2 and 3, N times to obtain the (bootstrapped) price predictions: ${\left\{{\widehat{P}}_{T+1}^{i}\right\}}_{t=1}^{N}$.
- 5.
- Calculate the previously (step 4) quantities in order to provide the requested PIs

#### 3.2.4. Quantile Regression Averaging

#### 3.3. Validity

## 4. Models

#### 4.1. (S)ARIMA Models

#### Time Series Analysis in the Simple Case of an ARMA Model

- 1.
- Preliminary analysis: it is necessary to know the main features of the series such, e.g., its stationarity. To this aim, so-called “unit roots tests” are often used as well as the Dickey–Fuller test, see [22].
- 2.
- Order selection or identification: It is important to select appropriate values for the orders p and q. One can start by analyzing the total Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF), and then make some initial guesses based on their characteristics, see [21]. From these, one can obtain more reliable results while using various “Information Criteria” (IC), such as Akaike Information Criteria (AIC) and Baesyan Information Criteria (BIC). ICs are indexes trying to find a balance between the goodness of fit and the number of parameters. A “penalty factor” is included in the ICs to discourage the fitting of models with too many parameters. Hence, the preferred model is the one that minimizes the specific IC considered.
- 3.
- Estimation of the coefficients: once p and q are known, then the coefficients of the polynomials can be estimated, e.g., by a least squares regression or with a maximum likelihood method. In most cases, this problem can be solved with numerical methods.
- 4.
- Diagnostic check: to check whether the residuals, follow a random process. If the fitted model is suitable, the rescaled residuals should have similar properties to a “white noise” $WN(0,1)$ sequence. One can observe the sample “autocorrelation function” (ACF) of the residuals and perform tests for randomness, e.g., the Ljung–Box test, see, e.g., ([21] (Chapter 1)).

#### 4.2. Seasonality and Its Decomposition

#### The Wavelet Decomposition for the Long-Term Seasonal Component

#### 4.3. Expert Models

#### 4.3.1. The ARX MODEL

#### 4.3.2. The mARX Model

#### 4.4. Artificial Neural Networks (ANNs)

#### Recurrent and Long-Short Term Memory Networks

- We define an input at time t as $\left({X}_{t}\right)$ and the hidden state from the previous time step as $\left({S}_{t-1}\right)$, which is inserted into the LSTM block. Subsequently, we have to calculate the hidden state $\left({S}_{t}\right)$.
- It is important to decide which information from the cell state should be discarded. This is decided by the following “forget gate”:$${f}_{t}=\sigma ({X}_{t}\phantom{\rule{0.166667em}{0ex}}{U}^{f}+{S}_{t-1}\phantom{\rule{0.166667em}{0ex}}{W}^{f}+{b}_{f}).$$
- Then you must determine which new information should be saved in the cell state. This part consists of two steps: firstly, the “Input-Gate” layer $\left({i}_{t}\right)$ determines which values are updated. Secondly, a tanh layer creates a vector of new candidate values ${\tilde{C}}_{t}$. These two can be written as$${i}_{t}=\sigma ({X}_{t}\phantom{\rule{0.166667em}{0ex}}{U}^{i}+{S}_{t-1}\phantom{\rule{0.166667em}{0ex}}{W}^{i}+{b}_{i}),\phantom{\rule{2.em}{0ex}}{\tilde{C}}_{t}=tanh({X}_{t}\phantom{\rule{0.166667em}{0ex}}{U}^{c}+{S}_{t-1}\phantom{\rule{0.166667em}{0ex}}{W}^{c}+{b}_{c}).$$
- We can obtain the new cell state ${C}_{t}$ updating the old cell state, ${C}_{t-1}$, as:$${C}_{t}={C}_{t-1}\otimes {f}_{t}\oplus {i}_{t}\otimes {\tilde{C}}_{t}.$$
- Finally, the chosen output will be based on the cell state, but it will be a sort of filtered version. At this point, the output gate $\left({o}_{t}\right)$ determines which parts of the cell state should be computed as output. Subsequently, the cell state passes through the tanh layer and it is multiplied by the output gate$${o}_{t}=\sigma ({X}_{t}\phantom{\rule{0.166667em}{0ex}}{U}^{o}+{S}_{t-1}\phantom{\rule{0.166667em}{0ex}}{W}^{o}+{b}_{o}),\phantom{\rule{2.em}{0ex}}{S}_{t}={o}_{t}\otimes tanh\left({C}_{t}\right).$$

## 5. Data Analysis

#### 5.1. From the Probability Density Function to a Possible Realization of the Stochastic Process

#### The Rejection Method

- 1.
- Generate $U\sim U(a,b)$ from $U=a+(b-a){U}_{1}$ with ${U}_{1}\sim U(0,1)$.
- 2.
- Generate $Y\sim U(0,c)$ from $Y=c{U}_{2}$ with ${U}_{2}\sim U(0,1)$.
- 3.
- If $Y\le f\left(U\right)$, then accept $X=U$, or else reject U and return to step 1.

#### 5.1.1. Marsaglia’s Ziggurat Method

`randn`. It is based on an underlying source of uniformly distributed random numbers, but it can also be applied to symmetric unimodal distributions, such as the normal one. An additional random sign ± is generated for the value that is determined by the Ziggurat method applied to the distribution in $[0,\infty )$. Instead of covering the graph of a given PDF f with a box, the idea now is to use a “ziggurat of rectangles”, a cap and a tail, which all have the same area. These sections are selected, so that it is easy to choose uniform points and determine whether to accept or reject them.

- 1.
- Choose $i\in 0,\dots ,M$ at random, uniformly distributed. Generate $U\sim U(0,1)$.
- 2.
- If $i\ge 1$, then
- Let $X=U{x}_{i}$.
- If $X<{x}_{i-1}$ then return X,else generate $Y\sim U(0,1)$ independent of U.If $f\left({x}_{i}\right)+Y\left(f\left({x}_{i-1}\right)-f\left({x}_{i}\right)\right)<f\left(X\right)$, then accept X, otherwise reject.

- 3.
- If $i=0$ then
- Set X = $\left(\nu U\right)/f\left({x}_{M}\right)$
- If $X<{x}_{M}$, accept X,or else generate a random value $X\in [{x}_{M},\infty )$ from the tail.

#### 5.1.2. The Function `Randn` and Its Possible Application

`randn`, implemented with the Ziggurat’s algorithm with the options: $X=\mathrm{randn}(\_{,}^{\prime}lik{e}^{\prime},p)$ returns an array of random numbers like p. X will be our possible realization for summer 2018. For example, if we generate 1000 paths and evaluate the $RMSE$, we get, in Figure 9, the following histogram, where we can check that the values are mainly in the range [1.35, 1.46]. As an example, we could get with $RMSE=1.33$:

#### 5.2. Example with SARIMA Model

#### 5.3. ARIMA Model

- We decide to deseasonalize the series while using the Matlab function deseasonalize, see Section 4.2. It considers both a short-term and a long-term seasonal component.
- We have used an Augmented Dickey–Fuller test to verify that our deseasonalized series is stationary. It tests the null hypothesis that a unit root is in a time series sample. In Matlab, we can use the function adftest; it returns a logical value with the rejection decision for a unit root in a univariate time series. For example, the result $adf=0$ indicates that this test does not reject the null hypothesis of a unit root against the trend-stationary alternative.In our cases, we have that the series is not stationary, so we can differentiate it, then obtaining the series to become stationary, which is equivalent to having $d=1$.
- Our goal now is to find a suitable ARIMA$(p,d,q)$ model for estimating the series. To guess a plausible order of the parameters, we consider the autocorrelation and partial autocorrelation functions, as proposed in the procedure of Box and Jenkins.
- We try to estimate different types of models, then choosing the one minimizing the information criterion AIC and BIC. In particular, we end up with choosing ARIMA(1,1,2):$$(1-{\varphi}_{1}B)(1-B)\phantom{\rule{0.166667em}{0ex}}{x}_{t}=(1+{\theta}_{1}B+{\theta}_{2}{B}^{2})\phantom{\rule{0.166667em}{0ex}}{\u03f5}_{t}.$$
- Subsequently, we calibrate our parameters in order to optimize the error in the ${L}^{2}$ norm of the difference between the Probability Density Function (PDF) of the data and the forecast we calculated based on the estimated model. The PDF is estimated by the Matlab function
`ksdensity`. It uses a non-parametric method, called kernel density estimation. We are interested in estimating the form of the density function f from a sample of data $({x}_{1},\dots ,{x}_{n})$; its kernel density estimator is$${\widehat{f}}_{h}\left(x\right)=\frac{1}{nh}\sum _{i=1}^{N}K\left(\frac{x-{x}_{i}}{h}\right),$$ - The optimal prediction results from the solution of this problem:$$\underset{{\varphi}_{1},\dots ,{\varphi}_{p},{\theta}_{1},\dots ,{\theta}_{q}}{min}{\parallel PDF{\left(\mathrm{forecast}\right)}_{\{{\varphi}_{1},\dots ,{\varphi}_{p},{\theta}_{1},\dots ,{\theta}_{q}\}}-PDF\left(\mathrm{data}\right)\parallel}_{2},$$
`fminsearch`, which determines the minimum of a multi-variable function using a derivative-free method. In this case, we use the Matlab function`arma_forecast`for the point forecasts, see [35]. In particular,`fminsearch`uses the Nelder–Mead simplex algorithm, which is a direct search method that does not use numerical or analytical gradients. Indeed, it depends on the given initial values. In particular, we use the parameters that come from the first estimate of the ARIMA model.Let us note that such an approach differs from the one that was used by Ziel and Steinert, see [9]. In fact, the optimal parameters of the problem are given by the solution of minimizing the BIC criteria while using the lasso technique. The general idea is that, for a linear model $Y={\beta}^{\prime}\mathbf{X}+\u03f5$, the LASSO estimator is given by:$$\widehat{\beta}=\underset{\beta}{min}{\parallel Y-{\beta}^{\prime}\mathbf{X}\parallel}_{2}^{2}+\lambda {\parallel \beta \parallel}_{1}$$ - We plot the density functions and see the results in Figure 13.
- In Table 3, one can compare between the error of the difference before and after optimization:
- Finally, we try to implement the idea that is described in Section 5.1. In view of the Probability Density Function (PDF) of the point estimates, we can guess a possible realization of the future period, for example, for the first three weeks of January 2018. A plausible result can be observed in Figure 14.

#### 5.4. Neural Networks

`trainNetwork`, specifying the train set, the

`lstmLayers`, and appropriate options for the training. With the function

`trainingOptions`, which is part of the Matlab Deep Learning Toolbox, we can set all of the parameters that we need, see Table 5. It is worth mention that there is not a specific procedure for selecting these values. Contrary to what we saw for the (S)ARIMA example, we do not have any criteria or index that we can exploit to justify a choice. Here, the decision is made after several attempts. There are two main problems to pay attention to. The first one is the so-called over-fitting. This implies that it is fundamental to prescribe an appropriate number of epochs for the training phase. For example, we can stop once we see that the loss function is stable, or we can try to optimize the parameters: a very high number of hidden layers could be complex to handle from a computational point of view.

#### 5.5. Hybrid Approach

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

ACF | Autocorrelation Function |

AIC | Akaike Information Criteria |

BIC | Bayesian Information Criteria |

CDF | cumulative distribution function |

CRPS | continuous ranked probability score |

DM | Diebold Mariano |

EPEX | European Power Exchange |

EPF | electricity price forecasting |

FNN | feedforward neural network |

GARCH | generalized autoregressive conditional heteroskedasticity |

i.i.d. | independently identically distributed |

LASSO | least absolute shrinkage and selection operator |

LSTM | Long Short Term Memory |

LTSC | Long-Term Seasonal Component |

OLS | Ordinary Least Squares |

PACF | Partial Autocorrelation Function |

Probability Density Function | |

PI | prediction intervals |

PIT | probability integral transform |

PL | pinball loss |

QRA | Quantile Regression Average |

RMSE | Root Mean Squared Error |

RNN | Recurrent Neural Network |

SARIMA | Seasonal Autoregressive Integrated Moving Average |

SDG | Stochastic Gradient Descent |

STSC | Short-Term Seasonal Component |

VAR | Value-at-Risk |

## References

- Ding, Y. Data Science for Wind Energy; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
- Singh, S.; Yassine, A. Big Data Mining of Energy Time Series for Behavioral Analytics and Energy Consumption Forecasting. Energies
**2018**, 11, 452. [Google Scholar] [CrossRef] [Green Version] - Ziel, F.; Weron, R. Day-ahead electricity price forecasting with high-dimensional structures: Univariate vs. multivariate modeling frameworks. Energy Econ.
**2018**, 70, 396–420. [Google Scholar] [CrossRef] [Green Version] - Khairalla, M.A.; Ning, X.; Al-Jallad, N.T.; El-Faroug, M.O. Short-Term Forecasting for Energy Consumption through Stacking Heterogeneous Ensemble Learning Model. Energies
**2018**, 11, 1605. [Google Scholar] [CrossRef] [Green Version] - Arora, S.; Taylor, J.W. Rule-based autoregressive moving average models for forecasting load on special days: A case study for France. Eur. J. Oper. Res.
**2018**, 266, 259–268. [Google Scholar] [CrossRef] [Green Version] - Sapsis, T.P. New perspectives for the prediction and statistical quantification of extreme events in high-dimensional dynamical systems. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci.
**2018**, 376, 20170133. [Google Scholar] [CrossRef] [PubMed] - Virginia, E.; Ginting, J.; Elfaki, F.A. Application of GARCH model to forecast data and volatility of share price of energy (Study on Adaro Energy Tbk, LQ45). Int. J. Energy Econ. Policy
**2018**, 8, 20170133. [Google Scholar] - Weron, R. Electricity price forecasting: A review of the state-of-the-art with a look into the future. Int. J. Forecast.
**2014**, 30, 1030–1081. [Google Scholar] [CrossRef] [Green Version] - Ziel, F.; Steinert, R. Probabilistic mid-and long-term electricity price forecasting. Renew. Sustain. Energy Rev.
**2018**, 94, 251–266. [Google Scholar] [CrossRef] [Green Version] - Nowotarski, J.; Weron, R. Computing electricity spot price prediction intervals using quantile regression and forecast averaging. Comput. Stat.
**2015**, 30, 791–803. [Google Scholar] [CrossRef] [Green Version] - Marcjasz, G.; Serafin, T.; Weron, R. Selection of calibration windows for day-ahead electricity price forecasting. Energies
**2018**, 11, 2364. [Google Scholar] [CrossRef] [Green Version] - Zhang, G.; Patuwo, B.E.; Hu, M.Y. Forecasting with artificial neural networks: The state of the art. Int. J. Forecast.
**1998**, 14, 35–62. [Google Scholar] [CrossRef] - Weron, R. Modeling and Forecasting Electricity Loads and Prices: A Statistical Approach; Wiley: Hoboken, NJ, USA, 2007; Volume 403. [Google Scholar]
- Bundesnetzagentur SMARD Strommarktdaten. Electricity Generation in August and September 2019. 2020. Available online: https://www.smard.de/en/topic-article/5870/14626 (accessed on 13 November 2020).
- Westgaard, S.; Paraschiv, F.; Ekern, L.L.; Naustdal, I.; Roland, M. Forecasting price distributions in the German electricity market. Int. Financ. Mark.
**2019**, 1, 11. [Google Scholar] - Nowotarski, J.; Weron, R. Recent advances in electricity price forecasting: A review of probabilistic forecasting. Renew. Sustain. Energy Rev.
**2018**, 81, 1548–1568. [Google Scholar] [CrossRef] - Gneiting, T.; Katzfuss, M. Probabilistic forecasting. Annu. Rev. Stat. Its Appl.
**2014**, 1, 125–151. [Google Scholar] [CrossRef] - Nowotarski, J.; Weron, R. On the importance of the long-term seasonal component in day-ahead electricity price forecasting. Energy Econ.
**2016**, 57, 228–235. [Google Scholar] [CrossRef] [Green Version] - Aziz Ezzat, A.; Jun, M.; Ding, Y. Spatio-temporal short-term wind forecast: A calibrated regime-switching method. Ann. Appl. Stat.
**2019**, 13, 1484–1510. [Google Scholar] [CrossRef] - Di Persio, L.; Frigo, M. Gibbs sampling approach to regime switching analysis of financial time series. J. Comput. Appl. Math.
**2016**, 300, 43–55. [Google Scholar] [CrossRef] - Brockwell, P.J.; Davis, R.A. Introduction to Time Series and Forecasting, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Dickey, D.A.; Fuller, W.A. Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc.
**1979**, 74, 427–431. [Google Scholar] - Bloomfield, P. Fourier Analysis of Time Series: An Introduction; Wiley: Hoboken, NJ, USA, 2004. [Google Scholar]
- Percival, D.B.; Walden, A.T. Wavelet Methods for Time Series Analysis; Cambridge University Press: Cambridge, UK, 2000; Volume 4. [Google Scholar]
- Haykin, S. Neuronal Networks, A Comprehensive Foundation; Mc Master University: Hamilton, ON, Cananda, 1999. [Google Scholar]
- Graves, A. Supervised sequence labelling. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 5–13. [Google Scholar]
- Werbos, P.J. Generalization of backpropagation with application to a recurrent gas market model. Neural Netw.
**1988**, 1, 339–356. [Google Scholar] [CrossRef] [Green Version] - Hammer, B. On the approximation capability of recurrent neural networks. Neurocomputing
**2000**, 31, 107–123. [Google Scholar] [CrossRef] - Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Sagheer, A.; Kotb, M. Time series forecasting of petroleum production using deep LSTM recurrent networks. Neurocomputing
**2019**, 323, 203–213. [Google Scholar] [CrossRef] - Hagfors, L.I.; Kamperud, H.H.; Paraschiv, F.; Prokopczuk, M.; Sator, A.; Westgaard, S. Prediction of extreme price occurrences in the German day-ahead electricity market. Quant. Financ.
**2016**, 16, 1929–1948. [Google Scholar] [CrossRef] [Green Version] - Gentle, J.E. Random Number Generation and Monte Carlo Methods; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Marsaglia, G.; Tsang, W.W. The Ziggurat method for generating random variables. J. Stat. Softw.
**2000**, 5, 1–7. [Google Scholar] [CrossRef] [Green Version] - Ljung, G.M.; Box, G.E. On a measure of lack of fit in time series models. Biometrika
**1978**, 65, 297–303. [Google Scholar] [CrossRef] - Sheppard, K. MFE MATLAB Function Reference Financial Econometrics; Oxford University: Oxford, UK, 2009; Available online: http://www.kevinsheppard.com/images/9/95/MFE_Toolbox_Documentation.pdf (accessed on 15 November 2020).
- Maciejowska, K.; Nowotarski, J. A hybrid model for GEFCom2014 probabilistic electricity price forecasting. Int. J. Forecast.
**2016**, 32, 1051–1056. [Google Scholar] [CrossRef] [Green Version] - Brownlee, J. Deep Learning for Time Series Forecasting: Predict the Future with MLPs, CNNs and LSTMs in Python; Machine Learning Mastery: Victoria, Australia, 2018. [Google Scholar]

**Figure 2.**Electricity generation and consumption in August and September 2019. We can see the total electricity generation and consumption every day of the period, see [14] (CC BY 4.0 license).

**Figure 5.**Example of a LSTM’s memory block, where ${f}_{t}$, ${i}_{t}$, ${o}_{t}$ mean forget, input, and output gates, respectively; $\tau $ and $\sigma $ represent the tanh and sigma activation functions of [30].

**Figure 12.**Comparison between target and forecast values for two weeks with the given training data.

**Figure 17.**Summary of the results of long-term short-term memory (LSTM) network on one year data for two weeks forecast.

**Figure 21.**Summary of results of probabilistic hybrid approach for each time period of training for a time horizon of three weeks.

Training Period | SARIMA(${\mathit{p},\mathit{d},\mathit{q})\times (\mathit{P},\mathit{D},\mathit{Q})}_{\mathit{s}}$ |

One year | $(2,0,2)\times {(1,0,1)}_{7}$ |

Spring/Summer | $(2,1,2)\times {(1,1,1)}_{7}$ |

Autumn/Winter | $(3,1,3)\times {(1,1,1)}_{7}$ |

Train Period | One Year | Spring/Summer | Autumn/Winter |
---|---|---|---|

RMSE | 21.5418 | 12.5163 | 8.1396 |

Forecasted Days | Error before | Error after |
---|---|---|

14 | 1.83 $\xb7{10}^{3}$ | 0.1098 |

Period | ARIMA | Forecasted Days | Error before | Error after |
---|---|---|---|---|

Autumn/Winter | (2,0,0) | 14 days in January | 1.7390 | 0.1223 |

Spring/Summer | (1,1,4) | 14 days in June | 11.1979 | 0.1375 |

Number of Epochs | 250 |

Hidden Layers | 200 |

Initial Learn Rate | 0.005 |

Elapsed Time | 1 min 8 s |

Spring/Summer | Autumn/Winter | |
---|---|---|

Epochs | 280 | 250 |

Hidden Layers | 190 | 190 |

Initial Learn Rate | 0.009 | 0.007 |

Elapsed Time | 56 s | 43 s |

One Year | Autumn/Winter | Spring/Summer | |
---|---|---|---|

SARIMA | 10% | 30% | 50% |

NN | 90% | 70% | 50% |

One Year | Autumn/Winter | Spring/Summer | |
---|---|---|---|

SARIMA | 21.54 | 8.13 | 12.53 |

NN | 6.05 | 7.65 | 9.13 |

Hybrid | 4.15 | 7.08 | 9.38 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Viviani, E.; Di Persio, L.; Ehrhardt, M.
Energy Markets Forecasting. From Inferential Statistics to Machine Learning: The German Case. *Energies* **2021**, *14*, 364.
https://doi.org/10.3390/en14020364

**AMA Style**

Viviani E, Di Persio L, Ehrhardt M.
Energy Markets Forecasting. From Inferential Statistics to Machine Learning: The German Case. *Energies*. 2021; 14(2):364.
https://doi.org/10.3390/en14020364

**Chicago/Turabian Style**

Viviani, Emma, Luca Di Persio, and Matthias Ehrhardt.
2021. "Energy Markets Forecasting. From Inferential Statistics to Machine Learning: The German Case" *Energies* 14, no. 2: 364.
https://doi.org/10.3390/en14020364