# Mutual Information-Based Inputs Selection for Electric Load Time Series Forecasting

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methodology

#### 2.1. Review of Mutual Information

^{i}to its k nearest neighbors, averaged over all z

^{i}. Let us denote ${z}^{k\left(i\right)}=\left({x}^{k\left(i\right)},{y}^{k\left(i\right)}\right)$ the k

^{th}nearest neighbor of z

^{i}. It should be noted that x

^{k(i)}and y

^{k(i)}are the input and output parts of z

^{k(i)}respectively, and thus not necessarily the k

^{th}nearest neighbor of x

^{i}and y

^{i}. Let us define ${d}_{X}^{i}=\parallel {x}^{i}-{x}^{k\left(i\right)}\parallel $, ${d}_{Y}^{i}=\parallel {y}^{i}-{y}^{k\left(i\right)}\parallel $, ${d}_{Z}^{i}=\parallel {z}^{i}-{z}^{k\left(i\right)}\parallel $. Evidently, ${d}^{i}=\mathrm{max}({d}_{X}^{i},{d}_{Y}^{i})$. Subsequently, the number ${n}_{X}^{i}$ of points ${x}^{j}$ whose distance from x

^{i}is strictly less than d

^{i}are counted, and similarly the number ${n}_{Y}^{i}$ of points ${y}^{j}$ whose distance from y

^{i}is strictly less than d

^{i}are counted. Then, I(X,Y) can be estimated as presented in:

^{2}). This paper implements this type of estimator which is one of the two proposed in [17]. This type of MI estimator depends on the value chosen for k, which controls the bias-variance tradeoff. As it is recommended in [18] a mid-range value for k = 6 will be used.

#### 2.2. A Review of Least Squares Support Vector Machine

_{k}represents error variables, γ is a regularization parameter which gives relative weight to errors and should be optimized by the user. In order to solve the optimization problem defined with (7) and (8), it is necessary to construct a dual problem using the Lagrange function. Once the mathematical calculations have been carried out, described in detail in [10], the following linear system, presented in (9), was obtained:

^{3}), where N is the number of training examples.

^{2}denotes the square of the variance of the Gaussian function, which should be optimized by the user.

## 3. Data analysis

_{i}, i=1,..,24 and s=2 non-time series features: the hour of the day H

_{i}, ${H}_{i}\in \left\{1,2,\mathrm{...},24\right\}$ and the day of the week D

_{i}, ${D}_{i}\in \left\{1,2,\mathrm{...},7\right\}$where 1 corresponds to Monday, 2 to Tuesday and so on.

## 4. The Proposed Input Selection Algorithm and Forecasting Model

_{k}is the k

^{th}training instance and x

_{t}is the current forecasting instance, the MI between x

_{k}and x

_{t}is one criterion for measuring the dependence between them. In this way, by choosing instances which share a “greater” amount of MI with x

_{t}, a greater prediction accuracy can be achieved by committing the model with input x

_{t}. The proposed approach for instance selection according to the MI criterion is presented in Algorithm 1 and Figure 4.

_{t}. Accordingly, the vector of MI values is established, which defines the significance of inputs in X versus x

_{t}. Two options are available for the selection of instances: the “MI threshold” or the “number of instances”.

_{t}, and needs to be set manually. All input vectors that have a greater amount of MI with x

_{t}and then α will be added to the new training set$({X}^{\u2033},{Y}^{\u2033})$.

- (1)
- Initialization of the algorithm: from the available data, form the initial training and testing set, and choose one forecasting instance x
_{t}from the testing set. - (2)
- Calculate the MI between every vector from the initial training set x
_{k}k=1, ..., N and current forecasting instance x_{t}, based on (3), and save these values in vector V(k). - (3)
- Reorder the initial training set in descending order according the values in V vector. Then, based on V vector and the resorted initial training set:
- a)
- define a total number of vectors that will remain in the training set, denoted with r, or,
- b)
- define the lower allowed limit bound for MI between initial training set vector and current forecasting instance x
_{t}, denoted with α.

- (4)
- If the choice criterion is determined with α, based on vector V choose instances from the initial training set for which V(k) > α, k=1, ..., N holds and put them into a reduced training set $({X}^{\prime},{Y}^{\prime})$.
- (5)
- If the choice criterion is determined with r, form the reduced training set $({X}^{\u2033},{Y}^{\u2033})$ based on first r instances from the reordered initial training set.
- (6)
- Train the LS-SVM model based on the reduced training set obtained from steps 4. or 5., apply it to current testing instance x
_{t}, obtain prediction, and then update x_{t}for the next prediction step. Go to step 2. until predictions for all steps (hours of the current day) have been obtained. - (7)
- Choose another instance from the testing set x
_{t}and go to step 2. until predictions for all instances (daily loads by hours) in the training set have been obtained.

## 5. Experimental Results

- (1)
- M0—a model trained with an initial training set that contains 2160 vectors,
- (2)
- M1—a group of models trained with sets determined with a kNN MI criterion,
- (3)
- M2—a group of models trained with sets determined with a kernel MI criterion,
- (4)
- M3—a group of models trained with sets determined with a Pearson correlation coefficient criterion,
- (5)
- M4—an average fit model, predictions are the average of the past 3 years of data based on day of the week and hour of the day congruence,
- (6)
- M5—and the direct model, recursive forecasting model with direct implementation (without usage of forecasted values in future steps, but with true values instead).

^{th}hour and n is the number of hours. As seen from Table 1, average, maximum and minimum daily MAPEs for the entire test set are given for each model. Models M1-M3 are based on input selection algorithm with the “MI threshold” or “number of inputs” selection option. For these models the results are obtained using both selection options, with “MI threshold” values: 0.5 and 0.6, 0.8 and 0.9, 0.98 and 0.99 for M1, M2 and M3 model respectively, denoted with TH

_{1}and TH

_{2}. Likewise with “number of inputs” values: 50 and 100 for M1–M3 models, denoted with NI

_{1}and NI

_{2}. The obtained results from Table 1 indicate that model M1 has the best average MAPE over the entire test set in NI

_{1}, TH

_{1}and TH

_{2}selecting scenarios while model M3 show the best results in the NI

_{2}scenario. To be precise, this is true if we disregard the results obtained with model M5 which gained the best results in general. However, it should be borne in mind that this model is not real because it uses true values in each prediction step which are not known in real situations.

Model | M0 | M1 | M2 | M3 | M4 | M5 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

− | TH_{1} | TH_{2} | NI_{1} | NI_{2} | TH_{1} | TH_{2} | NI_{1} | NI_{2} | TH_{1} | TH_{2} | NI_{1} | NI_{2} | − | − | |

Avr. | 2.83 | 2.03 | 2.14 | 1.91 | 2.11 | 2.3 | 2.21 | 2.13 | 2.27 | 2.31 | 2.14 | 2.29 | 1.96 | 5.24 | 1.42 |

Max | 4.64 | 4.09 | 4.9 | 3.65 | 4.08 | 5.65 | 4.96 | 5.55 | 5.53 | 3.31 | 5.27 | 5 | 4.28 | 7.95 | 2.72 |

Min | 1.42 | 0.93 | 0.96 | 1 | 1.01 | 1.16 | 1.22 | 1.1 | 0.93 | 1.05 | 0.92 | 1.21 | 0.82 | 3.51 | 0.89 |

_{1}i.e., a model with input selection using kNN MI estimator and “number of inputs” selection option with 50 input vectors for the training set. With the obtained results, the other models with input selection (M2 and M3) are close to model M1 and all these models with input selection outperform the initial model.

**Figure 5.**Daily MAPEs for all of the generated models. (

**a**) “MI threshold” selection option with 0.5 for M1, 0.8 for M2 and 0.98 for M3 (TH

_{1}), (

**b**) “MI threshold” selection option with 0.6 for M1, 0.9 for M2 and 0.99 for M3 (TH

_{2}), (

**c**) “number of inputs” selection option with 50 vectors for M1–M3 (NI

_{1}), (

**d**) “number of inputs” selection option with 100 vectors for M1–M3 (NI

_{2}), (

**e**) models without input selection with M1 for comparison.

**Figure 6.**Vector number distribution in the initial training for the first hour of each day. (

**a**) by kNN MI. (

**b**) by kernel MI. (

**c**) by correlation coefficient.

**Figure 7.**Real and predicted loads of models M0 and M1.

**(a)**period from September 17 to 23, (

**b)**period from September 24 to 30.

Model Time | M1 | M2 | M3 | M0 |
---|---|---|---|---|

One-step input selection time | 3.72 | 1.34 | 1.04 | - |

One-step training time | 1.13 | 0.78 | 0.91 | 24.41 |

Total time | 116 | 50.8 | 47 | 584 |

## 6. Conclusions

## Acknowledgments

## References

- Irisarri, G.D.; Widergren, S.E.; Yehsakul, P.D. On-line load forecasting for energy control center application. IEEE Power Eng. Rev.
**1982**, PAS-101, 71–78. [Google Scholar] - Mori, H.; Kobayashi, H. Optimal fuzzy inference for short-term load forecasting. IEEE Trans. Power Syst.
**1996**, 11, 390–396. [Google Scholar] [CrossRef] - Rahman, S.; Bhatnagar, R. An expert system based algorithm for short term load forecast. IEEE Trans. Power Syst.
**1988**, 3, 392–399. [Google Scholar] [CrossRef] - Hippert, H.S.; Pedreira, C.E.; Souza, R.C. Neural networks for short-term load forecasting: a review and evaluation. IEEE Trans. Power Syst.
**2001**, 16, 44–55. [Google Scholar] [CrossRef] - Chen, B. -J.; Chang, M.-W.; Lin, C.-J. Load forecasting using support vector Machines: A study on EUNITE competition 2001. IEEE Trans. Power Syst.
**2004**, 19, 1821–1830. [Google Scholar] [CrossRef] - Fan, S.; Chen, L. Short-term load forecasting based on an adaptive hybrid method. IEEE Trans. Power Syst.
**2006**, 21, 392–401. [Google Scholar] [CrossRef] - Amjady, N.; Keynia, F. Short-term load forecasting of power systems by combination of wavelet transform and neuro-evolutionary algorithm. Energy
**2009**, 34, 46–57. [Google Scholar] [CrossRef] - Wang, J.; Zhu, S.; Zhang, W.; Lu, H. Combined modeling for electric load forecasting with adaptive particle swarm optimization. Energy
**2010**, 35, 1671–1678. [Google Scholar] [CrossRef] - Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] [CrossRef] - Suykens, J.A.K; Van Gestel, T.; De Brabanter, J.; De Moor, B.; Vandewalle, J. Least Squares Support. Vector Machines; World Scientific: Singapore, Singapore, 2002. [Google Scholar]
- Mandal, P.; Senjyu, T.; Uezato, K.; Funabashi, T. In Several-Hours-Ahead Electricity Price and Load Forecasting Using Neural Networks. In Proceeding of IEEE Power Engineering Society General Meeting, San Francisco, CA, USA, 12–16 June 2005; pp. 2146–2153.
- Niu, D.; Wang, Y.; Wu, D.D. Power load forecasting using support vector machine and ant colony optimization. Expert Syst. Appl.
**2010**, 37, 2531–2539. [Google Scholar] [CrossRef] - Ying, C.; Luh, P.B.; Che, G.; Yige, Z.; Michel, L.D.; Coolbeth, M.A.; Friedland, P.B.; Rourke, S.J. Short-Term Load Forecasting: Similar Day-Based Wavelet Neural Networks. IEEE Trans. Power Syst.
**2010**, 25, 322–330. [Google Scholar] [CrossRef] - Guillen, A.; Herrera, L.J.; Rubio, G.; Pomares, H.; Lendasse, A.; Rojas, I. New method for instance or prototype selection using mutual information in time series prediction. Neurocomputing
**2010**, 73, 2030–2038. [Google Scholar] [CrossRef] - Moddemeijer, R. A statistic to estimate the variance of the histogram-based mutual information estimator based on dependent pairs of observations. Signal. Process.
**1999**, 75, 51–63. [Google Scholar] [CrossRef] - Moon, Y.; Rajagopalan, B.; Lall, U. Estimation of mutual information using kernel density estimators. Phys. Rev. E
**1995**, 52, 2318–2321. [Google Scholar] [CrossRef] - Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E
**2004**, 69, 066138. [Google Scholar] [CrossRef] - Stögbauer, H.; Kraskov, A.; Astakhov, S.A.; Grassberger, P. Least-dependent-component analysis based on mutual information. Phys. Rev. E
**2004**, 70, 066123. [Google Scholar] [CrossRef] - Kandil, N.; Wamkeue, R.; Saad, M.; Georges, S. An efficient approach for short term load forecasting using artificial neural networks. Int. J. Elect. Power Energy Syst.
**2006**, 28, 525–530. [Google Scholar] [CrossRef] - Soares, L.J.; Medeiros, M.C. Modeling and forecasting short-term electricity load: A comparison of methods with an application to Brazilian data. Int. J. Forecast.
**2008**, 24, 630–644. [Google Scholar] [CrossRef] - Elia history load data web site. Available online: http://www.elia.be/en/grid-data/data-download (accessed on 20 December 2012).

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Božić, M.; Stojanović, M.; Stajić, Z.; Floranović, N.
Mutual Information-Based Inputs Selection for Electric Load Time Series Forecasting. *Entropy* **2013**, *15*, 926-942.
https://doi.org/10.3390/e15030926

**AMA Style**

Božić M, Stojanović M, Stajić Z, Floranović N.
Mutual Information-Based Inputs Selection for Electric Load Time Series Forecasting. *Entropy*. 2013; 15(3):926-942.
https://doi.org/10.3390/e15030926

**Chicago/Turabian Style**

Božić, Miloš, Miloš Stojanović, Zoran Stajić, and Nenad Floranović.
2013. "Mutual Information-Based Inputs Selection for Electric Load Time Series Forecasting" *Entropy* 15, no. 3: 926-942.
https://doi.org/10.3390/e15030926