Next Article in Journal
Review on Integrated On-Board Charger-Traction Systems: V2G Topologies, Control Approaches, Standards and Power Density State-of-the-Art for Electric Vehicle
Previous Article in Journal
Nonlinear Surface Conductivity Characteristics of Epoxy Resin-Based Micro-Nano Structured Composites
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Step Ahead Short-Term Electricity Load Forecasting Using VMD-TCN and Error Correction Strategy

School of Electrical Engineering, Beijing Jiaotong University, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Energies 2022, 15(15), 5375; https://doi.org/10.3390/en15155375
Submission received: 28 June 2022 / Revised: 22 July 2022 / Accepted: 22 July 2022 / Published: 25 July 2022
(This article belongs to the Section F1: Electrical Power System)

Abstract

:
The electricity load forecasting plays a pivotal role in the operation of power utility companies precise forecasting and is crucial to mitigate the challenges of supply and demand in the smart grid. More recently, the hybrid models combining signal decomposition and artificial neural networks have received popularity due to their applicability to reduce the difficulty of prediction. However, the commonly used decomposition algorithms and recurrent neural network-based models still confront some dilemmas such as boundary effects, time consumption, etc. Therefore, a hybrid prediction model combining variational mode decomposition (VMD), a temporal convolutional network (TCN), and an error correction strategy is proposed. To address the difficulty in determining the decomposition number and penalty factor for VMD decomposition, the idea of weighted permutation entropy is introduced. The decomposition hyperparameters are optimized by using a comprehensive indicator that takes account of the complexity and amplitude of the subsequences. Besides, a temporal convolutional network is adopted to carry out feature extraction and load prediction for each subsequence, with the primary forecasting results obtained by combining the prediction of each TCN model. In order to further improve the accuracy of prediction for the model, an error correction strategy is applied according to the prediction error of the train set. The Global Energy Competition 2014 dataset is employed to demonstrate the effectiveness and practicality of the proposed hybrid model. The experimental results show that the prediction performance of the proposed hybrid model outperforms the contrast models. The accuracy achieves 0.274%, 0.326%, and 0.405 for 6-steps, 12-steps, and 24 steps ahead forecasting, respectively, in terms of the mean absolute percentage error.

1. Introduction

Electric load forecasting (ELF) is essential for the formulation of economical, reliable, and secure operating strategies for the whole power system [1]. Given the difficulty in storing a large amount of electric energy, precise ELF is considered crucial for the management of generation capacity, scheduling, peak reduction, market evaluation, and so on [2]. Depending on the prediction horizon, electric load forecasting methods are classified into short-term (a few minutes to 24 h), long-term (1 to 10 years), and intermediate-term (a few days to several months) load forecasting [3]. Despite a common practice of using weather-based models for load prediction, they are of less importance to short horizons, because weather-related variables change constantly over a short period of time. Therefore, it is necessary to develop the modeling methods where only historical load data are required, such as termed univariate methods. In the event that weather data and forecasts are unavailable or unaffordable, it is necessary to adopt a univariate method [4].
In general, short-term load forecasting (STLF) methods can be classified into three categories: statistic-based methods, machine learning (ML)-based methods, and hybrid approaches, i.e., the statistic-ML methods.
  • According to the statistic-based STLF methods, it is assumed that the current observation is a linear combination of its historical values. Typically, those statistic-based methods include the Holts–Winters (HW) method and the autoregressive moving average (ARMA) methods and its variants. The HW method relies on exponential smoothing to encode historical values that are then used to forecast load for the present and future. For example, in 2011, Taylor adopted five exponentially weighted methods to forecast intraday load, with these methods used in combination to produce the best possible result [5]. In 2018, Mi et al., proposed a hybrid model, and applied the HW method to smooth the original load series [6]. The ARMA is aimed to describe the autocorrelation in time series data. For example, in literature [7,8,9], ARMA, ARIMA, and seasonal ARIMA were used, respectively, to forecast hourly load. However, the dynamic nature of load series hindered statistic-based models from representing its dynamics and capturing the inner non-linearity [10].
  • The ML-based STLF methods are operated by learning and mining the historical load features and establishing a prediction model for load forecasting. The commonly adopted approaches include support vector regression (SVR) and artificial neural networks (ANNs). For example, Chen et al., put forward an SVR model to predict the demand response baseline for office buildings [11]. ANNs perform well in feature extraction and nonlinear mapping [12], which is effective in addressing the weaknesses of the time series model. The primary neural networks applied for short-term forecasting include the multi-layer perceptron (MLP) [13] and the feedforward deep neural network (DNN) [14]. With the advancement of machine learning, the recurrent neural network (RNN) has been demonstrated as one of the most effective models for processing time series data. For example, in literature [15,16], the electric load forecasting models based on LSTM were proposed, with the results showing that LSTM can produce a better performance in prediction and achieve a higher accuracy compared to primary ANNs. As a variant of LSTM, the gated recurrent unit (GRU) features a simpler network structure. In [17,18], GRU was introduced for short-term load forecasting. However, it remains necessary to enhance the performance in feature extraction due to the lack of convolution in the model. To address these problems, convolutional neural networks (CNN) have been adopted for their excellent capability of feature extraction. In [19,20], a CNN-LSTM model and a CNN-GRU model were proposed, with CNN layers used for feature extraction from the input data and RNN layers used for sequence learning. As suggested by the research results, CNN hybrid methods outperform the RNN-based model in terms of accuracy. More recently, a specialised CNN architecture known as temporal convolutional networks (TCN) has received popularity due to their applicability to process time series data [21], with the results showing that TCN outperforms LSTM in the accuracy of forecasting. Distinct from one-dimensional CNN [22], TCN relies on stacked CNN and dilated casual convolution, which enables the extraction of time series dependencies between the long intervals of historical data [23].
  • According to the hybrid (statistics-ML-based) approach, statistical theory is applied to decompose time series data, ML methods are used to analyse and predict the decomposition, and finally the prediction results of each decomposition part are aggregated [24]. For example, Hasmat Malik et al., proposed a hybrid model where empirical mode decomposition (EMD) and neural networks were combined for multi-step ahead load forecasting [25]. Jian Li et al., integrated empirical mode decomposition (EEMD) with LSTM [26]. Zhang et al., put forward a load prediction method based on complete ensemble empirical mode decomposition (CEEMD) and LSTM [27], with the results suggesting that hybrid approaches are conducive to improving the accuracy of forecasting. However, there are two technical problems with the EMD-like decomposition algorithms. On the one hand, the number of their decompositions is inconsistent when a complex time series is decomposed, and it is difficult to determine the number of predictive models for the subsequent compilation of the program, which will lead to interruptions. On the other hand, EMD-like algorithms are usually disadvantaged by such problems as boundary effects, mode overlap, and the sensitivity to noise, which may affect the final outcome of forecasting. As a novel multiresolution technique intended for signal processing, variational mode decomposition (VMD) [28] is entirely non-recursive and has a solid theoretical foundation. According to the experimental results shown in [29], VMD outperforms CEEMD in terms of decomposition. By decomposing the original series into linear and nonlinear parts, VMD reduces the difficulty of forecasting and ensures the high level of prediction accuracy [30]. However, the application of VMD remains problematic in reality, especially when it comes to determining the hyperparameters [31]. To solve this problem, there have been different evaluation methods proposed that take into account the number of decomposition modes. For example, Li Y et al., used the ratio of residual energy to determine the optimal decomposed number [32]. However, this method produces too many decomposed subsequences, which may lead to it being extremely time consuming. In [33], permutation entropy (PE) was applied to minimize the complexity of subsequences. However, PE is not a comprehensive indicator as it ignores the amplitude of subsequences. To a large extent, validating the VMD model for decomposing data hinges on the purpose of the experiment. Therefore, it is essential to adopt an appropriate evaluation method for the VMD model.
As discussed above, considering the unique advantage of VMD in non-linear and irregular signal decomposition, along with the excellent performance of TCN in prediction, a hybrid model based on VMD and TCN is proposed in this paper for electric load forecasting. The original load series is decomposed into several subsequences based on VMD, and TCN models are applied to train each subsequence. Then, the prediction results obtained by each TCN model are combined to obtain the primary prediction results. According to the analysis of the primary results, there is a certain characteristic in the error series that conforms to the Gaussian distribution. Moreover, an error correction strategy is adopted to further improve forecasting accuracy.
Herein, the publicly available GEFCom dataset [34] is used to validate the proposed method. The main contributions of this paper are detailed as follows:
  • In the process of data pre-processing, a novel approach was proposed to determine the most suitable VMD decomposition number and penalty factor based on weighted permutation entropy (WPE). As a more comprehensive indicator, WPE was used to evaluate the time complexity and amplitude of the decomposed subsequences.
  • A VMD-TCN model framework was constructed. VMD, as a competitive method of signal decomposition, was applied to reveal the irregular characteristics of the original series. The TCN network, as a method of deep learning, was adopted to explore the hidden high-level interdependence of the electrical load data, which leads to the highly effective prediction.
  • An error correction strategy was applied. For the error series between the primary forecasting results and the actual load, an extra TCN model was established again for the purpose of learning and forecasting. The final forecasting results were obtained by summing the error correction values and primary prediction results.
The rest of this paper is organized as follows. In Section 2, the basic methodology of VMD, WPE, and TCN is described. Section 3 introduces the whole process of our proposed forecasting framework. In Section 4, a case study is conducted to validate the method proposed in this paper. Finally, the conclusion is drawn in Section 5.

2. Methodology

2.1. Variational Mode Decomposition

As a novel, non-stationary, signal-adaptive decomposition estimation method proposed by Konstantin Dragomiretskiy in 2014, variational mode decomposition (VMD) is capable of decomposing the original complex non-linear signal into multiple simpler intrinsic mode functions (IMFs), thus reducing the difficulty in making predictions and improving its accuracy. Different from EMD-like algorithms, where recursive method is applied to decompose modes, VMD relies on non-recursive decomposition to process the original signal, which is effective in suppressing mode overlap and improving reliability [35].
VMD assumes that each “mode” is a finite bandwidth with unique center frequencies. The whole process to decompose the original signal X into finite bandwidth IMFs u k ( t ) ( i = 1 , 2 , , K ) could be described as follows:
  • For each subsequence, Hilbert transform is applied to calculate its analytical signal. Then, obtain its unilateral frequency spectrum
    δ ( t ) + j π t u k ( t )
    where δ ( t ) is the Dirac distribution, is convolution operator, j represents the imaginary part, which satisfies j 2 = 1 .
  • Estimate the center frequency of each mode by hybrid exponential tuned e j ω k t to shift it to the baseband
    ( δ ( t ) + j π t ) u k ( t ) e j ω k t
  • Calculate the bandwidth of each modal signal based on Gaussian smoothing, and the constrained variational problem could be described as
    min u , ω k t ( δ ( t ) + j π t ) u k ( t ) e j ω k t 2 s . t . k u k = f
    where K is the decomposition number, u = u 1 , , u K represent the set of modal functions, and ω = ω 1 , , ω K is the set of center frequencies that correspond to the modal functions.
  • In order to make the above constrained variational problem unconstrained, quadratic penalty factor α and Lagrangian multiplication operator λ ( t ) are introduced, where α maintains the reconstruction precision of the signal and λ ( t ) maintains the strictness of the constraint. The extended Lagrangian expression is shown in (4):
    L ( u k , ω k , λ ) = α k t [ ( δ ( t ) + j π t ) u k ( t ) ] e j ω k t 2 + f ( t ) k u k ( t ) 2 2 + λ ( t ) , f ( t ) k u k ( t )
  • Alternating direction method of multipliers (ADMM) is adopted to solve the original minimization problem. In the expanded Lagrange expression, u k n + 1 , ω k n + 1 , and λ n + 1 are alternately updated to find the “saddle point”, where u k n + 1 could be transformed into the frequency domain using the Fourier isometric transformation
    u ^ k n + 1 ( ω ) = f ^ ( ω ) i k u ^ i ( ω ) + λ ^ ( ω ) 2 1 + 2 α ( ω ω k ) 2
    ω k n + 1 = 0 ω u ^ k n ^ ( ω ) 2 d ω 0 u ^ k n ^ ( ω ) 2 d ω
    where n ^ is the number of iterations and f ^ ( ω ) , u ^ i ( ω ) , λ ^ ( ω ) , and u ^ k n ^ + 1 ( ω ) denote the Fourier transformation of f ( t ) , u ( t ) , λ ( t ) , and u k n ^ + 1 ( t ) , respectively.
To select the appropriate hyperparameters is crucial for the VMD model, especially the number of the decomposed modes K and the quadratic penalty factor α . When different K values are set, there would be a significant difference. Too many modes can lead to mode aliasing and additional noise, while too few components will cause the underdecomposition of the original signal. In addition, the penalty factor can also make difference to the outcome of decomposition. The smaller the penalty factor, the larger the bandwidth of each IMF component, and vice versa.
The effect caused by different values of K and α on VMD results is analyzed using GEFCom dataset. To begin with, α is set to 100, and K is set to 2, 3, 5, 7, and 9, respectively. The center frequency of each IMF varies with K, as shown in Figure 1. It can be seen from the figure there is a significant impact of K value on the outcome of decomposition. When K is set to 2, the bandwidth of IMF2 is relatively large, indicating the under-decomposition of the original signal. As the value of K increases, the central frequency bandwidth of each IMF is reduced gradually. When K = 9, multiple IMFs show frequencies aliasing at the beginning of iterations, which suggests the over-decomposition of the original signal and the emergence of superfluous components.
Secondly, K is set to 7, and α is set to 50, 200, 500, 1000, and 2000, respectively. The center frequency of each IMF varies with α , as shown in Figure 2. It can be seen from the figure that the value of α has a considerable impact on the decomposition result as well. When α is set to 5, the IMFs lead to severe frequency aliasing at the beginning of the iterations, with the frequencies of IMF3 and IMF7 being highly comparable. When α increases to 200, the frequency aliasing between the IMFs disappears. However, with a further increase to 500, the frequency of IMF7 fluctuates, which indicates the failed convergence of decomposition. By further increasing to 2000, the fluctuation of the high frequency component discontinues, and the decomposition results converge again. It can be seen from above that two parameters, K and α , can make a significant difference to the outcome of decomposition and it is difficult to determine them. To solve this problem, a novel approach to optimize K and α is proposed in this paper by introducing weighted permutation entropy.

2.2. Weighted Permutation Entropy

Permutation entropy (PE) is a measure of the complexity of a time series signal [36]. For each subsequence obtained from VMD, the complexity can be measured by the magnitude of the PE. The higher the PE, the more complex the subsequence is, and vice versa, the more regular the subsequence is. For a time series containing N observations X ( i ) , i = 1 , 2 , , N , with given embedding dimension m and delay time τ , the embedding vectors are represented as
x ( 1 ) x ( 1 + τ ) x ( 1 + ( m 1 ) τ ) x ( j ) x ( j + τ ) x ( j + ( m 1 ) τ ) x ( I ) x ( I + τ ) x ( I + ( m 1 ) τ )
Each row in matrix is an arranged in subvectors, where I is the number of arranged subvectors. In order to find the degree of correlation between the data, each of the subvectors is rearranged in ascending order. The process is shown
x ( i + ( j 1 1 ) τ ) x ( i + ( j 2 1 ) τ ) x ( i + ( j m 1 ) τ )
where j i indicates the position of the data before being rearranged. Then, the original subvector is mapped onto j i , j 2 , , j m , which is one of the m! permutations of m distinct symbols [1, 2, 3, …, m].
Introduce a probability distribution P 1 , P 2 , , P L to represent the frequency of occurrence of each subvectors in the order of rearrangement, where L m ! . The PE of the time series X ( i ) , i = 1 , 2 , , N can be obtained by calculating the Shannon entropy of L distinct symbols as:
H PE ( m ) = i = 1 L P i ln P i
By normalizing it, we obtain
0 H ¯ PE = H PE / ln ( m ! ) 1
The PE of each IMF could be calculated according to Equation (10), denoted as P E ( 1 ) , P E ( 2 ) , , P E ( K ) . Since various forecasting methods based on signal decomposition are capable of improving prediction accuracy, modal decomposition is effective in reducing volatility and eliminating the unrelated noise, so as to make it easier to predict each component. However, there are significant variations in the amplitude of the decomposed subsequences, with the modes of larger amplitude exerting a more significant effect on the summed prediction results. To predict the time series, it is necessary to have the decomposed subsequences with minimum time complexity, but the effect of subsequence amplitude is also worth taking into account. Based on the weight permutation entropy (WPE) algorithm [37], the average amplitude ratio of each IMF to the original load series is used to determine the weight.
It is assumed that the average amplitude of the original load series is A, and the average amplitude of each subsequence can be denoted as A 1 , A 2 , , A k . Then, the average amplitude can be calculated using the following equation.
A = 1 n i = 1 n x i
Let w i = A i / A , then the WPE value can be expressed as
P EW = w 1 P E 1 + w 2 P E 2 + + w K P EK
P EW takes into account both time complexity and amplitude for the decomposition results. For the forecasting problem, it is necessary to minimize P EW . Table 1 lists the calculation result of P EW under different K and α .
As shown in Table 1, when α is fixed, P EW decreases significantly with the increase in K, which confirms the capability of decomposition to reduce signal complexity. However, when α takes larger values, further raising K will increase the complexity of the decomposition results. Table 1 also shows that when K is set to 6 and α is set to 3000, the combined PWE reaches its minimum, which means that the weighted time complexity of subsequences is the lowest. Therefore, the hyperparameters of VMD are set as K = 6, and α = 3000, with the remaining parameters set by making reference to literature [32]. Besides, the tolerance is set to 0.3, and ε is set to 10−6.

2.3. Temporal Convolutional Network

The temporal convolutional network (TCN) is a variant of the convolutional neural network. Instead of using pooling layers as traditional convolutional networks, it gradually increases the receptive field through multiple dilated casual convolution layers, with the output containing more information, which makes the TCN more suitable for time series prediction [38]. The structure of dilated casual convolution layers is shown in Figure 3. Causal convolution refers to the practice of obtaining the output of time t through the convolution of elements at time t and earlier in the previous layer. It prevents the risk of information leakage, thus meeting the requirements of load forecasting. For a convolutional network, the receptive field can be expanded in three ways. The first one is to increase the size of the kernel, the second one is to increase the number of layers, and the last one is to perform a pooling layer before the convolution layer. However, the first two approaches will increase the number of parameters of network, while the third approach will cause the loss of some information. Dilated convolution provides a solution that does not increase the number of network parameters while expanding the receptive field of the output. The receptive field of TCN with dilation base b, kernel size k, and number of layers n is expressed as:
r = 1 + i = 0 n 1 ( k 1 ) b i = 1 + ( k 1 ) b n 1 b 1
In terms of dilated casual convolution, the overall characteristic of a long series can be effectively captured by adjusting hyperparameters such as kernel size, the number of convolution layers, and the dilation factor d. In addition, to maintain the whole sequence information, the output of each layer is zero padded to match the number of input sequences [39]. For a one-dimensional time series, the mathematical model of dilated casual convolution is expressed as follows:
y ( s ) = ( x f ) ( s ) = i = 0 k 1 f ( i ) x s d × i
where x represents the input, y denotes the output, d denotes the dilation factor, and refers to the convolutional operator.
In general, the feature extraction ability of the neural network is gradually improved with the depth of the network. However, the increase in depth can also lead to a range of problems such as gradient disappearance and explosion, etc. As for the degradation of deep neural networks, it could be addressed through residual connection (skip connect). For those neural networks with the same number of layers, the network with residual block can also converge faster, which allows neural networks to adopt a deeper design. Figure 4 shows the structure of residual connection. In order to prevent overfitting, regularization is introduced via dropout after every convolutional layer. Rectified linear unit (Relu) is used as the activation function. To normalize the input of hidden layers, which counteracts the exploding gradient problem among others, weight normalization is performed for every convolutional layer. The original input is processed by 1 × 1 convolution and adds to its output, which can be expressed as Equation (15). Furthermore, TCN is more memory efficient than recurrent neural networks due to the shared convolution architecture that allows them to process long sequences in parallel [40].
o = a c t i v a t i o n ( x + F ( x ) )

3. The Proposed Forecasting Model

The framework of the short-term load forecasting method proposed in this paper consists of VMD, TCN, and error correction strategy, as shown in Figure 5. It can be sum-marized as the following steps:
  • Determine the optimal decomposition number K and penalty factor α . Based on permutation entropy, the hyperparameters are determined by minimizing the time complexity of the decomposed subsequences. Considering the variation in amplitudes of the subsequences, the high amplitudes also have a relatively more significant impact on the final prediction results. Therefore, a comprehensive indicator P EW is introduced to evaluate the effect of different hyperparameter values, and the hyperparameter with the minimum P EW is treated as the optimal.
  • VMD decomposition. The original load sequence is denoted as X = x 1 , x 2 , , x n , where n represents the number of observations. There are K intrinsic mode functions obtained based on VMD, and they are denoted as u ( 1 ) u ( K ) . Then, each subsequence is normalized to (0, 1), respectively.
  • Split train set and test set. A moving window scheme is used to create the input–output pairs that will be fed into neural network. Figure 6 illustrates the process of applying the moving window over the complete time series. The input data of each sample can be represented as I = x 1 , x 2 , , x L , where L refers to the sequence length. The label of each sample can be represented as P = x L + 1 , x L + 2 , , x L + S , where S represents the forecasting steps. When different values of S are chosen, the S step future loads can be predicted.
  • Each subsequence is trained and predicted using a TCN model. For neural networks, weights updating is a non-convex optimization problem, as the 40 values of weights are related to the convergence efficiency and generalization ability of the network [41]. For the convolutional layers in TCN, weights are randomly initialized through a Gaussian distribution with a mean of 0 and a variance of 0.01. The primary forecasting results X pred are obtained by aggregating the prediction of each TCN model. X pred can be expressed as follows.
    X pred = i = 1 K u ^ ( i )
    where u ^ ( i ) denotes the prediction of the i-th subsequence.
  • Apply the error correction strategy. According to the central limit theorem, a certain pattern of errors between the actual historical data and the primary results can be found out. The distribution of error series conforms to the Gaussian distribution. Thus, an error correction strategy is applied. The trained forecasting model is applied to predict the train set, and forecasting values for the train set are obtained, which denotes as X train . The error series e train for the train set can be expressed as
    e train = X ^ train X train
    An extra TCN model is adopted to reveal the intrinsic characteristics of error series. Then, the primary results and the correction values X e are summed to obtain the final prediction results.
    X ^ = X pred + X e
  • By comparing the final prediction result X ^ with the actual load on the test set, performance indicators are calculated to evaluate the model.

4. Experiment and Discussion

The simulation and validation of the model proposed in this paper were performed on a computer (CPU: Intel I7 3.0 GHz, installed RAM: 16 GB, GPU: Nivida GeForce GTX 1050 Ti, operating system: Windows 10 64-bit). The training and analyses for this study were conducted in the Python 3.8 environment, with the Pytorch 11.3 CUDA version as the framework applied to construct the deep learning model.

4.1. Experimental Dataset

GEFCom 2014 dataset consists of three years of hourly load from the New England ISO. Given a strong seasonality of the power load time series, the data recorded in 2014 were used as the experimental dataset that involves 8760 observations, with the first ninety percent, i.e., the first 7884 observations as the training set and the last ten percent, i.e., the last 876 observations, as the test set. The original load series is shown in Figure 7.
As shown in Figure 6, the original load is a complicated series, showing distinct non-linear and non-stationary characteristics. According to the decomposition results shown in Figure 8, the average amplitude of IMF1 is relatively larger and varies slowly, with the trend component of the original sequence captured. As for subsequences IMF2 and IMF3, they show a certain regularity and a relatively evident periodicity, as a result of which they can be treated as periodic components. The average amplitude of IMF4, IMF5, and IMF6 declines in order, showing poor regularity but significant fluctuation. In order to eliminate the effect of amplitude while improving the efficiency of processing for the subsequences, each subsequence is normalized separately to (0, 1). The normalization equation is shown as follows.
x i = x i x min x max x min

4.2. Performance Indicators

Herein, the root mean square error (RMSE), the mean absolute percentage error (MAPE), and R-square (R2) are adopted to evaluate the forecasting result produced by the proposed model. RMSE is a commonly used measure of the differences between the actual and predicted values. MAPE is applicable to intuitively interpret the relative error. R-square is used to indicate the predictive fit of the model. These metrics are detailed as follows:
  • RMSE is defined as follows:
    RMSE = 1 n i = 1 n ( x ^ i x i ) 2
  • MAPE is defined as follows:
    MAPE = 100 % n i = 1 n x ^ i x i x i
  • R2 is defined as follows.
    R 2 = 1 i ( x ^ i x i ) 2 ( x i x ¯ i ) 2
    where x i and x ^ i are the actual and forecasting load value at time i.

4.3. Training Model

When each mode subsequence is trained and predicted, the forecast steps are defined by the problem, and the input data size must be decided. Since the contribution of far back historical data to the prediction results is relatively low, data redundancy occurs. However, it is possible for the selection of too short historical data as an input feature to cause information loss, which is likely to affect the prediction accuracy. Since the sampling period is one hour and the load series has a 24-h periodicity, the size of the input data is set to 24, i.e., the past 24 h data are used to predict the future load.
As different subsequences of the VMD decomposition tend to have different characteristics, each subsequence is predicted using a TCN model separately. The TCN structure contains three convolutional layers, a fully connected layer with 64 neurons, and an output layer with S neurons, where S represents the forecasting steps. For convolutional layers, the kernel size is set to 4, the number of filters is set to 10, and the stride is set to 1. The Adam optimizer is employed to update these networks, with the learning rate set to 0.001. The number of training epochs is set to 50, and the batch size is set to 32. The primary prediction results are obtained by summing the predictions of each TCN model.
Figure 9a,b show the error series between the actual load and the primary predicted results on the training set and its distribution, respectively. As can be seen from the figures, the prediction error follows a certain pattern, and the probability density of the error approximately conforms to a Gaussian distribution with a mean value of −20. The prediction results on the test set can be compensated for by using the error correction strategy. Furthermore, an extra TCN model is adopted to capture the hidden interdependence of the error series and improve the primary prediction results for the test data. The prediction results on the test set error series are shown in Figure 10, with the MAPE being 3.49% and the RMSE being 16.29 kW. The TCN model maintains a high level of forecasting accuracy for the error series, which ensures that the error correction strategy is effective in improving the final prediction accuracy.

4.4. Experimental Result

The prediction models with 6, 12, and 24 steps ahead are constructed, respectively. To analyze the prediction performance of the proposed model, the TCN and the VMD-TCN are treated as baseline models. Figure 11 shows the comparison of prediction results with 6-steps on the test set. It can be seen from the figure that all three models have captured the periodicity and fluctuations in the load series. However, the fitting performance of the TCN is relatively poor at peak hours and the inflection observations. Figure 12 shows the mean absolute error for the proposed and contrast methods. It can be found out that the purple line, which indicates the mean absolute error of the proposed method, approaches zero the most.
Table 2 lists the values of the three error indicators for the proposed and contrast methods, including the RMSE, the MAPE, and R2. Apparently, the proposed method outperforms the TCN and the VMD-TCN, with the RMSE and MAPE indicator being the smallest relative to other methods. The prediction accuracy shows a significant improvement due to signal decomposition, which verifies that the interdependence of the time series can be fully captured by decomposing the load series with non-stationary characteristics into the relatively stable trend and periodic components. By comparison, the error correction strategy fully exploits the information in the error series on the training set and further improves the prediction accuracy. Under the context of 6-steps ahead forecasting, the MAPE and RMSE of the VMD-TCN are reduced by 53.98% and 55.50% compared to the TCN, respectively. Relative to the VMD-TCN, the MAPE and RMSE of the proposed method are reduced by 61.46% and 57.99%, respectively. In the 12-steps and 24-steps ahead forecasting, the proposed model also performs better. As the number of forecasting steps increases, the prediction accuracy of the models declines gradually due to cumulative errors, despite the overall accuracy of the proposed model remaining at a high level.

4.5. Comparison

To verify the superiority of TCN in the time series prediction, we also evaluated the performance of LSTM and GRU, which have so far been considered as state-of-the-art in forecasting methods. LSTM introduces cell state on the basis of RNN to achieve long-term memory, which is conducive to solving the gradient explosion or gradient disappearance for RNN. Compared to LSTM, GRU involves fewer parameters since it lacks an output gate. The network structure of LSTM and GRU contain two hidden layers with 64 neurons and output layers with S neurons based on the forecasting steps. The training hyperparameters are same to make it comparable to the TCN model. Table 3 lists the forecasting results as obtained by using the same dataset for comparison.
As shown in Table 3, the three hybrid models outperform the single ML-based models. This is because VMD decomposes the original series into multiple simpler subsequences, thus reducing the prediction error caused by series volatility. By comparing different prediction models, it can be discovered that the prediction accuracy decreases with the number of the forecasting steps increasing. The prediction accuracy of LSTM and GRU is found to be comparable. Although LSTM outperforms the GRU with 6-steps and 24-steps forecasting slightly, it remains inferior to the TCN model. Boxplots are used to visualize the absolute values of forecasting errors in Figure 13, where the box size indicates the spread of prediction errors, + indicates the prediction error outliers, and the orange line indicates the median absolute error. With the different forecasting steps, it can be seen from Figure 13 that the prediction error range of the TCN is narrowest while LSTM and GRU are wider. In addition, the forecasting outliers and the median absolute error of the TCN are smaller than that of contrast models. The comparative experiment was conducted to further demonstrate the significant advantage of the TCN model in power load prediction. Table 4 shows the training time required for different models, revealing that LSTM takes the longest training time, while GRU takes slightly less training time than LSTM. With a 6-step forecasting horizon, TCN takes significantly less training time, which is only 14.07% of LSTM and 16.87% of GRU, respectively. Since the decomposition method increases the forecasting tasks, the execution for VMD-LSTM and VMD-GRU becomes time consuming. By comparing 6-steps forecasting, VMD-TCN is only 10.47% of VMD-LSTM and 13.03% of VMD-GRU in time consumtion.

5. Conclusions

Based on deep learning techniques, the TCN neural networks are advantageous in modeling series data with excellent learning capacities. Herein, a novel hybrid model was proposed by combining a TCN with VMD and an error correction strategy for muti-step electricity load forecasting. To determine the hyperparameters for the VMD algorithm, weighted permutation entropy was introduced to evaluate the decomposition results. As a more comprehensive indicator, WPE takes account of the complexity of decomposed subsequences and their impact on amplitude. By decomposing the original load series with non-linearity and irregularity into several simpler subsequences, VMD reduces the difficulty of prediction. TCN models are established, respectively, to extract the different characteristics of subsequences, with the primary results obtained by summing the predictions of each TCN model. Given a certain pattern in the error series between primary results and actual values, an extra TCN model was applied to capture the interdependence in the error series. Furthermore, the error correction strategy was adopted to improve forecasting accuracy. In terms of forecasting accuracy and time-efficiency, the hybrid model was validated through comparison against other contrast models. Future work will focus on considering the impact of more complex environmental factors, such as temperature and humidity, improving the modeling accuracy.

Author Contributions

Conceptualization, H.Z.; methodology, F.Z.; software, F.Z.; validation, F.Z., Z.L. and K.Z.; writing—original draft preparation, F.Z.; writing—review and editing, F.Z.; visualization, F.Z.; supervision, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key R&D Program Funding. No. 2017YFB0903403.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Kuster, C.; Rezgui, Y.; Mourshed, M. Electrical load forecasting models: A critical systematic review. Sustain. Cities Soc. 2017, 35, 257–270. [Google Scholar] [CrossRef]
  2. Malik, H.; Fatema, N.; Iqbal, A. Intelligent Data-Analytics for Condition Monitoring: Smart Grid Applications; Elsevier: Amsterdam, The Netherlands, 2021. [Google Scholar]
  3. Soliman, S.A.; Al-Kandari, A.M. Dynamic Electric Load Forecasting: Modeling and Model Construction; Elsevier: Amsterdam, The Netherlands, 2010. [Google Scholar]
  4. Petropoulos, F.; Spiliotis, E. The wisdom of the data: Getting the most out of univariate time series forecasting. Forecasting 2021, 3, 29. [Google Scholar] [CrossRef]
  5. Taylor, J.W. Short-term load forecasting with exponentially weighted methods. IEEE Trans. Power Syst. 2011, 27, 458–464. [Google Scholar] [CrossRef]
  6. Mi, J.; Fan, L.; Duan, X.; Qiu, Y. Short-term power load forecasting method based on improved exponential smoothing grey model. Math. Probl. Eng. 2018, 2018, 3894723. [Google Scholar] [CrossRef] [Green Version]
  7. Pappas, S.S.; Ekonomou, L.; Karampelas, P.; Karamousantas, D.C.; Katsikas, S.K.; Chatzarakis, G.E.; Skafidas, P.D. Electricity demand load forecasting of the Hellenic power system using an ARMA model. Electr. Power Syst. Res. 2010, 80, 256–264. [Google Scholar] [CrossRef]
  8. Wu, F.; Cattani, C.; Song, W.; Zio, E. Fractional ARIMA with an improved cuckoo search optimization for the efficient Short-term power load forecasting. Alex. Eng. J. 2020, 59, 3111–3118. [Google Scholar] [CrossRef]
  9. Tarsitano, A.; Amerise, I.L. Short-term load forecasting using a two-stage sarimax model. Energy 2017, 133, 108–114. [Google Scholar] [CrossRef]
  10. Lara-Benítez, P.; Carranza-García, M.; Riquelme, J.C. An experimental review on deep learning architectures for time series forecasting. Int. J. Neural Syst. 2021, 27, 458–464. [Google Scholar] [CrossRef]
  11. Chen, Y.; Xu, P.; Chu, Y.; Li, W.; Wu, Y.; Ni, L.; Bao, Y.; Wang, K. Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings. Appl. Energy 2017, 195, 659–670. [Google Scholar] [CrossRef]
  12. Wang, R.; Li, C.; Fu, W.; Tang, G. Deep learning method based on gated recurrent unit and variational mode decomposition for short-term wind power interval prediction. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 3814–3827. [Google Scholar] [CrossRef]
  13. Cerjan, M.; Matijaš, M.; Delimar, M. Dynamic hybrid model for short-term electricity price forecasting. Energies 2014, 7, 3304–3318. [Google Scholar] [CrossRef]
  14. Marcjasz, G.; Lago, J.; Weron, R. Neural networks in day-ahead electricity price forecasting: Single vs. multiple outputs. arXiv 2020, arXiv:2008.08006. [Google Scholar] [CrossRef]
  15. Muzaffar, S.; Afshari, A. Short-term load forecasts using LSTM networks. Energy Procedia 2019, 158, 2922–2927. [Google Scholar] [CrossRef]
  16. Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 2017, 10, 841–851. [Google Scholar] [CrossRef]
  17. Xiuyun, G.; Ying, W.; Yang, G.; Chengzhi, S.; Wen, X.; Yimiao, Y. Short-term load forecasting model of gru network based on deep learning framework. In Proceedings of the 2018 IEEE Conference on Energy Internet and Energy System Integration, Beijing, China, 20–22 October 2018. [Google Scholar] [CrossRef]
  18. Zheng, J.; Chen, X.; Yu, K.; Gan, L.; Wang, Y.; Wang, K. Short-term power load forecasting of residential community based on GRU neural network. In Proceedings of the 2018 International Conference on Power System Technology, Guangzhou, China, 6–9 November 2018. [Google Scholar] [CrossRef]
  19. Alhussein, M.; Aurangzeb, K.; Haider, S.I. Hybrid CNN-LSTM model for short-term individual household load forecasting. IEEE Access 2020, 8, 180544–180557. [Google Scholar] [CrossRef]
  20. Sajjad, M.; Khan, Z.A.; Ullah, A.; Hussain, T.; Ullah, W.; Lee, M.Y.; Baik, S.W. A novel CNN-GRU-based hybrid approach for short-term residential load forecasting. IEEE Access 2020, 8, 143759–143768. [Google Scholar] [CrossRef]
  21. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
  22. Zhang, J.; Liu, C.; Ge, L. Short-Term Load Forecasting Model of Electric Vehicle Charging Load Based on MCCNN-TCN. Energies 2022, 15, 2633. [Google Scholar] [CrossRef]
  23. Lara-Benítez, P.; Carranza-García, M.; Luna-Romera, J.M.; Riquelme, J.C. Temporal convolutional networks applied to energy-related time series forecasting. Appl. Sci. 2020, 10, 2322. [Google Scholar] [CrossRef] [Green Version]
  24. Yan, J.; Mu, L.; Wang, L.; Ranjan, R.; Zomaya, A.Y. Temporal convolutional networks for the advance prediction of ENSO. Sci. Rep. 2020, 10, 8055. [Google Scholar] [CrossRef]
  25. Malik, H.; Alotaibi, M.A.; Almutairi, A. A new hybrid model combining EMD and neural network for multi-step ahead load forecasting. J. Intell. Fuzzy Syst. 2022, 42, 1099–1114. [Google Scholar] [CrossRef]
  26. Li, J.; Deng, D.; Zhao, J.; Cai, D.; Hu, W.; Zhang, M.; Huang, Q. A novel hybrid short-term load forecasting method of smart grid using MLR and LSTM neural network. IEEE Trans. Ind. Inform. 2020, 17, 2443–2452. [Google Scholar] [CrossRef]
  27. Zhang, W.; Huang, W. Multivariate load prediction method for integrated energy system based on CEEMD-LSTM. In Proceedings of the International Symposium on Geographic Information, Energy and Environmental Sustainable Development, Tianjin, China, 26–27 December 2021. [Google Scholar] [CrossRef]
  28. Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal. Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
  29. Yan, X.; Jia, M. Application of CSA-VMD and optimal scale morphological slice bispectrum in enhancing outer race fault detection of rolling element bearings. Mech. Syst. Signal. Process. 2019, 122, 56–86. [Google Scholar] [CrossRef]
  30. Zhang, Y.; Zhao, Y.; Kong, C.; Chen, B. A new prediction method based on VMD-PRBF-ARMA-E model considering wind speed characteristic. Energy Convers. Manag. 2020, 203, 112254. [Google Scholar] [CrossRef]
  31. Zhang, Z.; Hong, W.C. Application of variational mode decomposition and chaotic grey wolf optimizer with support vector regression for forecasting electric loads. Knowl. Based Syst. 2021, 228, 107297. [Google Scholar] [CrossRef]
  32. Liu, Y.; Yang, C.; Huang, K.; Gui, W. Non-ferrous metals price forecasting based on variational mode decomposition and LSTM network. Knowl. Based Syst. 2020, 188, 105006. [Google Scholar] [CrossRef]
  33. Zhang, X.; Tuo, W.; Song, C. Application of MEEMD-ARIMA combining model for annual runoff prediction in the Lower Yellow River. J. Water Clim. Chang. 2020, 11, 865–876. [Google Scholar] [CrossRef] [Green Version]
  34. Hong, T.; Pinson, P.; Fan, S.; Zareipour, H.; Troccoli, A.; Hyndman, R.J. Probabilistic energy forecasting: Global energy forecasting competition 2014 and beyond. Int. J. Forecast. 2016, 32, 896–913. [Google Scholar] [CrossRef] [Green Version]
  35. Liu, H.; Mi, X.; Li, Y. Smart multi-step deep learning model for wind speed forecasting based on variational mode decomposition, singular spectrum analysis, LSTM network and ELM. Energy Convers. Manag. 2018, 159, 54–64. [Google Scholar] [CrossRef]
  36. Keller, K.; Mangold, T.; Stolz, I.; Werner, J. Permutation entropy: New ideas and challenges. Entropy 2017, 19, 134. [Google Scholar] [CrossRef] [Green Version]
  37. Fadlallah, B.; Chen, B.; Keil, A.; Príncipe, J. Weighted-permutation entropy: A complexity measure for time series incorporating amplitude information. Phys. Rev. E 2013, 87, 022911. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Hewage, P.; Behera, A.; Trovati, M.; Pereira, E.; Ghahremani, M.; Palmieri, F.; Liu, Y. Temporal convolutional neural (TCN) network for an effective weather forecasting using time-series data from the local weather station. Soft Comput. 2020, 24, 16453–16482. [Google Scholar] [CrossRef] [Green Version]
  39. Gan, Z.; Li, C.; Zhou, J.; Tang, G. Temporal convolutional networks interval prediction model for wind speed forecasting. Electr. Power Syst. Res. 2021, 191, 106865. [Google Scholar] [CrossRef]
  40. Chen, Y.; Kang, Y.; Chen, Y.; Wang, Z. Probabilistic forecasting with temporal convolutional neural network. Neurocomputing 2020, 399, 491–501. [Google Scholar] [CrossRef] [Green Version]
  41. Qiu, X.P. Neural Networks and Deep Learning; China Machine Press: Beijing, China, 2020. [Google Scholar]
Figure 1. Sample center frequency iteration curves in different modes.
Figure 1. Sample center frequency iteration curves in different modes.
Energies 15 05375 g001
Figure 2. Sample center frequency iteration curves in different penalty factors.
Figure 2. Sample center frequency iteration curves in different penalty factors.
Energies 15 05375 g002
Figure 3. Dilated casual convolution.
Figure 3. Dilated casual convolution.
Energies 15 05375 g003
Figure 4. Residual block.
Figure 4. Residual block.
Energies 15 05375 g004
Figure 5. The whole process of the proposed forecasting model.
Figure 5. The whole process of the proposed forecasting model.
Energies 15 05375 g005
Figure 6. Moving window procedure that obtains the input–output instances.
Figure 6. Moving window procedure that obtains the input–output instances.
Energies 15 05375 g006
Figure 7. Electricity load in New England during 2014.
Figure 7. Electricity load in New England during 2014.
Energies 15 05375 g007
Figure 8. The decomposition results using the VMD algorithm. (a) IMF1; (b) IMF2; (c) IMF3; (d) IMF4; (e) IMF5; (f) IMF6.
Figure 8. The decomposition results using the VMD algorithm. (a) IMF1; (b) IMF2; (c) IMF3; (d) IMF4; (e) IMF5; (f) IMF6.
Energies 15 05375 g008aEnergies 15 05375 g008b
Figure 9. The error series and its distribution on the train set. (a) The error series on the train set. (b) The error series distribution on the train set.
Figure 9. The error series and its distribution on the train set. (a) The error series on the train set. (b) The error series distribution on the train set.
Energies 15 05375 g009
Figure 10. The actual and forecasted error on the train set.
Figure 10. The actual and forecasted error on the train set.
Energies 15 05375 g010
Figure 11. Experimental comparison of different models on the test set.
Figure 11. Experimental comparison of different models on the test set.
Energies 15 05375 g011
Figure 12. Mean absolute error comparison of different models on the test set.
Figure 12. Mean absolute error comparison of different models on the test set.
Energies 15 05375 g012
Figure 13. Boxplot with different forecasting steps. (a) 6 steps; (b) 12 steps; (c) 24 steps.
Figure 13. Boxplot with different forecasting steps. (a) 6 steps; (b) 12 steps; (c) 24 steps.
Energies 15 05375 g013
Table 1. WPE values for different K and α for VMD decomposition of GEFCom dataset.
Table 1. WPE values for different K and α for VMD decomposition of GEFCom dataset.
α = 100 α = 500 α = 1000 α = 2000 α = 3000 α = 4000 α = 5000
K = 20.96750.96800.96860.96310.95220.93550.9978
K = 30.99600.99700.99690.99620.99390.98940.9851
K = 40.73490.73540.72580.69210.90260.68080.8088
K = 50.63750.60070.59210.63270.57730.62360.5776
K = 60.57080.57380.57720.53900.51040.56420.5634
K = 70.56960.56990.56990.56590.56710.52220.5360
K = 80.54100.56870.56670.56650.56700.56440.5411
Table 2. Performance evaluations of different models for the GEFCom datasets.
Table 2. Performance evaluations of different models for the GEFCom datasets.
ModelForecast Steps
6 Steps12 Steps24 Steps
MAPE/
%
RMSE/
kW
R2MAPE/
%
RMSE/
kW
R2MAPE/
%
RMSE/
kW
R2
TCN1.54566.6170.98262.553105.9110.95613.452138.8500.9246
VMD + TCN0.71129.6450.99650.86131.8730.99631.11743.4390.9924
VMD + TCN + EC0.27412.4520.99940.32616.2960.99860.40517.810.9977
Table 3. Performance evaluations of contrast models for the GEFCom datasets.
Table 3. Performance evaluations of contrast models for the GEFCom datasets.
Model6 Steps12 Steps24 Steps
MAPE/
%
RMSE/
kW
R2MAPE/
%
RMSE/
kW
R2MAPE/
%
RMSE/
kW
R2
LSTM1.87582.3250.97373.316135.770.92793.828170.5560.9146
GRU1.88184.1230.97233.244129.2380.93473.924156.9300.9037
TCN1.54566.6170.98262.553105.9110.95613.452138.8500.9246
VMD-LSTM1.35163.9650.98391.85679.2110.97552.801111.8530.9511
VMD-GRU1.37364.0359.98361.73575.8460.97752.949120.5290.9432
VMD-TCN0.71129.6450.99650.86131.8730.99631.11743.4390.9924
Table 4. Execution time of contrast models for the GEFCom datasets.
Table 4. Execution time of contrast models for the GEFCom datasets.
Model6 Steps12 Steps24 Steps
LSTM448.74s424.59s436.17s
GRU374.29s361.78s368.92s
TCN63.17s67.28s65.56s
VMD-LSTM2629.76s2686.79s2647.61s
VMD-GRU2113.85s2241.80s2286.35s
VMD-TCN275.36s280.62s280.09s
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhou, F.; Zhou, H.; Li, Z.; Zhao, K. Multi-Step Ahead Short-Term Electricity Load Forecasting Using VMD-TCN and Error Correction Strategy. Energies 2022, 15, 5375. https://doi.org/10.3390/en15155375

AMA Style

Zhou F, Zhou H, Li Z, Zhao K. Multi-Step Ahead Short-Term Electricity Load Forecasting Using VMD-TCN and Error Correction Strategy. Energies. 2022; 15(15):5375. https://doi.org/10.3390/en15155375

Chicago/Turabian Style

Zhou, Fangze, Hui Zhou, Zhaoyan Li, and Kai Zhao. 2022. "Multi-Step Ahead Short-Term Electricity Load Forecasting Using VMD-TCN and Error Correction Strategy" Energies 15, no. 15: 5375. https://doi.org/10.3390/en15155375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop