Next Article in Journal
Visibility Adaptation in Ant Colony Optimization for Solving Traveling Salesman Problem
Next Article in Special Issue
Fluid–Structure Interaction Modeling of Structural Loads and Fatigue Life Analysis of Tidal Stream Turbine
Previous Article in Journal
On the Characterization of a Minimal Resolving Set for Power of Paths
Previous Article in Special Issue
Study on Failure Characteristics and Control Technology of Roadway Surrounding Rock under Repeated Mining in Close-Distance Coal Seam
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Stacking Learning Model Based on Multiple Similar Days for Short-Term Load Forecasting

1
School of Information Engineering, Nanchang University, Nanchang 330031, China
2
Department of Systems and Computer Engineering, Carleton University, Ottawa, ON K1S 5B7, Canada
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2022, 10(14), 2446; https://doi.org/10.3390/math10142446
Submission received: 30 May 2022 / Revised: 30 June 2022 / Accepted: 5 July 2022 / Published: 13 July 2022

Abstract

:
It is challenging to obtain accurate and efficient predictions in short-term load forecasting (STLF) systems due to the complexity and nonlinearity of the electric load signals. To address these problems, we propose a hybrid predictive model that includes a sliding-window algorithm, a stacking ensemble neural network, and a similar-days predictive method. First, we leverage a sliding-window algorithm to process the time-series electric load data with high nonlinearity and non-stationarity. Second, we propose an ensemble learning scheme of stacking neural networks to improve forecasting performance. Specifically, the stacking neural networks contain two types of networks: the base-layer and the meta-layer networks. During the pre-training process, the base-layer network integrates a radial basis function (RBF), random vector functional link (RVFL), and backpropagation neural network (BPNN) to provide a robust predictive model. The meta-layer network utilizes a deep belief network (DBN) and the improved broad learning system (BLS) to enhance predictive accuracy. Finally, the similar-days prediction method is developed to extract the relationship of electric load data in different time dimensions, further enhancing the robustness and accuracy of the model. To demonstrate the effectiveness of our model, it is evaluated using real data from five regions of the United States in three consecutive years. We compare our method with several state-of-the-art and conventional neural-network-based models. Our proposed algorithm improves the prediction accuracy by 16.08%, 16.83%, and 22.64% compared to DWT-EMD-RVFL, SWT-LSTM, and EMD-BLS, respectively. Empirical results demonstrate that our model achieves better accuracy and robustness compared with the baselines.

1. Introduction

Power load forecasting is essential to power system planning [1]. Since it is challenging to store electric energy, an accurate load forecasting algorithm is critical for efficient power consumption and the security of the power grid [2]. The load forecasting task can be classified as long-term, medium-term, or short-term based on the time span of the load forecasting. For medium-term and long-term load forecasting, they are mainly used to develop long-term power generation plans. Due to the short interval of STLF, it can be used to adjust the operation mode of the power grid and promote the stable operation of the power system.
In past decades, various forecasting methods have been proposed to tackle STFL. They can be classified into two categories: One includes statistical models such as autoregressive moving average (ARMA) [3,4] and linear regression (LR) [5]. The other includes machine learning methods including support-vector regression (SVR) [6], backpropagation neural networks (BPNNs), deep neural networks (DBNs) [7], broad learning systems (BLSs) [8], random vector functional link (RVFL) [9], and long short-term memory (LSTM) [10]. Due to the nonlinear and non-stationary characteristics of short-term power load data, statistical methods cannot effectively process such characteristic information. Therefore, machine learning has gradually become the mainstream STLF method, which can effectively extract features from nonlinear time series and provide an effective connection between input and output.
Recently, machine learning has achieved remarkable results in load forecasting [11]. Artificial neural networks (ANNs) are one of the most popular methods [12], which can simulate human brain behavior by training and learning ways to obtain the relationship between input and output. Deep learning methods such as LSTM and DBN have powerful nonlinear data processing capabilities, and are also very popular methods [7,10]. However, a deep neural network requires a lot of computational costs, as it depends on a large amount of training data. To save computational costs, a new single-layer incremental neural network BLS is gradually obtained.
More recently, various hybrid models have been developed to effectively improve the predictive accuracy of STFL. This is because the hybrid models can integrate the advantages of each model to solve the limitations of each mode by weighted combination. Chen et al. [13] presented a new combination model to enhance power load forecasting. In [14,15], multiple artificial neural network models are integrated to improve predictive performance. In [16], artificial neural networks are integrated to improve the accuracy of STLF with a new evolutionary method. In [17], support-vector machines with ant colony optimization are combined to improve the performance of power load forecasting.
To tackle the complexity and nonlinearity of electric signals, various hybrid predictive frameworks have been developed by combining decomposition methods with neural networks. Nengling et al. [18] proposed dividing the load data into different resolutions by wavelet transform and applying different combination forecasting methods based on statistical models to each scale. Ghayekhloo et al. [19] and Ghofrani et al. [20] both used wavelet transform to convert the load data into multiple frequency components. Subsequently, they trained multiple artificial neural networks on the data by linking the weighted outputs of the trained networks in the STLF task. Qiu et al. [7] introduced integrated deep learning based on empirical mode decomposition for load-demand time-series forecasting. Laouafi et al. [21] combined traditional methods and intelligent methods for STLF.
The above hybrid predictive models achieve promising prediction results, as in [22,23]; however, they still cannot solve the following problems:
  • The existing popular empirical mode decomposition [24] often has the problem of modal aliasing. Furthermore, the difficulty of wavelet decomposition [25] lies in how to effectively select the wavelet basis and decomposition scale. In addition, the decomposition methods may introduce some redundant decomposition information to the predictive models, degenerating the predictive computational cost.
  • Each machine learning method—such as LSTM, DBN, and BLS—has its own specific limitations, which may influence its predictive performance in STLF.
  • The selection of the dataset is also a challenging problem. Generally, continuous time series are used, and are divided into training and test sets. This approach can lead to ineffective extraction of correlations between continuous time series and, therefore, may result in lower accuracy of model predictions.
To address the above problems, we propose an improved hybrid predictive model, which includes a sliding-window algorithm, a stacking ensemble neural network model, and a similar-days predictive method. Specifically, a sliding-window algorithm [26] is first introduced to directly process the nonlinearity and non-stationarity of the time-series electric load data. This method effectively mines spatiotemporal features of the time series. Furthermore, a stacking ensemble neural network model is proposed to improve the forecasting performance. The stacking neural networks contain two types of networks: the base-layer network and the meta-layer network. During the pre-training process, the base-layer networks integrate radial basis function (RBF), random vector functional link (RVFL), and backpropagation neural network (BPNN) to provide a robust predictive model; the meta-layer networks utilize a deep belief network (DBN) and the improved broad learning system (BLS) to improve predictive accuracy; the predictive results of RBF, RVFL, BPNN, DBN, and BLS are rationally weighted to obtain the final prediction result. Finally, the similar-days prediction method is developed to extract the relationship of electric load data in different time dimensions, further enhancing the robustness and accuracy of the model. This paper selects the load data of five regions in the United States for three consecutive years to conduct a large number of experiments, proving that the framework has high prediction accuracy and strong robustness.
The main contributions of this paper are as follows:
  • The sliding-window algorithm is an effective method for extracting the spatiotemporal characteristics of load data, which are used to reduce computational costs and improve prediction accuracy.
  • The stacking neural network is proposed to greatly improve the prediction accuracy.
  • The similar-days predictive method is developed to extract the relationship of electric load data in different time dimensions, further enhancing the robustness and accuracy of the model.
The rest of this paper is organized as follows: Section 2 introduces the framework of the proposed model and the theoretical knowledge of interest; Section 3 shows the data analysis; Section 4 introduces the details of the case analysis; Section 5 concludes the paper.

2. Methodology

2.1. Model Framework

We show the overall framework of our proposed method in Figure 1. It contains four parts: (A) data preprocessing, (B) base learners, (C) data processing, and (D) meta-learners. And the detailed pseudo-code of the stacking algorithm is given in Algorithm 1. The proposed model can be described as follows:
Part A: The collected power load data are divided into three parts: (1) the training set, which is used as training data; (2) the validation set, which is used for weight adjustment between meta-learners; and (3) the test set, which is used for evaluation and error analysis. The time series in each of the three parts are input into the sliding-window algorithm to obtain the reconstructed multidimensional matrix H as the data input matrix for the base learner (refer to Section 2.2 for more details).
Part B: The base-layer learning system consists of RBF, BPNN, and RVFL networks for preliminary training (refer to Section 2.3 for more details).
Part C: During data processing, the prediction data of the base-layer learner are recombined as the input of the meta-layer learner (refer to Section 2.3 for more details).
Part D: The meta-layer learning system involves two neural networks: DBN and BLS–BP. BLS–BP is an improved network that is first used in load forecasting (refer to Section 2.3 for more details).
Algorithm 1: Stacking: Based on sliding window
1:
Input: training data
D = { a i , b i } i = 1 m ( x i R n , y i Y )
2:
Step 1: analysis with sliding window
3:
Reconstruct D   to   H
4:
end for
5:
Step 2: learn base-layer learners
6:
forn: 1 to Ndo
7:
Learn a base learner S n   based   on   H
8:
end for
9:
Step 3: construct new datasets from H
10:
fori: 1 to mdo
11:
Construct a new dataset that contains
H s = { x i 1 , y i } ,
where x i 1 = { s 1 ( x i ) , s 2 ( x i ) , , s N ( x i ) }
12:
end for
13:
Step 4: learn meta-layer learners
14:
fort: 1 to Tdo
15:
Learn a meta-learner S t   based   on   H s
16:
end for
17:
Return S ( x ) = s 1 { s 1 ( x ) , s 2 ( x ) , , s N ( x ) }
18:
Output: ensemble learner S

2.2. Sliding-Window Algorithm

The principle of the sliding-window algorithm is to reconstruct the original power load data into a multidimensional matrix H by sliding the window. When training and validating the model, the reconstruction matrix H includes both the input and output data (also known as training data and label data, respectively). In the evaluation step, only training data are in the reconstruction matrix H. Figure 2 and Figure 3 illustrate the reconstruction process. In each window slot, three components are included, namely, input data X, output data Y, and delay time T. The window is slid to remove the data at the beginning of the previous window, and then the same amount of new data is added at the end of the window to ensure that the window size is constant. The sliding window will go through the entire dataset until all of the data are covered.

2.3. Stacking Algorithm

2.3.1. Algorithm Structure

Figure 4 and Figure 5 show the framework structure of the stacking algorithm; the specific process is as follows:
  • First, we leverage the sliding-window method to perform data preprocessing on the original data and obtain a new training set, validation set, and test set. The training set is divided into n parts: {Train(i) | i = 1, 2, …, n}, where n is the number of folds in cross-validation (see Section 3.3 for details).
  • Model training: We choose RBF, BPNN, and RVFL as the base-layer learners. After each model is pre-trained, we train the base learners with {Train(i)| i = 1, 2, …, n}, in turn. We make the base learners well-trained with the i-folder cross-validation method, as shown in Figure 5. We generate an intermediate dataset A by vertically merging the n predictions of the base learners. Specifically, we name the datasets generated by RBF, BPNN, and RVFL as A(1), A(2), and A(3), respectively, and merge these three datasets horizontally to obtain A(x). We use DBN and BLS–BP as our meta-learners and train the meta-learners on A(x). We repeat the above process for the validation set and the test set using the well-trained base learners to generate two intermediate datasets B(x) and C(x), where x∈{1,2,3}.
  • Weight adjustment between meta-learners: We obtain a new dataset B(x) from the predictions of the base learners on the validation set, and utilize B(x) to adjust the weights between the meta-learners. The weights between the two meta-learners are updated according to the error between the predictions and the labels.
  • Model Evaluation: We forward the test set to the well-trained base learners to obtain C(x), and forward C(x) to the well-trained meta-learners to make predictions. Finally, the optimal weights are used to obtain a weighted average of the predicted values of the two meta-learners to obtain the final prediction results. The model is evaluated by the error between the final prediction result and the actual value.

2.3.2. Base-Layer Network

The base-layer network consists of a radial basis function (RBF), random vector functional link (RVFL), and backpropagation neural network (BPNN), to provide a robust predictive model for STLF.
The RBF [27] is composed of three layers: The first layer is the input layer, which takes the signal source as input. The second layer is the hidden layer, whose transformation function is the radial basis function. The non-negative transformation function is linear, symmetric, and attenuated. The third layer is the output layer, which responds to the input mode. The output layer leverages a linear optimization strategy to fine-tune the linear weight between the hidden layer and the output layer.
RVFL [9] is a neural network based on the learning paradigm. RVFL is more efficient than the conventional iterative learning neural network. This feedforward structure can be regarded as a linear combination of a fixed number of nonlinear expansions of the original inputs. RVFL contains three layers: the input layer, enhanced node layer, and output layer. The principle of RVFL is to use the augmented nonlinear kernel raw data learned at the implicit layer to improve the generalization ability. The neural network has a direct connection from the input layer to the output layer, which is helpful to map the relationship between input and output. It is very suitable for the characteristics of our selected basic learner.
The BPNN [28] is the most basic supervised learning neural network. Its output is rendered by the forward propagation, and the errors are carried out in one-way propagation. The BPNN contains three layers: the input layer, the hidden layer, and the output layer. Specifically, the input of the hidden layer is the output of the input layer. Then, the hidden layer applies an activation function to the hidden features, and the output of the hidden layer is forwarded to the output layer to generate the output results.
The partial derivative gradient descent method is used to obtain the minimum value of the cost function so that the error between the expected value and the output is reduced as much as possible.

2.3.3. Meta-Layer Network

The meta-layer network applies a deep belief network (DBN) and the improved broad learning system (BLS) to improve predictive accuracy.
The DBN [7] is a deep neural network model composed of a stacked RBM and a layer of BP network, and it is also a current mainstream neural network. The structure is shown in Figure 6. The training process of the DBN includes two steps:
Step 1: Pre-training. The pre-training process involves training each layer of the RBM network in an unsupervised manner. The aim is to keep sufficient feature information when the features are mapped to different feature spaces. The overall training process includes three steps: (1) train the first RBM until convergence; (2) freeze the weight and bias of the well-trained RBM and take the state of its hidden layer as the input of the second RBM; and (3) stack the two RBM models after the second RBM is converged. We repeat the above three steps until the whole network is converged.
Step 2: Fine-tuning. In the fine-tuning step, we set up a supervised network in the last layer of a DBN model. The model takes the output of the RBM as input, and trains the entity relationship in a supervised manner. In addition, the backpropagation process propagates the error information to each RBM model, and fine-tunes the parameters in the DBN network.
As shown in Figure 7, the BLS [8] consists of four parts: input, feature node, enhancement node, and output. In fact, the network performance of the BLS after two training steps is insufficient. We establish the links between the output and input of the network, and fine-tune it by backpropagation. Based on this idea, this paper designs an improved BLS variant, namely, BLS–BP.
After the load data are trained by the BLS, the error in the output layer is propagated to the input layer for fine-tuning by backpropagation. Then, we calculate the gradient based on the error, and leverage the gradient to update the weights and biases. The training step is stopped if certain conditions are met. We can set the maximum number of iterations or calculate the prediction accuracy of the training set on the network, and stop training after reaching a certain threshold. The training process of the BLS can be viewed as the weight initialization of a BP network, which can help the network get rid of the local optima and shorten the training time. Here, a detailed pseudo-code for the BLS-BP is given in Algorithm 2.
Algorithm 2: Broad Learning: Increment of the backpropagation neural network
Input: training samples X;
Output: the weight matrix between feature nodes, W;
Parameter setting: Z n   ( the   feature   mapping   group ) ; H m (the enhancement nodes group); W (the output weight of the BLS); E (the condition for stopping iteration);
1:
for i = 0 ;   i     n  do
2:
Random W e i ,   β e i ;
3:
Calculate Z i = [ φ ( U W e i + β e i   ) ] ;
4:
end
5:
Set Z n = [ Z 1 , ,   Z n ] ;
6:
for j = 1 ;   j     m  do
7:
Random W h j , β h j ;
8:
Calculate H i = [ ξ j ( P W h j + β h j ) ] ;
9:
end
10:
Set H m = [ H 1 ,   H 2 , ,   H m ] ;
11:
Set Y = [ Z 1 , Z 2 , , Z n   | H 1 ,   H 2 , ,   H m ] ;   W = [ P | H m ] W ;
12:
Calculate E p = f · ( Y p ) · ( d p Y p ) ;
13:
while E p > E do
14:
Return W
15:
Calculate W ( n ) = W + ŋ · E p · Y p   ;
16:
Update Y ( n ) = [ Z 1 , Z 2 , , Z n   | H 1 ,   H 2 , ,   H m ]   ; W n = [ P | H m ] W ( n ) ;
17:
W = W ( n ) ;
18:
n = n + 1 ;
19:
end
20:
Repeat steps 12–19
21:
ExportW

3. Numerical Analysis

3.1. Datasets

To demonstrate the effectiveness and robustness of our model, we conducted experiments on five datasets collected in the United States between 2017 and 2019. All datasets were from five regions in the US, and were called CAPITL, CENTRL, DUNWOD, GENESE, and HUDVL. The dataset sampling interval was 30 min, meaning that one day covers 48 load data samples. Figure 8 shows the load data in March to demo the training mode of similar-days [29] prediction in this paper. We selected the similar-days period data of different years to divide the training set, validation set, and test set. Before starting the experiment, we normalized the sample data to the range [0, 1] to eliminate the dominant effect of those data with large values.
The normalization formula was as follows:
y ^ m = y m a x y m y m a x y m i n  
where y ^ m is the normalized value, y m represents the actual load data, y m a x is the maximum value of the load data, and y m i n is the minimum value of the load data.

3.2. Evaluation Criterion

To effectively evaluate the predictive performance of our proposed model in STLF, we used the following two evaluation criteria: the root-mean-square error (RMSE) [30] and the mean absolute percentage error (MAPE) [31]. They are defined as follows:
RMSE = 1 M m = 1 M ( y ^ m y m ) 2  
MAPE = 100 % M m = 1 M | y ^ m y m y m |  
where y ^ m represents the prediction data, y m represents the actual load data, and M is the size of the dataset. For both of the evaluation criteria, a smaller value indicates better performance of the models.

3.3. Parameter Settings

We performed hyperparameter exploration before we started our formal experiment. In our data pre-processing, the window size of the sliding-window algorithm was critical to the final performance. To ensure the optimal window size, we applied the controlled variable method, and adopted RMSE and MAPE as the evaluation metrics. According to the test results in Figure 9, we can see that our model has the smallest RMSE and MAPE when the sliding window takes the value of 96, indicating that the model has the best prediction performance. Since there are 48 samples of load data in a day, and the size of the window needs to be an integer number of days, we need to consider the practical significance of the window size representing a specific time interval. Therefore, we determined the optimal window size for the sliding window to be 96.
In the cross-validation (CV) module [32], an error experiment was carried out to filter out the number of CV folds with the smallest error. We utilized MAPE as the evaluation metric, and plotted the figure of error lines with regard to the power load in the spring for CAPITL. As shown in Figure 10, when the CV is too small, the empirical error is large and the model is not robust. When the CV is too large, the experimental error is not reduced, although the computational effort increases greatly. We set CV to 12 as a compromise of the computational effort and the model prediction performance. Table 1 shows the optimal parameter settings of the machine learning methods used for comparison after many experiments.

4. Case Study

To demonstrate the effectiveness and robustness of our model, we extensively compared the performance of the proposed model with several baselines. Among the baselines, five approaches are single components of our stacking model: RBFNN, BPNN, RVFL, DBN, BLS–BP, and four models are state-of-the-art in STLF: DWT-EMD-RVFL [9], SWT-LSTM [33], EMD-BLS [8], and EMD-EDBN [7]. We conducted four sets of experiments to show the superiority of our model against the baselines. Our experiments were implemented in MATLAB R2021b (which is produced by US based MathWorks, Inc., Natick, MA, USA) on a laptop equipped with Intel(R) Core (TM) i7-9750H CPU @ 2.60 GHz 2.59 GHz.
  • Ablation study: In this experiment, we compared our stacking model with the five components of our model, and verified that the proposed model can outperform all the baselines (see Section 4.1).
  • Compare to other ensemble models: In this experiment, we demonstrated the effectiveness of the stacked meta-learners in our model against the baseline models with a single meta-learner. We adopted the same base-layer learners in our model and the baseline models (see Section 4.2).
  • Compare to state-of-the-art models: We compared our model with other state-of-the-art models, and demonstrated that our model outperforms other baseline models (see Section 4.3).
  • Comparison of computation times between models: We compared the computation times required for each case based on the spring load data in HUDVL (see Section 4.4).
  • Heavy load test: In this experiment, we repeated the above three experiments on the data collected on a special holiday. The high demand for electricity on the holiday leads to a heavy power load, and increases the uncertainty of the power load (see Section 4.5).

4.1. Ablation Experiment between Single Models and Hybrid Models

In this experiment, we took five single machine learning methods as baselines, and the results are shown in Table 2. We emphasize the prediction results of our model using the grey background. We can observe that our model outperforms the baselines in each sub-dataset. Specifically, our model can achieve an outstanding performance even when the power load time series is nonlinear and non-stationary (see CAPITL and CENTRL for example). Although DBN can already make good predictions, the improved BLS proposed in this article even has a prediction error less than that of the DBN in many cases. The results demonstrate that the regression-based BLS has an effective predictive ability. In addition, our model significantly outperforms other baseline models. In Figure 11, we can also observe that the proposed method has the lowest MAPE, demonstrating the effectiveness and robustness of our model. The proposed method achieves the best performance in all datasets and forecasting steps. The forecast curves of the various methods on Christmas Day are shown in Section 4.5.

4.2. Comparison with Other Ensemble Models

Based on the base learners, we utilized DBN and BLS networks as our meta-learners. We further applied the backpropagation algorithm in the BLS network. We renamed the DBN and improved BLS as S-DBN and S-BLS, respectively. The error results are shown in Table 3. From the table, we can observe that the accuracy and stability of the stacked model outperform those of a single model. However, the performance is still insufficient compared to the algorithm proposed in this article, which shows the feasibility of our proposed algorithm. The forecast curves of various methods on Christmas Day are shown in Section 4.5.

4.3. Comparison with Other Hybrid Models

To date, a variety of hybrid models have been proposed for short-term load forecasting. We took four models as the baselines for our model: DWT-EMD-RVFL [9], SWT-LSTM [33], EMD-BLS [8], and EMD-EDBN [7]. The empirical results show a similar trend with previous experimental results: the proposed model outperforms the hybrid models for each dataset in all forecasting horizons. The respective error experiments are shown in Table 4. We can observe that the EMD-EDBN model has the worst results for all datasets in all forecasting horizons. Figure 12 shows that our method has the smallest errors on each sub-dataset, demonstrating the effectiveness and robustness of our model. The forecast curves of various methods on Christmas Day are shown in Section 4.5.

4.4. Comparison of Computation Times between Models

Considering that the computational effort should be considered for the prediction performance evaluation of the models, the computation times required for each case are discussed in this section. We selected the computation times of each model in the case of predicting the spring load data in HUDVL, and the computation times for each model are shown in Table 5. As shown in Table 5, the proposed model has a computation time of 83.928 s. Although the proposed model has a longer computation time than most of the individual comparative models, it significantly outperforms the other comparative models in terms of prediction performance, and the time cost of implementation is within acceptable limits. In addition, the experimental results show that the proposed model has a shorter computation time and better prediction performance than a single complex SWT-LSTM model.

4.5. Model Performance Analysis on a Heavy Load Test

This section selects a special day in the United States—Christmas—to analyze the performance of the model. Figure 13a–c show the prediction curves of three groups of comparative experiments on Christmas Day. The figures show that although other models can effectively predict the load in some regions where the original electric load increases and decreases steeply, there are still large errors in the prediction results in the regions where the original electric load curve fluctuates widely at the peaks and valleys. In contrast, the load prediction curve of the hybrid network proposed in this paper can fit the original power load curve well. It can predict well even in some areas with large fluctuations of the original power load curve, as well as peaks and troughs. This shows that the method proposed in this paper has strong robustness.

4.6. Discussion

The empirical results demonstrate that our model can achieve promising performance, and that our model is more robust than the baseline models for the STLF task. The high-level intuition is that the sliding-window algorithm can smooth the nonlinearity and non-stationarity of the power load data series. In addition, the proposed stacking method can effectively combine multiple neural networks to improve the prediction performance of this method. The forecasting accuracy is further improved by the improved BLS. In addition, the similar-days prediction method is developed for extracting the relationship of electric load data in different time dimensions, proving the robustness of the model.

5. Conclusions

This paper proposes a novel ensemble learning framework for short-term load forecasting. The proposed forecasting framework employs the sliding-window technique to deal with the time-series electric load data. After that, the data are processed in a similar-time prediction method after the training of ensemble learning. Finally, the proposed model is compared with individual neural network models, other ensemble models, and hybrid models. Error analysis is obtained based on MAPE and RMSE evaluation criteria.
In conclusion, the proposed model has advantages in robustness and effectiveness. (i) Robustness: The regression-based broad learning system can achieve outstanding performance when tackling the noise and outliers. The proposed model stacks sub-models to obtain a stacking ensemble model. Our empirical results show that our proposed stacking method can outperform its sub-component, demonstrating the robustness of our model. (ii) Effectiveness: The proposed model can outperform the four existing hybrid baseline models and the sub-component model. The experimental results show that our model can achieve significantly better results than the baseline models because it has a rational framework and design.
However, there are still some limitations to the proposed work. For example, our proposed stacking model has a higher computational cost compared to other individual prediction models. In the future, we plan to explore better model selection for the base- and meta-learners in the stacking approach to tackle this issue, and we intend to use optimization methods to tune the hyperparameters of the prediction models to obtain better prediction accuracy. In addition, we are also interested in investigating electricity load forecasting for individual household customers with higher volatility to validate the robustness and accuracy of our proposed model.

Author Contributions

Conceptualization, Q.J., Y.C. and C.L.; methodology, Y.C.; software, Y.C. and H.L.; validation, Q.J., Y.C. and H.L.; formal analysis, H.L.; investigation, Y.C.; resources, C.L. and P.X.L.; data curation, Y.C.; writing—original draft preparation, Q.J. and Y.C.; writing—review and editing, Q.J. and C.L.; visualization, Q.J. and Y.C.; supervision, C.L. and P.X.L.; project administration, C.L. and P.X.L.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under grants 61863028, 62173176, 81660299, and 61503177, and in part by the Science and Technology Department of Jiangxi Province of China under grants 20204ABC03A39, 20161ACB21007, 20171BBE50071, and 20171BAB202033.

Institutional Review Board Statement

This study did not involve humans or animals.

Informed Consent Statement

This study did not involve humans.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

ANNArtificial neural network
ARMAAutoregressive moving average
BLSBroad learning system
BLS–BPBroad learning system–backpropagation
BPNNBackpropagation neural network
CVCross-validation
DBNDeep belief network
DWTDiscrete wavelet transform
EMDEmpirical mode decomposition
EDBNEnsemble DBN
LRLinear regression
LSTMLong short-term memory
MAPEMean absolute percentage error
RBFRadial basis function
RBMRestricted Boltzmann machine
RVFLRandom vector functional link
RMSERoot-mean-square error
S-BLSSelected BLS
S-DBNSelected DBN
STLFShort-term load forecasting
SVRSupport-vector regression
SWTStationary wavelet transform

References

  1. Simoglou, C.K.; Biskas, P.N. Assessment of the impact of the National Energy and Climate Plan on the Greek power system resource adequacy and operation. Electr. Power Syst. Res. 2021, 194, 107113. [Google Scholar] [CrossRef]
  2. Li, X.; Jiang, T.; Liu, G.; Bai, L.; Cui, H.; Li, F. Bootstrap-based confidence interval estimation for thermal security region of bulk power grid. Int. J. Electr. Power Energy Syst. 2020, 115, 105498. [Google Scholar] [CrossRef]
  3. Moon, J.; Hossain, M.B.; Chon, K.H. AR and ARMA model order selection for time-series modeling with ImageNet classification. Signal Process. 2021, 183, 108026. [Google Scholar] [CrossRef]
  4. Dosiek, L. The Effects of Forced Oscillation Frequency Estimation Error on the LS-ARMA+S Mode Meter. IEEE Trans. Power Syst. 2020, 35, 1650–1652. [Google Scholar] [CrossRef]
  5. Prion, S.K.; Haerling, K.A. Making Sense of Methods and Measurements: Simple Linear Regression. Clin. Simul. Nurs. 2020, 48, 94–95. [Google Scholar] [CrossRef]
  6. Alamdar, F.; Mohammadi, F.S.; Amiri, A. Twin Bounded Weighted Relaxed Support Vector Machines. IEEE Access 2019, 7, 22260–22275. [Google Scholar] [CrossRef]
  7. Qiu, X.; Ren, Y.; Suganthan, P.N.; Amaratunga, G.A.J. Empirical Mode Decomposition based ensemble deep learning for load demand time series forecasting. Appl. Soft Comput. 2017, 54, 246–255. [Google Scholar] [CrossRef]
  8. Zhu, L.; Lian, C. Wind Speed Forecasting Based on a Hybrid EMD-BLS Method. In Proceedings of the Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2191–2195. [Google Scholar]
  9. Qiu, X.; Suganthan, P.N.; Amaratunga, A.J.G. Ensemble Incremental Random Vector Functional Link Network for Short-term Crude Oil Price Forecasting. In Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 18–21 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1758–1763. [Google Scholar]
  10. Shahid, F.; Zameer, A.; Muneeb, M. A novel genetic LSTM model for wind power forecast. Energy 2021, 223, 120069. [Google Scholar] [CrossRef]
  11. Potočnik, P.; Škerl, P.; Govekar, E. Machine-learning-based multi-step heat demand forecasting in a district heating system. Energy Build. 2021, 233, 110673. [Google Scholar] [CrossRef]
  12. Xu, C.; Gordan, B.; Koopialipoor, M.; Armaghani, D.J.; Tahir, M.M.; Zhang, X. Improving Performance of Retaining Walls under Dynamic Conditions Developing an Optimized ANN Based on Ant Colony Optimization Technique. IEEE Access 2019, 7, 94692–94700. [Google Scholar] [CrossRef]
  13. Chen, G.J.; Li, K.K.; Chung, T.S.; Sun, H.B.; Tang, G.Q. Application of an innovative combined forecasting method in power system load forecasting. Electr. Power Syst. Res. 2001, 59, 131–137. [Google Scholar] [CrossRef]
  14. Chen, Y.; Luh, P.B.; Guan, C.; Zhao, Y.; Michel, L.D.; Coolbeth, M.A.; Friedland, P.B.; Rourke, S.J. Short-Term Load Forecasting: Similar Day-Based Wavelet Neural Networks. IEEE Trans. Power Syst. 2010, 25, 322–330. [Google Scholar] [CrossRef]
  15. Matrenin, P.V.; Manusov, V.Z.; Khalyasmaa, A.I.; Antonenkov, D.V.; Eroshenko, S.A.; Butusov, D.N. Improving Accuracy and Generalization Performance of Small-Size Recurrent Neural Networks Applied to Short-Term Load Forecasting. Mathematics 2020, 8, 2169. [Google Scholar] [CrossRef]
  16. Singh, P.; Dwivedi, P. Integration of new evolutionary approach with artificial neural network for solving short term load forecast problem. Appl. Energy 2018, 217, 537–549. [Google Scholar] [CrossRef]
  17. Niu, D.; Wang, Y.; Wu, D.D. Power load forecasting using support vector machine and ant colony optimization. Expert Syst. Appl. 2010, 37, 2531–2539. [Google Scholar] [CrossRef]
  18. Nengling, T.; Stenzel, J.; Hongxiao, W. Techniques of applying wavelet transform into combined model for short-term load forecasting. Electr. Power Syst. Res. 2006, 76, 525–533. [Google Scholar] [CrossRef]
  19. Ghayekhloo, M.; Menhaj, M.B.; Ghofrani, M. A hybrid short-term load forecasting with a new data preprocessing framework. Electr. Power Syst. Res. 2015, 119, 138–148. [Google Scholar] [CrossRef]
  20. Ghofrani, M.; Ghayekhloo, M.; Arabali, A.; Ghayekhloo, A. A hybrid short-term load forecasting with a new input selection framework. Energy 2015, 81, 777–786. [Google Scholar] [CrossRef]
  21. Laouafi, A.; Mordjaoui, M.; Haddad, S.; Boukelia, T.E.; Ganouche, A. Online electricity demand forecasting based on an effective forecast combination methodology. Electr. Power Syst. Res. 2017, 148, 35–47. [Google Scholar] [CrossRef]
  22. Wang, G.; Wang, X.; Wang, Z.; Ma, C.; Song, Z. A VMD–CISSA–LSSVM Based Electricity Load Forecasting Model. Mathematics 2022, 10, 28. [Google Scholar] [CrossRef]
  23. Che, J.; Wang, J. Short-term load forecasting using a kernel-based support vector regression combination model. Appl. Energy 2014, 132, 602–609. [Google Scholar] [CrossRef]
  24. Zhang, T.; Zhang, Y.; Sun, H.; Shan, H. Parkinson disease detection using energy direction features based on EMD from voice signal. Biocybern. Biomed. Eng. 2021, 41, 127–141. [Google Scholar] [CrossRef]
  25. Hu, M.; Wang, G.; Ma, K.; Cao, Z.; Yang, S. Bearing performance degradation assessment based on optimized EWT and CNN. Measurement 2021, 172, 108868. [Google Scholar] [CrossRef]
  26. Cheng, Y.; Le, H.; Li, C. A Decomposition-Based Improved Broad Learning System Model for Short-Term Load Forecasting. J. Electr. Eng. Technol. 2022. [Google Scholar] [CrossRef]
  27. Liang, L.; Guo, W.; Zhang, Y.; Zhang, W.; Li, L.; Xing, X. Radial Basis Function Neural Network for prediction of medium-frequency sound absorption coefficient of composite structure open-cell aluminum foam. Appl. Acoust. 2020, 170, 107505. [Google Scholar] [CrossRef]
  28. Hu, Y.; Li, J.; Hong, M.; Ren, J.; Lin, R.; Liu, Y.; Liu, M.; Man, Y. Short term electric load forecasting model and its verification for process industrial enterprises based on hybrid GA-PSO-BPNN algorithm—A case study of papermaking process. Energy 2019, 170, 1215–1227. [Google Scholar] [CrossRef]
  29. Sun, G.; Jiang, C.; Cheng, P.; Liu, Y.; Wang, X.; Fu, Y.; He, Y. Short-term wind power forecasts by a synthetical similar time series data mining method. Renew. Energy 2018, 115, 575–584. [Google Scholar] [CrossRef]
  30. Mentaschi, L.; Besio, G.; Cassola, F.; Mazzino, A. Problems in RMSE-based wave model validations. Ocean. Model. 2013, 72, 53–58. [Google Scholar] [CrossRef]
  31. Jahan, S.; Riley, I.; Walter, C.; Gamble, R.F.; Pasco, M.; McKinley, P.K.; Cheng, B.H.C. MAPE-K/MAPE-SAC: An interaction framework for adaptive systems with security assurance cases. Future Gener. Comput. Syst. 2020, 109, 197–209. [Google Scholar] [CrossRef]
  32. Hirano, K.; Wright, J.H. Analyzing cross-validation for forecasting with structural instability. J. Econ. 2021, 226, 139–154. [Google Scholar] [CrossRef]
  33. Yan, K.; Li, W.; Ji, Z.; Qi, M.; Du, Y. A Hybrid LSTM Neural Network for Energy Consumption Forecasting of Individual Households. IEEE Access 2019, 7, 157633–157642. [Google Scholar] [CrossRef]
Figure 1. Framework of the proposed model.
Figure 1. Framework of the proposed model.
Mathematics 10 02446 g001
Figure 2. Flowchart of the sliding window.
Figure 2. Flowchart of the sliding window.
Mathematics 10 02446 g002
Figure 3. The reconstructed matrix.
Figure 3. The reconstructed matrix.
Mathematics 10 02446 g003
Figure 4. Algorithm structure.
Figure 4. Algorithm structure.
Mathematics 10 02446 g004
Figure 5. Cross-validation in the training process of the base learners.
Figure 5. Cross-validation in the training process of the base learners.
Mathematics 10 02446 g005
Figure 6. Frame structure of the DBN.
Figure 6. Frame structure of the DBN.
Mathematics 10 02446 g006
Figure 7. Frame structure of the BLS.
Figure 7. Frame structure of the BLS.
Mathematics 10 02446 g007
Figure 8. Dataset for March.
Figure 8. Dataset for March.
Mathematics 10 02446 g008
Figure 9. The corresponding error of window size.
Figure 9. The corresponding error of window size.
Mathematics 10 02446 g009
Figure 10. Corresponding error of CV folding number.
Figure 10. Corresponding error of CV folding number.
Mathematics 10 02446 g010
Figure 11. Stacked evaluation indicators of each model.
Figure 11. Stacked evaluation indicators of each model.
Mathematics 10 02446 g011
Figure 12. MAPEs of the hybrid models.
Figure 12. MAPEs of the hybrid models.
Mathematics 10 02446 g012
Figure 13. (a) The prediction curve of the comparison with separate neural network models; (b) the prediction curve of the comparison with other ensemble models; (c) the prediction curve of the comparison with other hybrid models.
Figure 13. (a) The prediction curve of the comparison with separate neural network models; (b) the prediction curve of the comparison with other ensemble models; (c) the prediction curve of the comparison with other hybrid models.
Mathematics 10 02446 g013aMathematics 10 02446 g013b
Table 1. Comparison of method parameter settings.
Table 1. Comparison of method parameter settings.
ModelOptimal Parameters
BPNN n h = 200 ,   m i = 10 ,   a f = S i g m o i d
DBN n h = 10 ,   e t a = 0.001 ,   a f = S i g m o i d ,   r b = 1 ,   v m = 0.01 ,   m i = 20
RBFNN f R B F = G a u s s i a n ,   s R B F = 50
RVFL n e = 10000, a f = S i g m o i d ,   D L = true ,   r m = G a u s s i a n
EMD-BLS n f = 24 ,   n e = 15
SWT-LSTM n h = 200 ,   e t a = 0.01
DWT-EMD-RVFL n e = 10000, a f = S i g m o i d ,   D L = true ,   r m = G a u s s i a n
EMD-EDBN n h = [ 100 ,   100 ] ,   e t a = 0.001 , a f = S i g m o i d , r b = 2 ,
v m = 0.01 , m i = 500
n h —the number of hidden nodes; n h —the maximum number of iterations; a f —activation function; eta—learning rate; r b —the random batch size of each time; v m —momentum value; f R B F —radial basis functions; s R B F —the spread of radial basis functions; n e —the number of enhancement nodes; D L —whether to have the direct link between the input layer and output layer; r m —randomization methods; n f —the number of feature nodes.
Table 2. The error of comparison with separate neural network models.
Table 2. The error of comparison with separate neural network models.
AreaSeasonSpringSummerAutumnWinter
ModelRMSEMAPERMSEMAPERMSEMAPERMSEMAPE
CAPITLOURS21.971.4727.731.2424.721.5425.701.47
RBFNN60.053.9355.502.5570.274.8038.592.34
BPNN39.752.5235.701.7137.662.5935.042.09
RVFL31.132.0038.741.7733.952.1731.291.82
DBN31.742.0939.391.8233.992.1729.681.73
BLS–BP37.712.3835.571.6141.292.5526.771.48
CENTRLOURS41.601.9541.041.6136.201.7538.021.74
RBFNN79.293.6951.331.9361.563.0666.513.03
BPNN63.322.9960.292.4246.152.2858.082.52
RVFL60.142.9644.131.7438.991.8041.171.83
DBN46.392.2356.132.2637.881.7840.001.74
BLS–BP50.042.3746.961.9746.452.1042.541.88
DUNWODOURS14.201.8622.821.8217.872.2816.422.09
RBFNN28.053.6834.472.9228.593.8225.463.28
BPNN25.803.4024.322.1332.094.5218.222.43
RVFL21.172.8927.932.0819.212.3818.092.32
DBN16.862.1732.222.7618.612.4817.122.16
BLS–BP18.292.3724.201.8918.592.3816.292.10
GENESEOURS18.481.4418.881.1615.551.1616.591.18
RBFNN29.902.3822.101.3423.711.8424.291.78
BPNN28.512.3838.842.6116.301.1920.931.57
RVFL32.362.6728.891.6921.281.6223.061.66
DBN23.231.9135.342.0421.951.7124.211.88
BLS–BP29.202.2326.791.6021.691.6824.151.69
HUDVLOURS24.051.9928.192.1147.494.0828.082.07
RBFNN48.704.1236.441.9264.845.3141.373.10
BPNN36.693.0847.622.7850.984.1731.402.25
RVFL37.673.1139.812.1059.715.1931.512.40
DBN24.691.9944.472.4356.604.8331.252.41
BLS–BP32.542.7334.971.9151.984.2732.472.46
Table 3. The error of comparison with other ensemble models.
Table 3. The error of comparison with other ensemble models.
AreaSeasonSpringSummerAutumnWinter
ModelRMSEMAPERMSEMAPERMSEMAPERMSEMAPE
CAPITLOURS21.971.4727.731.2424.721.5425.701.47
S-DBN34.522.3952.532.4034.352.2627.231.57
S-BLS30.331.9335.281.5731.602.0126.961.61
CENTRLOURS41.601.9541.041.6136.201.7538.021.74
S-DBN55.572.7149.041.9536.671.7940.881.80
S-BLS51.692.4054.282.0958.702.7740.511.78
DUNWODOURS14.201.8622.821.8217.872.2816.422.09
S-DBN15.432.0627.702.2618.002.3616.682.14
S-BLS15.372.0524.581.9218.982.4720.022.52
GENESEOURS18.481.4418.881.1615.551.1616.591.18
S-DBN19.791.5828.201.5017.961.3416.651.22
S-BLS23.861.9120.771.2017.161.2618.481.40
HUDVLOURS24.051.9928.192.1147.494.0828.082.07
S-DBN29.772.5137.572.0160.145.3929.122.18
S-BLS24.592.0230.121.6348.674.0829.472.24
Table 4. The error of comparison with other hybrid models.
Table 4. The error of comparison with other hybrid models.
AreaSeasonSpringSummerAutumnWinter
ModelRMSEMAPERMSEMAPERMSEMAPERMSEMAPE
CAPITLOURS21.971.4727.731.2424.721.5425.701.47
DWT-EMD-RVFL29.381.9543.221.9327.121.7529.541.66
SWT-LSTM29.561.9042.371.9725.711.6043.562.70
EMD-BLS23.281.5130.831.4339.423.0530.931.61
EMD-EDBN36.002.2674.203.6447.373.2030.931.61
CENTRLOURS41.601.9541.041.6136.201.7538.021.74
DWT-EMD-RVFL43.722.0848.461.9339.661.8639.751.77
SWT-LSTM35.521.6544.141.8537.581.7938.831.73
EMD-BLS52.072.5547.501.9239.861.8345.502.00
EMD-EDBN42.341.9888.283.6450.782.5179.923.76
DUNWODOURS14.201.8622.821.8217.872.2816.422.09
DWT-EMD-RVFL16.142.2028.012.2817.942.3817.492.16
SWT-LSTM15.642.0626.411.9818.702.1718.962.45
EMD-BLS19.762.6934.563.4530.284.7123.213.25
EMD-EDBN29.764.0095.577.3945.025.9330.024.37
GENESEOURS18.481.4418.881.1615.551.1616.591.18
DWT-EMD-RVFL22.531.7728.591.6519.491.4220.941.43
SWT-LSTM23.751.9327.461.7023.731.8723.011.75
EMD-BLS26.342.0826.231.4242.303.0430.542.22
EMD-EDBN46.703.89153.699.0139.913.0277.176.00
HUDVLOURS24.051.9928.192.1147.494.0828.082.07
DWT-EMD-RVFL28.932.3540.062.0948.324.1432.672.44
SWT-LSTM34.302.7330.611.7059.025.2128.202.04
EMD-BLS34.642.9340.732.4147.954.2933.412.46
EMD-EDBN64.315.8582.024.6197.178.7450.543.60
Table 5. The computation times required for each case.
Table 5. The computation times required for each case.
AreaSeasonSpring
ModelTime/Second
HUDVLOURS83.928
RBFNN13.214
BPNN15.893
RVFL12.039
DBN12.901
BLS–BP11.574
S-DBN81.259
S-BLS66.374
DWT-EMD-RVFL13.333
SWT-LSTM107.791
EMD-BLS12.618
EMD-EDBN28.211
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jiang, Q.; Cheng, Y.; Le, H.; Li, C.; Liu, P.X. A Stacking Learning Model Based on Multiple Similar Days for Short-Term Load Forecasting. Mathematics 2022, 10, 2446. https://doi.org/10.3390/math10142446

AMA Style

Jiang Q, Cheng Y, Le H, Li C, Liu PX. A Stacking Learning Model Based on Multiple Similar Days for Short-Term Load Forecasting. Mathematics. 2022; 10(14):2446. https://doi.org/10.3390/math10142446

Chicago/Turabian Style

Jiang, Qi, Yuxin Cheng, Haozhe Le, Chunquan Li, and Peter X. Liu. 2022. "A Stacking Learning Model Based on Multiple Similar Days for Short-Term Load Forecasting" Mathematics 10, no. 14: 2446. https://doi.org/10.3390/math10142446

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop