Two-Stage Short-Term Power Load Forecasting Based on RFECV Feature Selection Algorithm and a TCN–ECA–LSTM Neural Network

Liang, Hui; Wu, Jiahui; Zhang, Hua; Yang, Jian

doi:10.3390/en16041925

Open AccessArticle

Two-Stage Short-Term Power Load Forecasting Based on RFECV Feature Selection Algorithm and a TCN–ECA–LSTM Neural Network

by

Hui Liang

¹,

Jiahui Wu

^1,2,*,

Hua Zhang

³ and

Jian Yang

³

¹

School of Electrical Engineering, Xinjiang University, Urumqi 830017, China

²

Engineering Research Center of Renewable Energy Power Generation and Grid Connection Control, Ministry of Education, Urumqi 830017, China

³

CGN New Energy Investment (Shenzhen) Co., Ltd., Xinjiang Branch, Urumqi 830011, China

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(4), 1925; https://doi.org/10.3390/en16041925

Submission received: 31 December 2022 / Revised: 7 February 2023 / Accepted: 13 February 2023 / Published: 15 February 2023

Download

Browse Figures

Versions Notes

Abstract

:

To solve the problem of feature selection and error correction after mode decomposition and improve the ability of power load forecasting models to capture complex time series information, a two-stage short-term power load forecasting method based on recursive feature elimination with a cross validation (RFECV) algorithm and time convolution network–efficient channel attention mechanism–long short-term memory network (TCN–ECA–LSTM) is presented. First, the load sequence is decomposed into a relatively stable set of modal components using variational mode decomposition. Then, the RFECV-based method filters the feature set of each modal component to construct the best feature set. Finally, a two-stage prediction model based on TCN–ECA–LSTM is established. The first stage predicts each modal component and the second stage reconstructs the load forecast based on the predicted value of the previous stage. This paper takes actual data from New South Wales, Australia, as an example, and the results show that the method proposed in this paper can build the feature set reliably and efficiently and has a higher accuracy than the conventional prediction model.

Keywords:

load forecasting; temporal convolutional network; efficient channel attention; recursive feature elimination with cross validation; long short-term memory

1. Introduction

Electricity load forecasting is an important research area for power systems. The power load is mainly influenced by the operating characteristics of the power system, national policies, natural conditions and people’s living habits. Therefore, load forecasting needs to consider the relationship between the load and various influencing factors and forecast future load changes based on the historical load change pattern. Load forecasting methods can be divided into three categories according to their prediction horizons and different application ranges, such as long-term forecasting, medium-term forecasting and short-term forecasting. Amongst these, the long-term forecasting of electrical loads is a forecast of load changes over the next few years. It is mainly used for development planning, expansion and alterations of the power system, and provides the necessary basis for determining annual maintenance plans and system operation methods [1]. Medium-term power load forecasting is used to forecast load changes over the next few months or even up to a year. It is mainly applied to determine monthly maintenance plans and operation methods, to develop reservoir scheduling plans for hydropower plants and coal scheduling plans for thermal power plants, etc. Short-term power load forecasting is the most commonly used, with a forecast time of one day or one week in the future. It is mainly used to help maintain the balance between supply and demand of electricity; provide reliable data for power plants and market clearing prices by electricity marketing companies; and provide the necessary data support for the reconfiguration of the distribution network, the maintenance of equipment and the economic distribution of load [2,3].

In recent years, with the continuous development of new energy sources and the large-scale grid connection of distributed energy sources, the electric load has shown higher uncertainty and greater volatility compared to the past, which has increased the difficulty of short-term forecasting [4,5]. The magnitude of short-term load forecasting mainly depends on the forecasting accuracy, so how to improve the forecasting accuracy is the focus of current research on short-term load forecasting theory and methods.

In data processing, in order to reduce the impact of non-linearity and instability of load series on prediction model, scholars usually use a signal decomposition algorithm to denoise load series and to model and predict the decomposed subsequences. Common signal decomposition algorithms include wavelet decomposition, empirical mode decomposition and variational mode decomposition. Among them, the performance of wavelet decomposition depends on the selection of the mother wave and the number of decomposition layers, which is not adaptive [6]. Empirical mode decomposition is sensitive to noise and is prone to mode overlap [7]. Variational mode decomposition (VMD) is robust to noise and can effectively alleviate mode overlap [8,9]. Work in the literature [6] has used the VMD algorithm to process electricity load sequences and construct the input feature sets for each sub-series separately based on the mutual information coefficients, proving that the VMD processing method can improve the prediction to some extent. However, converting the task of load prediction into multiple sub-series predictions has the following problems. On the one hand, continuing manual feature selection and constructing input feature sets based on metrics such as Pearson coefficients [10] and mutual information coefficients [11] would double the feature engineering effort; on the other hand, there is also an inherent error in the prediction of each sub-series prediction model due to the loss problem in the signal decomposition algorithm, leading to an accumulation of errors in the direct reconstruction of load predictions based on the sub-series predictions.

In terms of prediction models, according to the basic modelling methods, they can be broadly classified into traditional forecasting models based on mathematical statistical methods, intelligent forecasting models based on artificial intelligence techniques, etc. Traditional forecasting model methods commonly include the time series method, autoregressive analysis, etc. They mainly analyse and process historical load data and construct mathematical models based on the law of load variation so as to find out the forecasted value of load [12,13]. The advantages of traditional prediction models are the small amounts of calculation, high computational efficiency, and strong parsability of prediction results. However, in the face of today’s large and complex non-linear data, the processing power is weak and the performance is often more general. Work in the literature [14] proposed an improved two-season exponential smoothing model, which has better prediction results than the periodic ARIMA model based on principal component analysis. However, because it does not take into account the influence of external characteristics such as electricity prices and weather, the model has more limitations, has a weaker learning ability, and tends to have a poor prediction effect on load series with high randomness and fluctuation.

With the development of computer technology, the intelligent forecasting models represented by support vector machine (SVM), convolutional neural networks (CNN) [15,16], and long short-term memory (LSTM) [17,18] have been widely used for power load forecasting. Work in the literature [19] used recurrent neural networks (RNN) for load forecasting and used an ant colony algorithm to optimise the parameters of the network and improve the accuracy of the forecasts. The paper [20] proposed a load forecasting model based on LSTM neural networks considering real-time electricity prices, which solved the shortcoming of RNNs in not being able to remember key information for a long time and provided better forecasting performance compared to BP neural networks and SVM.

In order to avoid the inherent defects of a single model, the combination of multiple heterogeneous models can effectively learn from each other and improve the generalization ability and prediction accuracy of the model. Therefore, the research on prediction models has gradually developed from a single model to a combination of multiple models. Work in the literature [21] used a combined CNN–LSTM model for load prediction and realized input feature fusion and information extraction through CNN, which has a higher forecasting accuracy than the single LSTM model. The paper [22] pioneered the temporal convolutional network (TCN) with a unique dilation causal convolutional structure that can extract long intervals of discontinuous temporal feature information and outperform CNNs in temporal prediction problems. The paper [23] improved the CNN–LSTM model by embedding the efficient channel attention (ECA) mechanism. With the addition of a small number of hyperparameters, ECA can effectively improve the model’s ability to capture channel information and enable the combined model to have higher prediction accuracy. Although the complex combination model can effectively improve the prediction effect, the inherent error in the prediction has not been effectively mined and utilized.

In summary, this paper proposes a recursive feature elimination with a cross-validation (RFECV)-based feature selection algorithm and a two-stage short-term electricity load model based on a TCN–ECA–LSTM neural network (TEL) for the feature selection and error correction problem after load sequence decomposition. First, the load sequence is decomposed into a set of relatively smooth modal components using VMD. Then, the best feature set is constructed by filtering the feature set based on RFECV for each modal component. Finally, a two-stage prediction model based on TEL is developed using ECA to improve the TCN–LSTM, with the first stage predicting each modal component and the second stage reconstructing the load prevalence based on the predicted values and the historical errors from the previous stage. Simulations with real data from New South Wales, Australia, demonstrate that the proposed method can reliably and efficiently construct feature sets with higher accuracy than conventional prediction models.

The contributions of this paper can be summarised as follows:

(1): For the feature selection problem after decomposition of load sequences, this paper proposes a solution based on VMD and RFECV, aiming to effectively construct the optimal set of input features for each load subsequence.
(2): In terms of prediction models, this paper proposes an improved TCN–ECA–LSTM combined model (TEL) based on ECA and compares it with common prediction models such as CNN and LSTM, proving that the proposed TEL combined model in this paper has a higher prediction accuracy.
(3): In order to fully exploit the useful information in the prediction errors, a two-stage prediction method based on the combined TEL model is proposed in this paper. Compared with the conventional prediction process, a second stage of prediction is added to correct and reconstruct the prediction results of the previous stage based on the historical prediction errors, and the experiments prove that the prediction method proposed in this paper can effectively improve the prediction accuracy.

2. Data Processing Methods

2.1. VMD

Electricity demand is influenced by climatic, geographical, social, political, daily life, and market factors. As a result, electricity load series exhibit high uncertainty and great volatility and contain a large number of signals. To reduce the pressure of noise on the prediction model, this paper uses the VMD method to pre-process the electricity load series.

VMD is a non-recursive signal decomposition method that adaptively decomposes a non-smooth complex time-series signal into a set of simple modal components of finite bandwidth, transforming the task of predicting electrical load sequences into the prediction of the modal components [24]. The VMD decomposition of the load sequence proceeds as follows:

(1): The decomposition of the load sequence, $f (t)$ , is transformed into a variational problem with constraints, whose objective function is shown in Equation (1).

\begin{matrix} \min_{{u_{k}}, {ω_{k}}} {{\sum_{k} ‖ \partial_{t} [(δ (t) + \frac{j}{π t}) \otimes u_{k} (t)] e^{- j ω_{k} t} ‖}_{2}^{2}}, \\ s . t . \sum_{k}^{K} u_{k} (t) = f (t) \end{matrix}

(1)

where

K

is the number of pre-determined decomposed modes,

δ (t)

is the Dirac distribution function,

{u_{k}}

represents the set of modal components,

{ω_{k}}

represents the set of central frequencies, and

f (t)

represents the original load sequence.

(2): The above variational problem with constraints is transformed into an unconstrained problem for solution by means of Lagrange multipliers, $λ (t)$ , and penalty factors, $α$ , where $α$ affects the accuracy of the signal sequence reconstruction, as shown in Equation (2).

L ({u_{k}}, {ω_{k}}, λ) = α \sum_{k = 1}^{K} {‖ \partial_{t} [(δ (t) + \frac{j}{π t}) \otimes u_{k} (t)] e^{- j ω_{k} t} ‖}_{2}^{2} + {‖ f (t) - \sum_{k = 1}^{K} u_{k} (t) ‖}_{2}^{2} + 〈 λ (t), f (t) - \sum_{k} u_{k} (t) 〉

(2)

(3): The alternating direction multiplier method is used to solve the unconstrained variational problem to achieve an effective separation of the load sequence frequencies. One of the iterative update formulas for the modal components and central frequencies is shown in Equation (3).

\begin{matrix} {\hat{u}}_{k}^{n + 1} (ω) = \frac{\hat{f} (ω) - \sum_{i \neq k} {\hat{u}}_{i} (ω) + \frac{\hat{λ} (ω)}{2}}{1 + 2 α {(ω - ω_{k})}^{2}} \\ ω_{k}^{n + 1} = \frac{\int_{0}^{\infty} ω {| {\hat{u}}_{k} (ω) |}^{2} d ω}{\int_{0}^{\infty} {| {\hat{u}}_{k} (ω) |}^{2} d ω} \end{matrix}

(3)

where

{\hat{u}}_{k}^{n + 1} (ω)

is the kth modal component with centre frequency

ω

at the n + 1st iteration,

ω_{k}^{n + 1}

is the centre frequency of the kth modal component at the n + 1st iteration, and

\hat{f} (t)

is the Fourier transform of the original load sequence.

2.2. RFECV

The K modal components obtained by VMD decomposition have different amplitudes and central frequencies and are highly quasi-orthogonal to each other. Therefore, it is necessary to consider the interaction between these K modal components and each characteristic variable and construct appropriate feature sets for each modal component separately to improve the prediction accuracy. In the field of machine learning, support vector machines are often used as feature filters for efficient feature selection based on the RFECV method in order to improve the efficiency of feature selection [25,26,27]. The RFECV method consists of two parts: recursive feature elimination and cross-validation.

First is recursive feature elimination, where the filter learns each feature of the complete feature set S, calculates the importance of each feature, and removes the feature with the lowest weight, updating the feature set. Then, the filter performs a new round of training learning and rating calculations on the feature set until it recursively finishes rating the importance of all features.

Then, according to the feature importance rating of recursive feature elimination, different numbers of features are selected from the complete feature set S in turn, to construct feature subsets; the filter cross-validates each feature subset separately to obtain the average score of each feature subset; the best feature combination is determined according to the feature subset with the highest average score; and the importance ranking and the number of best features of each feature are output to complete feature selection.

3. Load Forecasting Modelling Methods

3.1. TCN

TCN combines the advantages of dilation convolution and causal convolution with a unique dilation causal convolution structure, supplemented by a residual join structure to avoid gradient disappearance, and is a special convolutional network dedicated to time series prediction.

The key parameters of the TCN’s dilation causal convolution are the convolution kernel size,

k_{TCN}

, and the dilation factor,

d

, which control the structure of the network from the perspective of both the number and distance of the input elements in the upper layer, respectively. Figure 1 shows the structure of the TCN’s dilated causal convolution with three network layers, a convolutional kernel size,

k_{TCN}

, of 3, and dilatation factors,

d

, of 1, 2, and 4. The dilatation factor,

d

, increases exponentially with the number of network layers by a factor of two, allowing the convolutional kernel to expand the range of information extracted and obtain a large perceptual field using a relatively small number of layers. For an input one-dimensional vector

x

, the operation of the extended causal convolution is shown in the following equation:

F (x) = \sum_{i = 0}^{k_{TCN} - 1} f (i) \cdot x_{s - d \cdot i}

(4)

where

k_{TCN}

is the size of the convolution kernel,

d

is the dilation factor, and

x_{s - d \cdot i}

denotes the historical data of input

x

.

The temporal convolutional network consists of several TCN residual blocks stacked as shown in Figure 2. A TCN residual block contains two one-dimensional dilated causal convolutional layers, and each dilated causal convolutional layer is followed by a weight normalisation layer to eliminate the gradient explosion problem, a rectified linear unit (ReLU) is used to increase the non-linearity of the network, and a certain number of neurons are randomly deactivated by the dropout algorithm to avoid overfitting of the network.

3.2. LSTM

LSTM is a variant of recurrent neural networks that selects information for storage or forgetting by controlling forgetting gates, input gates, and output gates in the hidden layer units, alleviating the problems of long-term dependence, gradient explosion, and gradient disappearance of traditional recurrent neural networks.

The state update of each gate structure of the LSTM memory cell is shown in the following equation:

I_{t} = σ (X_{t} W_{x i} + H_{t}_{- 1} W_{h i} + b_{i})

(5)

F_{t} = σ (X_{t} W_{x f} + H_{t}_{- 1} W_{h f} + b_{f})

(6)

O_{t} = σ (X_{t} W_{x o} + H_{t}_{- 1} W_{h o} + b_{o})

(7)

{\tilde{C}}_{t} = \tanh (X_{t} W_{x c} + H_{t}_{- 1} W_{h c} + b_{c})

(8)

C_{t} = F_{t} ⊙ C_{t - 1} + I_{t} ⊙ {\tilde{C}}_{t}

(9)

H_{t} = O_{t} ⊙ \tanh (C_{t}) + I_{t} ⊙ {\tilde{C}}_{t}

(10)

where

I_{t}

,

F_{t}

, and

O_{t}

are the outputs of the input, forgetting, and output gates, respectively, at moment

t

;

{\tilde{C}}_{t}

and

C_{t}

are the outputs of the candidate and memory elements, respectively, at moment

t

;

W_{x i}

,

W_{x f}

,

W_{x o}

,

W_{x c}

,

W_{h i}

,

W_{h f}

,

W_{h o}

, and

W_{h c}

are the weight matrices obtained by learning;

b_{i}

,

b_{f}

,

b_{o}

, and

b_{c}

are the biases obtained by learning;

X_{t}

is the input at moment

t

;

H_{t}

is the hidden state at moment

t

; ⊙ is the Hadamard product of the matrices; and

σ

is the Sigmoid function. The structure of the LSTM memory cell is shown in Figure 3.

3.3. ECA

Attention mechanism in the field of deep learning is an emulation of human visual attention, which is essentially a resource allocation mechanism and has been widely used in various fields in recent years in combination with neural networks. Among them, the Squeeze-and-Excitation Network (SENet) proposes an available channel attention mechanism for the first time and proves that the channel attention mechanism can bring significant performance improvement to deep convolutional networks [28]. However, in order to control the complexity of the model, SENet performs channel compression on the input feature maps, and the compression and dimensionality reduction is not conducive to the model learning the dependencies between the channels.

ECA mainly improves SENet’s attention mechanism by adaptively selecting the size of the one-dimensional convolutional kernel to achieve a local cross-channel interaction strategy without downscaling, bringing significant performance gains with little added complexity to the model.

Figure 4 shows the working schematic of ECA. First, the ECA module performs a global average pooling (GAP) operation on the input raw feature sequence to obtain all features without dimensionality reduction; then, according to the size of channel C, it adaptively calculates the local cross-channel interaction coverage,

k_{ECA}

. The mapping relationship between

k_{ECA}

and

b

is shown in the following equation:

k_{ECA} = ϕ (C) = {| \frac{\log_{2} (C)}{γ} + \frac{b}{γ} |}_{o d d}

(11)

where C denotes the current total number of channels,

{| \cdot |}_{o d d}

denotes the nearest odd number, and

γ

and C take the fixed values of 2 and 1 according to the literature [29]. After determining the value of

k_{ECA}

, a one-dimensional convolution of size

k_{ECA}

is generated to capture the local cross-channel interaction information, and the learned attention weight parameters are shared between channels, with the weight of the ith channel being expressed as follows:

ω_{i} = σ (\sum_{j = 1}^{k_{ECA}} α^{j} y_{i}^{j}), y_{i}^{j} \in Ω_{i}^{k_{ECA}}

(12)

where

α^{j}

denotes the weight parameter shared by each channel,

σ

is the Sigmoid function, and

Ω_{i}^{k_{ECA}}

is the set of

k_{ECA}

channels adjacent to

y_{i}

. Finally, the original input features are combined with the channel weights,

ω

, to obtain high-dimensional feature information carrying the channel attention.

3.4. A Two-Stage Forecasting Model Based on TCN–ECA–LSTM

The TCN has the capability of parallel computation and the flexibility to adjust the perceptual field size in various ways to control memory consumption in deep networks, and the unique design of dilated causal convolution allows it to perform well in temporal feature extraction problems. Therefore, this paper uses TCN for deep mining of input features and combines prediction with LSTM to improve the generalization ability of the model and avoid the inherent limitations of a single model. ECA is embedded between TCN and LSTM to relieve the pressure of LSTM on the processing of high-dimensional features output from TCN and improve the overall performance of the prediction model. In the actual forecasting process, there is always an error in the load forecasting results due to the inherent bias between the load decomposition algorithm and the forecasting model. However, the errors often contain useful information that has not been fully explored. Therefore, this paper proposes a two-stage load forecasting method to correct the forecasting results for historical forecasting errors and effectively improve the forecasting performance. The structure of the two-stage forecasting model is shown in Figure 5.

After the load sequence is decomposed by VMD, a set of modal components is obtained, each modal component is given its own best feature set by RFECV, and the first stage TEL forecasting model is constructed for each modal component and its corresponding feature set, respectively. First, the TCN mines the input feature set for high-dimensional abstract features to extract higher-order temporal information. Then, ECA performs a local cross-channel exchange information capture on the output of the TCN, fuses the extracted attention weights with the output features of the TCN, and transmits them to the LSTM. The LSTM analyses and predicts the abstract timing features of the input and outputs the predicted values of the corresponding modal components. Finally, the predicted values of each modal component are fused and corrected by the TEL model in the second stage to obtain the final load prediction.

4. Comparative Validation and Analysis of Results

4.1. Data Sources and Evaluation Indicators

To validate the feasibility of the model proposed in this paper, a publicly available dataset from New South Wales, Australia, was selected as the validation of the algorithm. As shown in Table 1, the feature vector of the dataset has eight dimensions and contains feature parameters such as week, time of day, dew point temperature, dry bulb temperature, wet bulb temperature, humidity, electricity load, and electricity price, and the data were collected at a sampling interval of 30 min with 48 points per day. In order to verify the performance of the prediction model on different numbers of training sets of different sizes, experiments were conducted for two different cases.

Case 1 in this paper is a 21-day dataset used to analyse the performance of the models in this paper on a small data sample and to validate the predictive performance of each model. In Case 1, we used the data from 6 May to 26 May 2008 as a training sample for supervised learning and training of the model. Additionally, the data from 27 May 2008 were used as a sample for testing to validate the predictions for the load on this day.

Considering the existence of four different seasons in a year, there are differences in people’s electricity consumption behaviour under the influence of different seasons, which may have an impact on the prediction results. Therefore, to further validate the generalisation performance of the model in this paper, we chose to use a larger dataset in Case 2. In Case 2, we took the data from 1 January 2009 to 31 December 2009 as the training set for training each prediction model. At the same time, we used the data from 2010 as the validation set and randomly selected 1 week from each of February, May, August, and November for prediction validation.

In both Case 1 and Case 2, the feature types are the same, although the size of the datasets are different. The main difference between the two cases is the amount of data used for training, with the model in Case 1 using 20 days of data for training and the model in Case 2 using 1 year of data for training.

The input feature set and output labels were constructed for the entire dataset by the sliding window method. The width of the time window was set to 64, which means that the historical data at moments t − 1 to t − 64 are used as input features, and correspondingly the load value at moment t is used as the label for the predicted output. The input features and output labels are in one-to-one correspondence. In this way, the input feature set and the corresponding output label set can be constructed using the original dataset and the decomposition results of the VMD. Subsequently, the input feature set and the output label set are segmented according to the range of the training set and the result of feature selection, and then the input feature set and output label set of the training set are obtained for supervised learning and training of the prediction model. In the same way, the validation set’s input feature set and label set can also be obtained. Inputting the validation feature set to the well-trained model will give the load prediction values. By comparing the predicted values with the true labels of the validation set, various error metrics of the prediction can be obtained.

The mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and coefficient of determination (R²) were chosen as the evaluation metrics for the model. Among them, the RMSE can accurately reflect the dispersion of errors and R² indicates the linear correlation between the predicted and true values, which tends to 1 when the predicted and true values of the sample converge. The expressions for each error indicator are shown in the following equations:

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(13)

MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} |

(14)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(15)

R^{2} = 1 - \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} \div \sum_{i = 1}^{n} {(y_{i} - \frac{1}{m} \sum_{i = 1}^{m} y_{i})}^{2}

(16)

where

y_{i}

is the true value of the load,

{\hat{y}}_{i}

is the predicted value of the load, and

m

is the number of prediction points.

4.2. VMD Decomposition of Load Sequences

In order to obtain good prediction results, a VMD decomposition of the load series is performed to reduce the effects of volatility and non-linearity in the load series. The effect of the VMD decomposition involves the selection of two key parameters, namely the number of decompositions,

K

, and the penalty factor,

α

. The first step is to determine the value of

K

. Too large a value will result in modal aliasing and too small a value will result in under-decomposition of the signal. In general, the range of values for

K

is three to eight and the range of values for

α

is 10² to 10⁵. In this paper, the centre frequency observation method is used to determine the optimal

K

value by observing the centre frequency of each mode at different

K

values. Once the

K

value is determined, the

α

is adjusted appropriately based on experience so that the loss in signal decomposition is minimised.

According to the needs of the predictive model in this paper, we decompose all the load data for Case 1 and Case 2, respectively, using the VMD algorithm. In Case 1, the load data are decomposed for a total of 21 days from 6 May to 27 May 2008. In Case 2, the load data are decomposed for a total of 2 years from 2009 to 2010. The decomposition results of the VMD will be merged with the original dataset, the feature set and label set will be constructed using the sliding window method, and the segmentation will be performed according to the range of the training and validation sets.

Observe Table 2, when the value of the number of decomposition K is 6, the central frequencies of

ω_{5}

and

ω_{6}

are closer, and the difference between the two values is less than 1.5 times, so K is taken as 5.

α

is empirically taken as 1937, and the MAE of decomposition is 88.9629 MW, the MAPE is 0.9457%, and the RMSE is 116.0451 MW. The results of the decomposition of the original load sequence and the VMD are shown in Figure 6 and Figure 7.

The process of Case 2 is similar to that of Case 1, and we determine the decomposition number K of the VMD to be 6 using the centre frequency observation method. The decomposition results are shown in Figure 8.

4.3. Results of Feature Selection

In Case 1, the load sequence is decomposed by VMD to obtain the modal classifications IMF1, IMF2, IMF3, IMF4, and IMF5. The five modal components were used as one of the features to construct the original feature set with thirteen feature dimensions, and the feature vectors are electrical load, type of week, time of day, dry bulb temperature, dew point temperature, wet bulb temperature, humidity, electricity price, IMF1, IMF2, IMF3, IMF4, and IMF5, corresponding to feature numbers 1 to 13, respectively.

The RFECV features were selected for each modal component separately, and the feature importance rankings corresponding to RFECV are shown in Table 3, with higher rankings indicating higher importance of the features.

The average scores of different feature subsets for cross-validation in RFECV are shown in Figure 9. Combined with the analysis in Table 3, we can see that IMF1, IMF2, IMF3, IMF4, and IMF5 have the highest scores when choosing their corresponding feature subsets, with the number of features of 3, 7, 3, 1, and 1, respectively. The best feature sets for IMF1 are Feature 2, Feature 6, and Feature 9; the best feature sets for IMF2 are Feature 1, Feature 3, Feature 4, Feature 7, Feature 8, Feature 10, and Feature 11; the best feature sets for IMF3 are Feature 1, Feature 10, and Feature 11; the best feature set for IMF4 was Feature 12; and the best feature set for IMF5 was Feature 13.

In Case 2, the feature selection process is similar to that of Case 1. In Case 2, based on the RFECV results, the input features for IMF1 are historical load, type of week, wet bulb temperature, electricity price, IMF1 historical value; the input features for IMF2 are historical load, time of day, dry bulb temperature, humidity, electricity price, and IMF2 historical value; the input features for IMF3 are historical load, time of day, wet bulb temperature, and IMF3 historical value; IMF4 has the input characteristics of historical load, time of day, and IMF4 historical value; IMF5 has the input characteristic of IMF5 historical value; and IMF6 has the input characteristics of IMF6 historical value.

4.4. Analysis of Predicted Results

All models were built in Python, using Tensorflow-gpu and Keras as the deep learning framework. The experimental platform has an Intel Core i5-12500H processor, an NVIDIA GeForce GTX 3060 6G GPU, 16GB RAM, and a Windows 11 operating system.

In each comparison model, the number of LSTM hidden layers is 1, the number of neurons is 32; the number of convolutional kernels of CNN is 64, the size of convolutional kernels is 2; the number of TCN residual blocks is 1, where the number of convolutional kernels of the expanded causal convolution is 64, the size of convolutional kernels is 2, and the expansion factor d is 1, 2, 4, 8, 16, 32, and 64. The kernel function of SVR is rbf, the penalty factor C is 10, and gamma is 0.5. The time step for all models was set to 64, the batch size was set to 32, the epoch was set to 80, the initial value of learning rate was set to 0.001, and the optimizer was Adam. Each model used Features 1 to 8 as the input feature set.

4.4.1. Case 1

The error metrics and prediction results for each forecasting model are shown in Table 4 and Figure 10. As can be seen from Table 4, the TCN prediction model for time series prediction performs better compared to CNN, with MAE, MAPE, and RMSE decreasing by 13.1647 MW, 0.1121%, and 35.4986 MW, respectively, and R² improving by 0.0094. The evaluation indicators are close to those of LSTM; the evaluation indicators of TCN–LSTM and CNN–LSTM all performed better than single models such as CNN, TCN, and LSTM, proving that a proper combination of multiple models can improve the performance and generalization ability of the models. The MAE and RMSE of TCN–LSTM decreased by 11.8593 MW and 9.4122 MW, respectively, compared with CNN–LSTM, indicating that TCN is more suitable than CNN for LSTM to mine the temporal feature information of sequences.

Combining Table 4 and Figure 11, it can be seen that the combined TEL model with the ECA channel attention mechanism reduces the MAE, MAPE, and RMSE by 2.7108 MW, 0.0332%, and 5.663 MW, respectively, compared to the combined TCN–LSTM model, and the coefficient of determination, R², is closer to 1. The prediction results fit the true values better, indicating that the ECA attention mechanism can effectively improve the prediction model’s ability to capture channel information and the model achieves better prediction results.

In order to demonstrate the effectiveness of the RFECV feature screening method, the impact of RFECV feature screening on the TEL model was compared and analysed based on two scenarios: VMD decomposition of the load series and no decomposition. The evaluation metrics of the model are shown in Table 5.

As can be seen from Table 5, the RFECV–TEL model has better prediction results than the TEL model, and the RFECV feature selection method, which performs rapid screening of input features based on feature importance, is effective in constructing a feature set that is beneficial to the prediction model, resulting in a 13.2214 MW reduction in MAE, a 0.1169% reduction in MAPE, a 0.1169% reduction in RMSE for the TEL model (14.7682 MW), and an improvement in the R² value of 0.23%. Similarly, RFECV feature selection for each modal component of the VMD was able to reduce the MAE, MAPE, and RMSE of the TEL model by 11.17 MW, 0.1191%, and 23.469 MW, respectively, with a 0.51% improvement in the R² value. The prediction results of the above models are shown in Figure 12.

Additionally, in combination with Table 5 and Figure 13, it can be found that in the case of using VMD, if the predicted values of each modal component are reconstructed by direct weighting of the load values, this can lead to an error accumulation problem due to the decomposition loss of VMD and the inherent limitations of multiple TEL models, making the overall prediction of the model worse. VMD–RFECV–TEL–FC can improve the prediction to some extent by using full connection layer (FC) to reconstruct the load prediction values. However, the VMD–RFECV–TEL–TEL model proposed in this paper, based on the modal component prediction of VMD–RFECV–TEL, fuses each modal by the second-order TEL model predictions to reconstruct the load forecast values. Compared to the RFECV–TEL and VMD–RFECV–TEL–FC models, the MAE is reduced by 6.8247 MW and 20.531 MW, the MAPE is reduced by 0.127% and 0.2236%, the RMSE is reduced by 7.6144 MW and 29.6973 MW, and the R² value is improved by 0.11% and 0.48%, with a significant improvement in forecast accuracy.

The residuals for each model are shown in Figure 14 and the distribution of their residuals is shown in Figure 15. The distribution of residuals for the VMD–FRECV–TEL–TEL model in this paper is concentrated between −75 MW and 25 MW, with the median around −30 MW. Compared with other forecasting models, the VMD–FRECV–TEL–TEL model in this paper has the smallest and most stable forecasting error, with outstanding performance in the distribution of residuals.

4.4.2. Case 2

Viewing the data in Table 6, it can be seen that the TEL model in this paper predicts better results than any other model. Taking the forecast for the 1st week of February as an example, the TEL in this paper has the best indicators, with an MAE, MAPE, RMSE, and R² of 151.6003 MW, 1.6605%, 200.6554 MW, and 0.9820, respectively. Compared to the LSTM, the MAE, MAPE, and RMSE of the TEL are lower by 108.2261 MW, 1.1578%, and 129.0791 MW, respectively. Compared to TCN, the MAE, MAPE, and RMSE of TEL are reduced by 35.9945 MW, 0.4329%, and 41.1878 MW, respectively. Compared to TCN–LSTM, the MAE, MAPE, and RMSE of TEL are reduced by 14.4806 MW, 0.1638%, and 12.5104 MW.

To further improve the accuracy of the forecasts, we chose to combine the TEL forecasting model with algorithms such as VMD and RFECV and use a two-stage forecasting scheme that allows the TEL forecasting model to fully exploit the unused valid information from the historical forecasting errors to improve the accuracy of the forecasts. The prediction errors for the various TEL improvement models are shown in Table 7.

As shown in Table 6 and Table 7, the forecasting results of the two-stage forecasting method of VMD–RFECV–TEL–TEL proposed in this paper is better than any other models. Taking the forecast for the 1st week of February as an example, the VMD–RFECV–TEL–TEL has the best indicators, with an MAE, MAPE, RMSE, and R² of 51.4589 MW, 0.5719%, 67.6896 MW, and 0.9979, respectively. Compared to VMD–RFECV–TEL (Phase 1), the Phase 2 forecast for VMD–RFECV–TEL–TEL resulted in a reduction in MAE, MAPE, and RMSE of 63.9659 MW, 0.7132%, and 76.6979 MW, respectively. Compared to TEL, the MAE, MAPE, and RMSE for VMD–RFECV–TEL–TEL were reduced by 100.1414 MW, 1.0886%, and 132.9658 MW, respectively. Compared to the TCN–LSTM, the MAE, MAPE, and RMSE of the VMD–RFECV–TEL–TEL were reduced by 114.622 MW, 1.2524%, and 145.4762 MW, respectively.

From Table 6 and Table 7, it can be seen that the prediction models do not have the same accuracy over different seasonal time periods. This is because there are differences in people’s electricity consumption behaviour under different seasonal time periods, and these implied features are not accurately identified and learned by models. However, compared with other models, the VMD–FRECV–TEL–TEL model in this paper achieved good prediction results in all time periods, and the fluctuations in errors were relatively smooth, with good generalisation ability.

The forecast results for each day from February 1 to February 7 in 2010 are shown in Figure 16. The MAE and MAPE for each model’s daily forecast results are shown in Figure 17 and Figure 18. By comparison, we can find that the VMD–RFECV–TEL–TEL model proposed in this paper performs better than other models, and the prediction error is more stable every day. The MAE is maintained at about 50 MW and the MAPE is maintained at about 0.5%.

The residuals for each model are shown in Figure 19 and the distribution of their residuals is shown in Figure 20. The distribution of residuals for the VMD–FRECV–TEL–TEL model in this paper is concentrated between −150 MW and 150 MW, with the median around 30 MW. Compared with other forecasting models, the VMD–FRECV–TEL–TEL model in this paper has the smallest and most stable forecasting error, with outstanding performance in the distribution of residuals.

5. Conclusions

In order to improve the ability of the electricity load forecasting model to capture complex time-series information and achieve forecasting accuracy, this paper proposes a two-stage short-term load forecasting method based on RFECV feature selection and a TCN–ECA–LSTM neural network, and conducts arithmetic simulation with actual data from Australia, and draws the following conclusions:

(1): The ECA attention mechanism can effectively improve the prediction model’s ability to capture channel information, leading to the TEL combination model outperforming SVM, CNN, LSTM, TCN-LSTM, and other models in all evaluation metrics. In Case 1, the MAE, MAPE, and RMSE of the TEL combined model, compared to the TCN–LSTM combined model, were reduced by 2.7108 MW, 0.0332%, and 5.663 MW, respectively. In Case 2, for example, in the first week of February, compared to the TCN–LSTM, the MAE, MAPE, and RMSE of TEL were reduced by 14.4806 MW, 0.1638%, and 12.5104 MW.
(2): The combination of VMD and RFECV algorithms can effectively give the combination of features of each load subsequence, which is conducive to improving the prediction accuracy of the TEL model. In Case 1, because the dataset is small, the VMD–RFECV–TEL model based on VMD and RFECV algorithms has similar evaluation indices to the TEL model. In Case 2, because the dataset is large enough, the various indicators of VMD–RFECV–TEL are better than TEL. In the first week of February, for example, the MAE, MAPE, and RMSE decreased by 36.17552 MW, 0.3754%, and 56.2679 MW, respectively.
(3): The VMD–RFECV–TEL–TEL two-stage prediction method proposed in this paper can effectively compensate for the error accumulation problem caused by VMD decomposition and modal component prediction, and the prediction effect is better than that of other methods. In Case 1, the RFECV–VMD–TEL–TEL two-stage prediction model showed a reduction in MAE of 32.5213 MW, a reduction in MAPE of 0.3557%, and a reduction in RMSE of 36.7363 MW compared to the VMD–RFECV–TEL model. In Case 2, taking the first week of February as an example, the MAE, MAPE, and RMSE of VMD–RFECV–TEL were reduced by 63.9659 MW, 0.7132%, and 76.6979 MW, respectively, compared with VMD–RFECV–TEL.
(4): In Case 2, we can see that due to the influence of seasons and people’s habits, the accuracy of the prediction models varies in different seasons. However, compared with other models, the VMD–FRECV–TEL–TEL model in this paper achieves good prediction results in all time periods, and the fluctuations in errors are relatively smooth, with good generalisation ability. Meanwhile, through analysis of Case 1 and Case 2, we find that because the training set of Case 2 is larger than that of Case 1, which enables the VMD–FRECV–TEL–TEL model in this paper to learn the implicit features (including seasonal characteristics) for a larger amount of data, the prediction accuracy in Case 2 is higher than that in Case 1.

Author Contributions

H.L.: conceptualization, methodology, and original writing. J.W.: conceptualization, methodology, and supervision. H.Z.: formal analysis. J.Y.: validation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 52167016, and was funded by the Natural Science Foundation of Xinjiang Uygur Autonomous Region, grant number 2020D01C068.

Data Availability Statement

The raw data supporting this paper are available and will be provided without reservation by contacting the corresponding author if necessary.

Conflicts of Interest

The authors declare that there are no conflict of interest regarding the publication of this paper.

References

Mahdavi, M.; Sabillón, C.; Ajalli, M.; Monsef, H.; Romero, R. A real test system for power system planning, operation, and reliability. J. Control Autom. Electr. Syst. 2018, 29, 192–208. [Google Scholar] [CrossRef] [Green Version]
Li, G.Q.; Liu, Z.; Jin, G.B.; Quan, R. Ultra short-term power load forecasting based on randomly distributive embedded framework and BP neural network. Power Syst. Technol. 2020, 44, 437–445. [Google Scholar]
Hosseini, H.; Jalilzadeh, S.; Nabaei, V.; Zareie Govar, G.R.; Mahdavi, M. Enhancing deregulated distribution network reliability for minimizing penalty cost based on reconfiguration using BPSO. In Proceedings of the IEEE 2nd International Power and Energy Conference, Johor Bahru, Malaysia, 1–3 December 2008. [Google Scholar]
Liao, N.H.; Hu, Z.H.; Ma, Y.Y.; Lu, W.Y. Review of the short-term load forecasting methods of electric power system. Power Syst. Prot. Control 2011, 39, 147–152. [Google Scholar]
Zhao, J.; Su, P.W.; An, Q.S.; Wang, D.; Mu, Y.F.; Deng, S. Progress in researches on load forecasting technology for optimization of distributed energy system. Proc. CSU-EPSA 2018, 30, 12–18. [Google Scholar]
Li, B.; Zhang, J.; He, Y.; Wang, Y. Short-term load-forecasting method based on wavelet decomposition with second-order gray neural network model combined with ADF test. IEEE Access 2017, 5, 16324–16331. [Google Scholar] [CrossRef]
Zhang, S.Q.; Su, X.S.; Chen, R.F.; Liu, W.; Zuo, Y.G.; Zhang, Y. Short-term load forecasting based on the VMD and FABP. Chin. J. Sci. Instrum. 2018, 39, 67–73. [Google Scholar]
Hu, W.; Zhang, X.Y.; Li, Z.E.; Li, Q.; Wang, H. Short-term load forecasting based on an optimized VMD-mRMR-LSTM model. Power Syst. Prot. Control 2022, 50, 88–97. [Google Scholar]
He, F.F.; Zhou, J.Z.; Feng, Z.K.; Liu, G.B.; Yang, Y.Q. A hybrid short-term load forecasting model based on variational mode decomposition and long short-term memory networks considering relevant factors with Bayesian optimization algorithm. Appl. Energy 2019, 237, 103–116. [Google Scholar] [CrossRef]
Zhang, J.J.; Wang, T.; Wu, J.Y.; Zhu, H.N.; Lan, D.; Li, F.S. Short-term load forecasting method based on artificial intelligence highway neural network. In Proceedings of the IEEE 5th Conference on Energy Internet and Energy System Integration (EI2), Taiyuan, China, 22–24 October 2021. [Google Scholar]
Zhu, Q.Z.; Dong, Z.; Ma, N. Forecasting of short-term power based on just-in-time learning. Power Syst. Prot. Control 2020, 48, 92–98. [Google Scholar]
Zhu, T.; Li, Y.; Zhang, Y.; Zhang, X.; He, C. A new algorithm of advancing weather adaptability based on ARIMA model for day-ahead power load forecasting. Proc. CSEE 2006, 26, 14–19. [Google Scholar]
Song, K.B.; Baek, Y.S.; Hong, D.H.; Jang, G. Short-term load forecasting for the holidays using fuzzy linear regression method. IEEE Trans. Power Syst. 2005, 20, 96–101. [Google Scholar] [CrossRef]
Taylor, J.W.; McSharry, P.E. Short-term load forecasting methods: An evaluation based on european data. IEEE Trans. Power Syst. 2007, 22, 2213–2219. [Google Scholar] [CrossRef] [Green Version]
Yao, C.W.; Yang, P.; Liu, Z.J. Load forecasting method based on CNN-GRU hybrid neural network. Power Syst. Technol. 2020, 44, 3416–3424. [Google Scholar]
Ren, J.J.; Wei, H.H.; Zou, Z.L.; Hou, T.T.; Yuan, Y.L.; Shen, J.Q.; Wang, X.M. Ultra-short-term power load forecasting based on CNN-BiLSTM-Attention. Power Syst. Prot. Control 2022, 50, 108–116. [Google Scholar]
Kong, W.C.; Dong, Z.Y.; Jia, Y.W.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 2019, 10, 841–851. [Google Scholar] [CrossRef]
Li, C.J.; Dong, Z.Y.; Ding, L.; Petersen, H.; Qiu, Z.H.; Chen, G.; Prasad, D. Interpretable memristive LSTM network design for probabilistic residential load forecasting. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 2297–2310. [Google Scholar] [CrossRef]
Zou, Z.; Sun, Y.; Zhang, Z. Short-term load forecasting based on recurrent neural network using ant colony optimization algorithm. Power Syst. Technol. 2005, 29, 59–63. [Google Scholar]
Li, P.; He, S.; Han, P.; Zheng, M.; Huang, M.; Sun, J. Short-term load forecasting of smart grid based on long-short-term memory recurrent neural networks in condition of real-time electricity price. Power Syst. Technol. 2018, 42, 4045–4052. [Google Scholar]
Lu, J.X.; Zhang, Q.P.; Yang, Z.H.; Tu, M.F.; Lu, J.J.; Peng, H. Short-term Load Forecasting Method Based on CNN-LSTM Hybrid Neural Network Model. Autom. Electr. Power Syst. 2019, 43, 131–137. [Google Scholar]
Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Xiong, J.; Chen, L.; Wang, S.Q. Life prediction of rolling bearing based on multi-resolution singular value decomposition and ECNN-LSTM. J. Mech. Strength 2021, 43, 523–530. [Google Scholar]
Gu, Z.W.; Li, P.; Lang, X.; Yu, Y.X.; Shen, X.; Cao, M. A controllable clustering model of the electrical load curve based on variational mode decomposition and fast search of the density peak. Power Syst. Prot. Control. 2021, 49, 118–127. [Google Scholar]
Ye, X.Q.; Wu, Y.F. Cancer gene selection algorithm based on support vector machine recursive feature elimination and feature clustering. J. Xiamen Univ. Nat. Sci. 2018, 57, 702–707. [Google Scholar]
Liu, T.G.; Wang, C.H. Predicting apoptosis protein subcellular location based on SVM-RFE algorithm. Comput. Eng. Appl. 2017, 53, 155–159. [Google Scholar]
Wang, B.R.; Wang, W.B.; Zhou, C.; Fang, Y.; Zheng, Y.K. Feature selection and classification of heart sound based on EMD adaptive reconstruction. Space Med. Med. Eng. 2020, 33, 533–541. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Wang, Q.L.; Wu, B.G.; Zhu, P.F.; Li, P.H.; Zuo, W.M.; Hu, Q.H. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]

Figure 1. Dilated causal convolutional structure of TCN.

Figure 2. TCN residual block structure.

Figure 3. Structure of the LSTM memory unit.

Figure 4. Working principle of efficient channel attention module.

Figure 5. Structure of the two-stage forecasting model.

Figure 6. Original load sequence in Case 1.

Figure 7. Results of VMD decomposition in Case 1.

Figure 8. Results of VMD decomposition in Case 2.

Figure 9. Average score for each feature in Case1.

Figure 10. Forecast results of each forecasting model.

Figure 11. Forecast results after adding ECA.

Figure 12. Forecast results after adding RFECV.

Figure 13. Forecast results considering error correction.

Figure 14. Prediction residuals for each model in Case 1.

Figure 15. Distribution of predicted residuals for each model in Case 1.

Figure 16. Forecast results for 1 February to 7 February in Case 2.

Figure 17. MAE comparison chart for each model from 1 February to 7 February.

Figure 18. MAPE comparison chart for each model from 1 February to 7 February.

Figure 19. Prediction residuals from 1 February to 7 February.

Figure 20. Distribution of predicted residuals from 1 February to 7 February.

Table 1. Original dataset.

Feature Categories	Characteristic Parameters	Description
Date	Week	Monday to Sunday, taking values 1 to 7
Date	Time	Sampling interval 30 min, taking values from 0 to 23.5
Meteorology	Dew Point Temperature	Air temperature at which water vapour reaches saturation at a given atmospheric pressure
	Dry Bulb Temperature	The true thermodynamic temperature of air
	Wet Bulb Temperature	Air temperature at which water vapour reaches saturation at the same enthalpy air state
	Humidity	Air dryness and humidity
Electricity	Electricity Load	Electricity consumption per unit time, in MW
Economy	Real-time Electricity Prices	Real-time price per kWh

Table 2. Centre frequencies corresponding to different K values in Case 1.

K	Central Frequency
K	$ω_{1}$	$ω_{2}$	$ω_{3}$	$ω_{4}$	$ω_{5}$	$ω_{6}$
3	0.019	143.823	299.854
4	0.019	143.817	299.780	2132.397
5	0.018	143.274	293.038	538.876	2154.162
6	0.018	143.274	293.036	538.942	2150.829	2934.950

Table 3. Ranking the importance of each feature in Case 1.

Modal Components	Ranking of Each Feature
Modal Components	Feature 1	Feature 2	Feature 3	Feature 4	Feature 5	Feature 6	Feature 7	Feature 8	Feature 9	Feature 10	Feature 11	Feature 12	Feature 13
IMF 1	2	1	8	6	7	1	11	10	1	4	3	5	9
IMF 2	1	7	1	1	6	5	1	1	2	1	1	3	4
IMF 3	1	9	11	3	8	7	10	4	2	1	1	5	6
IMF 4	2	10	3	6	13	11	8	5	9	7	4	1	12
IMF 5	3	13	7	8	10	12	6	9	11	2	4	5	1

Table 4. Error indices of each forecasting model.

Model	MAE	MAPE	RMSE	R²
CNN	130.0418	1.3964	174.0873	0.9746
TCN	116.8771	1.2843	138.5887	0.9840
LSTM	113.1700	1.2137	141.3876	0.9833
CNN–LSTM	99.5585	1.0591	119.1095	0.9882
TCN–LSTM	87.6992	0.9638	109.6973	0.9899
TCN–ECA–LSTM (TEL)	84.9884	0.9306	104.0343	0.9910

Table 5. Error indices of different forecasting methods.

Model	MAE	MAPE	RMSE	R²
TCN–ECA–LSTM (TEL)	84.9884	0.9306	104.0343	0.9910
RFECV–TEL	71.7670	0.8137	89.2661	0.9933
VMD–TEL	108.6336	1.1616	141.8570	0.9832
VMD–RFECV–TEL	97.4636	1.0425	118.3880	0.9883
VMD–RFECV–TEL–FC with a two-stage model	85.4733	0.9103	111.3490	0.9896
VMD–RFECV–TEL–TEL with a two-stage model	64.9423	0.6867	81.6517	0.9944

Table 6. Error indices of each forecasting model in Case 2.

Model	Day	MAE	MAPE	RMSE	R²
CNN	2.1–2.7	329.5188	3.5744	424.9867	0.9192
	5.10–5.16	218.6985	2.5118	287.2508	0.9247
	8.16–8.22	372.2290	4.1383	459.5174	0.8404
	11.22–11.28	262.9436	3.0470	331.6370	0.9308
SVM	2.1–2.7	313.1980	3.7031	391.6685	0.9314
	5.10–5.16	184.3125	1.6691	183.4649	0.9753
	8.16–8.22	231.0575	2.5861	287.1470	0.9236
	11.22–11.28	173.5208	2.1324	209.7444	0.9723
LSTM	2.1–2.7	259.8264	2.8183	329.7345	0.9514
	5.10–5.16	208.0646	2.3854	268.6048	0.9341
	8.16–8.22	342.7110	3.7833	425.8358	0.8629
	11.22–11.28	193.6278	2.2396	245.6042	0.9621
TCN	2.1–2.7	187.5948	2.0934	241.8432	0.9738
	5.10–5.16	198.7105	2.2851	261.5502	0.9376
	8.16–8.22	214.4873	2.3845	275.5537	0.9426
	11.22–11.28	158.9730	1.9623	202.2736	0.9743
TCN–LSTM	2.1–2.7	166.0809	1.8243	213.1658	0.9797
	5.10–5.16	202.4517	2.3189	260.1847	0.9382
	8.16–8.22	256.7777	2.8139	348.9863	0.9079
	11.22–11.28	123.1609	1.4647	160.1591	0.9839
TEL	2.1–2.7	151.6003	1.6605	200.6554	0.9820
	5.10–5.16	119.1053	1.3779	160.7618	0.9764
	8.16–8.22	131.7038	1.4528	168.0049	0.9787
	11.22–11.28	98.4385	1.1690	124.2732	0.9903

Table 7. Error indices of each TEL improvement model in Case 2.

Model	Day	MAE	MAPE	RMSE	R²
VMD–TEL	2.1–2.7	108.7605	1.2487	132.8882	0.9921
	5.10–5.16	98.7686	1.1273	128.4358	0.9849
	8.16–8.22	119.2566	1.3052	152.2628	0.9825
	11.22–11.28	126.3595	1.5392	147.4287	0.9863
RFECV–TEL	2.1–2.7	130.7053	1.4773	152.6838	0.9896
	5.10–5.16	102.8529	1.1959	123.6642	0.9860
	8.16–8.22	137.1132	1.5378	165.0091	0.9794
	11.22–11.28	98.4897	1.2190	118.5506	0.9912
VMD–RFECV–TEL	2.1–2.7	115.4248	1.2851	144.3875	0.9907
	5.10–5.16	82.0895	0.9513	107.9646	0.9894
	8.16–8.22	101.8768	1.1496	130.7970	0.9871
	11.22–11.28	97.1150	1.1715	121.1319	0.9908
VMD–RFECV–TEL–FC with a two-stage model	2.1–2.7	104.0924	1.1597	127.6341	0.9927
	5.10–5.16	76.5576	0.9102	96.4386	0.9915
	8.16–8.22	80.9743	0.8940	103.6740	0.9918
	11.22–11.28	64.1035	0.7619	78.9967	0.9961
VMD–RFECV–TEL–TEL with a two-stage model	2.1–2.7	51.4589	0.5719	67.6896	0.9979
	5.10–5.16	48.4028	0.5615	63.1640	0.9964
	8.16–8.22	59.3789	0.6690	73.7071	0.9958
	11.22–11.28	49.5922	0.5979	61.6459	0.9976

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, H.; Wu, J.; Zhang, H.; Yang, J. Two-Stage Short-Term Power Load Forecasting Based on RFECV Feature Selection Algorithm and a TCN–ECA–LSTM Neural Network. Energies 2023, 16, 1925. https://doi.org/10.3390/en16041925

AMA Style

Liang H, Wu J, Zhang H, Yang J. Two-Stage Short-Term Power Load Forecasting Based on RFECV Feature Selection Algorithm and a TCN–ECA–LSTM Neural Network. Energies. 2023; 16(4):1925. https://doi.org/10.3390/en16041925

Chicago/Turabian Style

Liang, Hui, Jiahui Wu, Hua Zhang, and Jian Yang. 2023. "Two-Stage Short-Term Power Load Forecasting Based on RFECV Feature Selection Algorithm and a TCN–ECA–LSTM Neural Network" Energies 16, no. 4: 1925. https://doi.org/10.3390/en16041925

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Two-Stage Short-Term Power Load Forecasting Based on RFECV Feature Selection Algorithm and a TCN–ECA–LSTM Neural Network

Abstract

1. Introduction

2. Data Processing Methods

2.1. VMD

2.2. RFECV

3. Load Forecasting Modelling Methods

3.1. TCN

3.2. LSTM

3.3. ECA

3.4. A Two-Stage Forecasting Model Based on TCN–ECA–LSTM

4. Comparative Validation and Analysis of Results

4.1. Data Sources and Evaluation Indicators

4.2. VMD Decomposition of Load Sequences

4.3. Results of Feature Selection

4.4. Analysis of Predicted Results

4.4.1. Case 1

4.4.2. Case 2

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI