Next Article in Journal
A Composite Control Method for Permanent Magnet Synchronous Motor System with Nonlinearly Parameterized-Uncertainties
Next Article in Special Issue
Resilience Maximization in Electrical Power Systems through Switching of Power Transmission Lines
Previous Article in Journal
Multi-Objective Optimization of a Heat Sink for the Thermal Management of a Peltier-Cell-Based Biomedical Refrigerator
Previous Article in Special Issue
Nonlinear Self-Synchronizing Current Control for Grid-Connected Photovoltaic Inverters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Ultra-Short-Term Load Dynamic Forecasting Method Considering Abnormal Data Reconstruction Based on Model Incremental Training

1
School of Electric Power Engineering, Nanjing Institute of Technology, Nanjing 211167, China
2
State Grid Fujian Electric Power Company Limited, Fuzhou 350001, China
3
State Grid Fujian Electric Power Company Quanzhou Power Supply Company, Quanzhou 362000, China
*
Author to whom correspondence should be addressed.
Energies 2022, 15(19), 7353; https://doi.org/10.3390/en15197353
Submission received: 30 August 2022 / Revised: 19 September 2022 / Accepted: 2 October 2022 / Published: 6 October 2022
(This article belongs to the Special Issue Power System Analysis, Operation and Control)

Abstract

:
In order to reduce the influence of abnormal data on load forecasting effects and further improve the training efficiency of forecasting models when adding new samples to historical data set, an ultra-short-term load dynamic forecasting method considering abnormal data reconstruction based on model incremental training is proposed in this paper. Firstly, aiming at the abnormal data in ultra-short-term load forecasting, a load abnormal data processing method based on isolation forests and conditional adversarial generative network (IF-CGAN) is proposed. The isolation forest algorithm is used to accurately eliminate the abnormal data points, and a conditional generative adversarial network (CGAN) is constructed to interpolate the abnormal points. The load-influencing factors are taken as the condition constraints of the CGAN, and the weighted loss function is introduced to improve the reconstruction accuracy of abnormal data. Secondly, aiming at the problem of low model training efficiency caused by the new samples in the historical data set, a model incremental training method based on a bidirectional long short-term memory network (Bi-LSTM) is proposed. The historical data are used to train the Bi-LSTM, and the transfer learning is introduced to process the incremental data set to realize the adaptive and rapid adjustment of the model weight and improve the model training efficiency. Finally, the real power grid load data of a region in eastern China are used for simulation analysis. The calculation results show that the proposed method can reconstruct the abnormal data more accurately and improve the accuracy and efficiency of ultra-short-term load forecasting.

1. Introduction

Accurate load forecasting provides a basis for power system construction planning, dispatching the decision making and production planning of power generation enterprises [1]. In recent years, with the continuous increase in the scale of new energy grid connection and the increasing popularity of electric vehicles, the power load presents volatility, nonlinearity and randomness, and the difficulty of power grid regulation increases, which puts forward higher requirements for the accuracy of load forecasting.
At present, load forecasting mainly includes statistical analysis methods and artificial intelligence methods [2,3]. Statistical analysis methods mainly include multiple linear regression modeling, autoregressive summation moving average and exponential smoothing [4,5]. Statistical analysis methods are mainly used to deal with linear and stable load data, ignoring the influence of climate, date type and other factors on load prediction, and the accuracy of load prediction is poor.
With the rise of artificial intelligence, machine learning and deep learning have been widely used in power grid fault identification and load forecasting because of their strong nonlinear fitting ability [6,7]. In [8], based on the decision tree classifier (DTC), an enhanced DTC is designed to obtain better prediction accuracy, but the machine learning algorithm often ignores the time-series dependence, and the prediction effect of long-time series is not as good as that of the deep learning algorithm. The deep learning algorithm has unique advantages in the field of load forecasting with its strong time series learning ability. An LSTM network and a GRU network improved with a recurrent neural network can effectively deal with long, high-dimensional time series [9]. Reference [10] proposed convolutional neural networks (CNN) and long short-term memory network (LSTM) fusion network model, which significantly improved the prediction efficiency and accuracy of individual household electric load by using the powerful feature extraction ability of CNN. Reference [11] constructed a load-forecasting model based on GRU network that effectively improved the forecasting accuracy compared with a LSTM network. Reference [12] proposed an integrated load forecasting model based on CNN and LSTM that significantly improved the forecasting efficiency and accuracy of multidimensional characteristic loads. In order to improve the prediction accuracy of power loads in different time ranges, [13] proposed a fusion model of a long-short-term memory network and neural prophet (LSTM-NP). Simulation results show that: compared with traditional prediction methods, LSTM-NP improves the prediction accuracy of three different types of load forecasting. The research in the above literature has achieved good prediction accuracy when dealing with high-dimensional long time series, but most of them are based on the premise of accurate load history data, and the impact of abnormal data on prediction accuracy is not fully considered. Abnormal data destroy the original distribution of the data set and also cause insufficient data redundancy, which hinders the improvement of load forecasting accuracy. Therefore, it is necessary to process the abnormal data and restore the initial data distribution of the historical data set.
Abnormal data processing methods mainly include deletion and filling [14,15]. The deletion method is to directly delete the abnormal data and their associated data, which is simple and easy to perform. However, when the data are abnormal in a large area, it may lead to the loss of important information [16]. Filling methods can be divided into two categories: statistical methods and machine learning methods [17,18,19]. Statistical methods mainly include mean filling, nearest distance filling, and regression filling. The filling results obtained by these methods are relatively stable, but they are easily affected by other types of data, and the filling accuracy is poor. Machine learning mainly includes K-nearest neighbor filling, missing forest, and K-means clustering filling. Reference [20] introduced linear interpolation, matrix combination, and matrix transfer to improve the random forest; the simulation results show that the improved random forest algorithm has high filling accuracy when filling the power missing data, but the machine learning easily ignores the timing information, and the filling accuracy is low.
The generative adversarial network (GAN) is an unsupervised generative learning model that has been widely used in many areas such as image generation and data filling [21]. On the basis of GAN, reference [22] proposed self-attention based on time-series impaction networks, and it improved the filling accuracy of the missing data. In order to improve the reliability evaluation of transmission gears with insufficient data and imbalance, [23] proposed a conditional generative adversarial network-mean-covariance balancing labeling (CGAN-MBL) model: On the basis of constructing CGAN model, MBL was introduced to improve the authenticity of the CGAN-generated data. The above method could effectively reconstruct the missing data, but it did not fully consider the correlations between other features and missing values. Moreover, JS divergence and Wasserstein distance are mostly used as loss functions, and gradient disappearance is easy to occur in network training.
Most of the traditional modeling methods are based on fixed load data. When new samples are added to the historical data set, it is often necessary to remodel and train the new data set. With increasing amounts of training data, it not only consumes a great deal of training time but also leads to the disappearance of gradient and underfitting phenomenon, which affects the accuracy and efficiency of load forecasting. Therefore, in order to improve the accuracy and efficiency of ultra-short-term load forecasting under abnormal data, the following two problems need to be solved: (1) how to discover the hidden nonlinear relationship between abnormal data and other characteristic data, improve the authenticity of data reconstruction, and restore the integrity of load series; (2) when new samples are added to the load history data set, how to realize the incremental training of the model and improve the efficiency of ultra-short-term load forecasting.
Based on the in-depth analysis of the above literature, this paper proposes an ultra-short-term load dynamic forecasting method based on model incremental training and considering abnormal data reconstruction. Firstly, aiming at the abnormal data in ultra-short-term load forecasting, a load abnormal data processing method based on IF-CGAN is proposed. The isolation forest algorithm is used to accurately eliminate the abnormal data points, and the condition generation countermeasure network (CGAN) is constructed to interpolate the abnormal points. The load-influencing factors are taken as the condition constraints of CGAN, and the weighted loss function is introduced to improve the reconstruction accuracy of abnormal data. Secondly, aiming at the problem of low model training efficiency caused by the new samples in the historical data set, a model incremental training method based on Bi-LSTM is proposed. The historical data are used to train the Bi-LSTM, and transfer learning is introduced to process the incremental data set to realize the adaptive and rapid adjustment of the model weight and improve the model training efficiency. Finally, the simulation analysis is carried out with the real power grid load data in a certain area. The calculation results show that the proposed method can reconstruct the abnormal data more accurately and improve the accuracy and efficiency of ultra-short-term load forecasting.

2. Abnormal Data Reconstruction Method Based on IF-CGAN

The abnormal data in the load data set destroy the integrity of the time series and affect the prediction accuracy of the model. In this chapter, outliers are eliminated through isolation forest model, and then CGAN is constructed to interpolate the missing point data.

2.1. Isolation Forest

Isolation forest algorithms are widely used in the outlier detection of massive data [24]. Its principle is to randomly select a plane to divide the sample data space into two subspaces and then use the same method to divide the two subspaces many times until there is only one data stronghold in the subspace. This segmentation method is similar to the formation method of binary tree: The sample data space is the root of the tree, and the data points are the branches and leaves of the tree. Therefore, the path L(y) from the branch and leaf point y to the root can be calculated to judge whether the data point y is outlier.
An isolation forest is a collection of multiple isolation trees. Suppose that the data space contains m data points, and the anomaly index of data point y can be expressed as:
S ( y , m ) = 2 E ( L ( y ) ) C ( m )
C ( m ) = 2 [ ln ( m 1 ) + ξ m 1 m ]
where S(y,m) is the abnormal score of data point y, and E(L(y)) is the expected value of path L(y) of y in multiple trees. C(m) is the average path length of the isolation tree, ξ is Euler constant with a value of 0.577.

2.2. Conditional Generative Adversarial Network

GAN consists of a generator and a discriminator. The essence of the generator and discriminator is to learn the distribution of real data. The function of the generator is to approximate the real data as much as possible, and the function of the discriminator is to identify and generated data as much as possible. When the game between the generator and the discriminator reaches equilibrium, the output data of generator are infinitely close to the real value. CGAN is improved on the basis of GAN. It applies supervised learning to GAN. CGAN retains the game structure of GAN and adds conditional values to the inputs of generator and discriminator to speed up the convergence of the network. Its basic structure is shown in Figure 1. Circles represent neurons and colored lines represent connections between neurons.
The random noise z and condition c are combined and input into the generator, and the generator outputs the generated sample G(z|c). The input of the discriminator is the combination of the real value t of the load data and the condition c, and the combination of the generated sample G(z|c) and the condition c. the discriminator needs to judge whether the distribution between the generated sample and the real sample is similar and whether the generated sample meets condition c. The generator and discriminator update the parameters according to the discrimination results. The loss functions of generator and discriminator in CGAN are:
L G = E ( z , c ) [ D ( G ( z | c ) | c ) ]
L D = E ( t , c ) [ D ( t | c ) ] + E ( z , c ) [ D ( G ( z | c ) | c ) ]
where E represents the expected value of the corresponding distribution and G(z|c) and D(t|c) represent the output of the generator and discriminator respectively. The generator improves the authenticity of the generated sample G(z|c) through continuous iteration, and the discriminator hopes to reduce the authenticity of the generated data and improve the accuracy of distinguishing the real sample. Therefore, CGAN gradually balances in the game between the two, and the objective function can be defined as:
min G   max D   L C G A N = E ( t , c ) [ D ( t | c ) ] - E ( z , c ) [ D ( G ( z | c ) | c ) ]
It is easy for the gradient to disappear in the training of the original GAN. This is because when the generator generation effect is too excellent, the loss function is equivalent to JS divergence. When the generated data distribution does not overlap with the real data distribution, JS divergence is constant. Therefore, at the initial stage of model training and when the generator is too excellent, the loss function is constant, which makes network training difficult and convergence slow. Reference [25] proposed a WGAN-GP model, selected Wasserstein distance as the loss function of discriminator, and introduced gradient punishment mechanism to realize the Lipschitz constraint on the concentration area of true and false samples and their cross transition area. The objective function of CGAN is:
min G   max D   L C G A N = E ( t , c ) [ D ( t | c ) ] E ( z , c ) [ D ( G ( z | c ) | c ) ] + λ E [ D ( ~ ) 1 ] 2
where λ is the gradient penalty coefficient.
Smooth L1 loss function has strong robustness and stability in the solution process. It is widely used in neural network training, which can effectively avoid gradient explosion and improve the convergence speed of the network. Smooth L1 loss function can be defined as:
L S L 1 = { 0.5 × ( t G ( z | c ) ) 2 , | t G ( z | c ) | < 1 | t G ( z | c ) | 0.5 , | t G ( z | c ) | 1
Cosine similarity measures the difference between two vectors by calculating the cosine of the angle between vectors. Cosine similarity has high accuracy for vector similarity discrimination and can identify the trajectory change trend of vector [26]. In this paper, cosine similarity loss function is introduced to improve the authenticity of generated samples. Its loss function is defined as:
L cos = 1 i = 1 n t i G i i = 1 n t i 2 i = 1 n G i 2
Therefore, the objective function of CGAN is as follows:
L = min G   max D   L C G A N + λ 1 L S L 1 + λ 2 L cos
where λ1, λ2 is the weighting coefficient of the corresponding loss function.

2.3. Abnormal Data Reconstruction Strategy Based on CGAN

In practical training, when GAN processes high-dimensional data, the convergence speed is slow, and it is prone to problems such as underfitting and gradient disappearance. Compared with fully connected network, convolutional neural network (CNN) has a simple structure with strong feature extraction ability and can process high-dimensional data sets efficiently. Therefore, CNN is used to construct discriminator and generator in this paper.
In order to ensure that the convolution kernel can effectively extract the features of the data, the structure design of CGAN is shown in Figure 2. The blue cuboid represents the output of the hidden layer.
The input of the generator is the 13th-order matrix of the combination of random noise and conditional value. A three-layer CNN is constructed to extract the features of the input matrix, and then the samples are generated from the output of the full connection layer. In order for the generator to efficiently learn the nonlinear relationship between the generated samples and the condition values, the ReLU activation function is used in the convolution layer, and the regularization processing is carried out between the convolution layers. The specific network parameters are shown in Table 1.
The condition value, the real sample, and the generated sample are combined as the two input matrices of the discriminator. The discriminator needs to extract and identify the features of the input matrix, and its model structure is basically consistent with that of the generator. Different from the generator, because the WGAN-GP model introduces the gradient punishment mechanism, there is no regularization between the convolution layers in the discriminator. The LeakyReLU function is selected as the activation function, and the full connection layer outputs the discrimination results of the input matrix. The specific network parameters are shown in Table 2.

2.4. Abnormal Data Reconstruction Based on IF-CGAN

Figure 3 shows the abnormal data reconstruction process based on IF-CGAN.

3. Dynamic Forecasting of Ultra-Short-Term Load Based on Incremental Model Training

When the load data set changes, in order to obtain the latest load information, it is often necessary to re model and train the new data set, and the modeling method is inefficient. Based on the establishment of the Bi-LSTM model, this chapter introduces migration learning to process incremental data sets to realize the rapid adjustment of model weight and improve modeling efficiency.

3.1. Transfer Learning

Transfer learning [27] applies the knowledge learned from an old task to a different but related new task, avoiding learning new tasks from scratch and shortening the time of learning new tasks. There are two basic concepts of transfer learning: domain and task. The field with sufficient historical sample data is called the source field, and the field with limited data sample size is called the target field. The mathematical model can be expressed as:
d s = { x s , y s }
d t = { x t , y t }
where ds and dt respectively represent the source domain and target domain; xs and ys respectively represent the source domain samples and their corresponding labels; and xt and yt respectively represent the target domain samples and their corresponding tags. The task mathematical model of the source domain and target domain is expressed as:
t s = { x s , f s ( ~ ) }
t t = { x t , f t ( ~ ) }
where ts and tt represent the tasks of source domain and target domain, respectively, and f(~) represents the mapping relationship between domain data x and target value y. Migration learning is to migrate the source domain mapping relationship fs(~) to the target domain. When the data distribution of the source domain and the target domain is similar, the target domain model can be modified by fine tuning to obtain the target domain mapping relationship ft(~).

3.2. Bi-LSTM

When dealing with long time series, RNN has problems such as gradient explosion and disappearance. To solve such problems, Hochreiter proposed a LSTM network. A LSTM network controls the preservation and loss of time-series information through a forgetting gate, input gate, and output gate, so as to effectively avoid the disappearance of gradient caused by long series. The structure of LSTM unit is shown in Figure 4.
The forgetting gate controls the information flow of the previous time to ensure the backward transmission of effective information. The input gate updates the current data to the storage unit. The function of the output gate is to transmit the information of the storage unit to the next time. The calculation process is as follows:
f t = σ ( W f x t + U f h t 1 + b f )
i t = σ ( W i x t + U i h t 1 + b i )
c ˜ t = T a n h ( W c x t + U c h t 1 + b c )
c t = f t c t 1 + i t c ˜ t
o t = σ ( W 0 x t + U 0 h t 1 + b 0 )
h t = o t t a n h ( c t )
where Wf, Uf, bf, Wi, Ui, bi, Wo, Uo and bo are the parameters corresponding to the three gates respectively, Wc, Uc, bc are the weight parameters corresponding to the cell state, xt and ht are the input of LSTM unit and the output of hidden layer at time t, which σ are sigmoid functions, and tanh represents hyperbolic tangent functions.
The Bi-LSTM model is composed of two LSTM networks with opposite directions and shared weights. The principle is shown in Figure 5. The blue circle represents the forward LSTM network output. The green circle represents the reverse LSTM network output. Bi-LSTM can learn load time series information from both positive and negative directions, integrate past and future load information to update weight parameters, and improve the regression accuracy of long time series.

3.3. Incremental Model Training Method Based on Bi-LSTM

Based on the establishment of a load forecasting model, this paper introduces transfer learning to realize the rapid update of model weight when load data changes. The principle is shown in Figure 6.
The maximum mean difference (MMD) is mainly used to measure the distance between two different samples [28]. Therefore, MMD is usually used in transfer learning to evaluate the correlation of data distribution between source domain and target domain. When MMD is small, the data distribution of source domain and target domain is very similar. Adjusting the model structure will reduce the fitting accuracy of the model. The source domain data studied in this paper are the collected historical load samples, and the target domain data are the new load samples. The data distribution difference between the source domain and the target domain is small, so there is no need to adjust the source domain model structure.
The source domain model can be divided into three parts according to its functionality. The shallow network extracts the detailed characteristics of the source domain data, the deep network extracts the overall data information, and the output layer outputs the prediction results. Taking the source domain model parameters as the initial parameters of the target domain model, fixing different Bi-LSTM network layers, and using the target domain data to update the parameters of other network layers, the network parameters can be updated efficiently.
The model incremental training process based on Bi-LSTM is shown in Figure 7, and the specific steps are as follows.
Step 1: build the source domain model. The method proposed in Section 2.2 is used to establish the source domain model for the historical load data, and the three parameters of network layers, time step, and iteration times are selected to save the source domain model with the highest prediction accuracy.
Step 2: train the target domain model. Import the source domain model, migrate the source domain model parameters to the target domain, fix the different Bi-LSTM network layers, train other network layers with the target domain data, and save the target domain model with the highest prediction accuracy.
Step 3: output the prediction results. Load the target domain model, normalize the load influencing factors and reshape them into a three-dimensional matrix, input the target domain model, and then inverse normalize the output value to obtain the load forecasting value.

4. Ultra-Short-Term Load Dynamic Forecasting Method Considering Abnormal Data Reconstruction Based on Model Incremental Training

A high-quality load data set is the premise of establishing the prediction model. In this paper, isolation forest and conditional generation countermeasure network are used to deal with outliers to improve the quality of the data set. Secondly, the prediction model based on Bi-LSTM network is constructed, and migration learning is introduced to realize the incremental training of prediction model. The basic framework is shown in Figure 8.
(1)
Abnormal data reconstruction based on IF-CGAN.
Firstly, the historical load data are normalized by Equation (20), the isolation forest model is constructed, and the load outliers are screened and deleted. The conditional value of CGAN is the load influencing factor, mainly including time factor (time, rest day), climate factor (wind speed, air temperature, dew point temperature, air pressure, cloud amount) and load factor (the load value is engraved at the same time in the first two days and the load is engraved at the same time in the next two days). The random noise and the conditional value are spliced horizontally and input into the generator. The convolution kernel is used to extract the features and output the generated samples. The generated sample, real load, and condition value are spliced horizontally and input into the discriminator for true and false discrimination. In the process of game between generator and discriminator, the nonlinear relationship between load and influencing factors is excavated. After the game between generator and discriminator reaches balance, the generator is saved and the missing load data are filled:
a n = a a min a max a min
where an is the result of data normalization, amax and amin are the maximum and minimum values respectively, and a is the initial value of data.
(2)
Construction of load forecasting model based on Bi-LSTM network.
After obtaining the complete historical load data, the data are normalized and reconstructed into three-dimensional matrix format of sequence length, time step, and number of sequence features. The load forecasting model is built based on a Bi-LSTM network. The optimizer selects Adam, the learning rate is set to 0.001, the loss function is MSE, and the optimal model is saved according to the prediction accuracy evaluation index.
(3)
Network incremental training method based on Transfer Learning.
After obtaining the complete target domain data, load the source domain model, fix different network parameters, input the target domain data, train and update the remaining network layer weights, save the optimal target domain model, and output the target domain prediction results.

5. Example Analysis

In order to verify the effectiveness of the method proposed in this paper, the load data from November 2012 to August 2013 in a region of eastern China are used as an example. The sampling frequency of the load data is 15 min, the time span is 10 months, and a total of 28,277 load data are collected. Among them, the load information of 8 months from November 2012 to June 2013 is selected as the source domain data, the load information of July 2013 is selected as the target domain data, and the load information of August are the test set data.

5.1. Evaluation Index Construction

The reconstruction accuracy and R square are selected as the evaluation indexes of data reconstruction. The closer the reconstruction accuracy and R square are to 1, the higher the authenticity of model reconstruction. Root mean square error (RMSE) and mean absolute error (MAE) are used as the evaluation indexes of the prediction model. The smaller RMSE and MAE are, the closer the predicted value of the model is to the real value [13]. The calculation formula of evaluation index is as follows:
E R 2 = 1 i = 1 n ( z g z t ) 2 i = 1 n ( z m z t ) 2
E a c c = 1 n i = 1 n ( 1 | z t z g | z t ) × 100 %
E r m s e = 1 n i = 1 n ( z t z p ) 2
E m a e = 1 n i = 1 n | z t z p |
where zg, zt and zp are the generated value, real value and predicted value of load respectively, and zm represents the average value of missing data.

5.2. Comparative Analysis of Abnormal Data Reconstruction Results

In the process of power grid data acquisition, the meter flies away, communication faults and other phenomena occur from time to time, and the obtained data sets easily contain missing and abnormal data. Missing values are randomly generated in the original load data set to simulate missing data caused by communication failures. The value of abnormal data ranges from 1.5 to 1.8 times of the real value, isolation forests are used to simulate the data anomalies caused by flight, and labels are set to verify the identification effect of abnormal data. The number of isolation trees is set to 100, and the containment parameter is set according to the proportion of outliers. An isolation forest model is established to detect outliers in the data set, and the detection results are compared with the real labels.
The detection results of abnormal data are shown in Figure 9. The blue and black circles represent the normal data and abnormal data of the load respectively, while the red circles represent the detected values of isolation forests. When the red circles coincide with the black circles, it means that isolation forests identify the outliers correctly. As can be seen from Figure 9, isolation forests can more accurately identify abnormal data caused by meter flight.
The original load data were processed by isolated forest to form missing data sets with missing rates of 10%, 20%, 40%, and 60%, the gradient penalty coefficient λ and the weight of objective function λ1, λ2 are 40, 10 and 0.6 respectively. The complete data under different deletion rates are sent to the CGAN model for training. After the game between the generator and the discriminator reaches balance, the generated samples are output. The filling accuracy and R square are selected as the evaluation indexes of the generated samples, and compared with the mean interpolation method, KNN and random forest. The weight of KNN is set to “distance”, K is set to 8, the number of decision trees of random forest is 50, and the maximum depth is 20.
Figure 10 shows the data-filling effect of different models with a deletion rate of 40%. It can be seen from the figure that CGAN fully excavates the nonlinear relationship between load and influencing factors and can reconstruct the abnormal data more accurately. The generated load samples are closest to the real value. The filling effect of random forest and KNN is poor when dealing with long time series.
From the filling quantitative results of different models in Table 3 and Table 4, it can be seen that the R square sum accuracy of reconstructed data of CGAN model under different deletion rates is higher than that of the other three models, which verifies the effectiveness of the reconstructed data in the CGAN model. In different loss rate data sets, the reconstruction index of mean interpolation method is the least ideal, the reconstruction effect of CGAN model is relatively stable, and the reconstruction accuracy of KNN and RF is similar, especially in high loss rate data, the reconstruction accuracy fluctuates greatly. Under different deletion rates, the reconstruction accuracy of the CGAN model constructed in this paper is improved by 8.1% and 3.5% compared with the other three models.

5.3. Analysis on the Influence of Data Processing Methods on Prediction Results

The data generated by the model in Section 5.2 are used to interpolate the load anomaly data set, and the Bi-LSTM network is constructed according to the parameter settings in Section 5.4. Then, the complete data set and the data set obtained by the deletion method are sent to the network to train the Bi-LSTM network to verify the effectiveness of the time series reconstruction of the model proposed in this paper.
After using the above five methods to process the data set with a deletion rate of 40%, the prediction results of the Bi-LSTM network are shown in Figure 11. It can be seen from the figure that the data set reconstructed by CGAN model achieves the best fitting effect, which verifies the effectiveness of the reconstruction timing of CGAN model. The deletion method destroys the integrity of the data sequence, and the prediction effect of the obtained data set is the worst. Mean interpolation, KNN, and random forest restore the integrity of the sequence to a certain extent. The overall prediction effect is better than the deletion method, but there is still a gap with the CGAN model. The quantitative results of network prediction error of the complete data set obtained under different deletion rates are shown in Table 5 and Table 6.
It can be seen from Table 5 and Table 6 that for the load data set repaired by CGAN, the prediction error of Bi-LSTM network is the smallest, and the prediction accuracy is higher than that of the other interpolation models. When the deletion rate is 10%, the data set obtained by directly deleting abnormal data is the closest to CGAN; with the increase of data missing rate, the prediction accuracy of the data set obtained by deletion method and mean interpolation becomes clearly worse. On the data set reconstructed by CGAN, RMSE and MAE of Bi-LSTM network are reduced by at least 3.1% and 4.8% compared with the other four processing methods.

5.4. Comparative Analysis of Prediction Results of Source Domain Model

The Bi-LSTM network can effectively avoid the phenomenon of gradient disappearance when dealing with long time series. There are many Bi-LSTM network parameters, and different parameter settings have a great impact on the prediction effect of the network. In this paper, three parameters that have a great impact on the network are selected for research. The ultra-short-term load forecasting experiment is carried out on the data set processed by CGAN model. The source domain data are divided according to the ratio of 4:1. 80% of the data is used as the training set, and the other data is the test set. The prediction error of the test set with different parameter combinations is shown in Table 7.
As can be seen from Table 7, when the number of iterations, time step, and batch size of the Bi-LSTM network are set to 200, 15, and 32, respectively, the load information learned by the network is the most abundant and the prediction error is the smallest. If the number of iterations is increased or reduced, the network will appear over fitting or under fitting, resulting in the reduction of prediction accuracy.
The Bi-LSTM network is constructed according to the above parameters and compared with BP network, RNN, and SVR. The number of neurons in the hidden layer of BP network is set to 32, and the RNN parameter setting is consistent with that of the Bi-LSTM network. The kernel function of SVR model is set as Gaussian kernel function, and the penalty coefficient is 1.5. The load forecasting results of a certain day in the test set are shown in Figure 12.
As can be seen from Figure 12a, compared with the other two models, the cyclic neural network can better fit the real value, but in terms of the details of the fitting curve and error curve, compared with RNN, the prediction result from the Bi-LSTM network is closer to the real value. Figure 12b shows the prediction error distribution curves of different models. It can be seen from the figure that the error of Bi-LSTM network is concentrated around 0, and the error fluctuation is smaller than that of other models.
Table 8 shows the comparison of source domain prediction errors of different models. It can be seen from the table that the prediction accuracy of Bi-LSTM model is better than that of other models in terms of RMSE and MAE. Compared with RNN, after setting the same parameters, RMSE and MAE decreased by 3.67% and 7.43% respectively; Compared with BP network and SVR, RMSE decreased by 46.52% and 22.21% respectively, and MAE decreased by 47.82% and 18.87% respectively.

5.5. Comparative Analysis of Prediction Results of Target Domain Model

When the load data set changes, in order to improve the prediction accuracy of the model, it is necessary to extract the load information of the latest data set. However, in order to extract the information of the latest data, it is often necessary to retrain the model on the new data set, which is not only time-consuming, but also with the increase in the amount of data, the gradient of the model will disappear, underfitting and other problems will appear. In contrast, migration learning can quickly update the network weight and realize the efficient incremental training of prediction model.
Load the source domain model saved in Section 5.4, update the target domain model parameters with the target domain data, and compare the prediction results of the four models of unfixed network layer parameters (BiLSTM-TL0) and fixed network layer 1 to 3 parameters (BiLSTM-TL1~BiLSTM-TL3).
Figure 13a shows the prediction results of test sets obtained by fixing different network layers. It can be seen from the figure that all migration learning models can better fit the load curve, but in the details of the curve, the prediction results of BiLSTM-TL1 model obtained by fixing the network parameters of the first layer are closer to the true value. At this time, the load information extracted by the model is the most abundant. The data distribution of the target domain is similar to that of the source domain, and the amount of data in the target domain is small, so the BiLSTM-TL0 model is prone to over fitting. BiLSTM-TL3 and BiLSTM-TL2 models fix the deep network parameters and cannot effectively extract the overall characteristics of the target domain data.
Figure 13b shows the prediction error distribution curves of different transfer learning models. It can be seen from the figure that the errors of model BiLSTM-TL1 are concentrated around 0, and the distribution frequency with deviation of 0 is higher, which has less error fluctuation compared with other models. Therefore, the prediction accuracy of BiLSTM-TL1 model is better than other transfer learning models.
Table 9 shows the comparison of the prediction error results for the test sets of different transfer learning models. It can be seen from the table that the prediction results of BiLSTM-TL1 are the best in terms of RMSE and MAE. Compared with other transfer learning models, RMSE and MAE are reduced by 2.4% and 4.8%, respectively.
Compare the BiLSTM-TL1 model with the Bi-LSTM network (take the data of source domain and target domain as the network training set), BiLSTM-S model (the source domain model saved in Section 5.4), BP network and SVR. The parameter settings of Bi-LSTM network, BP network and SVR are consistent with Section 5.4.
Figure 14a shows the prediction results for the test sets of different prediction models. It can be seen from the figure that the four prediction models can better fit the load change trend. However, from the detailed enlarged figure, it can be found that the predicted value of Bi-LSTM network is closer to the real value. Compared with BiLSTM-S model, the prediction effect of BiLSTM-TL1 model is improved, which is similar to Bi-LSTM network, indicating that there is no negative migration of BiLSTM-TL1 model. Figure 14b shows the prediction error distribution curves of different models in the target domain. It can be seen from the figure that the prediction error distribution curves of BiLSTM-TL1 model and Bi-LSTM network are very similar, which are concentrated near 0, and the error fluctuation is small compared with other models.
From the prediction results of different model test sets in Table 10, it can be seen that the comprehensive prediction effect of BiLSTM-TL1 model is the best in the three evaluation indexes of RMSE, MAE, and training time. Compared with a Bi-LSTM network without transfer learning, on the premise of reducing the prediction accuracy by 3.9%, the training time of BiLSTM-TL1 model is shortened by 91.1%, and the training efficiency of the model is improved. Since the BiLSTM-TL1 model uses the latest sample data to update the network weight, compared with the BiLSTM-S model, RMSE and MAE are reduced by 8.8% and 7.3%, respectively. The load prediction accuracy of BiLSTM-TL1 is similar to that of GRU network, but BiLSTM-TL1 saves 87.6% training time. Compared with BP network, the training time of BiLSTM-TL1 model is similar, and the RMSE and MAE are reduced by 16.7% and 15.5%, respectively. Based on the above analysis, the BiLSTM-TL1 model, which introduces migration learning and fixes the first layer network parameters, can realize network incremental training, ensure the prediction accuracy and save a lot of model training time.

6. Conclusions

This paper proposes an ultra-short-term load dynamic forecasting method based on model incremental training and considering abnormal data reconstruction. The main conclusions are:
(1)
An abnormal data processing method based on IF-CGAN is proposed. This method can accurately eliminate the abnormal data, accurately fill in the abnormal point data, complete the load sequence, and effectively reduce the impact of abnormal data on the prediction results.
(2)
An ultra-short-term load forecasting model based on a Bi-LSTM network is proposed that realizes the accurate prediction of high-dimensional long-term ultra-short-term load.
(3)
A model incremental training method based on Bi-LSTM is proposed. By introducing transfer learning, the model weight can be adjusted quickly when the load data changes. Compared with the Bi-LSTM network without transfer learning, the proposed method can increase the prediction accuracy by 3.9%, shorten the model training time by 91.1%, and effectively improve the model training efficiency.
This paper mainly studies the accuracy and efficiency of ultra-short-term load forecasting under abnormal data. However, with the increasing scale of data collection and the massive and high-dimensional data, how to improve the accuracy of load abnormal data reconstruction and realize load online forecasting need to be further studied.

Author Contributions

Conceptualization, G.C. and Y.Z. (Yangfei Zhang).; methodology, G.C. and Y.W.; software, G.C. and Y.W; validation, G.C., Y.W., and L.Y.; formal analysis, G.C. and Y.W.; investigation, G.C. and Y.W; resources, G.C., G.L. and K.X.; data curation, Y.W., L.Y. and K.X.; writing—original draft preparation, Y.W.; writing—review and editing, G.C. and Y.W.; visualization, Y.W. and G.L.; supervision, G.C. and Y.Z. (Yuzhuo Zhang); project administration, G.C.; funding acquisition, G.C., L.Y. and K.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 52107098.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, B.; Zhang, J.; He, Y.; Wang, Y. Short-Term Load-Forecasting Method Based on Wavelet Decomposition With Second-Order Gray Neural Network Model Combined With ADF Test. IEEE Access 2017, 5, 16324–16331. [Google Scholar] [CrossRef]
  2. Azeem, A.; Ismail, I.; Jameel, S.M.; Harindran, V.R. Electrical Load Forecasting Models for Different Generation Modalities: A Review. IEEE Access 2021, 9, 142239–142263. [Google Scholar] [CrossRef]
  3. Shao, N.; Chen, Y. Abnormal Data Detection and Identification Method of Distribution Internet of Things Monitoring Terminal Based on Spatiotemporal Correlation. Energies 2022, 15, 2151. [Google Scholar] [CrossRef]
  4. Zhu, X.; Shen, M. Based on the ARIMA model with grey theory for short term load forecasting model. In Proceedings of the 2012 International Conference on Systems and Informatics, Yantai, China, 19–20 May 2012. [Google Scholar] [CrossRef]
  5. Ji, P.; Xiong, D.; Wang, P.; Chen, J. A study on exponential smoothing model for load forecasting. In Proceedings of the 2012 Asia-Pacific Power and Energy Engineering Conference, Shanghai, China, 27–29 March 2012. [Google Scholar] [CrossRef]
  6. Jawad, M.; Nadeem, M.S.A.; Shim, S.O.; Khan, I.R.; Shaheen, A.; Habib, N.; Hussain, L.; Aziz, W. Machine Learning Based Cost Effective Electricity Load Forecasting Model Using Correlated Meteorological Parameters. IEEE Access 2020, 8, 146847–146864. [Google Scholar] [CrossRef]
  7. Chen, K.; Chen, K.; Wang, Q.; He, Z.; Hu, J.; He, J. Short-term load forecasting with deep residual networks. IEEE Trans. Smart Grid 2019, 10, 3943–3952. [Google Scholar] [CrossRef] [Green Version]
  8. Alquthami, T.; Zulfiqar, M.; Kamran, M.; Milyani, A.H.; Rasheed, M.B. A Performance Comparison of Machine Learning Algorithms for Load Forecasting in Smart Grid. IEEE Access 2022, 10, 48419–48433. [Google Scholar] [CrossRef]
  9. Khan, A.H.; Li, S.; Cao, X.W. Tracking control of redundant manipulator under active remote center-of-motion constraints: An RNN-based metaheuristic approach. Sci. China 2021, 64, 149–166. [Google Scholar] [CrossRef]
  10. Alhussein, M.; Aurangzeb, K.; Haider, S.I. Hybrid CNN-LSTM Model for Short-Term Individual Household Load Forecasting. IEEE Access 2020, 8, 180544–180557. [Google Scholar] [CrossRef]
  11. Gao, X.Y.; Wang, Y.; Gao, Y.; Sun, C.Z.; Xiang, W.; Yue, Y.M. Short-term Load Forecasting Model of GRU Network Based on Deep Learning Framework. In Proceedings of the 2018 2nd IEEE Conference on Energy Internet and Energy System Integration, Beijing, China, 20–22 October 2018. [Google Scholar] [CrossRef]
  12. Rafi, S.H.; Masood, N.A.; Deeba, S.R.; Hossain, E. A Short-Term Load Forecasting Method Using Integrated CNN and LSTM Network. IEEE Access 2021, 9, 32436–32448. [Google Scholar] [CrossRef]
  13. Shohan, M.J.A.; Faruque, M.O.; Foo, S.Y. Forecasting of Electric Load Using a Hybrid LSTM-Neural Prophet Model. Energies 2022, 15, 2158. [Google Scholar] [CrossRef]
  14. Mohammadi, A.; Saraee, M.H. Dealing with missing values in microarray data. In Proceedings of the 2008 4th International Conference on Emerging Technologies, Rawalpindi, Pakistan, 18–19 October 2008. [Google Scholar] [CrossRef]
  15. Silva, L.O.; ZÁRATE, L.E. A brief review of the main approaches for treatment of missing data. Intel. Data Anal. 2014, 18, 1177–1198. [Google Scholar] [CrossRef]
  16. Pessanha, J.; Melo, A.; Caldas, R.; Falcão, D. An Approach for Data Treatment of Solar Photovoltaic Generation. IEEE Lat. Am. Trans. 2020, 18, 1563–1571. [Google Scholar] [CrossRef]
  17. Seaman, S.R.; White, I.R. Review of inverse probability weighting for dealing with missing data. Stat. Methods Med. Res. 2013, 22, 278–295. [Google Scholar] [CrossRef] [PubMed]
  18. Batista, G.E.A.P.A.; Monard, M.C. An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 2003, 17, 519–533. [Google Scholar] [CrossRef]
  19. Hastie, T.; Mazumder, R.; Lee, J.; Zadeh, R. Matrix completion and low-rank SVD via fast alternating least squares. J. Mach. Learn. Res. 2014, 16, 3367–3402. [Google Scholar] [CrossRef]
  20. Deng, W.; Guo, Y.; Liu, J.; Li, Y.; Liu, D.; Zhu, L. A missing power data filling method based on improved random forest algorithm. Chin. J. Electr. Eng. 2019, 5, 33–39. [Google Scholar] [CrossRef]
  21. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8 December 2014. [Google Scholar] [CrossRef]
  22. Oh, E.; Kim, T.; Ji, Y.; Khyalia, S. STING: Self-attention based Time-series Imputation Networks using GAN. In Proceedings of the 2021 IEEE International Conference on Data Mining, Auckland, New Zealand, 7–10 December 2021. [Google Scholar] [CrossRef]
  23. Li, J.; He, H.; Li, L. CGAN-MBL for Reliability Assessment with Imbalanced Transmission Gear Data. IEEE Trans. Instrum. Meas. 2019, 68, 3173–3183. [Google Scholar] [CrossRef]
  24. Hariri, S.; Kind, M.C.; Brunner, R.J. Extended Isolation Forest. IEEE Trans. Knowl. Data Eng. 2021, 33, 1479–1489. [Google Scholar] [CrossRef] [Green Version]
  25. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein GANs. In Proceedings of the Neural Information Processing Systems, Los Angeles, CA, USA, 4–9 December 2017. [Google Scholar]
  26. Wei, Y.; Zhu, C.; Yang, Y.; Liu, Y. A Discrete Cosine Model of Light Field Sampling for Improving Rendering Quality of Views. In Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing, Macau, China, 1–4 December 2020. [Google Scholar] [CrossRef]
  27. Shao, L.; Zhu, F.; Li, X. Transfer Learning for Visual Categorization: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 1019–1034. [Google Scholar] [CrossRef] [PubMed]
  28. Dang, W.; Li, H.J.; Ding, Z.M.; Nie, F.P.; Chen, J.Y.; Dong, X.; Wang, Z.H. Rethinking Maximum Mean Discrepancy for Visual Domain Adaptation. IEEE Trans. Neural Netw. Learn. Syst. 2015, 1–14. [Google Scholar] [CrossRef]
Figure 1. Basic structure of CGAN.
Figure 1. Basic structure of CGAN.
Energies 15 07353 g001
Figure 2. CGAN structure design: (a) Generator network structure; (b) Discriminator network structure.
Figure 2. CGAN structure design: (a) Generator network structure; (b) Discriminator network structure.
Energies 15 07353 g002
Figure 3. Abnormal data reconstruction process based on IF-CGAN.
Figure 3. Abnormal data reconstruction process based on IF-CGAN.
Energies 15 07353 g003
Figure 4. LSTM structure.
Figure 4. LSTM structure.
Energies 15 07353 g004
Figure 5. Bi-LSTM structure.
Figure 5. Bi-LSTM structure.
Energies 15 07353 g005
Figure 6. Transfer learning principle.
Figure 6. Transfer learning principle.
Energies 15 07353 g006
Figure 7. Bi-LSTM network incremental training flow chart.
Figure 7. Bi-LSTM network incremental training flow chart.
Energies 15 07353 g007
Figure 8. Ultra-short-term load dynamic forecasting framework considering abnormal data reconstruction based on model incremental training.
Figure 8. Ultra-short-term load dynamic forecasting framework considering abnormal data reconstruction based on model incremental training.
Energies 15 07353 g008
Figure 9. Abnormal data detection results.
Figure 9. Abnormal data detection results.
Energies 15 07353 g009
Figure 10. Reconstruction results with a deletion rate of 40%.
Figure 10. Reconstruction results with a deletion rate of 40%.
Energies 15 07353 g010
Figure 11. Prediction results under different data processing methods.
Figure 11. Prediction results under different data processing methods.
Energies 15 07353 g011
Figure 12. Comparison of prediction results of source domain: (a) Comparison of prediction results of different models; (b) Prediction error distribution curve of different models.
Figure 12. Comparison of prediction results of source domain: (a) Comparison of prediction results of different models; (b) Prediction error distribution curve of different models.
Energies 15 07353 g012aEnergies 15 07353 g012b
Figure 13. Comparison of prediction results of transfer learning model: (a) Comparison of prediction results of transfer learning model; (b) Prediction error distribution curve of transfer learning model.
Figure 13. Comparison of prediction results of transfer learning model: (a) Comparison of prediction results of transfer learning model; (b) Prediction error distribution curve of transfer learning model.
Energies 15 07353 g013
Figure 14. Comparison of prediction results of test set: (a) Comparison of prediction results of different model test sets; (b) Prediction error distribution curves of different model test sets.
Figure 14. Comparison of prediction results of test set: (a) Comparison of prediction results of different model test sets; (b) Prediction error distribution curves of different model test sets.
Energies 15 07353 g014
Table 1. Generator network parameters.
Table 1. Generator network parameters.
ModelParameter NameSpecification
CNN 12D convolutionconvolution kernel3 × 3
number of filters32
step1
activation functionReLU
regularization32
CNN 22D convolutionconvolution kernel5 × 5
number of filters64
step2
activation functionReLU
regularization64
CNN 32D convolutionconvolution kernel3 × 3
number of filters16
step1
activation functionReLU
regularization16
FCoutput dimension13 × 1
Table 2. Discriminator network parameters.
Table 2. Discriminator network parameters.
ModelParameter NameSpecification
CNN 12D convolutionconvolution kernel3 × 3
number of filters32
step1
activation functionLeakyReLU
CNN 22D convolutionconvolution kernel5 × 5
number of filters64
step2
activation functionLeakyReLU
CNN 32D convolutionconvolution kernel3 × 3
number of filters16
step1
activation functionLeakyReLU
FCoutput dimension1 × 1
Table 3. R square comparison of reconstruction results of different models.
Table 3. R square comparison of reconstruction results of different models.
Deletion RateR Square
CGANKNNRFMean
10%0.94470.92550.91660.8931
20%0.92480.91160.88690.8745
40%0.89380.85250.86740.8695
60%0.87850.81930.81230.8227
Table 4. Comparison of accuracy of reconstruction results of different models.
Table 4. Comparison of accuracy of reconstruction results of different models.
Deletion RateAccuracy/%
CGANKNNRFMean
10%97.0095.8494.9694.53
20%96.3595.2093.0993.86
40%96.0294.7693.4793.57
60%95.5392.8292.3192.58
Table 5. Influence of different processing methods on root mean square error of prediction.
Table 5. Influence of different processing methods on root mean square error of prediction.
Deletion RateRMSE/KW
CGANKNNRFMeanDeletion
10%43.4848.3564.4146.1144.87
20%48.8757.8162.5951.4554.66
40%58.3164.5871.9265.2768.31
60%70.0980.5379.6182.6984.26
Table 6. Influence of different processing methods on mean absolute error of prediction.
Table 6. Influence of different processing methods on mean absolute error of prediction.
Deletion RateMAE/KW
CGANKNNRFMeanDeletion
10%34.2637.7148.6735.9435.69
20%38.3345.7846.4440.2642.79
40%44.1850.4149.2752.0351.38
60%51.3155.0853.9157.6560.81
Table 7. Comparison of prediction errors under different parameter combinations.
Table 7. Comparison of prediction errors under different parameter combinations.
EpochTime StepBatchsizeRMSEMAE
200101691.8763.20
200103267.1349.65
200106491.6068.23
200153248.8738.33
2002032216.46159.6
100153288.5663.04
3001532170.6102.4
Table 8. Source domain prediction error comparison.
Table 8. Source domain prediction error comparison.
ModelRMSEMAE
Bi-LSTM48.8738.33
RNN52.7141.41
SVR91.3373.46
BP network62.7847.24
Table 9. Comparison of prediction errors of transfer learning model.
Table 9. Comparison of prediction errors of transfer learning model.
ModelRMSEMAE
BiLSTM-TL047.0636.82
BiLSTM-TL146.4235.93
BiLSTM-TL248.8037.47
BiLSTM-TL350.0238.23
Table 10. Comparison of prediction errors in target domain.
Table 10. Comparison of prediction errors in target domain.
ModelRMSEMAETime/s
BiLSTM-TL146.4235.9364.8
Bi-LSTM44.6034.83724.6
BiLSTM-S50.9238.780
GRU45.9335.12523.8
BP network55.7642.6762.7
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chen, G.; Wu, Y.; Yang, L.; Xu, K.; Lin, G.; Zhang, Y.; Zhang, Y. Ultra-Short-Term Load Dynamic Forecasting Method Considering Abnormal Data Reconstruction Based on Model Incremental Training. Energies 2022, 15, 7353. https://doi.org/10.3390/en15197353

AMA Style

Chen G, Wu Y, Yang L, Xu K, Lin G, Zhang Y, Zhang Y. Ultra-Short-Term Load Dynamic Forecasting Method Considering Abnormal Data Reconstruction Based on Model Incremental Training. Energies. 2022; 15(19):7353. https://doi.org/10.3390/en15197353

Chicago/Turabian Style

Chen, Guangyu, Yijie Wu, Li Yang, Ke Xu, Gang Lin, Yangfei Zhang, and Yuzhuo Zhang. 2022. "Ultra-Short-Term Load Dynamic Forecasting Method Considering Abnormal Data Reconstruction Based on Model Incremental Training" Energies 15, no. 19: 7353. https://doi.org/10.3390/en15197353

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop