Next Article in Journal
Joint User Scheduling and Resource Allocation in Distributed MIMO Systems with Multi-Carriers
Previous Article in Journal
Greengage Grading Method Based on Dynamic Feature and Ensemble Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Combined Model Incorporating Improved SSA and LSTM Algorithms for Short-Term Load Forecasting

School of Intelligent Science and Engineering, Hubei Minzu University, Enshi 455000, China
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(12), 1835; https://doi.org/10.3390/electronics11121835
Submission received: 19 May 2022 / Revised: 1 June 2022 / Accepted: 7 June 2022 / Published: 9 June 2022
(This article belongs to the Section Systems & Control Engineering)

Abstract

:
To address the current difficulties and problems of short-term load forecasting (STLF), this paper proposes a combined forecasting method based on the improved sparrow search algorithm (ISSA), with fused Cauchy mutation and opposition-based learning (OBL), to optimize the hyperparameters of the long- and short-term-memory (LSTM) network. For the sparrow-search algorithm (SSA), a Sin-chaotic-initialization population, with an infinite number of mapping folds, is first used to lay the foundation for global search. Secondly, the previous-generation global-optimal solution is introduced in the discoverer-location update way, to improve the adequacy of the global search, while adaptive weights are added to reconcile the ability of the local exploitation and global search of the algorithm as well as to hasten the speed of convergence. Then, fusing the Cauchy mutation arithmetic and the OBL strategy, a perturbation mutation is performed at the optimal solution position to generate a new solution, which, in turn, strengthens the ability of the algorithm to get rid of the local space. After that, the ISSA-LSTM forecasting model is constructed, and the example is verified based on the power load data of a region, while the experimental comparison with various algorithms is conducted, and the results confirm the superiority of the ISSA-LSTM model.

1. Introduction

High stability and economy of the power supply as well as providing customers with high quality power are the main tasks of power systems. However, because electrical energy cannot be stored too much and the five links of electrical energy generation, transmission, distribution, transformation, and consumption must be finished at the same time, it is necessary to make the system follow the load changes to produce electrical energy, i.e., to realize the dynamic balance of electrical-energy production and consumption, otherwise the quality and economy of electrical energy cannot be guaranteed, which may even have a serious impact on the safe and stable running of the system [1]. Load forecasting (LF) can be, generally, divided into four types: long-term, medium-term, short-term, and ultra-short-term, and the related literature surveys account for 16%, 20%, 58%, and 6% of past research efforts, respectively. This shows that STLF is a key focus and hot topic in this field [2].
STLF has been developed through decades of research and can be divided into three main types of forecasting methods; the first category is traditional statistical methods, mainly containing linear-regression (LR) [3], autoregression (AR) [4], and auto-regressive-moving-average (ARMA) methods [5]. Statistical methods are simple in structure and easy to model, but the distribution characteristics of the input data have a large impact on their model output. The second category is machine-learning approaches, including gray systems, an artificial neural network (ANN) [6], and a support-vector machine (SVM) [7]. The SVM algorithm can be used for linear/nonlinear problems with a low generalization-error rate and can solve high-dimensional problems in traditional algorithms, but it is slow to converge and has low accuracy when dealing with large data-volume-time series. A back-propagation (BP) neural network in the ANN method has a strong nonlinear mapping capability and can automatically extract data input and output features as well as adjust the network weights to adapt during the training process, but its speed of convergence is slow and prone to fall into local minimization, plus the features need to be manually specified for time-series data, which destroys the integrity of the time series. The third category is the combined-model-prediction method, which generally optimizes multiple hyperparameters present in the model by combining optimization algorithms, such as PSO-BP, PSO-LSTM, etc.
Short-term power-load data are usually compound time series containing its own load fluctuations and related factors, which are time series and non-linear, and it is difficult for statistical methods to model for non-linear time series; although traditional machine-learning methods can overcome this obstacle, the time-series integrity of the input information is difficult to be preserved [8]. In recent years, with the improvement of device-computing ability, deep learning, such as a deep neural network (DNN) [9] and a deep belief network (DBN) [10], has been able to develop at a high speed and has become a hot spot for load-prediction research [11], which DNN and DBN having been applied to improve the prediction accuracy compared with traditional algorithms. A recurrent neural network (RNN) is capable of processing time series of arbitrary length, in principle by using neurons with self-feedback to make the network have short-term memory, and it usually uses a gradient-descent algorithm, but the problem of gradient explosion and disappearance occurs when the input sequence is long. LSTM networks have been widely used in time-series-processing problems, by introducing gating mechanisms to improve the aforementioned problems [12,13,14]. However, the values of these hyperparameters, such as the number of iterations, the learning rate, and the number of hidden layers with their internal neurons set in the LSTM model, are often not optimal, which leads to the model not achieving the best prediction results. The number of iterations and the learning rate are related to the training process and the effectiveness of the LSTM model as well as the number of hidden layers and their internal neurons that affect the fitting capacity of the LSTM [15]. Hyperparameters are usually set by manual experience, which has poor generality and high uncertainty. Therefore, in this paper, a combination of algorithms is considered to build a prediction model.
A comprehensive analysis of the algorithm’s solution speed, stability, and convergence accuracy in the literature [16] showed that the sparrow-search algorithm is highly competitive with the gray-wolf-optimization (GWO) algorithm, the particle-swarm-optimization (PSO) algorithm, and the gravitational-search algorithm (GSA), but the SSA suffers from the problem that the population diversity reduces late in the iteration, so is prone to fall into local extremes [17]. In view of this, this paper makes a series of improvements to the SSA and, also, combines it with LSTM to propose an ISSA-LSTM algorithm, applying it to STLF. Finally, this paper conducts an example validation, based on the load data of a region, and also conducts an experimental comparison with various algorithms, so as to confirm the superiority of the ISSA-LSTM model.

2. Long- and Short-Term Memory Network

LSTM is a recurrent neural-network model that is improved on the basis of the recurrent neural-network (RNN) model, and the composition of RNN is shown in Figure 1 below. The traditional neural-network model is only able to establish weight connections between layers, while the neurons of each RNN cell in RNN can also establish weight connections between them, which is the biggest difference between RNN and traditional neural networks. The neurons in the traditional RNN structure are also connected to each other with weight values. That is, as the sequence advances, the hidden layer located at the front has an effect on the later ones. Therefore, RNNs are better than other kinds of neural-network models for the temporal-sequence problem. However, RNNs cannot handle the problem of long-range dependencies and are highly susceptible to gradient disappearance and explosion, due to their directional loop for information transfer [18]. However, LSTM solves the undesirable problems of RNN mentioned above. First, LSTM can learn on long-term dependent information. Second, since the core idea of LSTM is to include three gating units within each RNN unit, the information of key nodes is selected to be remembered or forgotten, thus well improving the problem of gradient disappearance and explosion that occurs in RNN models on long time-series problems [19]. The left half of Figure 2 below shows the Simple Recurrent Network (SRN) unit, and the right half shows the LSTM module used in the hidden layer of the RNN.
Taking (x1, x2,…, xt) as the series of inputs to the model, and setting the model hidden states as (h1, h2, …, ht), we have Equations (1)–(5) at moment t.
i t = δ ( W h i h t 1 + W x i x t + b i )
f t = δ ( W h f h t 1 + W h f x t + b f )
c t = f t c t 1 + i t g ( W h c h t 1 + W x c x t + b c )
o t = δ ( W h o h t 1 + W o x x t + W c o c t + b o )
h t = o t g ( c t )
The ft, it, ot, and ct in Equations (1)–(5) denote the oblivion gate, input gate, output gate, and cell state, respectively. Wf, Wi, Wc denote the matrix weights of the corresponding gates, respectively. bf, bi, bc, bo denote the bias terms of each gate, respectively. · denotes the vector inner product, g(·) and δ(·) denote the sigmoid-function variation and tanh-function variation, respectively.
A linear-regression layer is also added to satisfy the need of implementing prediction using LSTM. The linear-regression-layer expression is set as shown in Equation (6).
y t = W y o h t + b y
The by in the above Equation (6) denotes that the threshold value of the linear-regression layer and yt denotes the result predicted by the model.

3. Sparrow Search Algorithm

The SSA is a novel intelligent-optimization algorithm recently introduced by Jiankai Xue, inspired by the predatory and anti-predatory behaviors of sparrows in biology. The size of the sparrow population is denoted by N. The sparrow-set matrix is shown in Equations (7) and (8) below. The i = (1,2,…,N) in Equation (8), and d is the dimension of the variable.
X = x 1 , x 2 , , x N T
x i = x i , 1 , x i , 2 , , x i , d
The matrix of fitness values of these sparrows is shown in Equations (9) and (10), where N denotes the number of sparrows and the value in each Fx denotes the adaptation value of an individual.
F x = f ( x 1 ) , f ( x 2 ) , , f ( x N ) T
f ( x i ) = f ( x i , 1 ) , f ( x i , 2 ) , , f ( x i , d )
The sparrow with the better fitness value was the first to get food and acted as a discoverer, leading the entire population closer to the food location. The position of the discoverer was calculated according to Equation (11).
X i , j t + 1 = X i , j t exp i α i t e r max , R 2 < S T X i , j t + Q L , R 2 S T
The t in the above Equation (11) denotes the number of current iterations, j = (1,2, ⋯, d) and X i , j t means the information of the position of the sparrow with number i in the j-th dimension. itermax means the maximum number of iterations, α is a random number within (0, 1), R2(R2 ∈ [0, 1]), and ST(ST ∈ [0.5, 1]) in sequence means the alert value and the safety value. Q denotes a random number that submits to a normal distribution of [0, 1]. L denotes the all-1 matrix of 1 × d. When R2 < ST, it means the vicinity is safe and the discoverer practices an extensive search pattern. If R2ST, which indicates that the discoverer has detected the danger, it tells the entire population that it needs to move to another safe location.
The positions of the followers are calculated according to Equation (12).
X i , j t + 1 = Q exp X w o r s t t X i , j t i 2 , i > N 2 X p t + 1 + X i , j t X p t + 1 A + L , o t h e r
In the above Equation (12), Xworst represents the global worst position. A denotes a 1 × d matrix with all elements arbitrarily given the values one or minus one, and, in addition, A+ = AT(AAT)−1. When i > 0.5N, it means that the i-th follower with poor adaptation has not eaten food, and, therefore, its own energy value is low. In order to replenish its own energy, it needs to go to other places to forage for food.
When foraging begins, some sparrows are selected by the population to be on guard. When danger appears nearby, the discoverers and followers will abandon the food they have found and flee to other locations. We selected 10–20% of the number of sparrows, noted as SD, and SD sparrows were arbitrarily selected from the population for early alert in each generation. The formula for calculating its position is shown in Equation (13).
X i , j t + 1 = X b e s t t + β X i , j t X b e s t t , f i > f g X i , j t + k X i , j t X w o r s t t f i f w + ε , f i = f g
Xbest in the above Equation (13) represents the best global position. β is the adjustment factor for the step size, which is a normally distributed random number with a variance value of one and a mean value of zero. k denotes a uniform random number within [−1, 1]. fi is the current sparrow fitness value. fg and fw are, in turn, the optimal- and the worst-adaptation degrees of the current global. ε denotes the minimum constant, which is mainly used to prevent the denominator from being zero. When fi > fg, it represents that the sparrow is at the outermost part of the population and is prone to encounter predators, which are more dangerous at this time. When fi = fg, it represents that the sparrow in the center of the population senses danger and will approach the other sparrows at this time. k indicates the orientation of the sparrow’s movement, and is a step adjustment factor.

4. Improved Sparrow-Search Algorithm

4.1. Sin-Chaos-Initialization Population

Chaos is frequently applied to address optimization-search problems. The Tent model and the Logistic model are the most commonly used chaotic models, but both of them are limited in the number of mapping folds. Unlike the first two models, the number of mapping folds of the sine-chaotification model is unrestricted. Haidong Yang et al. [20] demonstrated that the Sin model has better chaotic properties than the Logistic model, so this paper uses Sin chaos for the SSA algorithm. The Sin chaos 1-dimensional self-mapping expression is shown in Equation (14).
x n + 1 = sin 2 x n , n = 0 , 1 , , N 1 x n 1 , x n 0
The initial value in Equation (14) cannot be set to zero because if the initial value is zero, immobility and zeros will be generated at [−1, 1]. The relationship between initial value sensitivity, traversal, randomness, and the number of iterations of the Sin-chaotic 1-dimensional self-map is shown in Figure 3 [20]. From the sub-graphs (a) and (b) in Figure 3, it is found that when the initial values are set differently, the chaotic sequences produced are different, and from Figure 3c it is seen that when a certain number of generations are updated, the system will traverse the whole solution region.

4.2. Dynamic Self-Adaptation Inertia Weights

For the role of the discoverer, if it starts to approach the optimal solution of the whole space at the beginning of the algorithm iteration, search accuracy will be low. Since the search is too small, it is easy to cause the problem of not being able to get rid of the local extreme-value space. In this paper, we introduce the previous generation of global optimal solutions in the discoverer-position-calculation formula, so that the discoverer position is influenced by both the previous generation of discoverer positions and the previous generation of global optimal solutions, which can effectively prevent the situation that the best expectation value found by the algorithm is always a local extreme value. Furthermore, this paper refers to the concept of inertia weights and adds the dynamic inertia weight parameter w to the way of discoverer position calculation [21]. In the early stage of the iterative process, the value of w is larger, which makes the global exploration better. In addition, at the later stage of the iteration, w starts to decrease adaptively, which makes the local search better and also allows the algorithm to converge faster. w is calculated as shown in Equation (15), and the improved discoverer-location-update method is shown in Equation (16).
ω = e 2 1 t / iter max e 2 1 t / iter max e 2 1 t / iter max + e 2 1 t / iter max
X i , j t + 1 = X i , j t + ω f j , g t X i , j t · r a n d , R 2 < S T X i , j t + Q , R 2 S T
In Equation (16), f j , g t denotes the best solution for the entire space in the j-th dimension in the last generation, and rand represents a random number between 0 and 1.

4.3. Improved Scouting-Warning-Sparrow-Update Formula

The formula for calculating the position of the detection-warning sparrow is improved according to Equation (17).
X i , j t + 1 = X b e s t t + β X i , j t X b e s t t , f i f g X b e s t t + β X w o r s t t X b e s t t , f i = f g
Equation (17) represents that if the sparrow is not in the best location, it flies to a location randomly between itself and the best location. Otherwise, its location will be randomly chosen between the worst and the best locations.

4.4. Incorporating Cauchy Variation and Opposition-Based Learning Strategies

OBL is a method proposed by Tizhoosh, which aims to find the corresponding backward solution based on the current solution through a backward-learning mechanism, and then it saves the relatively better solution after evaluation and comparison. In order to enhance the ability of individuals to solve the best solution, this paper incorporates the OBL strategy into the SSA, which is characterized mathematically as shown in Equations (18) and (19).
X b e s t t = u b + r l b X b e s t t
X i , j t + 1 = X b e s t t + b 1 X b e s t t X b e s t t
X b e s t in Equation (18) denotes the inverse solution according to the tth generation optimal solution, and r is a 1 × d (d is the spatial dimension) matrix of random numbers, which obeys a (0, 1) standard uniform distribution. ub and lb denote the upper and lower bounds, respectively. In addition, ⊕ represents the exclusive OR, and b1 is the control parameter of information exchange [22], which is calculated as shown in Equation (20).
b 1 = i t e r max t / i t e r max t
The Cauchy variation is derived from the Cauchy distribution, and Equation (21) is the one-dimensional Cauchy distribution probability-density expression.
f ( x ) = 1 π a a + x 2 ,   x ( , + )
When a = 1, it is known as the standard Cauchy distribution. Figure 4 shows the image of the probability density of the Gaussian distribution as a function of the Cauchy distribution.
From Figure 4, it can be clearly observed that the left and right parts of the Cauchy distribution are flat and long, approaching 0 more gently, with a slower speed compared to the Gaussian distribution, and with a smaller peak near the (0, 0) point compared to the latter, so the Cauchy variation has a more outstanding perturbation ability compared to the Gaussian variation. Hence, applying the Cauchy variant to the target location calculation, the global-search performance of the SSA is upgraded by using the perturbation ability of the Cauchy operator.
The location update method is shown in Equation (22), and the cauchy(0, 1) in Equation (22) denotes the standard Cauchy distribution.
X i , j t + 1 = X b e s t ( t ) + c a u c h y ( 0 , 1 ) X b e s t ( t )
In addition, note that the Cauchy distributed random-variables-generating function be η, whose expression is shown in Equation (23) below.
η = tan [ ( ξ 0.5 ) π ]
In Equation (23), tan is the tangent function and ξ denotes a random number between 0 and 1.
In order to further upgrade the algorithm’s optimization-seeking ability, this paper adopts a method that alternates the OBL strategy and the Cauchy variational-operator-perturbation strategy with a certain probability, thereby realizing the dynamic calculation of updated target locations. The former is used to obtain the reverse solution through the reverse-learning mechanism, thus enlarging the search space of the algorithm. The latter uses the Cauchy variational operator to derive a new solution, by performing a perturbation-variation operation at the best solution location, which ameliorates the drawback that the algorithm cannot be detached from the local space. The target position is updated with the selection probability Ps [22] in Equation (24), as the basis for strategy selection.
P s = exp ( 1 t i t e r max ) 20 + θ
θ in Equation (24) denotes the adjustment parameter, and its numerical size is taken as 1/20 in this paper.
If rand < Ps, the target position is updated using the OBL strategy of Equations (18)–(20), and vice versa, so the Cauchy variation-perturbation strategy of Equation (22) is selected.
After choosing the perturbation strategy mentioned above, although it reduces the occurrence of the phenomenon of the algorithm falling into the local space, it brings a new problem, that the fitness value of the new location is derived after the perturbation variation is performed, so it is impossible to determine whether it is superior to the fitness value of the original location. Accordingly, this paper introduces a greedy rule as shown in Equation (25). The position fitness value of x is denoted by f(x), and the need to recalculate the position information is determined by comparing the adaptation values of the two positions before and after the update.
X b e s t = X i , j t + 1 , f ( X i , j t + 1 ) < f ( X b e s t ) X b e s t = X b e s t , f ( X i , j t + 1 ) f ( X b e s t )

5. Load-Forecasting Models

5.1. Algorithmic Flow of the Model

The flow of the improved SSA, incorporating the Cauchy variation and OBL combined with the LSTM network for prediction, is shown in Figure 5. The left part of Figure 5 shows the ISSA algorithm flow, and the right part shows the algorithm flow of the LSTM model.
The following are the specific steps to optimize the SSA by incorporating Cauchy variation and OBL strategy.
(1)
Initialize parameters, such as population size N, spotter proportion PD, alert threshold R2, scout proportion SD, and the largest number of iterations, and initialize the population with Sin-chaos mapping using Equation (14).
(2)
The adaptation of each sparrow is calculated according to Equation (9), and the present worst and best adaptation are marked, along with the corresponding location information.
(3)
The sparrows with the better adaptation were chosen as discoverers in the population, and the discoverers updates their position according to Equation (16).
(4)
The other sparrows in the population play the role of followers, and the followers recalculate their locations according to Equation (12).
(5)
A randomly chosen fraction of sparrows in the population play the role of vigilantes, and the vigilantes update their position according to Equation (17).
(6)
Calculate the value of Ps according to Equation (24).
(7)
A choice is made between two strategies, Cauchy mutation perturbation and OBL, based on the value of Ps, thus perturbing the present best solution and, in turn, generating a new solution.
(8)
Based on the greedy rule, use Equation (25) to see if the location should be recalculated.
(9)
Determine whether the algorithm reaches the end condition of the maximum number of iterations. If the end condition is not met, then skip to part (2) above, if it is, then proceed to the next step.
(10)
The program process ends and outputs the best hyperparameter result calculated.
The optimal hyperparameters output from the ISSA are used to construct the LSTM prediction model, and the input data are imported into the forecasting model to derive the power-load-forecasting results.

5.2. Structure of the Model

An ISSA-LSTM-based STLF model is established, and the basic structural framework is illustrated in Figure 6. The left part of Figure 6 illustrates the input of the model, including the load-history values and related influencing factors. The middle part is the constructed load-forecasting model, and the input data are imported into the ISSA-LSTM model to, finally, obtain the forecasting results.

6. Model Experiment

6.1. Performance Comparison of Sparrow Search Algorithm before and after Improvement

(1)
Benchmark function
The performance of the SSA and ISSA algorithms are compared based on eight benchmark-test functions. Their specific formulas and their respective corresponding function images are shown in Appendix A. The test functions include single-peak and multi-peak functions, with dimension d of 10, 30, and 100, respectively. To fairly verify the effectiveness of the ISSA algorithm, the test was conducted in the same operating environment, and the computer processor used was an IntelCore i3-10100F (Santa Clara, CA, USA), the graphics card was a GTX 1060 3G, the SSD capacity size was 500 GB, and the memory was 16 GB. The computer operating system was Microsoft Windows 10, and the simulation software was MATLAB2020a. Ensure that the general conditions are consistent, and set the number of populations to 10, the number of iterations to 1000, PD = 0.7, SD = 0.2 and ST = 0.6, while each algorithm is run 50 times independently.
The information on the range of values and optimal solutions of the eight benchmark-test functions is given in Table 1.
(2)
Performance-comparison analysis of algorithm before and after improvement
The specific results of the two algorithms, for optimizing the eight benchmark-test functions, are given in Table 2.
According to Table 2 below, for single-peaked functions F1 to F5, the solving accuracy and speed of convergence of the ISSA in different dimensions are greatly improved, compared with the SSA algorithm, and the average value can reach the theoretical optimal solution with the smallest standard deviation. In the optimized multi-peak function F6 to F8, the improved sparrow algorithm has the highest solution efficiency in 10 dimensions, 30 dimensions, or 100 dimensions, and it can effectively move away from the local optimum and achieve the desired effect, which indicates that the improved strategy is feasible and effective. In addition, with the gradual increase in dimensionality, it appears that the SSA-solution accuracy decreases, while the ISSA-solution accuracy basically does not change, and, even, further improves in F2 accuracy, which shows excellent stability.
The optimization effects and convergence plots, combining the single-peak and multi-peak functions, are shown in Figure 7. From Figure 7a–h, it can be found that the ISSA shows more excellent performance in both solution accuracy and speed of convergence, and it can be observed from the two indexes of the mean and standard deviation of the multiple optimization search that the value of the ISSA is smaller, indicating that the stability and robustness of the ISSA are significantly superior to that of the SSA.

6.2. Load-Forecasting-Model Experiment

6.2.1. Input-Data Prerocessing

(1)
Abnormal-data handling
Linear interpolation is used to replace anomalous data and fill in missing data. The load values at time t and t + iΔt are known to be Lt and Lt + iΔt, and if the load data at intermediate time t + jΔt is missing, the load value at t + jΔt at this time is shown in Equation (26).
L t + j Δ t = L t + j × L t + i Δ t L t i , j < i
(2)
Data normalization
Since various meteorological factors and load values have different units, which may have different degrees of influence in the operation process, to avoid affecting the forecasting results of the model due to the difference in data magnitude, it is necessary to speciate the data so that each influencing factor is within the same range of values, and then use this as the input variable of the model. In addition, this is also in consideration of the data requirements of the neural network. The optimal solution-finding process becomes significantly smoother after normalizing the data, which makes it easier for the algorithm to converge to the best solution and facilitate subsequent trend analysis of the load data. The data-normalization formula is shown in Equation (27).
x R = x x min x max x min
In Equation (27), xR is the normalized value, x is the original value of the load data, and xmax and xmin represent the maximum and minimum load values of the same day, respectively.

6.2.2. Evaluation Indicators

The experimental process in this paper involves two evaluation indexes, Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE), respectively. Using yi to denote the actual value of the load data, y ^ i to denote the forecasting result, and y ¯ to denote the average of the actual values, the specific formulas for MAPE and RMSE are shown in Equations (28) and (29) below.
(1)
MAPE
The MAPE range is [0, +∞). The smaller its value is, the smaller the difference between y ^ i and yi, and the better the calculation. When its value is 0, it indicates a perfect model, and when its value is greater than 1, it indicates an inferior model.
MAPE = t = 1 n y i y ^ i y i × 100 n
(2)
RMSE
The RMSE ranges from [0, +∞), y ^ i when coinciding exactly with yi is equal to 0, i.e., the perfect model. The larger the error is, the larger the RMSE. In addition, the RMSE is more sensitive to the outlier response, and if the error is large, the effect of the error on the RMSE is not proportional.
RMSE = MSE = 1 n i = 1 n y i y ^ i 2

6.2.3. Predictive-Modeling Experiments

The experimental data are historical load data and load-related influencing factors of a region, with a sampling interval of 1 h and 24 sampling points in a day. The load-related influencing factors include meteorological factors and week types. Meteorological factors include maximum temperature, minimum temperature, average temperature, and ambient relative humidity. Week types were classified into four categories: Monday (0.7), Tuesday to Friday (0.8), Saturday (0.4), and Sunday (0.3).
For the hyperparameter-selection problem with LSTM, this paper adopts the improved sparrow algorithm ISSA, which incorporates the Cauchy variation and OBL, to search for the hyperparameters of LSTM. The hyperparameters include the number of iterations (iter), the learning rate (lr) and the number of neurons in the two hidden layers (L1 and L2). The mean square difference between the forecasted and true values calculated by the network is used as the fitness function, and the specific formulas are given in Equations (30)–(32). With the ISSA’s optimization search, a set of hyperparameters is able to be found, which makes the trained network have the lowest error.
f i t i = 1 2 MSE t r a i n + MSE test
MSE t r a i n = 1 n y t r a i n y ^ t r a i n 2
MSE test = 1 n y t e s t y ^ t e s t 2
(1)
LSTM-based prediction model
The selected load dataset in this paper is recorded 97 samples after data preprocessing, and the first 96 samples are used as the training set for the training of the prediction model. The remaining one sample of the data set is used as the test set, to forecast the load data of the last day. The LSTM model is set to 200 for L1, 200 for L2, 1000 for iter, and 0.0050 for lr. The model-prediction result data and the real value data are given in Table 3 below.
The comparison of the prediction curves generated by the LSTM-based prediction model, with the true value curves of the load data, is illustrated in Figure 8.
(2)
SSA-LSTM-based prediction model
A forecasting model based on SSA-LSTM was established. The results of the hyperparameters of the LSTM model, after parameter search by the SSA, are given in Table 4 below.
The LSTM network was rebuilt using the above hyperparameters to obtain the prediction results. The model-prediction result data and the real value data are given in Table 5 below.
The comparison of the prediction curves generated, based on the SSA-LSTM prediction model, with the true value curves of the load data is shown in Figure 9.
(3)
Predictive model based on ISSA-LSTM
An ISSA-LSTM-based STLF model is developed. After the algorithm is run, the adaptation curve can be obtained as shown in Figure 10. The curve indicates that with the ISSA seeking, the algorithm can search for a set of hyperparameters. When the network is trained using this set of hyperparameters, it can make its error lower, so it is a decreasing curve.
In addition, the change curves of the hyperparameters of the model at the end of the run are illustrated in Figure 11, from which it can be observed that the values of the four hyperparameters of the model eventually converge to a certain value, as the number of iterations increases. The specific values of each hyperparameter are given in Table 6.
The LSTM network was rebuilt using the above hyperparameters to obtain the prediction results. The model-prediction result data and the real value data are given in Table 7.
The comparison of the prediction curves generated, based on the ISSA-LSTM prediction model, with the true value curves of the load data is shown in Figure 12.

6.3. Load-Forecasting-Model Experiment

The prediction results of six models, LSTM, SSA-LSTM, ISSA-LSTM, PSO-LSTM [23], PSO-BP [24], and PSO-LSSVM [25], were put together for comparison, and then the prediction index values of each of the six models were calculated. The results of the model comparison built in this paper are illustrated in Figure 13 below. As can be observed from Figure 13 below, the ISSA-LSTM model has the most similar fluctuation magnitude to the actual value and achieves good accuracy in peak prediction. Although the prediction effect of some points is slightly inferior to other models, the overall prediction accuracy and stability are higher, and the fit is better.
The calculation results of the prediction index and prediction accuracy of each model are given in Table 8. As can be seen from the data in Table 8, that the MAPE and RMSE of the model built by LSTM, in combination with the PSO, SSA, and ISSA, are further reduced, and the prediction accuracy is improved. In addition, among the six models, the forecasting accuracy of the ISSA-LSTM model proposed in this paper is the highest.

7. Conclusions

A series of improvements are made to the SSA, which has been improved to improve its search performance significantly and can effectively get rid of the local optimum. Firstly, sparrow-position initialization is very important for global search, and this paper uses Sin-chaos initialization to initialize the population, thus enriching the diversity of solutions. Secondly, this paper introduces the dynamic adaptive-weight factor, which effectively balances the local and global-excavation ability of the algorithm. Finally, this paper integrates the Cauchy variation and OBL strategies, thus reducing the probability that the algorithm cannot get out of the local extremes, thus enhancing the global exploration ability of the algorithm.
An ISSA based on fused Cauchy variance and OBL is proposed to optimize the combined-forecasting method of LSTM hyperparameters with it, and an ISSA-LSTM-based STLF model is developed. ISSA-LSTM reduces the influence of human factors on LSTM and improves the ability of the model to capture the characteristics of power-load data. The experimental results indicate that the proposed model has higher forecasting accuracy compared with the LSTM, SSA-LSTM, PSO-LSTM, PSO-BP and PSO-LSSVM models, which provides a new idea for STLF.
The next step can be considered to continue to improve the optimization mechanism and algorithm structure of the sparrow algorithm, or try to integrate the advantages of other intelligent algorithms to propose intelligent algorithms with better performance for load forecasting. In addition, we have read a lot of the literature from other fields [26,27,28,29,30,31], and in the future we can consider how to incorporate them into our research topics.

Author Contributions

Conceptualization, M.H.; methodology, M.H.; software, M.H.; validation, M.H.; formal analysis, M.H.; investigation, M.H. and P.S.; resources, M.H. and A.T.; data curation, J.Z.; writing—original draft preparation, M.H.; writing—review and editing, M.H.; visualization, M.H.; supervision, J.Z. and H.L.; project administration, A.T.; funding acquisition, A.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to power-load data being provided by the internal national grid, only for the subject research, with a certain degree of confidentiality.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The specific formulas of the eight benchmark-test functions used in this paper are shown in Equations (A1)–(A8), and their function images are shown in Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7 and Figure A8, respectively.
(1)
Sphere(F1)
F 1 ( x ) = i = 1 n x i 2
Figure A1. F1 image.
Figure A1. F1 image.
Electronics 11 01835 g0a1
(2)
Schwefel’s 2.22 (F2)
F 2 ( x ) = i = 1 n x i + i = 1 n x i
Figure A2. F2 image.
Figure A2. F2 image.
Electronics 11 01835 g0a2
(3)
Schwefel’s 1.2 (F3)
F 3 ( x ) = i = 1 n j = 1 i x j 2
Figure A3. F3 image.
Figure A3. F3 image.
Electronics 11 01835 g0a3
(4)
Rosenbrock’s (F4)
F 5 ( x ) = i = 1 n 1 100 x i + 1 x i 2 2 + x i 1 2
Figure A4. F4 image.
Figure A4. F4 image.
Electronics 11 01835 g0a4
(5)
Quartic (F5)
F 7 ( x ) = i = 1 n i x i 4 + random 0 , 1
Figure A5. F5 image.
Figure A5. F5 image.
Electronics 11 01835 g0a5
(6)
Rastrigin (F6)
F 9 ( x ) = i = 1 n x i 2 10 cos 2 π x i + 10
Figure A6. F6 image.
Figure A6. F6 image.
Electronics 11 01835 g0a6
(7)
Ackely (F7)
F 10 ( x ) = 20 exp 0.2 1 n i = 1 n x i 2 exp 1 n i = 1 n cos 2 π x i + 20 + e
Figure A7. F7 image.
Figure A7. F7 image.
Electronics 11 01835 g0a7
(8)
Griewank (F8)
F 11 ( x ) = 1 4000 i = 1 n x i 2 i = 1 n cos x i i + 1
Figure A8. F8 image.
Figure A8. F8 image.
Electronics 11 01835 g0a8

References

  1. Han, M.; Tan, A.; Zhong, J. Application of Particle Swarm Optimization Combined with Long and Short-term Memory Networks for Short-term Load Forecasting. In Proceedings of the 2021 International Conference on Robotics Automation and Intelligent Control (ICRAIC 2021), Wuhan, China, 26–28 November 2021. [Google Scholar]
  2. Boroojeni, K.G.; Amini, M.H.; Bahrami, S.; Iyengar, S.S.; Sarwat, A.I.; Karabasoglu, O. A novel multi-time-scale modeling for electric power demand forecasting: From short-term to medium-term horizon. Electr. Power Syst. Res. 2017, 142, 58–73. [Google Scholar] [CrossRef]
  3. Song, K.B.; Baek, Y.S.; Hong, D.H.; Jiang, G. Short-term load forecasting for the holidays using fuzzy linear regression method. IEEE Trans. Power Syst. 2005, 20, 96–101. [Google Scholar] [CrossRef]
  4. Xu, W.; Hu, H.; Yang, W. Energy time series forecasting based on empirical mode decomposition and FRBF-AR model. IEEE Access 2019, 7, 36540–36548. [Google Scholar] [CrossRef]
  5. Huang, S.J.; Shih, K.R. Short-term load forecasting via ARMA model identification including non-Gaussian process considerations. IEEE Trans. Power Syst. 2003, 13, 673–679. [Google Scholar] [CrossRef]
  6. Hernández, L.; Baladrón, C.; Aguiar, J.M.; Calavia, L.; Carro, B.; Sanchez-Esguevillas, A.; Perez, F.; Fernandez, A.; Lloret, J. Artificial neural network for short-term load forecasting in distribution systems. Energies 2014, 7, 1576–1598. [Google Scholar] [CrossRef]
  7. Ceperic, E.; Ceperic, V.; Baric, A. A strategy for short-term load forecasting by support vector regression machines. IEEE Trans. Power Syst. 2013, 28, 4356–4364. [Google Scholar] [CrossRef]
  8. Wang, Z.; Zhao, B.; Ji, W.; Gao, X.; Li, X. Short-term Load Forecasting Method Based on GRU-NN Model. Autom. Electr. Power Syst. 2019, 43, 53–58. [Google Scholar]
  9. Zhang, J. Research and Analysis of Short-term Load Forecasting Based on Adaptive K-means and DNN. Electron. Meas. Technol. 2020, 43, 58–61. [Google Scholar]
  10. Kong, X.; Zheng, F.; E, Z.; Cao, J.; Wang, X. Short-term Load Forecasting Based on Deep Belief Network. Autom. Electr. Power Syst. 2018, 42, 133–139. [Google Scholar]
  11. Zhao, B.; Wang, Z.; Ji, W.; Gao, X.; Li, X. A Short-term power load forecasting Method Based on Attention Mechanism of CNN-GRU. Power Syst. Technol. 2019, 43, 4370–4376. [Google Scholar]
  12. Lu, J.; Zhang, Q.; Yang, Z.; Tu, M.; Lu, J.; Peng, H. Short-term Load Forecasting Method Based on CNN-LSTM Hybrid Neural Network Model. Autom. Electr. Power Syst. 2019, 43, 131–137. [Google Scholar]
  13. Wang, R.; Zhang, S.; Xiao, X.; Wang, Y. Short-term Load Forecasting of Multi-layer Long Short-term Memory Neural Network Considering Temperature Fuzziness. Electr. Power Autom. Equip. 2020, 40, 181–186. [Google Scholar]
  14. Tian, H.; Zhang, Z.; Yu, D. Research on Multi-load Short-term Forecasting of Regional Integrated Energy System Based on Improved LSTM. Proc. CSU-EPSA 2021, 33, 130–137. [Google Scholar]
  15. Liu, K.; Ruan, J.; Zhao, X.; Liu, G. Short-term Load Forecasting Method Based on Sparrow Search Optimized Attention-GRU. Proc. CSU-EPSA 2022, 34, 1–9. [Google Scholar]
  16. Xue, J. Research and Application of A Novel Swarm Intelligence Optimization Technique: Sparrow Search Algorithm. Master’s Thesis, Donghua University, Shanghai, China, 2020. [Google Scholar]
  17. Mao, Q.; Zhang, Q. Improved Sparrow Algorithm Combining Cauchy Mutation and Opposition-based Learning. J. Front. Comput. Sci. Technol. 2021, 15, 1155–1164. [Google Scholar]
  18. Zhu, J.; Liu, S.; Fan, N.; Shen, X.; Guo, X. A Short-term power load forecasting Method Based on LSTM Neural Network. China New Telecommun. 2021, 23, 167–168. [Google Scholar]
  19. Hao, H. Research on Short-Term Load Forecasting Method of Power Plant Based on Deep Learning. Master’s Thesis, Tianjin University of Technology, Tianjing, China, 2021. [Google Scholar]
  20. Yang, H.; Zhang, Q. An Adaptive Chaos Immune Optimization Algorithm with Mutative Scale and Its Application. Control. Theory Appl. 2009, 26, 1069–1074. [Google Scholar]
  21. Liu, J.; Yuan, M.; Zuo, F. Global Search-oriented Adaptive Leader Salp Swarm Algorithm. Control. Decis. 2021, 36, 2152–2160. [Google Scholar]
  22. He, Q.; Lin, J.; Xu, H. Hybrid Cauchy Mutation and Uniform Distribution of Grasshopper Optimization Algorithm. Control. Decis. 2021, 36, 1558–1568. [Google Scholar]
  23. Liu, B. Research of Short-term Power load Forecasting Based on PSO-LSTM Algorithm. Master’s Thesis, Jilin University, Changchun, China, 2020. [Google Scholar]
  24. Zeng, D.; Xu, J.; Yang, J.; Lu, W. Short-term power load forecasting Based on PSO-BP with Data Mining. Process. Autom. Instrum. 2020, 41, 93–97. [Google Scholar]
  25. Xiong, Y. Design and Imp Iementation of Short-Term Load Forecasting System Based on Improved PSO-LSSVM. Master’s Thesis, Anhui University, Hefei, China, 2019. [Google Scholar]
  26. Berry, M.; Lewis, Z.; Nye, J. On the Weierstrass-Mandelbrot fractal function. Proc. R. Soc. Lond. 1980, 370, 459–484. [Google Scholar]
  27. Guido, R.; Pedroso, F.; Contreras, R.; Rodrigues, L.; Guariglia, E.; Neto, J. Introducing the Discrete Path Transform (DPT) and its applications in signal analysis, artefact removal, and spoken word recognition. Digit. Signal. Process. 2021, 117, 103158. [Google Scholar] [CrossRef]
  28. Yang, L.; Su, H.; Zhong, C.; Meng, Z.; Luo, H.; Li, X.; Tang, Y.; Lu, Y. Hyperspectral image classification using wavelet transform-based smooth ordering. Int. J. Wavelets Multiresolut. Inf. Process. 2019, 17, 1950050. [Google Scholar] [CrossRef]
  29. Guariglia, E. Harmonic sierpinski gasket and applications. Int. J. Wavelets. Multiresolut. Inf. Process. 2018, 20, 714. [Google Scholar] [CrossRef] [PubMed]
  30. Zheng, X.; Tang, Y.; Zhou, J. A framework of adaptive multiscale wavelet decomposition for signals on undirected graphs. IEEE Trans. Signal. Process. 2019, 67, 1696–1711. [Google Scholar] [CrossRef]
  31. Guariglia, E.; Silvestrov, S. Fractional-wavelet Analysis of positive definite distributions and wavelets on D’(C). In Engineering Mathematics II; Springer: Cham, Switzerland, 2016; pp. 337–353. [Google Scholar]
Figure 1. Ground-truth RNN.
Figure 1. Ground-truth RNN.
Electronics 11 01835 g001
Figure 2. SRN and LSTM modules.
Figure 2. SRN and LSTM modules.
Electronics 11 01835 g002
Figure 3. The relationship between Sin’s chaotic properties and the number of iterations.
Figure 3. The relationship between Sin’s chaotic properties and the number of iterations.
Electronics 11 01835 g003
Figure 4. Probability density-curve image.
Figure 4. Probability density-curve image.
Electronics 11 01835 g004
Figure 5. ISSA-LSTM algorithm flow.
Figure 5. ISSA-LSTM algorithm flow.
Electronics 11 01835 g005
Figure 6. ISSA-LSTM-model-framework structure.
Figure 6. ISSA-LSTM-model-framework structure.
Electronics 11 01835 g006
Figure 7. Performance comparison of Sparrow algorithm before and after improvement.
Figure 7. Performance comparison of Sparrow algorithm before and after improvement.
Electronics 11 01835 g007aElectronics 11 01835 g007b
Figure 8. LSTM model forecasting curve vs. true value of load.
Figure 8. LSTM model forecasting curve vs. true value of load.
Electronics 11 01835 g008
Figure 9. Comparison of SSA-LSTM model prediction curves and real values of power load.
Figure 9. Comparison of SSA-LSTM model prediction curves and real values of power load.
Electronics 11 01835 g009
Figure 10. ISSA adaptation curve.
Figure 10. ISSA adaptation curve.
Electronics 11 01835 g010
Figure 11. Variation of model hyperparameters.
Figure 11. Variation of model hyperparameters.
Electronics 11 01835 g011
Figure 12. Comparison of ISSA-LSTM model-prediction curves and real values of power load.
Figure 12. Comparison of ISSA-LSTM model-prediction curves and real values of power load.
Electronics 11 01835 g012
Figure 13. Comparison of prediction results by models.
Figure 13. Comparison of prediction results by models.
Electronics 11 01835 g013
Table 1. Benchmark function.
Table 1. Benchmark function.
Function No.Function TypeRange of ValuesOptimal Solution
F1Sphere[−100, 100]0
F2Schwefel’s 2.22[−10, 10]0
F3Schwefel’s 1.2[−100, 100]0
F4Rosenbrock’s[−100, 100]0
F5Quartic[−1.28, 1.28]0
F6Rastrigin[−5.12, 5.12]0
F7Ackely[−32, 32]0
F8Griewank[−600, 600]0
Table 2. Results of algorithm-optimization benchmark-test function.
Table 2. Results of algorithm-optimization benchmark-test function.
FunctionAlgorithmAverage ValueStandard DeviationAverage ValueStandard DeviationAverage ValueStandard Deviation
d = 10d = 30d = 100
F1SSA2.933 × 10−751.581 × 10−741.306 × 10−626.536 × 10−623.011 × 10−531.702 × 10−52
ISSA000000
F2SSA7.612 × 10−374.302 × 10−362.008 × 10−308.912 × 10−302.617 × 10−309.886 × 10−30
ISSA1.055 × 10−25801.017 × 10−26101.005 × 10−2730
F3SSA1.517 × 10−337.921 × 10−331.688 × 10−297.772 × 10−296.947 × 10−263.229 × 10−25
ISSA000000
F4SSA5.223 × 10−272.582 × 10−260000
ISSA000000
F5SSA2.256 × 10−1501.105 × 10−1491.401 × 10−1097.992 × 10−1108.529 × 10−1004.507 × 10−99
ISSA000000
F6SSA000000
ISSA000000
F7SSA9.238 × 10−1609.238 × 10−1609.238 × 10−160
ISSA000000
F8SSA000000
ISSA000000
Table 3. Comparison of output values of LSTM model with real values.
Table 3. Comparison of output values of LSTM model with real values.
Serial No.True Value (MW)Predicted Values (MW)Serial No.True Value (MW)Predicted Values (MW)
159.6167.781376.4476.55
257.9560.611482.4385.52
355.6658.181584.7984.27
453.9155.481686.2084.82
552.8955.531785.9487.40
655.0255.551887.4084.55
760.1458.611985.1982.21
867.5264.192087.8189.72
980.0280.412184.9585.01
1086.1684.502283.2781.54
1190.4087.042375.9374.92
1289.5386.772468.5566.86
Table 4. The result of SSA search for hyperparameters.
Table 4. The result of SSA search for hyperparameters.
HyperparametersValue
L1270
L2383
iter175
lr0.0060
Table 5. Comparison of the output values of the SSA-LSTM model with the real values.
Table 5. Comparison of the output values of the SSA-LSTM model with the real values.
Serial No.True Value (MW)Predicted Values (MW)Serial No.True Value (MW)Predicted Values (MW)
159.6164.621376.4476.79
257.9556.331482.4383.13
355.6657.401584.7983.82
453.9155.591686.2084.92
552.8953.271785.9487.28
655.0252.681887.4085.42
760.1457.901985.1983.69
867.5265.312087.8187.31
980.0280.392184.9586.74
1086.1685.962283.2781.91
1190.4088.762375.9376.27
1289.5388.872468.5568.63
Table 6. The result of ISSA search for hyperparameters.
Table 6. The result of ISSA search for hyperparameters.
HyperparametersValue
L187
L2260
iter232
lr0.0083
Table 7. Comparison of ISSA-LSTM model output values with real values.
Table 7. Comparison of ISSA-LSTM model output values with real values.
Serial No.True Value (MW)Predicted Values (MW)Serial No.True Value (MW)Predicted Values (MW)
159.6162.691376.4477.05
257.9556.241482.4384.90
355.6655.791584.7985.03
453.9153.501686.2086.05
552.8953.691785.9488.07
655.0253.641887.4085.19
760.1459.101985.1984.65
867.5266.602087.8188.97
980.0279.662184.9587.13
1086.1685.042283.2781.70
1190.4088.612375.9375.38
1289.5388.242468.5568.38
Table 8. Model-prediction index and accuracy-calculation results.
Table 8. Model-prediction index and accuracy-calculation results.
Predictive ModelsMAPERMSEPrediction Accuracy
LSTM0.02962.633797.04%
SSA-LSTM0.01961.695698.04%
ISSA-LSTM0.01581.420498.42%
PSO-LSTM0.02732.802297.27%
PSO-BP0.03223.026396.78%
PSO-LSSVM0.03313.117796.69%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Han, M.; Zhong, J.; Sang, P.; Liao, H.; Tan, A. A Combined Model Incorporating Improved SSA and LSTM Algorithms for Short-Term Load Forecasting. Electronics 2022, 11, 1835. https://doi.org/10.3390/electronics11121835

AMA Style

Han M, Zhong J, Sang P, Liao H, Tan A. A Combined Model Incorporating Improved SSA and LSTM Algorithms for Short-Term Load Forecasting. Electronics. 2022; 11(12):1835. https://doi.org/10.3390/electronics11121835

Chicago/Turabian Style

Han, Mingchong, Jianwei Zhong, Pu Sang, Honghua Liao, and Aiguo Tan. 2022. "A Combined Model Incorporating Improved SSA and LSTM Algorithms for Short-Term Load Forecasting" Electronics 11, no. 12: 1835. https://doi.org/10.3390/electronics11121835

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop