Short Term Power Load Forecasting Based on PSVMD-CGA Model

Su, Jingming; Han, Xuguang; Hong, Yan

doi:10.3390/su15042941

Open AccessArticle

Short Term Power Load Forecasting Based on PSVMD-CGA Model

by

Jingming Su

,

Xuguang Han

^* and

Yan Hong

School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan 232001, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(4), 2941; https://doi.org/10.3390/su15042941

Submission received: 11 December 2022 / Revised: 13 January 2023 / Accepted: 24 January 2023 / Published: 6 February 2023

Download

Browse Figures

Versions Notes

Abstract

:

Short-term power load forecasting is critical for ensuring power system stability. A new algorithm that combines CNN, GRU, and an attention mechanism with the Sparrow algorithm to optimize variational mode decomposition (PSVMD–CGA) is proposed to address the problem of the effect of random load fluctuations on the accuracy of short-term load forecasting. To avoid manual selection of VMD parameters, the Sparrow algorithm is adopted to optimize VMD by decomposing short-term power load data into multiple subsequences, thus significantly reducing the volatility of load data. Subsequently, the CNN (Convolution Neural Network) is introduced to address the fact that the GRU (Gated Recurrent Unit) is difficult to use to extract high-dimensional power load features. Finally, the attention mechanism is selected to address the fact that when the data sequence is too long, important information cannot be weighted highly. On the basis of the original GRU model, the PSVMD–CGA model suggested in this paper has been considerably enhanced. MAE has dropped by 288.8%, MAPE has dropped by 3.46%, RMSE has dropped by 326.1 MW, and R2 has risen to 0.99. At the same time, various evaluation indicators show that the PSVMD–CGA model outperforms the SSA–VMD–CGA and GA–VMD–CGA models.

Keywords:

power load forecasting; GRU; CNN; VMD; attention mechanisms

1. Introduction

Power load forecasting plays an important role in the dispatching operation of the power system [1]. Based on the length of forecast time, power load forecast is divided into short-term forecast, medium-term forecast, and long-term forecast [2]. Accurate short-term load forecasting can not only help the system run safely and reliably, but also reduces resource waste and improves economic benefits [3]. As machine learning is widely used in various fields, it also performs well in power load forecasting [4,5]. Short-term power load is a non-stationary sequence, affected by natural and social factors, and usually has no obvious regularity in fluctuation trends. Short-term power load forecasting is a difficult task.

With the widespread use of deep learning in a variety of sectors, researchers have resorted to the field of deep learning to develop an algorithm model that fits the needs of power load forecasting. LSTM overcomes the time problem that neural networks cannot manage in deep learning. GRU simplifies LSTM, shortens training time, and boosts prediction accuracy [6]. However, the GRU model cannot adequately represent the time series’ local correlation properties [7,8]. This problem can be solved by combining GRU and CNN. To increase prediction accuracy, the CNN–GRU model can combine the benefits of both the CNN and GRU algorithms [9,10]. When the time series of load data is too long, though, significant information and characteristic values are lost. The attention mechanism can be utilized to tackle the challenges listed above [11]. However, the model’s structure is complex, the operation speed is slow, and the effect is poor when dealing with large amounts of power load data. The CGA model is shown in this study to successfully optimize model structure and increase prediction accuracy. However, the CGA model cannot decrease the random fluctuation of load and the forecast results cannot match the short-term power load accuracy criteria.

In view of the complex fluctuation of short-term power load, Gao et al. proposed to use EMD (empirical mode decomposition) to decompose the power load data and then use the GRU (Gated Recurrent Unit) algorithm to predict the decomposition items, respectively [12]. Since the presence of the model mixin in EMD affects prediction accuracy, studies [13,14,15,16] used EEMD (ensemble empirical mode decomposition) to decompose the raw load data. While empirical mode decomposition prevents model mixin, adding white noise to EEMD and EEMD during decomposition can create end effects and cause distortion. VMD (Variational mode decomposition) enables the effective separation of the intrinsic modal components and the division of the frequency domain of the signal to avoid distortions caused by the end effect [17,18,19]. However, the number of iterations K of the VMD and the penalty factor α determine the effect of the decomposition. Too large K or α can easily lead to excessive decomposition of mode mixture; too small K or α can easily lead to insufficient decomposition. Study [20] is to determine the parameters K and α based on experience or comparison, which is subjective and random. In recent years, some studies have proposed algorithms to determine the optimal parameters of VMD using a particle swarm optimization algorithm, a genetic algorithm, etc. These methods easily fall prey to the local optimal solution, which makes the prediction effect inaccurate [21,22,23].

In 2022, Zhang et al. proposed the sparrow algorithm to optimize VMD. However, the fitness function only considers the envelope entropy [24]. Without taking into account correlations and underlying regularities, this decomposition can be disruptive to raw power load data. In some studies, IFM values with low correlation were discarded, thus destroying the inherent regularity and integrity of the electrical load [25]. Roohollah et al. proposed clustering after decomposition to predict the clustering results. This method takes into account the correlation and entropy of the values after decomposition. However, clustering after decomposition not only adds a clustering link but also cannot guarantee that the decomposition process will not destroy the inherent regularity of the power load.

In recent years, the rapid development and wide application of deep learning have made it a hot spot in the field of load forecasting [26,27,28,29,30]. Within this field, the CNN–GRU model is widely used in forecasting due to its prediction accuracy and high efficiency [31,32]. Although the CNN (Convolutional Neural Network) extracts the features of the data and the GRU extracts potential regularities in the CNN–GRU model, the weights assigned to all feature values are unchanged. This study refers to a CNN–GRU model under an attention mechanism (CGA) which assigns different weights to different feature values.

In summary, a new combined power load forecasting model of Vibrational Modal Decomposition (VMD) based on the Sparrow Search Algorithm (SSA) and the CNN–GRU model under an attention mechanism are proposed. First, we address the problems of modal aliasing and end effects for EMD. We propose using the SSA algorithm to optimize the VMD algorithm, while using the fitness function to consider the correlation coefficient, permutation entropy, and aggregation algebra. We propose that this method will help avoid the randomness and subjectivity of decomposition results caused by artificially setting parameters. Secondly, a CNN–GRU model under the attention mechanism (CGA) is proposed to solve the problem of weight assignment of important features.

The following is the content arrangement of this study. Theoretical knowledge regarding VMD, SSA–VMD, and CGA is presented in Section 2. The results of the load forecasting are introduced in Section 3. Section 4 provides a conclusion.

2. Methods

The VMD, PSVMD, CGA model principle, mathematical theory, model of PSVMD–CGA model, and Model Evaluation Indicators are all introduced in this part.

2.1. VMD Methods

VMD is an adaptive decomposition method for non-smooth signals, which can determine the number of modal decompositions according to the actual situation of the sequence [33]. The optimal solution is obtained by adaptively matching the frequency bandwidth of each mode to the optimal frequency bandwidth of each class of modes during the solution process. The intrinsic mode function can be regarded as containing several AM–FM components u_k(t), which have limited bandwidth and central frequency.

u_{k} (t) = A_{k} (t) \cos (ω_{k} (t))

(1)

where A_k(t) is the instantaneous amplitude of u_k(t) and ω_k(t) is the instantaneous power of u_k(t).

Equation (1) is the expression of intrinsic mode function. It can be seen from this equation that the intrinsic mode function is a modulation function. At this time, the Hilbert transform is used for u_k(t) to obtain the unilateral spectrum after the intrinsic mode function analysis:

(δ (t) + \frac{j}{π t}) u_{n} (t)

(2)

By estimating the exponential function e^−jωk of the center frequency ω_k, the spectrum of each intrinsic mode function from Equation (1) can be modulated to the fundamental frequency band, namely:

((δ (t) + \frac{j}{π t}) u_{n} (t)) e^{- j ω_{n} t}

(3)

In this case, the variational mode decomposition becomes the problem of constructing and solving constraints, that is, decomposing the original signal into several intrinsic mode function (IMF) components. This constraint condition can be expressed as:

\{\begin{matrix} \min_{|u_{n}|, |w_{n}|} = \{\sum_{n = 1}^{N} ((δ (t) + \frac{j}{π t}) u_{n} (t)) e^{- j ω_{n} t}_{2}^{2}\} \\ \sum_{n = 1}^{N} u_{n} (t) = f (t) \end{matrix}

(4)

where {u_k} = {u₁, u₂, …, u_k} is the decomposed eigenfunction component, {ω_k} = {ω₁, ω₂, …, ω_k} is the center frequency of the corresponding component, f is the input signal, and δ(t) is the unit pulse function.

The constrained problem is transformed into a non-constrained problem by introducing the quadratic penalty term and Lagrange multiplier operator, and the optimal solution of the model is calculated. The results are as follows:

L (|u_{n}|, |w_{n}|, λ) = α \sum_{n = 1}^{N} {‖\partial t [(δ (t) + \frac{j}{π t}) \otimes u_{n} (t)] e^{- j ω_{n} t}‖}_{2}^{2} + ‖f (t) - \sum_{n = 1}^{N} u_{n} (t)‖ + 〈λ (t), f (t) - \sum_{n = 1}^{N} u_{n} (t)〉

(5)

where α is a quadratic penalty factor and λ(t) is the Lagrange factor.

By using the Alternate Direction Method of Multipliers (ADMM), the values of

u_{k}^{n + 1}

,

ω_{n}^{k + 1}

and λⁿ⁺¹ are updated, and the extreme points of the augmented Lagrange function are calculated to decompose the original signal into k intrinsic mode function.

The updated function of

u_{k}^{n + 1}

is as follows:

u_{k}^{n + 1} = a r g m i n \{α \partial_{t} [((δ (t) + \frac{j}{π t}) * u_{k} (t))] e^{- j ω_{k} t}_{2}^{2} + f (t) - \sum_{k} u_{k} (t) + {\frac{λ (t)}{2}}_{2}^{2}\}

(6)

The equation is transformed from time domain to frequency domain by an equidistant Fourier transform. The frequency domain iterative equation is as follows:

u_{n}^{k + 1} (ω) = \frac{\overset{⏜}{f} (w) - \sum_{n = 1}^{N} {\overset{⏜}{u}}_{n} (ω) + \frac{\overset{⏜}{λ}}{2}}{1 + 2 α {(ω - ω_{n})}^{2}}

(7)

Similarly, the central frequency

ω_{n}^{k + 1}

can be converted to the frequency domain, and the iterative Equation is:

ω_{n}^{k + 1} = \frac{\int_{0}^{\infty} ω {|{\overset{⏜}{u}}_{n} (ω)|}^{2} d ω}{\int_{0}^{\infty} {|{\overset{⏜}{u}}_{n} (ω)|}^{2} d ω}

(8)

λ can be updated by Equation (9):

{\hat{λ}}^{n + 1} (ω) \leftarrow {\hat{λ}}^{n} (ω) + τ (f (ω) - \sum_{k} u_{k}^{\bar{n + 1}} (ω))

(9)

The termination condition of parameter iteration is set as the iteration accuracy ε > 0. When the iteration satisfies Equation (10) below, the iteration is terminated, and K IMF components are obtained.

\sum_{k} {‖{\hat{u}}_{k}^{n + 1} - {\hat{u}}_{k}^{n}‖}_{2}^{2} / {‖{\hat{u}}_{k}^{n}‖}_{2}^{2} < ε .

(10)

2.2. PSVMD Principle

According to the principle of the VMD algorithm, it is required to manually pre-set the number of modalities K and the quadratic penalty term α before decomposing the signal. In the traditional VMD algorithm, improper values of K and α will lead to under-decomposition or over-decomposition. This situation will cause inaccurate prediction results of power load data. Therefore, the subjectivity of artificially setting parameters will directly affect whether VMD can be decomposed correctly.

The sparrow search algorithm (SSA) can adaptively adjust the VMD parameters K and α according to the changing characteristics and complexity of the signal. Thus, the problem of relying on subjective judgment to determine VMD parameters is solved.

The SSA was first proposed in 2020 as a new heuristic algorithm. It is mainly inspired by the foraging and anti-predation behavior of sparrows and divides groups into discoverers, followers, and watchers [34,35].

The SSA divides different individuals into discoverers and followers. Discoverers actively find the target and followers follow the discoverers. Followers and guards are also joined during this period. The discoverer’s location is updated as follows:

X_{i, j}^{t + 1} = \{\begin{array}{l} _{i, j}^{t} \cdot \exp (\frac{- i}{α \cdot i {ter}_{\max}}) & X if R_{2} < S T \\ X_{i, j}^{t} + Q \cdot L & if R_{2} \geq S T \end{array}

(11)

where

X_{i, j}^{t + 1}

denotes the location of the j-th dimension of the i-th sparrow at the t-th iteration. α is a random number between (0, 1]; Tmax is the maximum number of iterations; R₂ and ST are the warning and safety thresholds, respectively (where R₂ ∈ [0, 1]; ST ∈ [0.5, 1]); Q is a random number; and L is a d-dimensional column matrix with elements of 1. When R₂ ≥ ST, the warning is triggered and a sparrow has found the predator; at this time all sparrows have to leave the warning area. When R₂ < ST, the state is safe and discoverers can continue to search in a wide range.

At this point the follower locations are updated as follows:

X_{i, j}^{t + 1} = \{\begin{array}{l} Q \cdot \exp (\frac{x_{w o r s t, j}^{t} - x_{i, j}^{t}}{i^{2}}) if i > \frac{n}{2} \\ X_{p}^{t + 1} + |x_{i, j}^{t} - x_{p}^{t + 1}| \cdot A^{+} \cdot L \begin{matrix}  \end{matrix} e l s e \end{array}

(12)

where

x_{p}^{t + 1}

denotes the best location of discovers in the iteration. X_worst is the worst position in the current global situation.

A^{+} = A^{T} {(A A^{T})}^{- 1}

, A is a 1 by d matrix with random magnitude of 1 or −1 for each element.

X_P is the optimal position of the current discoverer, and the alert is randomly produced by individuals at the group boundary, accounting for 10–20% of the total population. Their location is updated as follows:

X_{i, j}^{t + 1} = \{\begin{array}{l} X_{best}^{t} + β \cdot |X_{i, j}^{t} - X_{best}^{t}| if f_{i} > f_{g} \\ X_{i, j}^{t} + K \cdot (\frac{|X_{i, j}^{t} - X_{worst}^{t}|}{(f_{i} - f_{w}) + ε}) if f_{i} = f_{g} \end{array}

(13)

where X_best represents the current global optimal position. β is the step size control parameter, which is subject to the standard normal distribution.

f_{i}

is the fitness value of

X_{i, j}^{t}

;

f_{g}

is the current optimal fitness value; and

f_{w}

is the current worst fitness value. K

\in

[0, 1] is the direction in which the sparrow moves. ε is a minimal constant to avoid zero division error. The initialization parameters of the SSA algorithm are shown in Table 1.

In this study, the SSA is used to iteratively optimize the VMD parameter combination [K, α] as the position of the sparrow population, comprehensively consider the three VMD evaluation indicators of permutation entropy, aggregation algebra, and the Pearson correlation coefficient, and construct the fitness function of SSA optimization. Permutation entropy is used to measure the complexity of time series data and is very sensitive to local changes. It is mainly used to detect the randomness of time series, is suitable for the analysis of non-stationary signals, and has good robustness. The entropy value of the signal determines its randomness: the larger the IMF entropy, the stronger the randomness of the signal [36,37,38]; the smaller the IMF entropy, the more regular and orderly the signal. The CNN–GRU model under the attention mechanism (CGA) can provide more effective predictions for regular and ordered sequences. Therefore, it is necessary to add the permutation entropy value to the fitness function evaluation. Aggregation algebra is a parameter index in the VMD calculation process. It is defined as the length of the optimal center frequency signal after the signal is decomposed by VMD, which can reflect the frequency characteristics of each IMF. The lower the aggregation algebra, the faster the signal aggregation speed. The Pearson correlation coefficient is a measure of the degree of linear correlation between two signals and can be used as an evaluation index for the degree of correlation between the IMF and the original signal. The higher the correlation coefficient, the more original information it contains. In other studies, IMF values with little correlation were eliminated to improve the prediction accuracy. Although this method will improve the prediction accuracy to a certain extent, it will also eliminate the regular content in the original data, which has great drawbacks. Therefore, the fitness function in this study considers the Pearson correlation coefficient. The Pearson correlation coefficient, permutation entropy, and aggregation algebra all take the same weight coefficient. The fitness function is calculated as follows:

f = \min [\frac{H_{p} (m)}{P^{}} \times \lg (omega)]

(14)

H_P(m) is the permutation entropy value. P is the correlation coefficient. Omega represents the aggregation algebra of the optimal center frequency. The specific steps taken by the SSA algorithm to optimize VMD parameters are shown in Figure 1.

Randomly generate several [k, α] combination as the initial position of discoverer and follower.
Determine VMD decomposition and calculate the fitness value f and sort it.
Update the location of discovers and followers based on the warning value.
Determine VMD decomposition for each discoverer and follower position.
Randomly select the watchers and update the position.
Calculate the correlation of each IFM with the original value, omega, and the permutation entropy value.
Randomly select the watchers and update the position.
Determine whether the stop condition was met or not. In the case of yes, continue to step 9, otherwise, skip back to step 3.
Finish and obtain the min fitness.
Output the best influence parameter combination [k, α].

2.3. Principle of CGA Model

The CNN model adopts the method of local connection and weight sharing to perform higher-level processing on the original data, which can effectively and automatically extract the internal features in the data [39,40]. Its internal neural network layer is mainly composed of a convolution layer, pooling layer, and fully connected layer. This structure reduces the number of weights and reduces the complexity of the network model.

The structure diagram of GRU is shown in Figure 2. Chung et al. proposed a simplified version of the LSTM cell known as Gated Recurrent Units (GRU); it requires less training time with improved network performance [41]. In terms of operation, GRU and LSTM work similarly but the GRU cell uses one hidden state that merges the forget gate and the input gate into a single update gate. Moreover, GRU combines the hidden and cell states into one state. Therefore, the total number of gates in GRU is half of the total number of gates in LSTM, making GRU popular and a shortened variant of the LSTM cell. The two gates of GRU are update gate and reset gate.

In Figure 2, the direction of the arrow is the data flow direction, where

\otimes

is the multiplication of the matrix, δ is the activation function Sigmoid function, Tanh is the activation function, and 1-indicates that the data propagated forward by the link is 1 − Z_t. The update gate and reset gate are z_t and r_t, respectively, X_t is the input, and H_t is the output of the hidden layer. The GRU-based unit calculates H_t by the following formula:

z_{t} = σ (W^{(z)} x_{t} + U^{(z)} h_{t - 1})

(15)

r_{t} = σ (W^{(r)} x_{t} + U^{(r)} h_{t - 1})

(16)

{\tilde{H}}_{t} = \tanh (r_{t} \circ U h_{t - 1} + W x_{t})

(17)

H_{t} = (1 - z_{t}) \circ {\tilde{h}}_{t} + z_{t} \circ h_{t - 1}

(18)

H_t is the sum of the input X_t and the past hidden state

{\tilde{H}}_{t}

. U^(Z), W^(Z), U^(r), W^(r), U, and W are trainable parameter matrices.

Attention is a resource allocation mechanism that simulates attention mechanisms in the human brain, which improves the effective extraction of necessary information by cleverly and reasonably changing the attention to information, ignoring irrelevant information, and amplifying important information [42,43]. Attention allocates enough attention to key information by means of probability allocation and highlights the influence of important information, thereby improving the accuracy of the model. The attention structure is shown in Figure 3. Among them, X₁....X_n represent the input of the GRU network, H₁....H_n correspond to the hidden layer output obtained by each input through the GRU, α₁....α_n are the attention probability distribution value of the attention mechanism to the GRU hidden layer output, and Y is the output of the introduction of the attention mechanism.

In short-term power load forecasting, the historical sequence is an important piece of information that contains the load forecasting law. With its special structure, CNN can fully mine the interrelationships between data, extract more important features, and capture the periodicity in historical load data. The GRU network can model with time/series data when the load data is highly volatile and uncertain, and inputting the features extracted by CNN into the GRU can help it to better learn the periodic changes and regularities in the load data. However, when there are many input features in short-term load forecasting, the GRU network is prone to information loss and important features cannot occupy a large proportion of the network. The attention mechanism enhances the influence of important information by assigning different weights to the model input features.

The model structure proposed in this study is shown in Figure 4, which is mainly divided into an input layer, CNN layer, GRU layer, attention layer, and output layer. Each layer in the model is described as follows:

Input layer: The input layer takes historical power load data as the input of the prediction model. The load data of length n is normalized and input into the prediction model, which can be represented by $X = {[x_{1} \dots x_{t - 1}, x_{t} \dots x_{n}]}^{T}$ .
CNN layer: The CNN layer mainly performs feature extraction on the input historical sequence. CNN consists of convolutional layers, pooling layers, and fully connected layers. The model selects ReLU activation function for activation. To retain more data fluctuation information, the pooling method of pooling layer 1 and pooling layer 2 selects maximum pooling. After the processing of the convolution layer and pooling layer, the original data is mapped to the hidden layer feature space, the full connection layer structure is built to convert and output it, and the feature vector is extracted. The full connection layer uses the activation function Sigmoid. The output feature vector Hc of the CNN layer is expressed by the following formula:

$C_{1} = f (X \otimes W_{1} + b_{1}) = R e L U (X \otimes W_{1} + b_{1})$

(19)

$P_{1} = \max (C_{1}) + b_{2}$

(20)

$C_{2} = f (P_{1} \otimes W_{2} + b_{3}) = R e L U (P_{1} \otimes W_{2} + b_{3})$

(21)

$P_{2} = \max (C_{2}) + b_{4}$

(22)

$H_{C} = f (P_{2} \times W_{3} + b_{5}) = S i g m o i d (P_{2} \times W_{3} + b_{5})$

(23)

where: C₁ and C₂ are the outputs of convolution layer 1 and convolution layer 2, respectively; P₁ and P_2, respectively, output the pooling layer 1 and pooling layer 2; W₁, W_2, and W₃ are weight matrices; b₁, b₂, b₃, b_4, and b₅ are deviations; and max is a convolution operation and a maximum function.
GRU layer: The GRU layer learns the feature vectors extracted by the CNN layer. A single-layer GRU structure is built and the proposed features are fully learned to capture their internal regularity. The output of the GRU layer is recorded as H, and the output in step t is expressed as:

$h_{t} = G R U (H_{C, t - 1}, H_{C, t}), t \in [1, i] .$

(24)
Attention layer: The input of the attention mechanism layer is processed by the GRU network layer. The probability corresponding to different feature vectors is calculated according to the weight distribution principle and the optimal weight parameter matrix is continuously updated. The weight coefficient calculation formula of the attention mechanism layer can be expressed as:

$e_{t} = u \tanh (w h_{t} + b)$

(25)

$α_{t} = \frac{\exp (e_{t})}{\sum_{j = 1}^{t} e_{j}}$

(26)

$s_{t} = \sum_{t = 1}^{i} α_{t} h_{t}$

(27)
Output layer: The input of the output layer is the output of the attention mechanism layer. The output layer calculates the output $Y = {[y_{1}, y_{2} \dots \dots y_{m}]}^{T}$ through the fully connected layer and deformalizes it to obtain the final output. The prediction formula can be expressed as:

$y_{t} = S i g m o i d (w_{o} s_{t} + b_{o})$

(28)

where: y_t represents the predicted output value at time t; w_o is the weight matrix; and b_o is the deviation vector. A sigmoid function is selected as the activation function of the dense layer in this paper.

Figure 4. Structure of CNN–GRU model based on attention mechanism.

2.4. Principle of PSVMD–CGA

The overall procedure of the proposed PSVMD–CGA model is presented as follows and is illustrated in Figure 5.

The power load data is decomposed into IFM by PSVMD.
The CGA model makes individual predictions for each IMF separately.
The results of each IMF forecast are added to get the final forecast.

2.5. Model Evaluation Indicators

In this study, four evaluation indices were used to evaluate the prediction effect. The four indices are the coefficient of determination (R²), the mean absolute error (MAE), the mean absolute percentage error (MAPE), and the mean squared error (MSE). Where L_t and

{\hat{L}}_{t}

are the actual and forecasting values of the load at time t, M is the total number of data used, and p is the number of features.

R^{2} = 1 - \frac{\sum_{t = 1}^{M} {(L_{t} - {\hat{L}}_{t})}^{2}}{\sum_{t = 1}^{M} {(L_{t} - {\bar{L}}_{t})}^{2}}

(29)

M A E = \frac{\sum_{t = 1}^{M} {(L_{t} - {\hat{L}}_{t})}^{2}}{M}

(30)

M A P E = \frac{\sum_{t = 1}^{M} \frac{|(L_{t} - {\hat{L}}_{t})|}{L_{t}}}{M} \times 100 %

(31)

R M S E = \sqrt{\frac{\sum_{i = 1}^{M} {(L_{t} - {\hat{L}}_{t})}^{2}}{M}}

(32)

3. Results

3.1. CGA Model Prediction Results Analysis

The CGA model and the machine learning prediction model are compared and alluded to in the short-term power load forecast results in this chapter. The forecast time is 300 sample points, with a half-hour gap between each sampling point. The forecast period runs from 15 January 2006 through 21 January 2006. Random forest, Adaboost, and support vector machine models are among the machine learning models used in the comparison.

Figure 6 describes the prediction results of the GRU model, RF, SVR, and Adaboost models. The fitting degree between the projected results in the figure and the actual values is the worst, with the support vector machine (SVR) algorithm model having the worst fitting degree. There is sudden value at the last expected peak at the 288th to 295th sample points and the predicted value is close to 14,000 MW, which is significantly higher than the genuine value. The fitting degree of models built using the random forest (RF) and Adaboost algorithms is considerably superior to that of the SVR model. When the predictions of random forest, support vector machine, Adaboost, and GRU models are compared, the GRU algorithm has the best fitting degree and the highest accuracy between the predicted and real values.

Figure 7 describes the comparison results of RMSE and MAPE of GRU, random forest, support vector machine, and Adaboost models. Figure 8 describes the comparison results of R² and MAE of GRU, random forest, support vector machine, and Adaboost models.

Figure 7 and Figure 8 show that the RMSE value of the SVR model is 1130.96 MW; the highest MAPE value of the SVR model is 10.48%; and the MAE value is 872.84. The minimum value of the coefficient of determination is 0.28. Random forest has an RMSE of 632.53 MW, a MAPE of 5.70%, an MAE of 505.2, and an R² of 0.77. Adaboost’s RMSE, MAPE, and MAE errors are 657.83MW, 5.82%, and 518.54, respectively, and its R² is 0.75. The prediction error of the RF and Adaboost algorithm models is similar, as is the fitting degree. The GRU model has the lowest actual prediction error when compared to the SVR, RF, and Adaboost algorithm models.

Figure 9 shows the prediction results of the GRU algorithm, CNN–GRU algorithm, and CGA model.

The results show that the GRU model’s predicted results are basically lower than the true load value, and that the trend is basically consistent with the true value; the CNN–GRU model’s predicted value basically coincides with the true value at the wave trough, but the predicted result at the wave crest is more accurate than the GRU model. The CGA model provides the best fitting effect, which roughly corresponds to the actual magnitude of power load. When the true value changes, the predicted value can closely follow to provide a better fitting effect. The experimental results show that the addition of an attention mechanism increased the prediction effect of the CNN–GRU model on its original base significantly.

The comparison of each model’s error indicators is shown in Table 2. The table shows that the GRU algorithm model’s RMSE, MAPE, and MAE values are 476.20 MW, 4.76%, and 401.90, respectively, and that its R² is 0.88, making it the model with the greatest inaccuracy of all. The CNN–GRU prediction model’s RMSE, MAPE, and MAE errors are 342.61 MW, 3.17%, and 260.01, correspondingly, and its R² is 0.93. The CGA model’s R² is 0.96 and its RMSE, MAPE, and MAE are 232.27 MW, 2.14%, and 182.20, respectively.

The experimental findings demonstrate that compared to the other two models, the CGA short-term power load forecasting model has stronger assessment indicators. The CNN–GRU model decreases 133.59 MW in comparison to the GRU model in terms of RMSE, whereas the CGA model reduces 110.34 MW in comparison to the CNN–GRU model. The CNN–GRU model is 1.59% less efficient than the GRU model in terms of MAPE, while the CGA model is 1.03% more efficient. The MAE difference between the CNN–GRU and GRU models is 141.89, whereas the MAE difference between the CNN–GRU and CGA models is 77.81. Compared to the GRU model and the CNN–GRU model, the R² of the CGA is 0.08 and 0.03 points higher, respectively.

3.2. Analysis of PSVMD–CGA Model Prediction Results

Table 3 shows the results of PSVMD decomposition of power load. When decomposing the original power load data, the penalty parameter α obtained by the PSVMD method is 1357; the number of modal functions is K = 3; and the other parameters are all default values.

The decomposition values IMF₁, IMF₂, and IMF₃ derived from PSVMD decomposition are shown in Figure 10. The fluctuation degree of the three components is lower than the fluctuation degree of the original data. When compared to the GA–VMD [21] and SSA–VMD [24] decomposition findings, the number of decompositions is the smallest and the decomposition impact is the best. The bandwidth of the three decomposed IMF components is reasonable, and there are no phenomena of too broad or too narrow. The convergence curve findings suggest that the K value and penalty factor of the PSVMD-optimized VMD are effective.

Figure 11 illustrates the IMF₁ component following PSVMD’s breakdown of the initial power load. To get the component prediction results, the CGA model predicts the IMF₁ component. The graphic demonstrates that IMF₁ lacks a clear fluctuation rule and has a generally stable trend with just one full peak. In general, the anticipated result is smaller than the actual result at the trough and roughly correlates with the real value of IMF₁ at the waistline.

Figure 12 shows the fitting effect diagram of IFM₂ prediction results and the original components. IMF₂ has an obvious periodicity of 24 h. The overall error between the prediction result and the original component IMF₂ is small. At the waistline where the original component rises and falls, the two basically coincide. The prediction error is large at the wave trough and the predicted value is larger than the value of the original component IMF₂. The predicted value at the wave crest is smaller than the value of the original component IMF₂.

Figure 13 depicts the predicted results of IFM₃ components. It can be seen from the figure that the IMF₃ component fluctuates violently, and the difference between each peak value is large.

Table 4 compares the prediction errors for each PSVMD–CGA model component. IMF₁, IMF₂, and IMF₃ components’ R² values are 0.97, 0.99 and 0.92, respectively, as indicated in the table. IMF₂ has the best fitting impact, whereas IMF₃ has the worst prediction effect. IMF₁, IMF₂, and IMF₃ components’ respective MAEs are 50.0, 86.7 and 81.9. The respective MAPEs for IMF₁, IMF₂, and IMF₃ were 0.58%, 18.1% and 162.3%. IMF₁, IMF₂, and IMF₃ components have RMSEs of 56.3 MW, 105.2 MW and 99.3 MW, respectively. In conclusion, component IMF₂’s prediction result is closest to the true value, and component IMF₃’s prediction result’s error is the largest.

Figure 14 compares the prediction results obtained using the PSVMD–CGA model with those obtained using the CGA model, the GA–VMD–CGA model, the SSA–VMDSSA–VMD–CGA model, and the PSVMD–CGA model suggested in this chapter. The four models can all estimate short-term power load more accurately, as shown in the figure, but it is clear that the PSVMD–CGA algorithm has the best prediction effect at the rising waistline, where the difference between the anticipated value and the actual value is the smallest. The PSVMD–CGA model changes along with the change in the real value when the trend of the real value fluctuates, which can be anticipated in advance.

Table 5 shows the comparison of prediction errors of each model. In Table 5, the PSVMD–CGA model’s maximum R² value is 0.99, which is very near to 1. The CGA model’s minimum R² value is 0.96, while the R² for the GA–VMD–CGA and SSA–VMDSSA–VMD–CGA models is both 0.98. The CGA model’s greatest RMSE value is 232.2WM and the PSVMD–CGA model has the lowest RMSE value. The maximum MAPE of the CGA model is 2.1% and the predicted MAPE of the PSVMD–CGA model is 1.3. The MAE of the CGA prediction model is 182.2 and the minimum MAE of the PSVMD–CGA model is 113.1. In four evaluation indices, the other two algorithms perform better than the CGA model and worse than PSVMD–CGA. In MAE, MAPE, and RMSE, the GA–VMD–CGA model performs worse than the SSA–VMDSSA–VMD–CGA model.

4.: The MAE value of the GA–VMD–CGA model fell by 12.1, the MAPE value by 0.5%, and the RMSE value by 57.2 when compared to the CGA model. The R² of the GA–VMD–CGA increased by 0.2 to 0.98 during the same period. This outcome demonstrates how “decomposition integration” can be used to effectively increase the algorithm’s predictive power and decrease its error.
5.: The MAE value of the SSA–VMDSSA–VMD–CGA model decreased by 46.9, the MAPE value by 0.1%, and the RMSE value by 8.9 when compared to the GA–VMD–CGA model. According to the experimental findings, SSA–VMD performs better in power load breakdown than GA–VMD. The effectiveness and superiority of the sparrow optimization algorithm in optimizing VMD parameters are confirmed when compared to genetic algorithms.
6.: The PSVMD-three CGA’s forecasting errors are the lowest, and the PSVMD-forecasted CGA’s value and actual power load value fit together the best. The MAE value decreased by 10.1, the MAPE value decreased by 0.2%, and the RMSE value decreased by 16MW as compared to the SSA–VMDSSA–VMD–CGA model. R² increased by 0.1. This demonstrates that the Sparrow Optimization Algorithm (PSVMD) with the CGA model and the fitness functions of permutation entropy, aggregation algebra, and correlation coefficient provides the best prediction effect and the smallest error.

3.3. Instance Validation

This study anticipates partial load data in Australia in winter and summer, as well as load data in Quanzhou, Fujian, China, to assess the application breadth and prediction capabilities of the PSVMD–CGA model. Table 6 displays partial load and impact factor statistics for a specific location in Australia. The partial data for Australia in the winter of 2010 and the partial data for Australia in the summer of 2010 are projected based on this. The winter prediction period is from 5 July to 16 July 2010; the summer forecast period is from 4 February to 15 February 2010.

Figure 15 shows the comparison between the predicted results of the Australian summer CGA model and the PSVMD–CGA model. Figure 16 depicts a comparison of the forecast results of the CGA model and the PSVMD–CGA model in Australia throughout the winter. The two figures show that the discrepancy between the projected value of the CGA model and the actual value at the peak and trough is significant, and it cannot be properly fitted. However, the projected value of the PSVMD–CGA model is somewhat different from the real value at the wave peak in summer or winter, indicating that the PSVMD–CGA model’s prediction ability is greater than that of the CGA model.

Table 7 compares the prediction errors of the PSVMD–CGA model with the CGA model in winter and summer. The table shows that the PSVMD–CGA model has been greatly improved over the original CGA model in both winter and summer. The summer applicability of the PSVMD–CGA model is strong, whereas the winter applicability of the CGA model is strong. PSVMD–CGA has a larger prediction impact in the summer than in the winter. The improved impact is greater when compared to the CGA model. The explanation for this might be that the power load is influenced by more factors during the summer. In the summer, the PSVMD–CGA model is more relevant and has a stronger prediction impact.

Table 8 shows the partial load and influence factor data of Quanzhou City, Fujian Province, China, with the sampling interval of 15 min. The data is from the State Grid of China, and the meteorological data is from the China Meteorological Administration.

Figure 17 depicts a comparison between the CGA model and the PSVMD–CGA model. The figure shows that the predicted value of the CGA model is smaller than the actual value at the wave crest and larger than the actual value at the wave trough, indicating a poor fitting effect, whereas the predicted value of the PSVMD–CGA model is smaller than the actual value at the wave crest, indicating a better fit to the true value. It demonstrates that the PSVMD–CGA model has high prediction capacity and is widely applicable in China.

Table 9 compares the prediction errors of the PSVMD–CGA model with the CGA model in Quanzhou. According to the table, R² of the PSVMD–CGA model rose by 0.02, MAE reduced by 63.21, MAPE decreased by 1.14%, and RSME decreased by 77.73 MW on the basis of the CGA model. It demonstrates that the CGA model and the PSVMD–CGA model described in this study may be utilized to forecast more accurately in Quanzhou, Fujian. PSVMD–CGA, on the other hand, has a better prediction impact. Its prediction performance is greatly increased when compared to the CGA algorithm.

4. Conclusions

A novel load forecasting model, PSVMD–CGA, is proposed in this paper. The power load data is first decomposed using PSVMD into several breakdown quantities, then the CGA model is used to forecast the decomposition quantities, and lastly the predicted values are overlaid. To begin, the CGA model is introduced to solve the problems that it is difficult to use the GRU algorithm to properly extract feature laws from historical power load data and that the time series is too long to ensure that the essential features are assigned the appropriate weight. The PSVMD–CGA model overcomes the problems that the CGA model cannot minimize the complexity of load data and that it has a low forecast accuracy. It is suggested to employ permutation entropy, aggregation algebra, and the correlation coefficient as the fitness function due to the artificial and arbitrary nature of the VMD algorithm’s parameter selection. In comparison to GA–VMDand SSA–VMD, PSVMD converges more quickly and faces more difficulty in achieving a local optimal solution. The PSVMD–CGA model indicated that MAE fell by 288.8%, MAPE decreased by 3.46%, RMSE decreased by 326.1 MW, and R2 increased to 0.99 when compared to the CGA model. The experimental results fully demonstrate the reliability and effectiveness of the PSVMD–CGA model proposed in this paper for load forecasting, which shows the strong generalization ability and robustness of the model.

Author Contributions

Conceptualization, Y.H.; methodology, X.H.; software, X.H.; validation, X.H.; formal analysis, XH.; investigation, X.H.; resources, J.S.; data curation, X.H.; writing—original draft preparation, X.H.; writing—review and editing, X.H.; visualization, X.H.; supervision, X.H.; project administration, X.H.; funding acquisition, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program (2019YFC1904304).

Institutional Review Board Statement

Not applicable for studies not involving humans or animals.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patient(s) to publish this paper.

Data Availability Statement

Restrictions apply to the availability of this data. Data was obtained from State Grid of China. The data can be obtained in the following: (https://xs.xauat.edu.cn/info/1208/2122.htm, accessed on 1 February 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

CNN	Convolution Neural Network
GRU	Gated Recurrent Unit
EMD	Empirical mode decomposition
EEMD	Ensemble empirical mode decomposition
VMD	Vibrational Modal Decomposition
IMF	Intrinsic mode function
PSVMD	VMD optimized by new sparrow algorithm
PSVMD–CGA	CGA after VMD optimized by new sparrow algorithm
GA–VMD–CGA	CGA after VMD optimized by GA
SSA–VMD–CGA	CGA after VMD optimized by sparrow algorithm
MAE	Mean absolute error
MAPE	Mean absolute percentage error
RMSE	Root Mean Square Error
R²	Determination coefficient
u_k(t)	Modal function of VMD technology
A_k(t)	Instantaneous amplitude of u_k(t)
ω_k(t)	Instantaneous power of u_k(t)
ω_k	Center frequency
IMF	Intrinsic mode function
ƒ	Input signal
δ(t)	Unit pulse function
λ(t)	Lagrange factor
Xij	Individual position of sparrow
X_worst	Worst position of of Sparrow flock
R₂	Early warning value
ST	Safety value
iter_max	Maximum Iterations
X_best	Best position of of Sparrow flock
δ	activation function
Tanh	activation function
X_t	input of GRU
H_t	sum of the input X_t
W	trainable parameter matrices
C	outputs of convolution layer
P	output of pooling layer

References

Salem, J.I.; Vaclav, S.; Stanislav, M. Intelligent Systems for Power Load Forecasting: A Study Review. Energies 2020, 13, 6105. [Google Scholar]
Fang, J.; Shen, D.; Li, X.; Li, H. An efficient power load forecasting model based on the optimized combination. Mod. Phys. Lett. B 2020, 34, 2050114. [Google Scholar] [CrossRef]
Jin, Y.; Guo, H.; Wang, J.; Song, A. A Hybrid System Based on LSTM for Short-Term Power Load Forecasting. Energies 2020, 13, 6241. [Google Scholar] [CrossRef]
Jayasudha, M.; Elangovan, M.; Mahdal, M.; Priyadarshini, J. Accurate Estimation of Tensile Strength of 3D Printed Parts Using Machine Learning Algorithms. Processes 2022, 10, 1158. [Google Scholar] [CrossRef]
Gupta, K.K.; Kalita, K.; Ghadai, R.K.; Ramachandran, M.; Gao, X.-Z. Machine Learning-Based Predictive Modelling of Biodiesel Production—A Comparative Perspective. Energies 2021, 14, 1122. [Google Scholar] [CrossRef]
Umar, J.; Khalid, I.; Muhammad, J.; Ikramullah, K.; Ejaz, A.A.; Khurram, S.Z.; Muhammad, N.R.; Noman, S. A novel short receptive field based dilated causal convolutional network integrated with Bidirectional LSTM for short-term load forecasting. Expert Syst. Appl. 2022, 205, 117689. [Google Scholar]
Meng, X.; Zhu, T.; Li, C. Construction of perfect dispatch learning model based on adaptive GRU. Energy Rep. 2022, 8, 668–677. [Google Scholar] [CrossRef]
JunKi, H. Vibration Prediction of Flying IoT Based on LSTM and GRU. Electronics 2022, 11, 1052. [Google Scholar]
Lipeng, J.; Chenqi, F.; Zheng, J.; Yicheng, S.; Shun, W.; Li, T. Short-Term Canyon Wind Speed Prediction Based on CNN—GRU Transfer Learning. Atmosphere 2022, 13, 813. [Google Scholar]
Wu, L.; Kong, C.; Hao, X.; Chen, W. A Short-Term Load Forecasting Method Based on GRU-CNN Hybrid Neural Network Model. Math. Probl. Eng. 2020, 2020, 1428104. [Google Scholar] [CrossRef]
Yu, E.; Xu, G.; Han, Y.; Li, Y. An efficient short-term wind speed prediction model based on cross-channel data integration and attention mechanisms. Energy 2022, 256, 124569. [Google Scholar] [CrossRef]
Gao, X.; Li, X.; Zhao, B.; Ji, W.; Jing, X.; He, Y. Short-Term Electricity Load Forecasting Model Based on EMD-GRU with Feature Selection. Energies 2019, 12, 1140. [Google Scholar] [CrossRef]
Yu, H.-J.; Wei, H.-J.; Li, J.-M.; Zhou, D.-P.; Wei, L.-D.; Liu, H.; Concli, F. Lubrication State Recognition Based on Energy Characteristics of Friction Vibration with EEMD and SVM. Shock Vib. 2021, 2021, 9972119. [Google Scholar] [CrossRef]
He, Y.; Wang, Y. Short-term wind power prediction based on EEMD–LASSO–QRNN model. Appl. Soft Comput. J. 2021, 105, 107288. [Google Scholar] [CrossRef]
Jia, Y.; Li, G.; Dong, X.; He, K. A novel denoising method for vibration signal of hob spindle based on EEMD and grey theory. Measurement 2021, 169, 108490. [Google Scholar] [CrossRef]
Wang, C.; Zhang, H.; Fan, W.; Ma, P. A new chaotic time series hybrid prediction method of wind power based on EEMD-SE and full-parameters continued fraction. Energy 2017, 138, 977–990. [Google Scholar] [CrossRef]
Vijaya, K.R.; Mishra, S.P.; Jyotirmayee, N.; Dash, P.K. Adaptive VMD based optimized deep learning mixed kernel ELM autoencoder for single and multistep wind power forecasting. Energy 2022, 244, 122585. [Google Scholar]
Zhong, J.; Gou, X.; Shu, Q.; Liu, X.; Zeng, Q. A FOD Detection Approach on Millimeter-Wave Radar Sensors Based on Optimal VMD and SVDD. Sensors 2021, 21, 997. [Google Scholar] [CrossRef]
Zhang, Y.; Pan, G.; Chen, B.; Han, J.; Zhao, Y.; Zhang, C. Short-term wind speed prediction model based on GA-ANN improved by VMD. Renew. Energy 2020, 156, 1373–1388. [Google Scholar] [CrossRef]
Zhou, M.; Hu, T.; Bian, K.; Lai, W.; Hu, F.; Hamrani, O.; Zhu, Z. Short-Term Electric Load Forecasting Based on Variational Mode Decomposition and Grey Wolf Optimization. Energies 2021, 14, 4890. [Google Scholar] [CrossRef]
Li, Y.; Tang, B.; Jiang, X.; Yi, Y. Bearing Fault Feature Extraction Method Based on GA-VMD and Center Frequency. Math. Probl. Eng. 2022, 2022, 2058258. [Google Scholar] [CrossRef]
Zhang, Q.; Chen, S.; Fan, Z.P. Bearing fault diagnosis based on improved particle swarm optimized VMD and SVM models. Adv. Mech. Eng. 2021, 13, 16878140211028451. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, Y. Harmonic detection method based on permutation entropy and variational modal decomposition optimized by genetic algorithm. Rev. Sci. Instrum. 2021, 92, 025118. [Google Scholar] [CrossRef]
Ren, Y.; Zhang, L.; Chen, J.; Liu, J.; Liu, P.; Qiao, R.; Yao, X.; Hou, S.; Li, X.; Cao, C.; et al. Noise Reduction Study of Pressure Pulsation in Pumped Storage Units Based on Sparrow Optimization VMD Combined with SVD. Energies 2022, 15, 2073. [Google Scholar] [CrossRef]
Yang, D.; Guo, J.-E.; Sun, S.; Han, J.; Wang, S. An interval decomposition-ensemble approach with data-characteristic-driven reconstruction for short-term load forecasting. Appl. Energy 2022, 306, 117992. [Google Scholar] [CrossRef]
Keshvari, R.; Imani, M.; Parsa Moghaddam, M. A clustering-based short-term load forecasting using independent component analysis and multi-scale decomposition transform. J. Supercomput. 2022, 78, 7908–7935. [Google Scholar] [CrossRef]
Shohan, M.J.A.; Faruque, M.O.; Foo, S.Y. Forecasting of Electric Load Using a Hybrid LSTM-Neural Prophet Model. Energies 2022, 15, 2158. [Google Scholar] [CrossRef]
Liu, Y.; Chai, T.; Zhang, Z.; Long, G. Towards Electricity Price and Electric Load Forecasting Using Multi-task Deep Learning. J. Phys. Conf. Ser. 2022, 2171, 012048. [Google Scholar] [CrossRef]
Machado, E.; Pinto, T.; Guedes, V.; Morais, H. Electrical Load Demand Forecasting Using Feed-Forward Neural Networks. Energies 2021, 14, 7644. [Google Scholar] [CrossRef]
Sajjad, M.; Khan, Z.A.; Ullah, A.; Hussain, T.; Ullah, W.; Lee, M.; Baik, S.W. A Novel CNN-GRU based Hybrid Approach for Short-term Residential Load Forecasting. IEEE Access 2020, 8, 143759–143768. [Google Scholar] [CrossRef]
Ding, A.; Zhang, Y.; Zhu, L.; Li, H.; Huang, L. Intelligent recognition of rough handling of express parcels based on CNN-GRU with the channel attention mechanism. J. Ambient Intell. Humaniz. Comput. 2021, in press. [Google Scholar] [CrossRef]
Yu, J.; Zhang, X.; Xu, L.; Dong, J.; Zhangzhong, L. A hybrid CNN-GRU model for predicting soil moisture in maize root zone. Agric. Water Manag. 2021, 245, 106649. [Google Scholar] [CrossRef]
Gendeel, M.; Yuxian, Z.; Aoqi, H. Performance comparison of ANNs model with VMD for short-term wind speed forecasting. IET Renew. Power Gener. 2018, 12, 1424–1430. [Google Scholar] [CrossRef]
Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
Zhang, H.; Peng, Z.; Tang, J.; Dong, M.; Wang, K.; Li, W. A multi-layer extreme learning machine refined by sparrow search algorithm and weighted mean filter for short-term multi-step wind speed forecasting. Sustain. Energy Technol. Assess. 2022, 50, 101698. [Google Scholar] [CrossRef]
Little, D.J.; Kane, D.M. Permutation entropy of finite-length white-noise time series. Phys. Rev. E 2016, 94, 022118. [Google Scholar] [CrossRef]
Wu, S.-D.; Wu, C.-W.; Humeau-Heurtier, A. Refined scale-dependent permutation entropy to analyze systems complexity. Phys. A Stat. Mech. Its Appl. 2016, 450, 454–461. [Google Scholar] [CrossRef]
Amigó, J.M.; Keller, K. Permutation entropy: One concept, two approaches. Eur. Phys. J. Spec. Top. 2013, 222, 263–273. [Google Scholar] [CrossRef]
Hsu, T.Y.; Huang, C.W. Onsite Early Prediction of PGA Using CNN With Multi-Scale and Multi-Domain P-Waves as Input. Front. Earth Sci. 2021, 9, 626908. [Google Scholar] [CrossRef]
Shin, G.; Lee, S.-H. Implementation of Voice Recognition Via CNN and LSTM. Int. J. Innov. Technol. Explor. Eng. 2020, 9, 1842–1844. [Google Scholar] [CrossRef]
Chung, J.; Gülçehre, Ç.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Dong, Y.; Wang, Z.; Du, J.; Fang, W.; Li, L. Attention-based hierarchical denoised deep clustering network. World Wide Web 2022, 26, 441–459. [Google Scholar] [CrossRef]
Li, J.; Liu, Y.; Li, Q. Intelligent fault diagnosis of rolling bearings under imbalanced data conditions using attention-based deep learning method. Measurement 2022, 189, 110500. [Google Scholar] [CrossRef]

Figure 1. Flowchart of SSA optimization VMD parameters.

Figure 2. Basic structural unit of GRU.

Figure 3. Attention mechanism structure.

Figure 5. Overall process of the proposed PSVMD–CGA model.

Figure 6. Comparison of machine learning algorithm and GRU prediction results.

Figure 7. Comparison between machine learning algorithms and GRU evaluation indicators (RMSE, MAPE).

Figure 8. Comparison between machine learning algorithms and GRU evaluation indicators (MAE, R²).

Figure 9. Comparison of prediction results by models.

Figure 10. PSVMD decomposition result.

Figure 11. IMF₁ prediction results after PSVMD decomposition.

Figure 12. IMF₂ prediction results after PSVMD decomposition.

Figure 13. IMF₃ prediction results after PSVMD decomposition.

Figure 14. Prediction results comparison of PSVMD–CGA and other models.

Figure 15. Comparison of forecast results of various models in Australia in summer.

Figure 16. Comparison of forecast results of various models in winter in Australia.

Figure 17. Comparison of prediction results of various models in Quanzhou.

Table 1. The parameter initialization of SSA algorithm.

iter_max	C	R₂	K	α
20	20	0.7	[2, 10]	[400, 3000]

Table 2. Comparison of model error metrics.

Algorithm Model	RMSE/MW	MAPE (%)	MAE	R²
GRU	476.20	4.76	401.90	0.88
CNN–GRU	342.61	3.17	260.01	0.93
CGA	232.27	2.14	182.20	0.96

Table 3. Adaptive parameters during PSVMD decomposition.

K	α	Fitness Function
3	1357	0.1334

Table 4. Error comparison of PSVMD–CGA components and prediction after superposition.

Decomposition Item	MAE	MAPE (%)	RMSE	R²
IMF₁	50.0	0.58	56.3	0.97
IMF₂	86.7	18.1	105.2	0.99
IFM₃	81.9	162.3	99.3	0.92

Table 5. Comparison of prediction errors of each model.

Algorithm Model	MAE	MAPE (%)	RMSE/MW	R²
CGA	182.2	2.1	232.2	0.96
GA–VMD–CGA	170.1	1.6	175.0	0.98
SSA–VMD–CGA	123.2	1.5	166.1	0.98
PSVMD–CGA	113.1	1.3	150.1	0.99

Table 6. Partial load data and influencing factors in Australia.

Load (KW)	Dry Bulb Temperature (°C)	Dew Point Temperature (°C)	Wet Bulb Temperature (°C)	Humidity	Electricity Price (AUD/GJ)
7242.72	21.3	19.2	20	88	19.42
6878.13	19.8	16.8	18	83	18.08
6674.12	19.5	16.25	17.55	81.5	16.09
6468.76	19.2	15.7	17.1	80	11.92
6378.27	19.3	14.55	16.55	74	11.49
6382.87	19.4	13.4	16	68	11.96
6398.17	19.65	12.8	15.8	64	12.17
6404.88	19.9	12.2	15.6	61	12.22
6681.83	19.9	12.6	15.8	62.5	13.21
6996.4	19.9	13	16	64	19.03
7672.73	20	13.05	16.05	64	22
8226.14	20.1	13.1	16.1	64	22
8671.97	20	13.35	16.2	65.5	22.06
9138.32	19.9	13.6	16.3	67	22

Table 7. Comparison of prediction errors of models in winter and summer.

Algorithm Model	Season	MAE	MAPE (%)	RMSE/MW	R²
CGA	winter	220.79	2.37	279.27	0.96
PSVMD–CGA	winter	144.79	1.49	188.49	0.98
CGA	summer	260.01	2.77	343.42	0.96
PSVMD–CGA	summer	110.77	1.22	151.39	0.99

Table 8. Partial Load and Influencing Factor Data of Quanzhou, Fujian.

Load (KW)	Maximum Temperature (°C)	Minimum Temperature (°C)	Average Temperature (°C)	Humidity	Precipitation (mm)
3967.25	19.5	12.1	15.8	63	0
4828.23	9.2	5.1	6.9	78	2.9
4845.49	11	6.3	8.2	92	3.3
4628.56	16.7	11.6	14.5	68	0.4
4546.87	14.8	11.8	13.1	72	1.8
4451.41	16.1	11.4	13.3	88	1
4323.60	19.2	13.9	15.8	90	2.2
4010.57	16.5	13.2	15	99	14.4
3595.16	15.3	12	13.4	90	9.5
1946.57	13.6	7.4	9.7	93	2.5
1908.39	8.9	5.2	7	87	7.5
1755.60	8	6	6.7	90	1.9
1859.34	7.3	4.5	6.1	77	1.3
2036.46	7.7	4.5	6.1	86	0.8
2202.77	20.2	10.6	13.8	90	0.1
3827.02	10.3	7.1	8.6	76	1.7

Table 9. Comparison of forecast errors of various models in Quanzhou.

Algorithm Model	MAE	MAPE (%)	RMSE/MW	R²
CGA	178.14	3.30	232.61	0.96
PSVMD–CGA	114.93	2.16	154.88	0.98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Su, J.; Han, X.; Hong, Y. Short Term Power Load Forecasting Based on PSVMD-CGA Model. Sustainability 2023, 15, 2941. https://doi.org/10.3390/su15042941

AMA Style

Su J, Han X, Hong Y. Short Term Power Load Forecasting Based on PSVMD-CGA Model. Sustainability. 2023; 15(4):2941. https://doi.org/10.3390/su15042941

Chicago/Turabian Style

Su, Jingming, Xuguang Han, and Yan Hong. 2023. "Short Term Power Load Forecasting Based on PSVMD-CGA Model" Sustainability 15, no. 4: 2941. https://doi.org/10.3390/su15042941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short Term Power Load Forecasting Based on PSVMD-CGA Model

Abstract

1. Introduction

2. Methods

2.1. VMD Methods

2.2. PSVMD Principle

2.3. Principle of CGA Model

2.4. Principle of PSVMD–CGA

2.5. Model Evaluation Indicators

3. Results

3.1. CGA Model Prediction Results Analysis

3.2. Analysis of PSVMD–CGA Model Prediction Results

3.3. Instance Validation

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI