Next Article in Journal
Exploring the Determinants of Digital Transformation Adoption for SMEs in an Emerging Economy
Previous Article in Journal
Integrated Nanogrid for the Impressed Current Cathodic Protection System in Desalination Plant
Previous Article in Special Issue
Forecasting Chinese Electricity Consumption Based on Grey Seasonal Model with New Information Priority
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Partially Linear Component Support Vector Machine for Primary Energy Consumption Forecasting of the Electric Power Sector in the United States

1
School of Mathematics and Physics, Southwest University of Science and Technology, Mianyang 621010, China
2
School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
3
School of Management Science and Real Estate, Chongqing University, Chongqing 400045, China
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(9), 7086; https://doi.org/10.3390/su15097086
Submission received: 21 February 2023 / Revised: 26 March 2023 / Accepted: 13 April 2023 / Published: 23 April 2023
(This article belongs to the Special Issue Development Trends of Environmental and Energy Economics)

Abstract

:
Energy forecasting based on univariate time series has long been a challenge in energy engineering and has become one of the most popular tasks in data analytics. In order to take advantage of the characteristics of observed data, a partially linear model is proposed based on principal component analysis and support vector machine methods. The principal linear components of the input with lower dimensions are used as the linear part, while the nonlinear part is expressed by the kernel function. The primal-dual method is used to construct the convex optimization problem for the proposed model, and the sequential minimization optimization algorithm is used to train the model with global convergence. The univariate forecasting scheme is designed to forecast the primary energy consumption of the electric power sector of the United States using real-world data sets ranging from January 1973 to January 2020, and the model is compared with eight commonly used machine learning models as well as the linear auto-regressive model. Comprehensive comparisons with multiple evaluation criteria (including 19 metrics) show that the proposed model outperforms all other models in all scenarios of mid-/long-term forecasting, indicating its high potential in primary energy consumption forecasting.

1. Introduction

Energy forecasting has long been a hot spot in this era of energy revolution. In most recent years, energy forecasting is already available to bring profits to enterprises by helping them make more reasonable financial plans [1]. On the other hand, energy consumption is not only an indicator for economics or finance but also an important factor for environmental issues, especially for carbon-related issues [2]. With the more diverse impact of real-world problems, energy forecasting is appealing to many researchers and engineers to make their own contributions. Topics of energy forecasting are also broadened to wider areas, such as energy consumption [3], energy production [4], energy price [5], the relationship between energy and economics and the environment [6], etc.
There is a long history of the application of primary for industrial production. Accurate forecasts of primary energy forecasting are still of great importance for making decisions in energy marketing, management, and also in the policies for pollution emissions. However, our investigation of the existing literature on energy forecasting (presented in Section 2) indicates that there are still issues in existing methods and implies that there is a research gap in the application of partially linear models for primary energy forecasting. In actuality, the time series of primary energy consumption often has very clear patterns of variation, especially for the stable economical entities in mid–short periods; therefore, it is suitable to use the deterministic models to fit such properties. On the other hand, with the development of data-capturing technologies, it is much easier to obtain more data sets to build forecasting models. Thus, it is natural to consider using machine learning models to further improve forecasting accuracy. Above all, it is more reasonable to combine the merits of these models for better practice and higher accuracy in real-world applications.
The partially linear model is a typical example of the practice of combining models with deterministic and indeterministic formulations. The earliest work using the partially linear model should be credited to Engle et al., in which a very simple combination of linear regression and a nonlinear function was used [7]. The semi-parametric support vector machines (SVM) presented by Smola and Schölkopf was the first work that used machine learning models to build such a partially linear structure [8] in a uniform way, and the linear kernel was used to represent the linear part. Espinoza et al. presented another version of a kernel-based partially linear model based on the framework of least squares support vector machines (LSSVMs) [9]. Conversely, this work uses the nonlinear kernel to represent the nonlinear part, and it also presented an analytic way of training the model for the first time. Such properties of analytical solutions make them much easier to implement more models, and several models have been developed for function estimation and system identification [10,11,12]. In the last several years, Ma et al. used a simplified formulation to build the kernel-based grey system models by regularizing all linear parameters and parameters in the feature space [13,14,15], which actually also shares the philosophy of Hammerstein system models. The work by [16] Matí also uses the method of regularizing all parameters and made it easier to train a partially linear SVM. Within the different specific ways for implementation, all of these works have proven that the kernel-based partially linear models are much more efficient in the cases in which prior knowledge is available, such as a known linear relationship between the input and output.
It can be learned from the previous works that an efficient partially linear model can be developed if the features of the data are properly treated. For instance, Xu et al. [17] pointed out that it is also reasonable to separate the linear and nonlinear functions of the input, where the partially linear LSSVM based on this idea can then outperform the other models. Enlightened by this pattern, a new partially linear SVM using principal linear components extracted using a principal component analysis (PCA) is developed, and its related theoretical and computational problems will be discussed in detail. The real-world applications of forecasting the monthly primary energy consumption of electric power sector in the US will be presented, and the proposed model will be compared with several other machine learning models that have been very popular in recent research studies.
The rest of this work is organized as follows: literature studies are presented in Section 2; preliminary examinations on the specific formulation of the partially linear model, with the related theoretical basis, and the computational details of the PCA are introduced in Section 3; a complete representation of the proposed partially linear component support vector machine (PLC-SVM) is presented in Section 4, including its formulation in primal and dual spaces and its computational details for univariate time series forecasting; the case study forecasting the monthly primary energy consumption of the electric power sector in the US based on a data set with 565 months of real-world data is presented in Section 5, along with a comprehensive comparison between different models and a detailed discussion; the conclusions are drawn in Section 6.

2. Literature Study

In this section, some recent literature on energy forecasting will be reviewed, and the details of the most commonly used structured and non-structured models for energy forecasting will be briefly summarized. A short discussion on the findings and research gaps will also be presented in the last subsection. For convenience, an overview of the main models for energy forecasting reviewed in this section is presented in Figure 1.

2.1. The Structured Models for Energy Forecasting

In this subsection, the structured models are roughly categorized into empirical models, linear models, and grey system models.
The empirical models are often presented as specific functions (see [18]), which are often built with engineering experience and are directly validated in practice. These models are often easy to use but are not suitable for very complex data sets. Recent works have paid significantly fewer attention to such models.
The linear regression (LR) and autoregressive integrated moving average (ARIMA) models both share linear structures. While the LR model only simulates a simple linear correlation between the input and output variables [19,20], the ARIMA model mainly considers the auto-correlation of the time series. The linear models are quite popular in the application of energy forecasting and have been used to forecast oil consumption [19], electricity consumption [21,22], demand [20,23], wind generation [24], total energy demand and supply [25], etc. However, the ARIMA model often suffers from “overdifferece” [26], and both of these linear models are limited in describing nonlinear data sets.
Grey system models are increasingly popular in energy forecasting. There are several techniques used in the recent literature, including designing new structures to fit the data (e.g., nonlinear whitening equations [27], time-delayed terms [28], and periodic terms [29]), using complex accumulation operators (e.g., Hausdorff fractional order accumulation [30] and buffer operators [31]), and combining grey system models with other methods (e.g., Kalman filter [32] and Markov model [33]). Researchers often use intelligence optimizers when new methods contain nonlinear parameters [27,29,30,31]. One advantage of grey system models is their ability to make reliable predictions with limited data. However, for more complex forecasting applications, the proper structure or preprocessing methods still require the experience of researchers.

2.2. The Non-Structured Models for Energy Forecasting

Non-structured models do not have deterministic structures; a complete formulation can only be determined by the data sets. Machine learning is one of the most popular non-structured models, and recent literature has shown considerable interest in the application of these models. The most popular machine learning models for energy forecasting are neural networks, support vector machines, and regression trees.
Neural networks, particularly multilayer perceptrons, remain popular for energy forecasting, with applications in areas such as electricity [34,35,36] and building energy consumption [37], ocean wave energy and photovoltaic plants generation forecasting [38], etc. Deep learning has led to the development of more complex models, such as LSTM-based networks with fully connected layers [39,40] or convolutional layers [41,42,43]. Other types of layers, such as bagged echo state networks [44], echo state networks [45], and radial belief networks [46], are also used. While these complex networks improve flexibility, they increase computational costs and require expert knowledge for the design. Thus, developing general models for energy forecasting remains challenging.
Kernel-based machine learning models, especially SVMs, remain popular for energy forecasting. Recent studies have focused on combining SVMs with evolutionary algorithms such as particle swarm optimization (PSO) [47], differential evolution (DE) [48], improved chicken swarm optimization (ICSO) [49], covariance matrix adaptation evolutionary strategy (CMAES) [50], improved fruit fly optimization (IFFO) [51], and Harris Hawks optimization [52], to optimize the hyperparameters automatically. These models are less time-consuming and have higher generality. However, partially linear kernel-based models have not been used in recent energy forecasting studies.
Many new models based on the basic regression trees have been developed in the past decade and are also widely adopted in energy forecasting, such as in carbon trading volume and price [53], building energy consumption [54], solar radiation [55], hydro-energy [56], etc. One significant merit of the regression tree-based models is that the ones with shallow structures are generally explainable. However, efficient regression trees usually become deeper with larger or more complex data sets, and a large amount of hyperparameters may also make the overall forecasting process too complex.
Hybrid models are gaining more interest in energy forecasting in both the literature and in competitions [57]. The main schemes found in the literature can be categorized into three classes. The first class is to combine the machine learning models and the preprocessing methods, such as variational mode decomposition (VMD), autoencoder [58], singular spectrum analysis (SSA) [59], wavelet transform [60], etc. The second class is to combine different machine learning models using the ensemble learning scheme [61,62,63] or multiple combining scheme [64,65], among other schemes. The third class is actually the integration of the above two schemes. In these works, the decomposition methods are often adopted, such as empirical mode decomposition (EMD) [66] and complete ensemble empirical mode decomposition (CEEMD) [67]. Despite being simple and effective, these hybrid models are more complex than other machine learning models and can lead to longer training times, less explainability, and the need for better hardware.

2.3. A Brief Summary of Literature Study

According to the literature study presented above, the research gaps can be briefly summarized in two parts: (1) In terms of methodology, machine learning models are becoming more popular in recent works for energy forecasting. However, along with the higher performance of more complex models, it raises other issues such as higher computational complexity and an incomplete framework of appropriate models in real-world applications. (2) In terms of applications, more complex models often need larger-sized data sets, and many works only present good performance in mid-/short-term predictions. The PLSVM method illustrates a new way of combining the linearity and nonlinearity of the data sets but has not been used in energy forecasting applications based on our investigation.
To fill the above research gaps, this work presents a new machine learning model for energy forecasting in real-world applications, and the main contributions can be summarized as follows:
  • A partially linear component support vector machine is developed, which uses the principal linear features of the input data set obtained by a PCA. This way will reduce the risk of multi-collinearity and keep the model as simple as possible.
  • A theoretical analysis is also presented, showing that the computational complexity of the main training process of the proposed model is in the same order as the existing SVM model.
  • A complete partially linear auto-regression scheme for out-of-sample time series forecasting is presented in a real-world application with different scenarios on forecasting the primary energy consumption of the electric power sector of the United States, showing that the proposed model outperforms the cutting-edge models, especially in mid-/long-term forecasting.

3. Preliminaries

In this section, the main idea of the partially linear model and key steps of the principal component analysis (PCA) will be briefly summarized.

3.1. Main Idea of the Partially Linear Model

One typical definition of the partially linear model is [68]
y = β T x lin + g ( x nonl ) ,
where x lin consists of the linear dimensions of the input x, x nonl consists of the nonlinear dimensions, and g ( · ) is an unknown nonlinear function. However, it has been argued that this formulation only separates the linear dimensions of the input vectors, and a more reasonable approach is to separate the linear functions of the input vector [17]. Enlightened by this idea, a simpler formulation is considered
y = β T x + g ( x ) ,
where β T x is the linear function of x and g ( x ) is an unknown nonlinear function of x.
Remark 1. 
It is well known that any differentiable real function can be written by the formulation
f ( x ) = f ( x 0 ) + D f ( x 0 ) ( x x 0 ) + R ( x x 0 )
according to Taylor’s theorem [69], where D is a differential operator (For multivariable functions, the differential operator can be written as D f = f x 1 , f x 2 , . . . , f x d T , and the products of between the vectors are inner products). This formulation can be transformed compactly by
f ( x ) = D f ( x 0 ) x + R ( x x 0 ) + f ( x 0 ) D f ( x 0 ) x 0 .
It is clear that the first term is a linear function of x and the second term is a nonlinear function (with constant bias). It is obvious that this formulation is mathematically equivalent to (2).
Based on this idea, the linear function of the input will be treated in a more direct way, and this will make it more stable than treating the linear part in a fully nonlinear way. For example, if the real nonlinearity follows a polynomial function such as
F ( x ) = a 0 + a 1 x + a 2 x 2 + a 3 x 3 ,
it will be unstable to approximate it using a full nonlinear function as the linear term a 1 x will be over-estimated. Above all, the formulation in (2) is considered to build the partially linear model in this paper.

3.2. Principal Component Analysis

As described above, a partially linear model (2) has a linear function of the input. But in real-world applications, the elements in such a linear input may have high multicollinearity, which may lead to ill-posed problems and higher computational complexity. In this work, the principal component analysis (PCA) is used to reduce the dimension of the input.
PCA is one of the most popular classical linear methods, which can efficiently extract the linear features of the input vector and make it more stable for linear function estimations. For the original input x = x 1 , x 2 , , x d T , where x i ( i = 1 , 2 , , d ) represent the elements (features) of the input, the main goal of the PCA is to find a linear transformation A that transforms the original input x into a new vector z, of which the features are linearly independent to each other. For convenience, a set of an input is denoted by
X = x 1 , x 2 , , x N
and the objective of the PCA is to find a linear matrix that satisfies:
A d × d ( X d × N U d × N ) = Z d × N
where U is the matrix of mean values of X, of which the elements are u i j = 1 N k = 1 N x k j ( i = 1 , , N , j = 1 , , d ) . The transformation matrix A can be denoted by
A = ξ 1 , ξ 2 , ߪ , ξ d
where ξ i are the eigenvectors of the auto-covariance matrix ( X U ) ( X U ) T , i.e.,
( X U ) ( X U ) T ξ i = λ ξ i .
The order of the eigenvectors is coincidental with the descending order of the corresponding eigenvalues λ i of the auto-covariance matrix of X.
The contribution ratio of the k-th linear component in the new features Z is calculated by
r k = λ k i = 1 d λ i .
The total contributions of the first k components are the sum of the first k ratios defined in (10). As the auto-covariance matrix ( X U ) ( X U ) T is a positive semi-definite symmetric matrix, all eigenvalues are non-negative; thus, the contribution ratios r k are all non-negative. Furthermore, the total contributions of the first k components are in the range [ 0 , 1 ] . Usually, if the contributions of some components are larger than a threshold r p , they can contain almost all of the information of the original samples, and these components are called the principal components.

4. The Proposed Partially Linear Component Support Vector Machines

The modeling procedures and some key notes on the theoretical basis of the proposed partially linear component support vector machines for regression will now be presented.

4.1. Partially Linear Component Model in the Feature Space

A support vector machine model for regression essentially estimates a nonlinear function in a feature space, which is defined by
y = w T φ ( x ) + b ,
where
φ : R d F
is a feature mapping that maps the vector in R d space to a feature space, and w T φ ( x ) + b is a linear approximation in the feature space of a nonlinear function, i.e., g ( x ) in (2) can be approximated in this way. Based on this idea, it is very natural to rewrite the partially linear function (2) into the following formulation
y = β T z + w T φ ( x ) + b ,
where z is a vector only containing the principal linear components corresponding to x. According to the basic principles of functional analysis, it is very easy to build a new feature space using
F ˜ = z φ ( x ) z R p , φ ( x ) F ; x R d ,
where p is the number of principal linear components and d is the dimension of x. Thus, it is very easy to define a new feature mapping ϕ : R d F ˜ using
ϕ ( x ) = z φ ( x ) .
The linear weights can then be concatenated using
ω = β w .
The partially linear model can then be compactly written in the new feature space F ˜ by
y = ω T ϕ ( x ) + b .

4.2. Partially Linear Component Support Vector Machines in Primal and Dual Formulations

Within Formula (17), the primal problem of the partially linear component support vector machine for regression (PLC-SVM) can be defined as
min ω 1 2 ω 2 + C i = 1 N ξ i + ξ i *
s . t . y i ω T ϕ ( x ) b ε + ξ i ω T ϕ ( x ) + b y i ε + ξ i * ξ i , ξ i * 0 .
This formulation shares the same primal problem of the support vector regression, which is often known as the ε -insensitive formulation. However, this formulation is not available for computation use; thus, its corresponding dual problem should be used, which is defined by Smola et al. [70]
max α , α * J ( α , α * ) = 1 2 i , j = 1 N ( α i α i * ) ( α j α j * ) ϕ T ( x i ) ϕ ( x j ) ε i = 1 N α i + α i * + i = 1 N y i α i α i *
s . t . i = 1 N α i α i * = 0 α i , α i * [ 0 , C ] .
It is very important to notice that the linear weight ω in the feature space can be expressed by the linear combination of the mapping ϕ , and the weights are the Lagrangian multipliers, i.e.,
ω = i = 1 N ( α i α i * ) ϕ ( x i ) = i = 1 N ( α i α i * ) z i i = 1 N ( α i α i * ) φ ( x i ) .
Then the partially linear function can be written as
ω T ϕ ( x ) = i = 1 N ( α i α i * ) z i T , i = 1 N ( α i α i * ) φ T ( x i ) · z j φ ( x j ) = i = 1 N ( α i α i * ) z i T z j + i = 1 N ( α i α i * ) φ T ( x i ) φ ( x j ) .
Recalling the definition of ω in (16), it is easy to notice that
β = i = 1 N ( α i α i * ) z i .
Thus the partially linear function can be rewritten as
ω T ϕ ( x ) = β T z j + i = 1 N ( α i α i * ) φ T ( x i ) φ ( x j ) .
According to the kernel trick, the inner product of a feature mapping can be expressed by a kernel function that satisfies Mercer’s condition, i.e.,
φ T ( x i ) φ ( x j ) = k ( x i , x j ) .
Noticing that the nonlinear mapping ϕ contains a linear and nonlinear part according to its definition (15), the inner products should be written as
ϕ T ( x i ) ϕ ( x j ) = z i T , φ T ( x i ) z j φ ( x j ) = z i T z j + φ T ( x i ) φ ( x j ) = z i T z j + k ( x i , x j ) .
Finally, the partially linear model can now be written as
y = ω T ϕ ( x ) + b = ( β T , w T ) z φ ( x ) + b = β T z + w T φ ( x ) + b = β T z + i = 1 N ( α i α i * ) k ( x i , x ) + b .
The Gaussian kernel (also known as the radial basis function kernel) is often used
k ( x i , x j ) = exp γ x i x j 2 ,
where γ is known as the reciprocal of the squares of the kernel width σ . The dual problem that can be used for computation can now be expressed within the inner product (25) as
max J ( α , α * ) = 1 2 i , j = 1 N ( α i α i * ) ( α j α j * ) ( Z T Z + K ) ε i = 1 N α i + α i * + i = 1 N y i α i α i *
s . t . i = 1 N α i α i * = 0 α i , α i * [ 0 , C ] ,
where K = ( k ( x i , x j ) ) N × N .
Remark 2. 
The Gram matrix Z T Z is a positive semi-definite symmetric matrix, while K is also known as a positive semi-definite symmetric matrix; thus, their addition Z T Z + K is also a positive semi-definite symmetric. Thus, the dual problem (28) satisfies the condition of the typical quadratic programming (QR), and it can be solved using sequential minimum optimization (SMO) with global convergence as proven by Takahashi et al. [71].
Within the above procedures and analysis, the overall computational steps of the proposed PLC-SVM are now clear, and a summarization is presented in the pseudo-code in Algorithm 1. The main computational steps can be roughly divided into four parts: the first part is preparing the data set and initializing the key settings; the second part utilizes the PCA to extract the principal linear components while building the kernel matrix used in (28); the third part solves the dual problem using the SMO algorithm, which has the same implementation as LibSM [72]; the last part makes predictions using the trained PLC-SVM model.
Algorithm 1: Algorithm of PLC-SVM (training and predicting).
Sustainability 15 07086 i001
Remark 3. 
The complexity of the proposed PLC-SVM model is mainly contributed by the PCA and the cost for solving the dual problem (28). The complexity of the PCA is known to be O ( d 2 · N + d 3 ) for the worse cases. The complexity of the SMO in LibSVM is between O ( N 2 ) and O ( N 3 ) .
In general, the sample size is much larger than the dimension of the input vector, i.e., N d ; therefore, the total complexity of the PLC-SVM model is generally slightly larger than the SVM model with the same hyperparameters.

4.3. Forecasting Scheme for Univariate Time Series

The proposed model presented above essentially estimates a static model describing the relationship between the input and the output. But for time series forecasting, the model should estimate the correlation between the time series and the former points. One typical formulation is the auto-regressive model, which is represented by
y t = f ( y t 1 , y t 2 , , y t τ ) .
In other words, the former series with τ points constructs a vector x t = [ y t 1 , y t 2 , , y t τ ] T , which play a role as the input of the regression models. When the function f ( · ) is nonlinear, the Equation (29) is known as the nonlinear auto-regressive model (NAR). In this regard, it is easy to use the PLC-SVM model to build such an auto-regressive model; the main difference is that PLC-SVM considers the principal linear components of the input. Thus, the final model used in this work can be written as
y t = w T z t + g ( y t 1 , y t 2 , , y t τ ) .
where z t is the vector of which the elements are the linear components that are transformed by the PCA.
A complete partially linear auto-regression forecasting scheme is presented in Algorithm 2.
When executing the forecasting procedures, the newly predicted values of y t would be added into the input at the next time step; thus the future points can be estimated using the models based on such recurrent scheme. It should be noticed that such a procedure is different to the n-step ahead forecasting, while all the future values would be forecasted only based on the in-sample data.   
Algorithm 2: Algorithmof partially linear auto-regression based on PLC-SVM.
Sustainability 15 07086 i002

5. Case Study

In this section, a real-world case study of forecasting the monthly primary energy consumption of the electric power sector in the US will be presented with three cases. The background information, evaluation metrics, and models for comparison will be introduced first, and then the results along with a discussion of the results will be presented. The general framework illustrating the overall procedures of this case study is presented in Figure 2.

5.1. Data Collection and Preprocessing

As discussed in Section 1, the primary energy consumption is of great importance for industrial economics. In this section, the real-world case of the primary energy consumption of the electric power sector in the US was considered.
The raw data of the monthly primary energy consumption from January 1973 to January 2020 were collected from the US Energy Information Administration (EIA) website (https://www.eia.gov/totalenergy/data/monthly/ Monthly Energy Review of the US, accessed on 1 March 2020). As shown in Figure 3, the data set contains 565 points of monthly primary energy consumption of the electric power sector in the US (unit: trillion Btu). The time series data were firstly reconstructed using the steps presented in line 2 to line 5 in Algorithm 2. Then, the first 90% of points are used as in-sample data, and the remaining 10% of points are used as out-of-sample data; furthermore, the first 90% as in-sample data are finally used for training the models, and the remaining 10% of in-sample data are used for validating the performance of the models. In order to make it easier to train the machine learning models, the raw data were divided by the largest value in the in-sample data before training, and the final predicted values were multiplied by the same largest value.

5.2. Models for Comparison and Evaluation Metrics

Nine models were selected for comparison with the proposed PLC-SVM, of which their information is summarized in Table 1 with descriptions of the corresponding hyperparameters. As described above, the PLC-SVM model is essentially based on the methodology of the SVM model; thus, the most closely related models are chosen for comparison. For convenience, the Gaussian kernel (27) is selected for SVM and LSSVM, and the rational quadratic kernel is selected for GPR, as suggested in [73]. On the other hand, the PLC-SVM model has a partially linear structure, and the linear auto-regressive model is also used as the baseline model for comparison.
  • AR: The linear auto-regressive model (AR) used in this work is formulated as y t = a 0 + a 1 y t 1 + a 2 y t 2 + + a τ y t τ , which can be regarded as a simplified version of PLC-SVM (without the kernel-based term and C ), and the parameters are estimated using the ordinary least squares method. With no hyperparameters, the AR does not need to be further optimized by the grid search cross-validation like in the other machine learning models.
  • SVM: The ε -insensitive support vector machine (SVM) model for regression is selected in this work, of which the modelling details are described in [70,72]. It shares the most similar regularization formulation to PLC-SVM but has no partially linear part.
  • LSSVM: The least squares support vector machine (LSSVM) model presented by Suykens in 1999 [74] is another version of SVM that uses quality constraints. The regression version of LSSVM is based on the LSSVM model for function estimation described in [75].
  • GPR: The Gaussian process regression (GPR) model also uses the kernel combinations developed from the SVM model as described in [73]; the main difference is that the GPR approach is mainly based on the Bayesian theory.
Decision tree-based models are another kind of cutting-edge method, and they are widely used in the energy forecasting fields, such as in carbon energy [53], building energy [54], solar energy [55], and hydro-energy applications [56], among others. These models all use the regression tree and ensemble learning method, such as boosting and bagging, and often have high performance in time series forecasting with high accuracy and stability and very low time cost. Thus, it will be very interesting to see whether the proposed PLC-SVM can outperform these emerging models in this case. Information on these models is listed below:
  • RF: The random forest (RF) model is one of the most classical tree-based models, which mainly ensembles the weak regressors using bagging. The general method was first proposed by Ho in 1995 [76], and a complete work was first presented by Breiman in 2001 [77].
  • XGB: The extreme gradient boosting (XGB) model was proposed by Chen in 2015, and the complete work was published in 2016 [78]. It was famous for its high performance in dealing with complex features and its extremely fast speed [79].
  • LGBM: The light gradient boosting model was proposed by Ke in 2017 [80], who has won the one million bonus from Alibaba Ltd. using this model with his partners. The LGBM model uses multiple technologies to boost the original gradient boosting models, and it can even be more stable and faster than XGB in some tasks.
  • CATB: Gradient boosting with categorical features support (CATB) was proposed by Prokhorenkova et al. in 2018 [81]. It has a very good performance in dealing with categorical features and has very good robustness.
The recurrent neural networks are widely used in time series forecasting and in related works in recent years. In this work, a state-of-the-art gated recurrent unit is used for comparison. The detailed information of this model is described as follows.
  • GRU: The gated recurrent unit (GRU) model was introduced by Cho et al. [82] in 2014 as a simplified version of the long short-term memory (LSTM) model by Hochreiter and Schmidhuber [83] in 1997. In time series forecasting, the GRU model is often combined with other layers to capture more complex data patterns or shapes. In this study, a three-layer neural network was used, consisting of a GRU layer directly connected to the input data, an activation layer using a sigmoid function, and an output layer with a linear full connection.
Table 1. Models for comparison and their hyperparameters.
Table 1. Models for comparison and their hyperparameters.
ModelAbbreviationReferencesHyperparameters
Auto-RegressiveAR [21]
Support Vector MachineSVM [70,72]Kernel parameter, regularization parameter
Least Squares Support Vector MachineLSSVM [74]Kernel parameter, regularization parameter
Gaussian Process RegressionGPR [73]Kernel type
Random ForestRF [76]Bootstrap (whether bootstrap samples are used when building trees), maximum tree depth, number of features for the best split, minimum samples at a leaf node, minimum samples for splitting an internal node, number of trees
Extreme Gradient BoostingXGB [78]Minimum loss reduction, learning rate, maximum tree depth, minimum weight for new node, L1 regularization parameter
Light Gradient BoostingLGBM [80]Maximum tree depth, maximum tree leaves, minimum number of data needed in a child, L1 regularization parameter, L2 regularization parameter
Gradient Boosting with Categorical Features SupportCATB [81]Maximum number of trees, tree depth, L2 regularization parameter
Gated Recurrent UnitGRU [82]Hidden size
To ensure a fair comparison, all machine learning models were utilized as nonlinear auto-regressive models, similar to PLC-SVM in Algorithm 2 (one can use these models in line-6 to implement the overall workflow). The models were implemented using Python 3.7, and their forecasting performances were evaluated using the multiple criteria listed in Table 2. The scikit-learn [84] library’s built-in grid search method was used for tuning the hyperparameters of the models except for the AR model. Detailed information on the hyperparameters and original references are summarized in Table 1. In order to make the grid search process executable, we only choose the most important hyperparameters for each model, following the engineering experience or suggestions made by the original references. As time series require forward series validation to determine the model’s performance, it is more reasonable to use 90% of the in-sample data for training and the remaining 10% for validation, as in [85].

5.3. Results

In order to make a comprehensive comparison between the PLC-SVM model and the other models, three sub-cases based on the same data sets with different lags were carried out.

5.3.1. Case I: τ = 18

In this case, the time lag is set as τ = 18 , i.e., every point will be predicted based on the former 18 points in the way presented in Algorithm 2. Four linear principal components are transformed by the PCA ( r = 0.95 ) from eighteen dimensions, which are presented in Equation (A1) in Appendix A. Then, the semi-analytical output function of PLC-SVM can be written as
y = β T z + w T φ ( x ) + b = 0.1643 z 1 0.0156 z 2 0.2404 z 3 0.0692 z 4 + w T φ ( x ) + 0.6824 .
The testing metrics of all models are listed in Table 3. It is clear that the overall performance of the PLC-SVM model is the best among all models as all of its metrics are the best. It is very interesting to see that the SVM model has the closest performance to PLC-SVM in this case, and this is easy to explain as they share similar methodologies (kernel method and ε -insensitive loss function). In the kernel-based models, the SVM model has the best performance aside from the PLC-SVM model, while the GPR model has the worst performance. The RF model performs the best and CATB performs the worst in the tree-based models. The GRU model only outperforms the worst tree-based models, which is a performance that is even worse than the linear AR model.
The predicted values of all 10 models, along with the percentage errors at each point, are plotted in Figure 4. It is very interesting to see that the values predicted by PLC-SVM and SVM are very close, which is coincident with the results in the metrics described above. It is also very clear that the predicted series of the other models except for CATB appear to be larger than the raw data, which are less stable than PLC-SVM and SVM, whereas the predicted values of CATB tend to be approximately constant in the last steps. It is interesting to see that the predicted values of GRU in the first few steps are actually acceptable, but most predicted values become smaller than the raw data with longer steps. The predicted values of AR are very close to the average value, which is coincident with its properties.
From another point of view, the PEs of PLC-SVM and SVM are approximately distributed around zero, as shown in Figure 4. However, more PEs of LSSVM, GPR, RF, LGBM, XGB, and AR are larger than zero; this indicates that these models overestimated future consumption. In contrast, more PEs of CATB and GRU are smaller than zero; this indicates that these models underestimated the future trend of consumption. Overall, the PLC-SVM model has the best performance in primary energy consumption in this case.

5.3.2. Case II: τ = 24

In this case, the time lag is set as τ = 24 , i.e., every point will be predicted based on the former 24 points as described in Algorithm 2. The PCA ( r = 0.95 ) transforms the 24 dimensions into 5 principal components, which are presented in Equation (A1) in Appendix A. Then, the output function of the PLC-SVM model can be written as:
y = β T z + w T φ ( x ) + b = 0.074 z 1 0.0036 z 2 0.2802 z 3 + 0.0157 z 4 0.2419 z 5 + w T φ ( x ) + 0.6149
The testing metrics of all models are listed in Table 4. In this case, the performance of PLC-SVM is also the best among these models, and the errors are smaller than the other models on a more significant scale; SVM still has the closest performance to PLC-SVM. RF performs best among the tree-based models, while GRP and CATB perform the worst in the kernel-based and the tree-based models, respectively. In this case, GRU has the worst performance of all of the models. For the AR model, although it outperforms several other models, its metrics are still significantly worse than PLC-SVM.
The predicted values and PEs of all 10 models are plotted in Figure 5. The values predicted by PLC-SVM and SVM still seem to be close. But in this case, it is more obvious that LSSVM, GPR, RF, LGBM, XGB, and AR all overestimate the observations, and it is clear that the overall trends reflected by these models are less stable and appear to be increasing. The values predicted by CATB still appear to decay, of which the peak values are too far away from the observations. Only some of the first predicted values by GRU are close to the raw data, but most of the following predicted values are larger than the average value of the corresponding raw data.
By analyzing the PEs shown in Figure 5, it is very clear that most PEs of LSSVM, GPR, RF, LGBM, XGB, GRU, and AR are larger than zero. This presents a clearer picture that these models all overestimate the future trend of real consumption. Meanwhile, most PEs of CATB are smaller than zero, and most of them are too large, indicating that the results of this model are not acceptable at all. The positive and negative PEs of PLC-SVM and SVM appear to be approximately equivalent, and the MPE (defined in Table 2) is closest to zero. Overall, the advantage of PLC-SVM over the other models is still significant in this case.

5.3.3. Case III: τ = 30

In this case, the time lag is set as τ = 30 , i.e., every point will be predicted based on the former 30 points. There are five principal components that are transformed by the PCA ( r = 0.95 ), which are presented in Equation (A1) in Appendix A. The output function of the PLC-SVM model is obtained as:
y = β T z + w T φ ( x ) + b = 0.1029 z 1 0.1242 z 2 0.2137 z 3 0.0626 z 4 0.104 z 5 + w T φ ( x ) + 0.9909
The testing metrics of all models are listed in Table 5. PLC-SVM is still the best model in this case, and it is very interesting to see that all of its metrics are generally better than the previous two cases. GRU has the second-best performance in this case, and its MedAe is even closer to zero than PLC-SVM. The performance of SVM is significantly worse than PLC-SVM in this case. XGB performs the best among the tree-based models, which has the closest performance to PLC-SVM. Meanwhile, GPR and CATB still have the worst performance in this case.
The predicted values of all 10 models are plotted in Figure 6. The values predicted by PLC-SVM appear to be closer to the observations in this case than they were in the previous two cases. Having the closest performance to PLC-SVM, the predicted values of GRU are very close to most peak values, which appear to be closer to the raw data than the tree-based model XGB. The values predicted by CATB still decay with more steps. It is very interesting to see that only the predicted values by PLC-SVM and CATB all fall within the range of the observations, while there are several points by the other models that are larger than the nearby peak values.
By looking at the results of PEs plotted in Figure 6, most values of PEs of CATB are negative, and most PEs of the other models are positive; this indicates that most models over-estimated the raw data in this case. Moreover, it is very clear that the distributions of PEs of PLC-SVM and GRU appear to be more uniform than others. However, it is clear that the PEs of GRU with larger steps become larger than PLC-SVM; this is the reason why the overall metrics for GRU are not the best. Overall, although GRU presents a highly competitive performance, the PLC-SVM model still performs the best in this case.

5.4. Discussion

It is clear that the PLC-SVM model has the best performance in all cases. One significant finding is that the PLC-SVM model indeed improved the accuracy of the SVM model. Having a similar structure and training algorithm, the SVM model can approach PLC-SVM with a smaller τ , as it was shown when τ = 18 , 24 . But it is interesting to note that the difference between PLC-SVM and SVM becomes larger with longer lag, as it is shown that the related metrics of PLC-SVM are significantly better than SVM when τ = 30 . This indicates that the PLC-SVM model has a better performance in higher dimensional problems than the SVM model. It is very interesting to note that although the performance of the AR model is not the best, it generally presents a moderate performance in all cases. This indicates that there indeed exists a linear relationship between the current primary energy consumption and the former ones. Having a partially linear structure, the PLC-SVM model has taken advantage of such linear features. It is easy to see that such improvements are from its structure of a partially linear formulation, which takes most advantages of the linear features of the original series. At this stage, it can be confirmed that such linear features make the predicted series using the PLC-SVM model more accurate and stable than the SVM model, and this is also reflected in the Figure 4, Figure 5 and Figure 6.
It should also be noted that the tree-based models are also very competitive compared with the PLC-SVM model. The best tree-based model in each case often presents a very close performance to the PLC-SVM model and is even much better than the other kernel-based models in some cases. Moreover, it is very interesting to see that the XGB model performs the second best when τ = 30 and is much better than the other kernel-based models. This greatly coincides with a well-recognized result that tree-based models have very good performance in high-dimensional problems.
Although neural networks using the GRU model often perform much worse than other models with shorter lags, it is also very interesting to see that it performs quite well when τ = 30 , of which the metrics are the closest to the best model in this case, and even the MedAe model is better than that of the PLC-SVM model. This implies that the GRU model is very competitive with larger lags. However, even in such conditions, the overall performance of GRU is still slightly worse than PLC-SVM.
However, the advantages of PLC-SVM over the tree-based models and GRU is still significant. One of the most significant advantages of PLC-SVM is that it only has some hyperparameters to tune. In the above cases, only the regularization parameter C and kernel parameter γ are tuned, while the ε is set as a determined value (this is reasonable because it uses the ε -insensitive cost function). However, all tree-based models and GRU (also the other neural networks) have a lot of hyperparameters to tune, such as the maximum depth of trees, the number of estimators, and even other parameters that need fine-tuning. This is very important because less hyperparameter often means that the model is easier to tune, less time-consuming, and further makes it easier to design an optimal prediction scheme in real-world applications. Another advantage of PLC-SVM is its global convergence. And as mentioned in Section 4.2, the dual formulation is essentially a convex optimization; thus, the PLC-SVM model can be trained with global convergence. However, the algorithms used for the tree-based models and GRU (e.g., bagging for RF and gradient-based algorithms for other tree-based models and GRU) do not have global convergence; thus, they generally need more trials to obtain well-trained models.
For application implications, it is first suggested to use larger time lags as the PLC-SVM model presents a better performance with such settings, and this implies that more features may make the performance of PLC-SVM better. Another point is that the forecasting terms considered in this work are not short. In the above cases, the forecasting steps are all 55, which means that the monthly primary energy consumptions in 55 months (almost 5 years) are predicted. Considering the performance of stability and accuracy, it is reasonable to say that PLC-SVM is eligible to be used for primary energy consumption forecasting in the electric power sector for the mid-/long-term. Such performance may make it a potential tool for decision-making and marketing planning in the future.

6. Conclusions

A partially linear component support vector machine, named PLC-SVM, was proposed in this work. By using the PCA algorithm, the linear part of PLC-SVM has fewer linear dimensions, reducing the risk of multicollinearity and computational complexity. The methodology of SVM was used to construct the partially linear framework, and the use of the primal-dual trick causes the PLC-SVM model to have global optimality and easy implementation. The case study focused on the primary energy consumption forecasting of the electric power sector in the US by using the univariate time series data from January 1973 to January 2020, which contains 565 points of monthly primary energy consumption. The results of three sub-cases showed that the PLC-SVM model presents more accurate and stable forecasting results than the other three kinds of typical machine learning models and the linear AR model with different lags; larger lags might improve the performance of the PLC-SVM model. Within the above discussions, the PLC-SVM model is eligible to make mid-/long-term forecasting for primary energy forecasting of electric sectors in the US. Considering its general formulation, it can be expected to be used for forecasting more kinds of energies in future works.
The possible limitations of this work are twofold. The first issue is that this model might not be suitable for cases with too small of data sets. In such conditions, the available lags would be very small, which means that the original dimension of the linear part is already small; thus, obviously, the PCA will not work well. Another limitation is that this work only considered the most commonly used Gaussian kernel in the applications. More kernels can be designed based on achieving a very good performance if proper knowledge is used. In this regard, future works can also be extended by using more advanced kernels or new kernels that are designed for specific cases, as is suggested in the kernel cookbook by David [86].

Author Contributions

Conceptualization, methodology, writing—original draft preparation, funding acquisition, X.M.; software, X.M. and Y.C.; writing—review and editing, H.Y. and Y.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Humanities and Social Science Fund of the Ministry of Education of China (19YJCZH119), the Scientific and Technological Achievements Transformation Project of the Sichuan Scientific Research Institute (2022JDZH0035), and the National College Students Innovation and Entrepreneurship Training Program of China (S202210619106).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Publicly available datasets were analyzed in this study. The data can be found at https://www.eia.gov/totalenergy/data/monthly/, accessed on 1 March 2020.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The expressions of the principal components obtained in the case studies are presented in this section. In all of the following formulae, z t i represents the ith element in the vector z t .
The four principal components in Case I:
z t 1 = 0.2298 y t 1 0.2321 y t 2 0.2343 y t 3 0.2353 y t 4 0.2355 y t 5 0.2346 y t 6 0.2331 y t 7 0.2326 y t 8 0.2326 y t 9 0.2332 y t 10 0.2337 y t 11 0.2348 y t 12 0.2371 y t 13 0.2391 y t 14 0.2410 y t 15 0.2417 y t 16 0.2415 y t 17 0.2401 y t 18 2.7449 z t 2 = 0.2538 y t 1 + 0.3692 y t 2 + 0.1697 y t 3 0.1758 y t 4 0.3613 y t 5 0.2335 y t 6 + 0.0537 y t 7 + 0.2129 y t 8 + 0.1098 y t 9 0.1205 y t 10 0.2098 y t 11 0.0388 y t 12 + 0.2465 y t 13 + 0.3589 y t 14 + 0.1618 y t 15 0.1788 y t 16 0.3623 y t 17 0.2350 y t 18 + 0.0178 z t 3 = 0.2789 y t 1 + 0.0345 y t 2 + 0.3278 y t 3 + 0.3192 y t 4 + 0.0167 y t 5 0.2898 y t 6 0.2963 y t 7 0.0460 y t 8 + 0.2106 y t 9 + 0.2058 y t 10 0.0534 y t 11 0.2939 y t 12 0.2745 y t 13 + 0.0343 y t 14 + 0.3226 y t 15 + 0.3152 y t 16 + 0.0162 y t 17 0.2853 y t 18 0.0094 z t 4 = 0.1046 y t 1 0.1707 y t 2 0.1736 y t 3 0.1765 y t 4 0.1749 y t 5 0.1029 y t 6 + 0.0856 y t 7 + 0.3274 y t 8 + 0.5002 y t 9 + 0.4970 y t 10 + 0.3211 y t 11 + 0.0822 y t 12 0.1002 y t 13 0.1688 y t 14 0.1716 y t 15 0.1726 y t 16 0.1686 y t 17 0.0982 y t 18 + 0.0197 ,
The five principal components in Case II:
z t 1 = 0.1974 y t 1 0.1983 y t 2 0.1994 y t 3 0.2003 y t 4 0.2006 y t 5 0.2007 y t 6 0.2013 y t 7 0.2021 y t 8 0.2031 y t 9 0.2037 y t 10 0.2041 y t 11 0.2042 y t 12 0.2045 y t 13 0.2052 y t 14 0.2060 y t 15 0.2064 y t 16 0.2064 y t 17 0.2066 y t 18 0.2066 y t 19 0.2073 y t 20 0.2082 y t 21 0.2086 y t 22 0.2088 y t 23 0.2085 y t 24 3.1723 z t 2 = 0.2448 y t 1 + 0.2681 y t 2 + 0.0273 y t 3 0.2352 y t 4 0.2593 y t 5 0.0202 y t 6 + 0.2412 y t 7 + 0.2631 y t 8 + 0.0233 y t 9 0.2385 y t 10 0.2610 y t 11 0.0239 y t 12 + 0.2365 y t 13 + 0.2601 y t 14 + 0.0241 y t 15 0.2340 y t 16 0.2580 y t 17 0.0217 y t 18 + 0.2346 y t 19 + 0.2550 y t 20 + 0.0197 y t 21 0.2375 y t 22 0.2598 y t 23 0.0274 y t 24 + 0.0187 z t 3 = 0.1634 y t 1 + 0.1274 y t 2 + 0.2912 y t 3 + 0.1644 y t 4 0.1281 y t 5 0.2937 y t 6 0.1675 y t 7 + 0.1244 y t 8 + 0.2910 y t 9 + 0.1672 y t 10 0.1206 y t 11 0.2850 y t 12 0.1628 y t 13 + 0.1231 y t 14 + 0.2846 y t 15 + 0.1610 y t 16 0.1273 y t 17 0.2893 y t 18 0.1642 y t 19 + 0.1217 y t 20 + 0.2844 y t 21 + 0.1630 y t 22 0.1195 y t 23 0.2806 y t 24 + 0.0035 z t 4 = + 0.1268 y t 1 + 0.2401 y t 2 + 0.2935 y t 3 + 0.2712 y t 4 + 0.1755 y t 5 + 0.0331 y t 6 0.1173 y t 7 0.2321 y t 8 0.2813 y t 9 0.2537 y t 10 0.1586 y t 11 0.0226 y t 12 + 0.1203 y t 13 + 0.2325 y t 14 + 0.2837 y t 15 + 0.2588 y t 16 + 0.1609 y t 17 + 0.0186 y t 18 0.1304 y t 19 0.2435 y t 20 0.2899 y t 21 0.2600 y t 22 0.1640 y t 23 0.0287 y t 24 + 0.0157 z t 5 = 0.2626 y t 1 0.1649 y t 2 0.0228 y t 3 + 0.1231 y t 4 + 0.2348 y t 5 + 0.2849 y t 6 + 0.2609 y t 7 + 0.1695 y t 8 + 0.0315 y t 9 0.1180 y t 10 0.2375 y t 11 0.2914 y t 12 0.2640 y t 13 0.1631 y t 14 0.0192 y t 15 + 0.1277 y t 16 + 0.2367 y t 17 + 0.2839 y t 18 + 0.2578 y t 19 + 0.1635 y t 20 + 0.0244 y t 21 0.1229 y t 22 0.2389 y t 23 0.2886 y t 24 + 0.0057
The five principal components in Case III:
z t 1 = 0.1751 y t 1 0.1763 y t 2 0.1776 y t 3 0.1786 y t 4 0.1791 y t 5 0.1789 y t 6 0.1783 y t 7 0.1783 y t 8 0.1786 y t 9 0.1794 y t 10 0.1800 y t 11 0.1805 y t 12 0.1816 y t 13 0.1826 y t 14 0.1838 y t 15 0.1846 y t 16 0.1850 y t 17 0.1848 y t 18 0.1839 y t 19 0.1836 y t 20 0.1838 y t 21 0.1844 y t 22 0.1848 y t 23 0.1852 y t 24 0.1860 y t 25 0.1870 y t 26 0.1880 y t 27 0.1888 y t 28 0.1890 y t 27 0.1887 y t 30 3.5511 z t 2 = 0.1790 y t 1 + 0.2921 y t 2 + 0.1452 y t 3 0.1319 y t 4 0.2838 y t 5 0.1779 y t 6 + 0.0656 y t 7 + 0.2018 y t 8 + 0.1083 y t 9 0.1034 y t 10 0.2005 y t 11 0.0685 y t 12 + 0.1748 y t 13 + 0.2863 y t 14 + 0.1417 y t 15 0.1319 y t 16 0.2826 y t 17 0.1787 y t 18 + 0.0610 y t 19 + 0.1952 y t 20 + 0.1042 y t 21 0.1026 y t 22 0.1974 y t 23 0.0679 y t 24 + 0.1700 y t 25 + 0.2787 y t 26 + 0.1371 y t 27 0.1307 y t 28 0.2784 y t 29 0.1765 y t 30 + 0.0241 z t 3 = 0.2259 y t 1 + 0.0134 y t 2 + 0.2487 y t 3 + 0.2509 y t 4 + 0.0205 y t 5 0.2200 y t 6 0.2301 y t 7 0.0257 y t 8 + 0.1870 y t 9 + 0.1886 y t 10 0.0226 y t 11 0.2285 y t 12 0.2235 y t 13 + 0.0123 y t 14 + 0.2446 y t 15 + 0.2475 y t 16 + 0.0209 y t 17 0.2171 y t 18 0.2266 y t 19 0.0258 y t 20 + 0.1822 y t 21 + 0.1840 y t 22 0.0227 y t 23 0.2241 y t 24 0.2184 y t 25 + 0.0123 y t 26 + 0.2393 y t 27 + 0.2427 y t 28 + 0.0206 y t 29 0.2127 y t 30 0.0056 z t 4 = 0.0758 y t 1 0.1498 y t 2 0.1754 y t 3 0.1783 y t 4 0.1588 y t 5 0.0878 y t 6 + 0.0486 y t 7 + 0.2115 y t 8 + 0.3274 y t 9 + 0.3331 y t 10 + 0.2270 y t 11 + 0.0685 y t 12 0.0694 y t 13 0.1451 y t 14 0.1714 y t 15 0.1734 y t 16 0.1522 y t 17 0.0802 y t 18 + 0.0554 y t 19 + 0.2159 y t 20 + 0.3289 y t 21 + 0.3320 y t 22 + 0.2245 y t 23 + 0.0658 y t 24 0.0718 y t 25 0.1473 y t 26 0.1729 y t 27 0.1743 y t 28 0.1526 y t 29 0.0818 y t 30 + 0.0134 z t 5 = 0.1951 y t 1 0.0796 y t 2 0.0195 y t 3 + 0.0073 y t 4 + 0.0727 y t 5 + 0.1963 y t 6 + 0.3028 y t 7 + 0.2909 y t 8 + 0.1277 y t 9 0.1064 y t 10 0.2759 y t 11 0.2963 y t 12 0.1997 y t 13 0.0826 y t 14 0.0198 y t 15 + 0.0100 y t 16 + 0.0759 y t 17 + 0.1962 y t 18 + 0.2996 y t 19 + 0.2851 y t 20 + 0.1207 y t 21 0.1114 y t 22 0.2777 y t 23 0.2967 y t 24 0.1998 y t 25 0.0824 y t 26 0.0190 y t 27 + 0.0107 y t 28 + 0.0745 y t 29 + 0.1899 y t 30 0.00001 .

References

  1. Statt, N. Google and DeepMind Are Using AI to Predict the Energy Output of Wind Farms. The Verge. p. 1. Available online: https://www.theverge.com/2019/2/26/18241632/google-deepmind-wind-farm-ai-machine-learning-green-energy-efficiency (accessed on 26 February 2019).
  2. Ma, M.; Ma, X.; Cai, W.; Cai, W. Low carbon roadmap of residential building sector in China: Historical mitigation and prospective peak. Appl. Energy 2020, 273, 115247. [Google Scholar] [CrossRef]
  3. Lu, H.; Ma, X.; Azimi, M. Us natural gas consumption prediction using an improved kernel-based nonlinear extension of the arps decline model. Energy 2020, 194, 116905. [Google Scholar] [CrossRef]
  4. Zeng, B.; Zhou, M.; Liu, X.; Zhang, Z. Application of a new grey prediction model and grey average weakening buffer operator to forecast China’s shale gas output. Energy Rep. 2020, 6, 1608–1618. [Google Scholar] [CrossRef]
  5. Niu, T.; Wang, J.; Lu, H.y.; Yang, W.; Du, P. A learning system integrating temporal convolution and deep learning for predictive modeling of crude oil price. IEEE Trans. Ind. Inform. 2020, 17, 4602–4612. [Google Scholar] [CrossRef]
  6. Yang, J.; Cai, W.; Ma, M.; Li, L.; Liu, C.; Ma, X.; Li, L.; Chen, X. Driving forces of China’s CO2 emissions from energy consumption based on kaya-lmdi methods. Sci. Total Environ. 2020, 711, 134569. [Google Scholar] [CrossRef]
  7. Engle, R.F.; Granger, C.W.J.; Rice, J.; Weiss, A. Semiparametric estimates of the relation between weather and electricity sales. J. Am. Stat. Assoc. 1986, 81, 310–320. [Google Scholar] [CrossRef]
  8. Smola, A.J.; Frieß, T.; Schölkopf, B. Semiparametric support vector and linear programming machines. In Proceedings of the Advances in Neural Information Processing Systems 11, NIPS Conference, Denver, CO, USA, 30 November–5 December 1998; pp. 585–591. [Google Scholar]
  9. Espinoza, M.; Suykens, J.A.K.; De Moor, B. Kernel based partially linear models and nonlinear identification. IEEE Trans. Autom. Control 2005, 50, 1602–1606. [Google Scholar] [CrossRef]
  10. Goethals, I.; Pelckmans, K.; Suykens, J.A.K.; De Moor, B. Identification of mimo hammerstein models using least squares support vector machines. Automatica 2005, 41, 1263–1272. [Google Scholar] [CrossRef]
  11. Varoquaux, G. Cross-validation failure: Small sample sizes lead to large error bars. Neuroimage 2018, 180, 68–77. [Google Scholar] [CrossRef]
  12. Castro-Garcia, R.; Agudelo, O.M.; Suykens, J.A.K. Impulse response constrained ls-svm modelling for mimo hammerstein system identification. Int. J. Control 2019, 92, 908–925. [Google Scholar] [CrossRef]
  13. Ma, X.; Liu, Z. Predicting the oil production using the novel multivariate nonlinear model based on Arps decline model and kernel method. Neural Comput. Appl. 2018, 29, 579–591. [Google Scholar] [CrossRef]
  14. Ma, X.; Liu, Z. The kernel-based nonlinear multivariate grey model. Appl. Math. Model. 2018, 56, 217–238. [Google Scholar] [CrossRef]
  15. Ma, X. A brief introduction to the grey machine learning. J. Grey Syst. 2019, 31, 1–12. [Google Scholar]
  16. Matías, J.M.; Taboada, J.; Ordóñez, C.; González-Manteiga, W. Partially linear support vector machines applied to the prediction of mine slope movements. Math. Comput. Model. 2010, 51, 206–215. [Google Scholar] [CrossRef]
  17. Xu, Y.; Chen, D.R. Partially-linear least-squares regularized regression for system identification. IEEE Trans. Autom. Control 2009, 54, 2637–2641. [Google Scholar]
  18. Fan, J.; Wu, L.; Zhang, F.; Cai, H.; Zeng, W.; Wang, X.; Zou, H. Empirical and machine learning models for predicting daily global solar radiation from sunshine duration: A review and case study in China. Renew. Sustain. Energy Rev. 2019, 100, 186–212. [Google Scholar] [CrossRef]
  19. Chang, Y.; Choi, Y.; Kim, C.S.; Miller, J.I.; Park, J.Y. Forecasting regional long-run energy demand: A functional coefficient panel approach. Energy Econ. 2021, 96, 105117. [Google Scholar] [CrossRef]
  20. Johannesen, N.J.; Kolhe, M.; Goodwin, M. Relative evaluation of regression tools for urban area electrical energy demand forecasting. J. Clean. Prod. 2019, 218, 555–564. [Google Scholar] [CrossRef]
  21. Akdi, Y.; Gölveren, E.; Okkaoğlu, Y. Daily electrical energy consumption: Periodicity, harmonic regression method and forecasting. Energy 2020, 191, 116524. [Google Scholar] [CrossRef]
  22. Khalifa, A.; Caporin, M.; Di Fonzo, T. Scenario-based forecast for the electricity demand in qatar and the role of energy efficiency improvements. Energy Policy 2019, 127, 155–164. [Google Scholar] [CrossRef]
  23. Nafil, A.; Bouzi, M.; Anoune, K.; Ettalabi, N. Comparative study of forecasting methods for energy demand in morocco. Energy Rep. 2020, 6, 523–536. [Google Scholar] [CrossRef]
  24. Dumitru, C.-D.; Gligor, A. Wind energy forecasting: A comparative study between a stochastic model (arima) and a model based on neural network (ffann). Procedia Manuf. 2019, 32, 410–417. [Google Scholar] [CrossRef]
  25. Rakpho, P.; Yamaka, W. The forecasting power of economic policy uncertainty for energy demand and supply. Energy Rep. 2021, 7, 338–343. [Google Scholar] [CrossRef]
  26. Karia, A.A.; Bujang, I.; Ahmad, I. Fractionally integrated arma for crude palm oil prices prediction: Case of potentially overdifference. J. Appl. Stat. 2013, 40, 2735–2748. [Google Scholar] [CrossRef]
  27. Wang, Z.-X.; Jv, T.-Q. A non-linear systematic grey model for forecasting the industrial economy-energy-environment system. Technol. Forecast. Soc. Chang. 2021, 167, 120707. [Google Scholar] [CrossRef]
  28. Ma, X.; Lu, H.; Ma, M.; Wu, L.; Cai, Y. Urban natural gas consumption forecasting by novel wavelet-kernelized grey system model. Eng. Appl. Artif. Intell. 2023, 119, 105773. [Google Scholar] [CrossRef]
  29. Qian, W.; Sui, A. A novel structural adaptive discrete grey prediction model and its application in forecasting renewable energy generation. Expert Syst. Appl. 2021, 186, 115761. [Google Scholar] [CrossRef]
  30. Wang, Y.; Nie, R.; Ma, X.; Liu, Z.; Chi, P.; Wu, W.; Guo, B.; Yang, X.; Zhang, L. A novel hausdorff fractional ngmc (p, n) grey prediction model with grey wolf optimizer and its applications in forecasting energy production and conversion of China. Appl. Math. Model. 2021, 97, 381–397. [Google Scholar] [CrossRef]
  31. Wang, Z.-X.; He, L.-Y.; Zheng, H.-H. Forecasting the residential solar energy consumption of the united states. Energy 2019, 178, 610–623. [Google Scholar] [CrossRef]
  32. Moonchai, S.; Chutsagulprom, N. Short-term forecasting of renewable energy consumption: Augmentation of a modified grey model with a kalman filter. Appl. Soft Comput. 2020, 87, 105994. [Google Scholar] [CrossRef]
  33. Xie, N.; Yuan, C.; Yang, Y. Forecasting China’s energy demand and self-sufficiency rate by grey forecasting model and markov model. Int. J. Electr. Power Energy Syst. 2015, 66, 1–8. [Google Scholar] [CrossRef]
  34. Piazza, A.D.; Piazza, M.C.D.; Tona, G.L.; Luna, M. An artificial neural network-based forecasting model of energy-related time series for electrical grid management. Math. Comput. Simul. 2021, 184, 294–305. [Google Scholar] [CrossRef]
  35. Kobylinski, P.; Wierzbowski, M.; Piotrowski, K. High-resolution net load forecasting for micro-neighbourhoods with high penetration of renewable energy sources. Int. J. Electr. Power Energy Syst. 2020, 117, 105635. [Google Scholar] [CrossRef]
  36. Al-Gabalawy, M.; Hosny, N.S.; Adly, A.R. Probabilistic forecasting for energy time series considering uncertainties based on deep learning algorithms. Electr. Power Syst. Res. 2021, 196, 107216. [Google Scholar] [CrossRef]
  37. Katsatos, A.L.; Moustris, K.P. Application of artificial neuron networks as energy consumption forecasting tool in the building of regulatory authority of energy, athens, greece. Energy Procedia 2019, 157, 851–861. [Google Scholar] [CrossRef]
  38. Bento, P.M.R.; Pombo, J.A.N.; Mendes, R.P.G.; Calado, M.R.A.; Mariano, S.J.P.S. Ocean wave energy forecasting using optimised deep learning neural networks. Ocean. Eng. 2021, 219, 108372. [Google Scholar] [CrossRef]
  39. Abu-Salih, B.; Wongthongtham, P.; Morrison, G.; Coutinho, K.; Al-Okaily, M.; Huneiti, A. Short-term renewable energy consumption and generation forecasting: A case study of western australia. Heliyon 2022, 8, e09152. [Google Scholar] [CrossRef]
  40. Somu, N.; Gauthama Raman, M.R.; Ramamritham, K. A hybrid model for building energy consumption forecasting using long short term memory networks. Appl. Energy 2020, 261, 114131. [Google Scholar] [CrossRef]
  41. Khan, N.; Haq, I.U.; Khan, S.U.; Rho, S.; Lee, M.Y.; Baik, S.W. Db-net: A novel dilated cnn based multi-step forecasting model for power consumption in integrated local energy systems. Int. J. Electr. Power Energy Syst. 2021, 133, 107023. [Google Scholar] [CrossRef]
  42. Etxegarai, G.; López, A.; Aginako, N.; Rodríguez, F. An analysis of different deep learning neural networks for intra-hour solar irradiation forecasting to compute solar photovoltaic generators’ energy production. Energy Sustain. Dev. 2022, 68, 1–17. [Google Scholar] [CrossRef]
  43. Gao, Y.; Ruan, Y.; Fang, C.; Yin, S. Deep learning and transfer learning models of energy consumption forecasting for a building with poor information data. Energy Build. 2020, 223, 110156. [Google Scholar] [CrossRef]
  44. Hu, H.; Wang, L.; Peng, L.; Zeng, Y. Effective energy consumption forecasting using enhanced bagged echo state network. Energy 2020, 193, 116778. [Google Scholar] [CrossRef]
  45. Hu, H.; Wang, L.; Lv, S. Forecasting energy consumption and wind power generation using deep echo state network. Renew. Energy 2020, 154, 598–613. [Google Scholar] [CrossRef]
  46. Natarajan, Y.; Kannan, S.; Selvaraj, C.; Mohanty, S.N. Forecasting energy generation in large photovoltaic plants using radial belief neural network. Sustain. Comput. Inform. Syst. 2021, 31, 100578. [Google Scholar] [CrossRef]
  47. Cui, Y.; Jia, L.; Fan, W. Estimation of actual evapotranspiration and its components in an irrigated area by integrating the shuttleworth-wallace and surface temperature-vegetation index schemes using the particle swarm optimization algorithm. Agric. For. Meteorol. 2021, 307, 108488. [Google Scholar] [CrossRef]
  48. Zhang, F.; Deb, C.; Lee, S.E.; Yang, J.; Shah, K.W. Time series forecasting for building energy consumption using weighted support vector regression with differential evolution optimization technique. Energy Build. 2016, 126, 94–103. [Google Scholar] [CrossRef]
  49. Wen, L.; Cao, Y. Influencing factors analysis and forecasting of residential energy-related CO2 emissions utilizing optimized support vector machine. J. Clean. Prod. 2020, 250, 119492. [Google Scholar] [CrossRef]
  50. Mason, K.; Duggan, J.; Howley, E. Forecasting energy demand, wind generation and carbon dioxide emissions in ireland using evolutionary neural networks. Energy 2018, 155, 705–720. [Google Scholar] [CrossRef]
  51. Hu, G.; Xu, Z.; Wang, G.; Zeng, B.; Liu, Y.; Lei, Y. Forecasting energy consumption of long-distance oil products pipeline based on improved fruit fly optimization algorithm and support vector regression. Energy 2021, 224, 120153. [Google Scholar] [CrossRef]
  52. Abba, S.I.; Rotimi, A.; Musa, B.; Yimen, N.; Kawu, S.J.; Lawan, S.M.; Dagbasi, M. Emerging harris hawks optimization based load demand forecasting and optimal sizing of stand-alone hybrid renewable energy systems—A case study of Kano and Abuja, Nigeria. Results Eng. 2021, 12, 100260. [Google Scholar] [CrossRef]
  53. Lu, H.; Ma, X.; Huang, K.; Azimi, M. Carbon trading volume and price forecasting in China using multiple machine learning models. J. Clean. Prod. 2020, 249, 119386. [Google Scholar] [CrossRef]
  54. Lu, H.; Cheng, F.; Ma, X.; Hu, G. Short-term prediction of building energy consumption employing an improved extreme gradient boosting model: A case study of an intake tower. Energy 2020, 117756. [Google Scholar] [CrossRef]
  55. Fan, J.; Wang, X.; Zhang, F.; Ma, X.; Wu, L. Predicting daily diffuse horizontal solar radiation in various climatic regions of China using support vector machine and tree-based soft computing models with local and extrinsic climatic data. J. Clean. Prod. 2020, 248, 119264. [Google Scholar] [CrossRef]
  56. Huang, G.; Wu, L.; Ma, X.; Zhang, W.; Fan, J.; Yu, X.; Zeng, W.; Zhou, H. Evaluation of catboost method for prediction of reference evapotranspiration in humid regions. J. Hydrol. 2019, 574, 1029–1041. [Google Scholar] [CrossRef]
  57. Hong, T.; Xie, J.; Black, J. Global energy forecasting competition 2017: Hierarchical probabilistic load forecasting. Int. J. Forecast. 2019, 35, 1389–1399. [Google Scholar] [CrossRef]
  58. Bedi, J.; Toshniwal, D. Energy load time-series forecast using decomposition and autoencoder integrated memory network. Appl. Soft Comput. 2020, 93, 106390. [Google Scholar] [CrossRef]
  59. Adedeji, P.A.; Akinlabi, S.; Ajayi, O.; Madushele, N. Non-linear autoregressive neural network (narnet) with ssa filtering for a university energy consumption forecast. Procedia Manuf. 2019, 33, 176–183. [Google Scholar] [CrossRef]
  60. Tayab, U.B.; Lu, J.; Yang, F.; AlGarni, T.S.; Kashif, M. Energy management system for microgrids using weighted salp swarm algorithm and hybrid forecasting approach. Renew. Energy 2021, 180, 467–481. [Google Scholar] [CrossRef]
  61. Zhang, G.; Tian, C.; Li, C.; Zhang, J.J.; Zuo, W. Accurate forecasting of building energy consumption via a novel ensembled deep learning method considering the cyclic feature. Energy 2020, 201, 117531. [Google Scholar] [CrossRef]
  62. Xiao, J.; Li, Y.; Xie, L.; Liu, D.; Huang, J. A hybrid model based on selective ensemble for energy consumption forecasting in China. Energy 2018, 159, 534–546. [Google Scholar] [CrossRef]
  63. Khan, W.; Walker, S.; Zeiler, W. Improved solar photovoltaic energy generation forecast using deep learning-based ensemble stacking approach. Energy 2022, 240, 122812. [Google Scholar] [CrossRef]
  64. Kazemzadeh, M.; Amjadian, A.; Amraee, T. A hybrid data mining driven algorithm for long term electric peak load and energy demand forecasting. Energy 2020, 204, 117948. [Google Scholar] [CrossRef]
  65. Tran, D.; Luong, D.; Chou, J. Nature-inspired metaheuristic ensemble model for forecasting energy consumption in residential buildings. Energy 2020, 191, 116552. [Google Scholar] [CrossRef]
  66. Liu, Z.; Wang, X.; Zhang, Q.; Huang, C. Empirical mode decomposition based hybrid ensemble model for electrical energy consumption forecasting of the cement grinding process. Measurement 2019, 138, 314–324. [Google Scholar] [CrossRef]
  67. da Silva, R.G.; Dal Molin Ribeiro, M.H.; Moreno, S.R.; Mariani, V.C.; Leandro dos Santos Coelho, L. A novel decomposition-ensemble learning framework for multi-step ahead wind energy forecasting. Energy 2021, 216, 119174. [Google Scholar] [CrossRef]
  68. Härdle, W.; Liang, H.; Gao, J. Partially Linear Models; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  69. Rudin, W. Principles of Mathematical Analysis; McGraw-Hill: New York, NY, USA, 1976; Volume 3. [Google Scholar]
  70. Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
  71. Takahashi, N.; Guo, J.; Nishi, T. Global convergence of smo algorithm for support vector regression. IEEE Trans. Neural Netw. 2008, 19, 971–982. [Google Scholar] [CrossRef] [PubMed]
  72. Chang, C.; Lin, C. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol. TIST 2011, 2, 1–27. [Google Scholar] [CrossRef]
  73. Rasmussen, C.E.; Williams, C.K.I. Gaussian Process Regression for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
  74. Suykens, J.A.K.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
  75. De Brabanter, J.; De Moor, B.; Suykens, J.A.K.; Van Gestel, T.; Vandewalle, J.P.L. Least Squares Support Vector Machines; World Scientific: Singapore, 2002. [Google Scholar]
  76. Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar]
  77. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  78. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  79. Dong, J.; Zeng, W.; Wu, L.; Huang, J.; Gaiser, T.; Srivastava, A.K. Enhancing short-term forecasting of daily precipitation using numerical weather prediction bias correcting with XGBoost in different regions of China. Eng. Appl. Artif. Intell. 2023, 117, 105579. [Google Scholar] [CrossRef]
  80. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the NIPS’17: 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3146–3154. [Google Scholar]
  81. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. Catboost: Unbiased boosting with categorical features. In Proceedings of the NIPS’18: 32st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 3–8 December 2018; pp. 6638–6648. [Google Scholar]
  82. Cho, K.; van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the SSST-8, 8th Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014; pp. 103–111. [Google Scholar]
  83. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  84. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  85. Miranian, A.; Abdollahzade, M. Developing a local least-squares support vector machines-based neuro-fuzzy model for nonlinear and chaotic time series prediction. IEEE Trans. Neural Netw. Learn. Syst. 2012, 24, 207–218. [Google Scholar] [CrossRef]
  86. Duvenaud, D. Automatic Model Construction with GAUSSIAN Processes. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 2014. [Google Scholar]
Figure 1. Overview of the models for energy foecasting in recent years.
Figure 1. Overview of the models for energy foecasting in recent years.
Sustainability 15 07086 g001
Figure 2. The general framework of the proposed PLC-SVM model structure and its application in US primary energy consumption in the electric power sector forecasting.
Figure 2. The general framework of the proposed PLC-SVM model structure and its application in US primary energy consumption in the electric power sector forecasting.
Sustainability 15 07086 g002
Figure 3. Raw data of monthly primary energy consumption of the electric power sector in the US from January 1973 to January 2020.
Figure 3. Raw data of monthly primary energy consumption of the electric power sector in the US from January 1973 to January 2020.
Sustainability 15 07086 g003
Figure 4. Predicted values using (a) PLC-SVM, (b) SVM, (c) LSSVM, (d) GPR, (e) RF, (f) LGBM, (g) XGB, (h) CATB, (i) GRU, and (j) AR with τ = 18 .
Figure 4. Predicted values using (a) PLC-SVM, (b) SVM, (c) LSSVM, (d) GPR, (e) RF, (f) LGBM, (g) XGB, (h) CATB, (i) GRU, and (j) AR with τ = 18 .
Sustainability 15 07086 g004
Figure 5. Predicted values using (a) PLC-SVM, (b) SVM, (c) LSSVM, (d) GPR, (e) RF, (f) LGBM, (g) XGB, (h) CATB, (i) GRU, (j) AR with τ = 24 .
Figure 5. Predicted values using (a) PLC-SVM, (b) SVM, (c) LSSVM, (d) GPR, (e) RF, (f) LGBM, (g) XGB, (h) CATB, (i) GRU, (j) AR with τ = 24 .
Sustainability 15 07086 g005
Figure 6. Predicted values using (a) PLC-SVM, (b) SVM, (c) LSSVM, (d) GPR, (e) RF, (f) LGBM, (g) XGB, (h) CATB, (i) LSTM, (j) AR with τ = 30 .
Figure 6. Predicted values using (a) PLC-SVM, (b) SVM, (c) LSSVM, (d) GPR, (e) RF, (f) LGBM, (g) XGB, (h) CATB, (i) LSTM, (j) AR with τ = 30 .
Sustainability 15 07086 g006
Table 2. Metrics used in this paper.
Table 2. Metrics used in this paper.
MetricsAbbreviationFormula
Average ErrorAE 1 n k = 1 n x 0 k x ^ 0 k
Average Relative ErrorARE 1 n k = 1 n x ( 0 ) ( k ) x ^ ( 0 ) ( k ) x ( k )
Index of AgreementIA 1 k = 1 n x ( 0 ) ( k ) x ^ ( 0 ) ( k ) 2 k = 1 n x ( 0 ) ( k ) x ¯ ( 0 ) + x ^ ( 0 ) ( k ) x ^ ¯ ( 0 ) 2
Mean Arctangent Absolute Percentage ErrorMAAPE 1 n k = 1 n arctan x 0 k x ^ 0 k x k
Mean Absolute ErrorMAE 1 n k = 1 n x 0 k x ^ 0 k
Mean Absolute Percentage ErrorMAPE 1 n k = 1 n x 0 k x ^ 0 k x k × 100 %
Median Absolute ErrorMedAe 1 n k = 1 n arctan x 0 k x ^ 0 k x k
Mean Percentage ErrorMPE 1 n k = 1 n x 0 k x ^ 0 k x k × 100 %
Mean Squared ErrorMSE 1 n k = 1 n x 0 k x ^ 0 k 2
Mean Squared Logarithmic ErrorMSLE 1 n k = 1 n log x 0 k + 1 log x ^ 0 k + 1 2
Normalized Root Mean Square ErrorNRMSE 1 n k = 1 n x 0 k x ^ 0 k 2 x 0 k max x 0 k min
Percent BiasPibas k = 1 n x 0 k x ^ 0 k k = 1 n x ^ 0 k
Coefficient of DeterminationR 2 1 k = 1 n x 0 k x ^ 0 k 2 k = 1 n x 0 k x ¯ 0 2
Root Mean Square ErrorRMSE 1 n k = 1 n x 0 k x ^ 0 k 2
Root Mean Square Logarithmic ErrorRMSLE 1 n k = 1 n log x 0 k + 1 log x ^ 0 k + 1 2
Root Mean Square Percentage ErrorRMSPE 1 n k = 1 n x 0 k x ^ 0 k x k 2
Symmetric Mean Absolute Percentage ErrorSMAPE 1 n k = 1 n x 0 k x ^ 0 k 0.5 x 0 k + 0.5 x ^ 0 k × 100 %
Theil U Statistic 1U1 1 n k = 1 n x 0 k x ^ 0 k 2 1 n k = 1 n x 0 k 2 + 1 n k = 1 n x ^ 0 k 2
Theil U Statistic 2U2 1 n k = 1 n x 0 k x ^ 0 k 2 k = 1 n x 0 k 2
Table 3. Results of the metrics of the ten models with time lag τ = 18 .
Table 3. Results of the metrics of the ten models with time lag τ = 18 .
PLC-SVMSVMLSSVMGPRRFLGBMXGBCATBGRUAR
AE−30.4026−31.9230−131.3443−144.5654−105.5290−135.2636−86.1456251.4290161.4449−79.9688
ARE0.03720.03790.04890.05110.03870.04780.03980.08660.06220.0423
IA0.93710.93470.91140.92240.94920.91940.93540.54360.88560.9202
MAAPE0.03720.03790.04880.05100.03860.04770.03970.08590.06210.0422
MAE115.2908117.3400145.8265157.6530116.6060144.3825119.7081290.9737193.6625127.7884
MAPE3.72403.79354.89405.11233.86954.77753.97778.65886.22474.2286
MedAe94.820097.0436123.2384135.7107106.0792119.447988.2682205.1306178.984195.8115
MPE−1.3489−1.4058−4.5055−4.7546−3.5528−4.5172−3.00127.21345.0866−2.9021
MSE19,396.658019,970.173032,269.843333,986.855020,322.637133,104.551624,215.1642146,391.523551,779.693226,994.5302
MSLE0.00200.00210.00340.00330.00220.00340.00260.01420.00600.0028
NRMSE0.11810.11990.15240.15640.12090.15430.13200.32450.19300.1394
Pibas−0.0096−0.0101−0.0402−0.0440−0.0325−0.0413−0.02670.08710.0543−0.0249
R20.82280.81750.70510.68950.81430.69750.7787−0.33760.52690.7533
RMSE139.2719141.3159179.6381184.3552142.5575181.9466155.6122382.6115227.5515164.3001
RMSLE0.04460.04530.05870.05750.04640.05820.05060.11920.07770.0534
RMSPE0.04570.04640.06150.05990.04810.06100.05290.10870.07390.0559
SMAPE3.67333.74054.71804.94393.76094.60193.85619.27116.48154.1009
U10.02200.02230.02790.02860.02220.02820.02440.06330.03700.0257
U20.04410.04480.05690.05840.04520.05770.04930.12130.07210.0521
Table 4. Results of the metrics of the ten models with time lag τ = 24 .
Table 4. Results of the metrics of the ten models with time lag τ = 24 .
PLC-SVMSVMLSSVMGPRRFLGBMXGBCATBGRUAR
AE−69.5776−88.3138−146.5563−158.9098−122.4937−129.8408−140.6233198.5871−290.2731−126.2264
ARE0.03960.04170.05040.05320.04390.05040.05050.07420.10090.0488
IA0.93090.92620.91780.92000.93650.90440.90780.60710.71680.9054
MAAPE0.03950.04160.05030.05310.04380.05020.05030.07380.10010.0486
MAE120.1745125.5630152.0696163.6975132.1869152.4988151.0061248.7770298.6850145.9707
MAPE3.96174.17265.04125.31964.38565.03565.04927.420210.08534.8787
MedAe94.6035104.8577123.1539143.7897119.0864116.1281125.8849169.8898294.6203123.9808
MPE−2.5357−3.1289−4.8879−5.1840−4.1159−4.3672−4.73815.5953−9.8633−4.3340
MSE23,899.458826,147.968533,695.239336,364.693025,312.305339,167.088237,253.4219106,653.0431119,973.931035,560.0139
MSLE0.00250.00280.00350.00350.00270.00400.00390.01000.01220.0037
NRMSE0.13110.13720.15570.16170.13490.16790.16370.27700.29380.1599
Pibas−0.0217−0.0274−0.0446−0.0482−0.0376−0.0397−0.04290.0676−0.0847−0.0387
R20.78160.76110.69210.66770.76870.64210.65960.0255-0.09620.6751
RMSE154.5945161.7033183.5626190.6953159.0984197.9068193.0115326.5778346.3725188.5736
RMSLE0.05020.05270.05910.05940.05160.06290.06250.10020.11060.0611
RMSPE0.05240.05520.06200.06200.05360.06630.06590.09290.11930.0645
SMAPE3.85364.04264.85895.13724.24964.84064.84527.83829.42894.6886
U10.02430.02530.02850.02950.02480.03070.02990.05360.05260.0293
U20.04900.05130.05820.06040.05040.06270.06120.10350.10980.0598
Table 5. Results of the metrics of the ten models with time lag τ = 30 .
Table 5. Results of the metrics of the ten models with time lag τ = 30 .
PLC-SVMSVMLSSVMGPRRFLGBMXGBCATBGRUAR
AE−85.5618−123.2492−145.0989−157.4847−131.4612−128.6614−108.7032173.3637−95.0323−138.1859
ARE0.03900.04580.04920.05170.04470.04860.04280.06760.04020.0494
IA0.93210.91610.91960.92150.93450.90860.93170.65470.92350.9063
MAAPE0.03890.04570.04910.05160.04460.04850.04270.06730.04000.0492
MAE117.0587136.7596148.3506158.8722135.0314145.6343128.2523224.7215120.2737147.6295
MAPE3.89594.58384.91925.16954.47294.86304.27656.75654.01694.9359
MedAe94.3220111.6924114.1242129.1694121.9611123.7311108.4888147.031776.1974128.1547
MPE−3.0065−4.2051−4.8233−5.1293−4.3722−4.3847−3.67714.8896−3.2737−4.6751
MSE23,675.599930,820.159732,348.958434,821.787026,206.981533,890.035126,217.100685,262.685729,073.846635,635.1807
MSLE0.00250.00330.00340.00340.00270.00360.00280.00800.00310.0037
NRMSE0.13050.14890.15250.15830.13730.15610.13730.24770.14460.1601
Pibas−0.0266−0.0379−0.0444−0.0480−0.0404−0.0395−0.03360.0587−0.0295−0.0423
R20.77360.70530.69070.66710.74940.67600.74940.18480.72200.6593
RMSE153.8688175.5567179.8582186.6060161.8857184.0925161.9170291.9977170.5105188.7728
RMSLE0.05020.05710.05790.05830.05220.05970.05250.08940.05530.0611
RMSPE0.05250.06000.06060.06080.05430.06280.05470.08370.05860.0644
SMAPE3.77584.41704.74454.99234.33194.68014.13887.08363.86434.7417
U10.02420.02740.02800.02900.02520.02870.02530.04790.02670.0294
U20.04900.05590.05720.05940.05150.05860.05150.09290.05430.0601
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, X.; Cai, Y.; Yuan, H.; Deng, Y. Partially Linear Component Support Vector Machine for Primary Energy Consumption Forecasting of the Electric Power Sector in the United States. Sustainability 2023, 15, 7086. https://doi.org/10.3390/su15097086

AMA Style

Ma X, Cai Y, Yuan H, Deng Y. Partially Linear Component Support Vector Machine for Primary Energy Consumption Forecasting of the Electric Power Sector in the United States. Sustainability. 2023; 15(9):7086. https://doi.org/10.3390/su15097086

Chicago/Turabian Style

Ma, Xin, Yubin Cai, Hong Yuan, and Yanqiao Deng. 2023. "Partially Linear Component Support Vector Machine for Primary Energy Consumption Forecasting of the Electric Power Sector in the United States" Sustainability 15, no. 9: 7086. https://doi.org/10.3390/su15097086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop