Next Article in Journal
A Critical Review of Works Pertinent to the Einstein-Bohr Debate and Bell’s Theorem
Previous Article in Journal
Experimental and Numerical Peeling Investigation on Aged Multi-Layer Anti-Shatter Safety Films (ASFs) for Structural Glass Retrofit
Previous Article in Special Issue
Hybrid Ensemble Deep Learning-Based Approach for Time Series Energy Prediction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Short-Term Energy Forecasting Using Machine-Learning-Based Ensemble Voting Regression

1
Department of Computer Engineering, Jeju National University, Jeju-si 63243, Korea
2
Department of Computer Education, Teachers College, Jeju National University, 61 Iljudong-ro, Jeju-si 63294, Korea
*
Authors to whom correspondence should be addressed.
Symmetry 2022, 14(1), 160; https://doi.org/10.3390/sym14010160
Submission received: 5 November 2021 / Revised: 4 December 2021 / Accepted: 27 December 2021 / Published: 14 January 2022

Abstract

:
Meeting the required amount of energy between supply and demand is indispensable for energy manufacturers. Accordingly, electric industries have paid attention to short-term energy forecasting to assist their management system. This paper firstly compares multiple machine learning (ML) regressors during the training process. Five best ML algorithms, such as extra trees regressor (ETR), random forest regressor (RFR), light gradient boosting machine (LGBM), gradient boosting regressor (GBR), and K neighbors regressor (KNN) are trained to build our proposed voting regressor (VR) model. Final predictions are performed using the proposed ensemble VR and compared with five selected ML benchmark models. Statistical autoregressive moving average (ARIMA) is also compared with the proposed model to reveal results. For the experiments, usage energy and weather data are gathered from four regions of Jeju Island. Error measurements, including mean absolute percentage error (MAPE), mean absolute error (MAE), and mean squared error (MSE) are computed to evaluate the forecasting performance. Our proposed model outperforms six baseline models in terms of the result comparison, giving a minimum MAPE of 0.845% on the whole test set. This improved performance shows that our approach is promising for symmetrical forecasting using time series energy data in the power system sector.

1. Introduction

The energy sector is one of the influential factors affecting the economic development of a country. The economy might go down if there is not sufficient energy to provide to the end users [1]. Therefore, energy industries have to produce enough energy by balancing supply and consumption, making energy forecasting essential in the energy management system. Moreover, it also helps reduce operating and generating costs and conduct short-term scheduling functions in the power system. Forecasting can be divided into three terms based on predictive duration: short term, medium term, and long term [2]. This research mainly focuses on short-term energy forecasting because it considers next-hour prediction using hourly energy data.
Research on short-term energy forecasting has been conducted using different models that classify statistical time series and ML models. Autoregressive (AR), moving average (MA), autoregressive moving average (ARMA), etc., are included in the first group [3]. These models are beneficial for non-real-time forecasting, but they cannot handle nonlinear load consumption. Thus, ML algorithms have been widely used in forecasting to overcome nonlinear problems in energy data. They can be split into two groups: regression analysis and neural-network-based models. Many ML regressors such as decision trees regressors, boosting regressors, and bagging regressors have been investigated [4]. Among them, tree-based regressors such as extra trees regressor (ETR) [5], random forest regressor (RFR) [6], and classification and regression tree (CART) [7] have been commonly applied in energy forecasting because of their simple tree structure and easy understanding.
The random forest (RF) algorithm, proposed by Leo Breiman, is one of the ML algorithms that can predict a large amount of data [8]. The four main processes of the RF algorithm are bootstrap resampling, random feature selection, out-of-bag (OOB) error estimation, and a fully grown decision tree [9]. This algorithm generates multiple decision trees (so-called weak learners) from the given training samples. Each tree’s exact number of samples is randomly chosen to form a new training sample using bootstrap resampling. The unselected samples are defined as out-of-bag samples during the training process. Afterward, the RF tree is grown entirely using the selected new training sample without pruning. Unlike the CART, the RF determines only a small number of features randomly instead of all predictor features. After training multiple times, the decision trees are then created randomly as a forest. The RF algorithm estimates the OOB error during the forest construction instead of using cross-validation like other tree-based models. Finally, it collects predictions from all weak learners and combines them using the bagging ensemble method to perform the final predictions. Lahouar et al. conducted a day-ahead load forecasting using the RF technique [10]. Their research highlighted that the RF was flexible with expert selection, load profiles, and complex customer behaviors and proved to have better performance in all season and calendar effects. Dudek also investigated the RF algorithm, combining regression trees with only a few parameters for short-term load forecasting (STLF). His research also revealed that the RF had the benefit of eliminating nonstationarity and filtering trends and seasonal cycles longer than the daily cycle [11].
The ETR algorithm, also known as extremely randomized trees, was proposed by Geurts et al. [5]. It is also related to the class of tree-based ensemble methods for implementing classification and regression tasks. The ETR algorithm extends the randomization of the RF algorithm. ETR uses all the training samples to train each ensemble member, while RF uses the tree-bagging step to generate a training subset for each tree. When splitting the tree, ETR randomly selects the best feature and its corresponding value to reduce overfitting and obtain better performance than RFR [12]. Gabriel analyzed electric load forecasts using ETR, XGBoost, and statistical time series models. In their research, ETR provided the best forecasting performance over traditional time series methods in the case of lengthier historical load data [13]. Alawadi et al. also proved that the increment in forecasting time does not affect the accuracy of ETR after comparing it with multiple ML algorithms on indoor temperature forecasting, considering user comfort levels, and reducing energy consumption [14].
Among the most prominent ensemble ML algorithms, gradient boosting machine (GBM) algorithms are also popular because of their high flexibility and interpretability achieved by transforming weak learners into strong learners. In the case of regression or classification, their procedures involve multiple weak learners trained sequentially by reweighting the original training data. The final prediction is performed using a weighted majority vote of sequentially trained learners. Friedman proposed a statistical view of boosting as an additive logistic regression model in the early 20th century [15]. Subsequently, it was extended as the estimation of a function by optimizing a loss criterion via a steepest gradient descent in the function space [16,17]. Some recent boosting variants such as extreme gradient boosting (XGBoost) [18], light gradient boosting machine (LGBM) [19], and categorical boosting (CatBoost) [20] have been developed by focusing on the increment of speed and predictive performance, and achieved robustness results in real applications and forecasting competitions. In recent years, boosting-tree-based algorithms have been widely applied in many areas, namely, computer vision [21], biology [22], chemistry [23], energy [24], etc. In particular, boosting has provided great advances in the energy sector in terms of highest predictive performance [25,26,27].
Fix and Hodges first introduced the nearest neighbor method as nonparametric classification, one of the most straightforward techniques in predictive mining [28]. This method provides the actual regression function without making strong assumptions for the estimation. It is easily understandable as kernel and nearest neighbors regression estimators are local univariate estimators. The K-nearest neighbors (KNN) algorithm was proposed to find K training samples closest to the target in the training set [29]. The most relative K value is chosen based on the nearest distance to classify the input features. The majority of K-nearest neighbors then gather a similar group of the specific input training set. Therefore, the KNN algorithm mainly depends on the distance and voting function of the selected optimal value of K. Fan et al. applied the KNN algorithm to classify Chinese load patterns to improve the accuracy of STLF [30]. In their research, the Euclidean distance function was used as the weight of KNN for a better load classification. Moreover, the KNN algorithm was utilized to classify and predict hourly energy consumption data in the work of Wahid and Kim. They revealed the effectiveness of the KNN algorithm by observing hyperparameters [31].
It is hard to make a reliable prediction of the decision-making process in the energy sector because of asymmetric information. The proper forecasting method with symmetric errors must be selected to obtain better predictions to overcome this. Accordingly, in this paper, seventeen ML were trained and their forecasting performances compared using error measurements. After comparing seventeen ML algorithms, this research aimed to fit a voting regressor, which combined the five best ML models, including ETR, RFR, LGBM, GBM, and KNN. We included meteorological data to train all ML algorithms, including temperature, humidity, wind speed, etc. These features primarily affect energy consumption due to seasonal effects along with meterological conditions. Our ensemble proposed model could improve forecasting error using the five best ML models.

1.1. Contribution

The highlighted contributions of this research are as follows:
1.
A forecasting performance comparison of seventeen ML algorithms on a training set using error metrics was conducted during the training process.
2.
The top five ML algorithms, namely ETR, RFR, LGBM, GBR, and KNN, that had minimum errors were selected and combined to build the proposed VR algorithm.
3.
To conduct final predictions and improve accuracy, our ensemble VR model performed a majority voting and selected the best predictions among the five ML algorithms.
4.
The performance evaluation was finally conducted by comparing the proposed model and the five standalone ML and ARIMA models.

1.2. Paper Structure

The arrangement of this paper is described as follows. Existing researches on energy forecasting using different forecasting models are reviewed in Section 2. In Section 3, the primary proposed system is presented, which involves information about data collection, data analysis, input selection, and system modeling. Generated results of all models are discussed and compared in Section 4. The conclusion of this paper takes place in Section 5.

2. Prior Works

In recent years, ML algorithms have been commonly used in many research areas to lessen the workload. For instance, the GBM model has been proposed in many applications using different data such as image processing, bilogical data, chemical data, energy data [21,22,23,24]. In the study of Wu et al., the GBM-based multiple kernel learning (MKL) was proposed with the extension of transfer learning algorithms, including antagonistic, homogeneous, and heterogeneous. Their proposed framework on STLF provided better results than baseline models [32]. During the global energy forecasting competition 2012, Lloyd also applied the GBM algorithm and Gaussian process regression with three different kernel functions that were proven very effective for predictive modeling and load forecasting [33]. In the work of Friedrich and Afshari, the transfer function model was applied using load data from Abu Dhabi for one-day and two-day forecasting, and results were compared with autoregressive integrated moving average (ARIMA) and artificial neural network (ANN) [34].
The principles of similar pattern-based methods affecting short-term load forecasting (STLF) were presented by Dudek. Afterward, similar pattern-based local linear regression models were proposed using Polish power system data for STLF. His proposed stepwise and lasso regressors outperformed other benchmark models: ARIMA, ANN, exponential smoothing, and Nadaraya–Watson estimator [35,36,37]. For load forecasting, KNN and support vector machine (SVM) methods were implemented, whereas the feature selection was conducted by a DT regression and recursive feature elimination (RFE) [38]. Even though these above-mentioned ML standalone algorithms provide good results for energy forecasting, some algorithms often can not handle complex nonlinear relationships and have computational efficiency. Considering the weakness of ML algorithms, hybridizing two or more advanced algorithms can be considered to generate reliable forecasts by enhancing their performance.
The ensemble method based on two ML algorithms, such as variational mode decomposition (VMD) and extreme learning machine (ELM) was proposed for multi-step ahead load forecasting. This proposed model was optimized by differential evolution algorithms using two electric load series [39]. Likewise, the combination of DT and SVM using smart meter data was proposed for one-week test predictions by Zhang et al. and obtained better forecasting performance [40]. Jihoon et al. also performed a hybrid method that combined RFR and multilayer perceptron (MLP) using power consumption data to predict one week ahead. Better forecasting accuracy was obtained by using their hybrid model over standalone forecasting ML models [41]. The ensemble model including cat-boost (CB), GBM, and MLP joined along with feature selection by genetic algorithm (GA) was conducted and compared with other ML baseline models in the work of Khan and Byun. Moreover, they also hybridized XGBoost, SVM regression, and KNN regressor algorithms. Their results showed that the ensemble approach could improve the forecasting accuracy [42].
In addition, some authors created ensembles in two ways, averaging and stacking, by combining SVM, RFR, and deep belief network (DBN) forecasting models in solar power forecasting [43]. The SVM model was used to generate forecasts, and these results were combined by using the RFR model for solar power forecasting in the cited work [44]. Similarly, three different ensemble methods, namely, linear, normal distribution, and normal distribution with additional features, using seven ML algorithms were proposed by Mohammed and Aung. Their proposed methods were compared and outperformed statistical models [45]. Tree-based ensemble methods such as RFR and ETR were applied to the prediction of photovoltaic generation output, and their predictive results were better than SVM [46]. According to the improved accuracy and the confidence level of the forecasts from the above-mentioned cited papers, combining two or more forecasting models is a noticeably better approach than applying single algorithms. Therefore, this paper also adopts an ensemble technique, which votes among five ML algorithms to improve the forecasting accuracy.

3. Proposed System

This section mainly presents a detailed explanation of our proposed system, including data collection, data analysis, input selection, and system modeling.

3.1. Data Collection

Our energy consumption data was collected from four primary energy sources such as fossil-fuel-based energy (FF), behind-the-meter (BTM), photovoltaic (PV), and wind power sources (WP). We denoted our actual energy data as Total_Load in MW, which were used for target variables. Jeju Island has four regions, the so-called JEJU-SI, SEOGWIPO, SEONSAN, and GOSAN. Each area has its own weather station that provides nine weather features. As we used Total_Load for the whole Jeju Island, we also aggregated all weather information by averaging the percentages of the whole island corresponding to each region. Accordingly, 50% of JEJU-SI, 30% of SEOGWIPO, 10% of SEONSAN, and 10% of GOSAN data were used. Therefore, nine weather features were considered: three temperature features, including the average temperature (Total_TA), the temperature of dew point (Total_TD), and the sensible temperature (Total_ST) measured in degree Celsius ( C), the humidity (Total_HM) in %, the wind speed (Total_WS) in m/s, the wind direction degree in degree ( ), the atmospheric pressure on the ground (Total_PA) in hPa, the discomfort index (Total_DI), and the solar irradiation quantity (Total_SI) in (Mj/m 2 ). The source information of collected load and weather data is described in Figure 1.

3.2. Data Analysis and Input Selection

The descriptive statistics for each variable are indicated in Table 1. As our target variable is Total_Load, we considered the load from yesterday (Yes_Load) as one of the independent variables. Consequently, there were ten input variables, including Yes_Load and nine weather variables, to forecast the next day’s load. The minimum (Min), maximum (Max), mean ( μ ), standard deviation (SD), and coefficient of variation (CV) were calculated to see the extent of our variables. The mathematical expressions of μ , SD, and CV are:
μ = x / n
S D = ( x μ ) 2 / ( n 1 )
C V = S D / μ
where x are the observations and n is the number of observations.
In terms of CV in Table 1, all input variables except Total_SI obtained CV values less than 1, meaning a low variance in the data. However, the last variable had a high variance, showing a 1.67 CV value. To better understand the data, the correlation coefficient ( ρ ) of Spearman was determined to catch nonlinear monotonic correlation between two variables, as represented in Figure 2. It indicates a value between −1 and +1 and assumes a negative ρ is negatively correlated, while a positive one is positively correlated. There is no correlation between the two variables if the correlation coefficient is zero. Regarding the correlation diagram, Total_Load has a positive correlation with Yes_Load and vice versa. It is negatively correlated with Total_TA, Total_TD, Total_HM, Total_DI, and Total_ST, whereas there is close to zero correlation with Total_WS, Total_WD, Total_PA, and Total_SI. The Spearman’s rank correlation coefficient ( ρ ) can be formulated as follows:
ρ = 1 ( 6 d i 2 ) / n ( n 2 1 )
where d i is the difference between the two ranks of each observation and n is the number of observations.

3.3. System Modeling

The overall workflow of the proposed system is revealed in Figure 3. Initially, three-year raw data, including load and meteorological features, were loaded into the training process. Secondly, feature engineering involved data cleaning, data arrangement, and data splitting. In the process of data cleaning, we checked missing values and outliers in the raw data and then replaced outliers by moving averages. Cleaned data were then arranged according to ten independent variables and one target variable in order to train the ML algorithms. The next step consisted in splitting the arranged data into two sets, training and testing, based on predefined duration. The training data were provided to the training process of the model selection that was conducted using the so-called function compare_models from the PyCaret open source package. From this stage, seventeen ML models were generated from the training data, and error metrics among all models were compared. We then selected the five top ML algorithms with minimum errors and trained them to perform test predictions. Each single ML model performed predictions using one-year testing data. In Figure 3, y ^ E T R , y ^ R F R , y ^ L G B M , y ^ G B R , y ^ K N N represent the predictions of each ML model. Afterward, these five algorithms were combined to build a voting regressor (VR) algorithm that voted the best predictor among them, to fit the final trained VR. Final predictions of the trained VR ( y ^ V R ) on the testing set were executed. The evaluation of the proposed model was finally accomplished using error measurements. Five ML algorithms and the ARIMA model were compared with the proposed model with the purpose of a monthly result comparison.
The detailed error comparison of seventeen ML algorithms during the training process is indicated in Table 2. The computation time of each trained algorithm was also evaluated and is shown in the comparison table. The error performance on each algorithm was checked, and then we selected the top five algorithms with minimum errors among all. ETR, RFR, and LGBM models provide forecasting MAPEs of approximately 3% regardless of computing time. GBR and KNN have MAPEs of 3.7% and 4%, respectively. All the other ML models except Ada, PAR, and LLAR achieve MAPEs over 4% in energy forecasting. Ada and PAR obtain MAPE of about 5% and 6%, which can be considered acceptable performance. However, the LLAR algorithm is unsuitable for energy forecasting because it has the highest errors in all measurements.
During the training module, each trained forecasting model was fitted together with different parameters. Both ETR and RFR algorithms were trained with the number of trees in the forest (n_estimators = 100) and split criteria (MSE). Likewise, the LGBM and GBR algorithms used the same parameters and a 0.1 boosting learning rate. The alpha-quantile of the Huber loss function was fixed at a value of 0.9 for the GBR algorithm. Unlike the first four algorithms, the KNN model uses a local interpolation of the targets associated with the nearest neighbors in the training set to predict the target. Therefore, five neighbors, a leaf size of thirty, and standard Euclidean metric functions were tuned in the KNN algorithm. The order of the AR model (p), degree of differentiating (d), and order of the MA model (q) were, respectively, defined as one, zero, and one, to train the time series ARIMA model. The main criterion of the ARIMA model was Akaike’s information criterion (AIC) that estimates the relative quality of statistical models for a given set of data. In this research, all experiments were conducted with Google Colab Jupyter using a desktop machine with the following specifications: 11th Gen Intel Core i7 5.00 GHz processor, 16 GB RAM, 64-bit operating system, x64-based processor.

4. Result and Discussion

In this section, the accuracy of each model is computed on test predictions to compare monthly forecast results. To measure the accuracy of each month, MAPE, MAE, and MSE were chosen, and their mathematical expressions are as below:
M A P E = 1 / t ( t = 1 24 | y t y ^ | / y t ) × 100 %
M A E = 1 / t t = 1 24 | y t y ^ |
M S E = 1 / t t = 1 24 ( y t y ^ ) 2
where,
y t = actual energy value at time t,
y ^ = predicted energy value of y at time t,
t = hourly period per day.
The forecasts of the proposed model are discussed and compared with the results of the six benchmark models. Correspondingly, the comparisons of monthly MAPE, MAE, and MSE are indicated in Table 3, Table 4 and Table 5, respectively. Overall, the proposed VR model outperforms other forecasting baseline models in all error measurements, with an MAPE of 4.28%, MAE of 29.33 MW, and MSE of 1549.51 MW 2 . We can rank LGBM and GBR models as second because they provide better accuracy than others, showing a MAPE around 4.3%, MAE of 29 MW, and MSE of 1580 MW 2 each. Next, both ETR and RFR models have an MAPE, MAE, and MSE of about 4.4%, 30 MW, and 1650 MW 2 , respectively. The KNN model is ranked last among all ML algorithms, with an MAPE of 4.66%, MAE of 31.66 MW, and MSE of 1854.29 MW 2 . Against our proposed VR and other ML models, the statistical time series ARIMA model has the worst performance, with an MAPE, MAE, and MSE of over 12%, 87 MW, and 12,000 MW 2 , respectively. Therefore, all ML models generally provided adaptable accuracy performance because five ML benchmark models were chosen from the top performing among seventeen ML models during the training process.
All forecasting models except ARIMA react with almost similar accuracy in terms of all error metrics for all months. Consequently, the proposed ensemble VR model was mainly selected to analyze MAPE measurements of each month. A lower MAPE at around 3% is observed in June, October, and November, followed by 3.5% in May, which has many holidays. The MAPEs increase to just over 4% in July, August, September, and April and approximately 5% in December, January, and March. The highest MAPE in all ML models occurs at around 6% because of holidays and heat consumption in February, while the ARIMA model obtains 15.39%.
Furthermore, we also calculated the seasonal MAPE to check whether season effects affected energy consumption as Korea has four seasons: spring, summer, fall, and winter. The seasonal MAPE comparison between the proposed VR and baseline models is indicated in Table 6. Regardless of the models, it can be observed that the fall season has the lowest MAPE with 3.5% for all ML models because of less energy consumption. In the winter, people consume more electricity than usual due to the cold weather and heating system, showing an MAPE of about 5%. Spring is the season between winter and summer, so the MAPE is a little higher than summer because of weather effects. The KNN and ARIMA models provide a worse MAPE than that of the other models in all seasons.
The difference between actual energy and predictions of all ML models for the best week is shown in Figure 4. The best week predictions of all models can be observed in the second week of October 2018. Thus, it is evident that predictions of all models perform very well. In the Figure, the actual energy and prediction of the proposed model are shown as a blue line and red dash line, respectively. In contrast, the predictions of other benchmark models are denoted as a green line for ETR, a black line for LGBM, a purple line for RFR, an aqua line for GBR, and an orange line for KNN. All models predict almost precisely the actual energy, although some gaps occur on 8 October and 10 October. The predictive load fluctuation varies from 450 MW to over 600 MW.
Moreover, the best predicted day was excluded from all tested days to explore how the minimum accuracy of our proposed voting regressor performs. It is presented in Figure 5, where individually, blue, red, and orange-dashed lines represent the actual energy, prediction, and MAPE, respectively. On 17 October 2018, the proposed model obtained an average MAPE of around 0.845%. MAPEs of less than 1% are achieved during the whole day, except for high MAPEs at night. Subsequently, our proposed model could improve the forecasting performance based on the generated relevant results.

5. Conclusions

This paper primarily targeted multiple ML algorithms on a training set and compared them using error metrics. Five ML regressors with minimum errors were then chosen to build the proposed VR model during the training process. Our proposed ensemble VR model voted the best predictor on each test point among five ML algorithms and predicted the outcomes on the whole testing set. In the experiment, three-year hourly based energy data were applied along with meteorological data collected from four different stations. The duration of the test predictions for all forecasting models ranged from June 2018 to May 2019. Three error metrics, MAPE, MAE, and MSE, were measured monthly to test the predictions of all models. According to the error measurements, our proposed model provided a higher accuracy than the other five standalone ML models and statistical ARIMA model, contributing to an average MAPE of 4.28%. Thus, we could improve the forecasting accuracy by applying our ensemble model trained on different datasets in other areas.

Author Contributions

Conceptualization, P.-P.P.; methodology, P.-P.P.; software, P.-P.P.; validation, P.-P.P.; formal analysis, P.-P.P.; investigation, P.-P.P.; writing—original draft preparation, P.-P.P.; writing—review and editing, Y.-C.B.; visualization, P.-P.P.; supervision, Y.-C.B.; funding acquisition, Y.-C.B.; supervision, N.P.; funding acquisition, N.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2019S1A5C2A04083374), and this work was supported by the Korea Foundation for the Advancement of Science and Creativity (KOFAC) grant funded by the Korean government (MOE).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Khan, P.W.; Byun, Y.C.; Lee, S.J.; Kang, D.H.; Kang, J.Y.; Park, H.S. Machine learning-based approach to predict energy consumption of renewable and nonrenewable power sources. Energies 2020, 13, 4870. [Google Scholar] [CrossRef]
  2. Phyo, P.P.; Jeenanunta, C.; Hashimoto, K. Electricity load forecasting in Thailand using deep learning models. Int. J. Electr. Electron. Eng. Telecommun. 2019, 8, 221–225. [Google Scholar] [CrossRef]
  3. Hagan, M.T.; Behr, S.M. The time series approach to short term load forecasting. IEEE Trans. Power Syst. 1987, 2, 785–791. [Google Scholar] [CrossRef]
  4. Fernández-Delgado, M.; Sirsat, M.S.; Cernadas, E.; Alawadi, S.; Barro, S.; Febrero-Bande, M. An extensive experimental survey of regression methods. Neural Netw. 2019, 111, 11–34. [Google Scholar] [CrossRef]
  5. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef] [Green Version]
  6. Jin, Z.; Shang, J.; Zhu, Q.; Ling, C.; Xie, W.; Qiang, B. RFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis. Lect. Notes Comput. Sci. 2020, 12343 LNCS, 503–515. [Google Scholar]
  7. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Routledge: Oxfordshire, UK, 2017. [Google Scholar]
  8. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  9. Jiang, R.; Tang, W.; Wu, X.; Fu, W. A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinform. 2009, 10, S65. [Google Scholar] [CrossRef] [Green Version]
  10. Lahouar, A.; Slama, J.B.H. Day-ahead load forecast using random forest and expert input selection. Energy Convers. Manag. 2015, 103, 1040–1051. [Google Scholar] [CrossRef]
  11. Dudek, G. Short-Term Load Forecasting using Random Forests. In Proceedings of the 7th IEEE International Conference Intelligent Systems IS’2014, Warsaw, Poland, 24–26 September 2014. [Google Scholar]
  12. John, V.; Liu, Z.; Guo, C.; Mita, S.; Kidono, K. Real-time lane estimation using deep features and extra trees regression. In Image and Video Technology; Springer: Berlin/Heidelberg, Germany, 2015; pp. 721–733. [Google Scholar]
  13. Dada, G.I. Analysis of Electric Load Forecasts Using Machine Learning Techniques. Ph.D. Thesis, National College of Ireland, Dublin, Ireland, 2019. [Google Scholar]
  14. Alawadi, S.; Mera, D.; Fernández-Delgado, M.; Alkhabbas, F.; Olsson, C.M.; Davidsson, P. A comparison of machine learning algorithms for forecasting indoor temperature in smart buildings. Energy Syst. 2020, 1–17. [Google Scholar] [CrossRef] [Green Version]
  15. Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
  16. Friedman, J. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  17. Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
  18. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  19. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
  20. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 1–23. [Google Scholar]
  21. Zhang, F.; Du, B.; Zhang, L. Scene classification via a gradient boosting random convolutional network framework. IEEE Trans. Geosci. Remote Sens. 2015, 54, 1793–1802. [Google Scholar] [CrossRef]
  22. Lei, X.; Fang, Z. GBDTCDA: Predicting circRNA-disease associations based on gradient boosting decision tree with multiple biological data fusion. Int. J. Biol. Sci. 2019, 15, 2911. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Lu, J.; Lu, D.; Zhang, X.; Bi, Y.; Cheng, K.; Zheng, M.; Luo, X. Estimation of elimination half-lives of organic chemicals in humans using gradient boosting machine. Biochim. Biophys. Acta (BBA)-Gen. Subj. 2016, 1860, 2664–2671. [Google Scholar] [CrossRef]
  24. Lu, H.; Cheng, F.; Ma, X.; Hu, G. Short-term prediction of building energy consumption employing an improved extreme gradient boosting model: A case study of an intake tower. Energy 2020, 203, 117756. [Google Scholar] [CrossRef]
  25. Bogner, K.; Pappenberger, F.; Zappa, M. Machine learning techniques for predicting the energy consumption/production and its uncertainties driven by meteorological observations and forecasts. Sustainability 2019, 11, 3328. [Google Scholar] [CrossRef] [Green Version]
  26. Zhang, Y.; Haghani, A. A gradient boosting method to improve travel time prediction. Transp. Res. Part C Emerg. Technol. 2015, 58, 308–324. [Google Scholar] [CrossRef]
  27. Touzani, S.; Granderson, J.; Fernandes, S. Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy Build. 2018, 158, 1533–1543. [Google Scholar] [CrossRef] [Green Version]
  28. Fix, E.; Hodges, J.L. Nonparametric discrimination: Consistency properties. Randolph Field Tex. Proj. 1951, 57, 21–49. [Google Scholar]
  29. Altman, N.S. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. Am. Stat. 1992, 46, 175–185. [Google Scholar]
  30. Fan, G.F.; Guo, Y.H.; Zheng, J.M.; Hong, W.C. Application of the weighted k-nearest neighbor algorithm for short-term load forecasting. Energies 2019, 12, 916. [Google Scholar] [CrossRef] [Green Version]
  31. Wahid, F.; Kim, D. A prediction approach for demand analysis of energy consumption using k-nearest neighbor in residential buildings. Int. J. Smart Home 2016, 10, 97–108. [Google Scholar] [CrossRef] [Green Version]
  32. Xiao, L.; Wang, J.; Hou, R.; Wu, J. A combined model based on data pre-analysis and weight coefficients optimization for electrical load forecasting. Energy 2015, 82, 524–549. [Google Scholar] [CrossRef]
  33. Lloyd, J.R. GEFCom2012 hierarchical load forecasting: Gradient boosting machines and Gaussian processes. Int. J. Forecast. 2014, 30, 369–374. [Google Scholar] [CrossRef] [Green Version]
  34. Friedrich, L.; Afshari, A. Short-term Forecasting of the Abu Dhabi Electricity Load Using Multiple Weather Variables. Energy Procedia 2015, 75, 3014–3026. [Google Scholar] [CrossRef] [Green Version]
  35. Dudek, G. Pattern similarity-based methods for short-term load forecasting-Part 2: Models. Appl. Soft Comput. J. 2015, 36, 422–441. [Google Scholar] [CrossRef]
  36. Dudek, G. Pattern similarity-based methods for short-term load forecasting-Part 1: Principles. Appl. Soft Comput. J. 2015, 37, 277–287. [Google Scholar] [CrossRef]
  37. Dudek, G. Pattern-based local linear regression models for short-term load forecasting. Electr. Power Syst. Res. 2016, 130, 139–147. [Google Scholar] [CrossRef]
  38. Ashfaq, T.; Javaid, N. Short-term electricity load and price forecasting using enhanced KNN. In Proceedings of the 2019 International Conference on Frontiers of Information Technology, Islamabad, Pakistan, 16–18 December 2019; pp. 266–271. [Google Scholar] [CrossRef]
  39. Lin, Y.; Luo, H.; Wang, D.; Guo, H.; Zhu, K. An ensemble model based on machine learning methods and data preprocessing for short-term electric load forecasting. Energies 2017, 10, 1186. [Google Scholar] [CrossRef] [Green Version]
  40. Zhang, X.; Cheng, M.; Liu, Y.; Li, D.H.; Wu, R.M. Short-term load forecasting based on big data technologies. Appl. Mech. Mater. 2014, 687–691, 1186–1192. [Google Scholar] [CrossRef]
  41. Moon, J.; Kim, Y.; Son, M.; Hwang, E. Hybrid short-term load forecasting scheme using random forest and multilayer perceptron. Energies 2018, 11, 3283. [Google Scholar] [CrossRef] [Green Version]
  42. Khan, P.W.; Byun, Y.C. Adaptive Error Curve Learning Ensemble Model for Improving Energy Consumption Forecasting. Comput. Mater. Contin. 2021, 69, 1893–1913. [Google Scholar] [CrossRef]
  43. Amarasinghe, P.A.; Abeygunawardana, N.S.; Jayasekara, T.N.; Edirisinghe, E.A.; Abeygunawardane, S.K. Ensemble models for solar power forecasting-a weather classification approach. AIMS Energy 2020, 8, 252–271. [Google Scholar] [CrossRef]
  44. Abuella, M.; Chowdhury, B. Random forest ensemble of support vector regression models for solar power forecasting. In Proceedings of the 2017 IEEE Power and Energy Society Innovative Smart Grid Technologies Conference, Torino, Italy, 26–29 September 2017. [Google Scholar] [CrossRef] [Green Version]
  45. Mohammed, A.A.; Aung, Z. Ensemble learning approach for probabilistic forecasting of solar power generation. Energies 2016, 9, 1017. [Google Scholar] [CrossRef]
  46. Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Tree-based ensemble methods for predicting PV power generation and their comparison with support vector regression. Energy 2018, 164, 465–474. [Google Scholar] [CrossRef]
Figure 1. The source information of collected load and weather data.
Figure 1. The source information of collected load and weather data.
Symmetry 14 00160 g001
Figure 2. Correlations for all variables.
Figure 2. Correlations for all variables.
Symmetry 14 00160 g002
Figure 3. The workflow of the proposed ML-based ensemble voting regression mechanism.
Figure 3. The workflow of the proposed ML-based ensemble voting regression mechanism.
Symmetry 14 00160 g003
Figure 4. The comparison between actual and prediction on best predicted week of all models.
Figure 4. The comparison between actual and prediction on best predicted week of all models.
Symmetry 14 00160 g004
Figure 5. The best predicted day of the proposed model.
Figure 5. The best predicted day of the proposed model.
Symmetry 14 00160 g005
Table 1. Descriptive statistics of input variables.
Table 1. Descriptive statistics of input variables.
No      VariablesMinMaxMeanSDCVUnit
1Total_Load233951629.56103.320.16MW
2Yes_Load233951629.56103.320.16MW
3Total_TA−2.4033.8016.817.980.47 C
4Total_TD−10.7028.3011.469.480.83 C
5Total_HM2199.2072.3314.670.20%
6Total_WS022.403.041.460.48m/s
7Total_WD0360190.5584.520.44
8Total_PA704.301032.401011.9011.680.01hPa
9Total_DI30.1086.5061.9712.350.20-
10Total_ST−7.4033.8016.248.810.54 C
11Total_SI02.400.320.541.67Mj/m 2
Table 2. Comparison of error metrics and computation time for all trained ML algorithms.
Table 2. Comparison of error metrics and computation time for all trained ML algorithms.
NoML AlgorithmMAPE
(%)
MAE
( MW )
MSE
( MW 2 )
Time
(s)
1Extra trees regressor (ETR)3.1019.37726.742.79
2Random forest regressor (RFR)3.2020.05771.275.88
3Light gradient boosting machine (LGBM)3.4020.88808.370.40
4Gradient boosting regressor (GBR)3.7022.71947.631.40
5K neighbors regressor (KNN)4.0025.061175.540.08
6Bayesian ridge (BR)4.2025.771281.170.02
7Linear regression (LR)4.2025.781281.170.16
8Lasso regression (Lasso)4.1025.731286.850.03
9Ridge regression (Ridge)4.2025.781281.170.02
10Huber regressor (Huber)4.1025.441300.880.19
11Elastic net (EN)4.2025.831305.450.03
12Orthogonal matching pursuit (OMP)4.3027.071427.190.02
13AdaBoost regressor (Ada)5.2030.611474.620.72
14Decision tree regressor (DT)4.4027.261549.400.10
15Least angle regression (LAR)4.8029.661648.040.02
16Passive aggressive regressor (PAR)6.6041.372890.090.03
17Lasso least angle regression (LLAR)13.2079.509831.520.02
Table 3. Monthly MAPE comparison between the proposed model and six baseline models in percent.
Table 3. Monthly MAPE comparison between the proposed model and six baseline models in percent.
ETR   RFR   LGBM     GBRKNNARIMA    Proposed VR    
June3.593.623.283.313.7711.483.31
July4.504.594.474.384.2516.974.24
August4.354.334.584.565.0918.734.36
September4.484.524.384.204.7013.414.20
October3.343.243.033.113.6011.153.14
November3.403.413.353.433.718.073.32
December4.854.834.704.705.1612.594.71
January5.345.335.265.275.4418.055.20
February6.046.236.026.206.6615.396.10
March5.085.234.914.815.1110.684.89
April4.584.674.454.404.677.444.43
May3.643.703.463.543.928.143.52
Average    4.424.464.314.324.6612.684.28
Table 4. Monthly MAE comparison between proposed model and six baseline models in MW.
Table 4. Monthly MAE comparison between proposed model and six baseline models in MW.
ETR    RFR   LGBM  GBR  KNN  ARIMA  Proposed VR     
June22.2422.4620.2520.3222.7562.1620.38
July33.4734.0933.3532.5830.77127.9131.50
August34.2233.8636.1936.2640.13151.5434.56
September28.2228.5627.5326.0829.0977.9426.27
October19.2518.7517.4817.9820.7557.6018.20
November20.5720.6320.3420.8322.4246.0220.16
December34.0534.0033.2233.2736.3792.0233.27
January41.1640.9740.5540.7141.54141.6440.01
February44.2545.5743.9845.2748.40116.9544.57
March35.4836.6634.1233.5835.3477.4434.11
April29.4636.6628.5228.1629.8148.4928.41
May21.9422.0420.8521.2923.4445.3721.23
Average    30.3030.5729.6429.6331.6687.1629.33
Table 5. Monthly MSE comparison between proposed model and six baseline models in MW 2 .
Table 5. Monthly MSE comparison between proposed model and six baseline models in MW 2 .
ETRRFRLGBMGBRKNNARIMAProposed VR
June925.66957.60811.24834.40919.926202.86790.07
July1925.442018.451910.071831.401786.8724,061.931737.92
August2112.712050.422348.142309.573165.2333,180.542186.10
September1356.391381.261263.881180.531502.928314.911175.74
October679.66662.24555.97567.94772.935604.40603.30
November724.45739.55704.54728.45887.753226.72701.97
December1778.521803.591733.201736.542115.4712,465.831732.65
January2729.312682.772618.842648.642908.4624,611.422582.60
February3289.473452.913228.473345.293930.3818,199.153305.67
March2039.322173.711890.861859.062049.589063.551881.25
April1330.221399.041253.991264.371429.824048.371257.31
May790.84801.67732.51734.64895.443204.19735.05
Average1632.801668.551580.601578.791854.2912,715.921549.51
Table 6. Seasonal MAPE comparison between our proposed model and six baseline models in percent.
Table 6. Seasonal MAPE comparison between our proposed model and six baseline models in percent.
ETR    RFR   LGBM   GBR   KNN   ARIMAProposed VR    
Spring4.444.524.274.254.578.774.28
Summer      4.154.194.124.104.3815.783.98
Fall3.743.723.583.584.0010.883.56
Winter5.395.445.315.375.7215.355.32
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Phyo, P.-P.; Byun, Y.-C.; Park, N. Short-Term Energy Forecasting Using Machine-Learning-Based Ensemble Voting Regression. Symmetry 2022, 14, 160. https://doi.org/10.3390/sym14010160

AMA Style

Phyo P-P, Byun Y-C, Park N. Short-Term Energy Forecasting Using Machine-Learning-Based Ensemble Voting Regression. Symmetry. 2022; 14(1):160. https://doi.org/10.3390/sym14010160

Chicago/Turabian Style

Phyo, Pyae-Pyae, Yung-Cheol Byun, and Namje Park. 2022. "Short-Term Energy Forecasting Using Machine-Learning-Based Ensemble Voting Regression" Symmetry 14, no. 1: 160. https://doi.org/10.3390/sym14010160

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop