Ensemble Machine Learning Approaches for Prediction of Türkiye’s Energy Demand

Kayacı Çodur, Merve

doi:10.3390/en17010074

Open AccessArticle

Ensemble Machine Learning Approaches for Prediction of Türkiye’s Energy Demand

by

Merve Kayacı Çodur

Industrial Engineering Department, Faculty of Engineering and Architecture, Erzurum Technical University, 25200 Erzurum, Türkiye

Energies 2024, 17(1), 74; https://doi.org/10.3390/en17010074

Submission received: 6 October 2023 / Revised: 18 December 2023 / Accepted: 19 December 2023 / Published: 22 December 2023

(This article belongs to the Special Issue Advanced Machine Learning Applications in Modern Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Energy demand forecasting is a fundamental aspect of modern energy management. It impacts resource planning, economic stability, environmental sustainability, and energy security. This importance is making it critical for countries worldwide, particularly in cases like Türkiye, where the energy dependency ratio is notably high. The goal of this study is to propose ensemble machine learning methods such as boosting, bagging, blending, and stacking with hyperparameter tuning and k-fold cross-validation, and investigate the application of these methods for predicting Türkiye’s energy demand. This study utilizes population, GDP per capita, imports, and exports as input parameters based on historical data from 1979 to 2021 in Türkiye. Eleven combinations of all predictor variables were analyzed, and the best one was selected. It was observed that a very high correlation exists among population, GDP, imports, exports, and energy demand. In the first phase, the preliminary performance was investigated of 19 different machine learning algorithms using 5-fold cross-validation, and their performance was measured using five different metrics: MSE, RMSE, MAE, R-squared, and MAPE. Secondly, ensemble models were constructed by utilizing individual machine learning algorithms, and the performance of these ensemble models was compared, both with each other and the best-performing individual machine learning algorithm. The analysis of the results revealed that placing Ridge as the meta-learner and using ET, RF, and Ridge as the base learners in the stacking ensemble model yielded the highest R-squared value, which was 0.9882, indicating its superior performance. It is anticipated that the findings of this research can be applied globally and prove valuable for energy policy planning in any country. The results obtained not only highlight the accuracy and effectiveness of the predictive model but also underscore the broader implications of this study within the framework of the United Nations’ Sustainable Development Goals (SDGs).

Keywords:

energy demand; ensemble machine learning; SDGs; Türkiye

1. Introduction

Energy plays a vital role in supporting the social and economic development of a country from past to present. It drives economic growth, improves living standards, supports social services, enhances national security, and contributes to environmental sustainability. Ensuring reliable, affordable, and sustainable energy access is essential for a country’s overall development and progress. Therefore, the development of energy policies and the estimation of energy demand are a most important priority for developed and developing countries. Türkiye is the 19th largest economy in the world, with a gross domestic product (GDP) of roughly USD 906 billion [1]. Therefore, as a fast-growing economy, Türkiye’s energy consumption has undergone significant growth and diversification over the years. As a rapidly developing country, Türkiye has experienced a substantial increase in energy demand due to population growth, urbanization, industrialization, and economic expansion. In terms of energy sources, Türkiye has a mix of fossil fuels, renewable energy, and imports. Electricity consumption constitutes a significant portion of Türkiye’s energy consumption. However, the energy production in Türkiye is rather low, in spite of the considerable increase in energy consumption.

In 2021, Türkiye produced approximately 41.3 MTOE (million tons of oil equivalent), mostly based on coal and lignite (57.5%). In contrast, the country’s energy consumption reached approximately 109.8 MTOE in the same year. This substantial gap between energy production and consumption has led to Türkiye becoming one of the major energy-importing countries in Europe [2]. According to a report by the Ministry of Energy and Natural Resources in September 2021, Türkiye’s energy dependency rate was reported to be around 74%. This means that Türkiye relied on imports to meet approximately 74% of its total energy consumption. The country heavily depended on imports of natural gas and oil to bridge the energy gap between domestic production and demand [3].

To provide energy either by importing or by producing it, forecasting energy consumption, and analyzing the relationship between energy demand and supply, are crucial issues in short- and long-term energy planning. Managing energy demand also involves identifying and prioritizing energy resources, optimizing energy utilization, improving energy efficiency, shaping policy decisions, and devising strategies to reduce emissions.

Furthermore, it is important to emphasize that the United Nations’ SDGs provide a comprehensive blueprint for addressing global challenges and promoting sustainability by 2030 [4]. This study aligns closely with several of these goals, including Goal 7, Goal 8, and Goal 13, making a significant contribution to the broader aims of sustainable development. Accurate energy demand forecasting plays a central role in achieving ‘Goal 7: Affordable and Clean Energy’. By optimizing energy production, distribution, and consumption, this study facilitates the provision of affordable, reliable, and clean energy. This, in turn, supports economic growth, enhances energy access, and reduces environmental impacts. Moreover, this study directly addresses ‘Goal 13: Climate Action’ by mitigating the effects of climate change. Through precise energy demand predictions, it empowers Türkiye to make informed decisions that reduce greenhouse gas emissions, promote renewable energy adoption, and foster a low-carbon, sustainable energy sector. Energy efficiency serves as a catalyst for economic growth and job creation, aligning with ‘Goal 8: Decent Work and Economic Growth’. This study enables Türkiye to implement energy-efficient measures, leading to cost savings for industries and households.

Researchers have developed various statistical techniques, meta-heuristic algorithms, and artificial intelligence techniques in energy modeling. Artificial neural networks (ANNs) have garnered significant interest in energy planning due to their ability to handle complex nonlinear relationships between input and output data [5]. ANNs have been applied in various energy forecasting applications, including gas consumption [6], energy demand [7], electricity consumption [8], transportation energy demand [9,10,11], energy source analysis [12], and energy dependency [7]. Apart from ANNs, other prediction methods have emerged, such as fuzzy logic, adaptive network-based fuzzy inference systems (ANFIS), and general machine learning algorithms [13,14,15]. It is important to recognize that artificial neural networks (ANNs) are a subfield of machine learning (ML), which, in itself, is a subset of artificial intelligence (AI). Frequently, the terms AI, ML, and deep learning (DL) are used interchangeably to refer to intelligent systems or software. DL, specifically, extends the concept of ANNs by incorporating extra hidden layers and employing specialized activation functions that are not typically found in traditional ANN models.

AI-based prediction models have received considerable interest for solving a variety of problems in energy planning recently. These models leverage the power of artificial intelligence techniques to forecast future outcomes based on historical data patterns. These models use advanced algorithms and machine learning methods to analyze large datasets, identify patterns, and make predictions.

Research on predicting Türkiye’s energy requirements began in the 1960s, with the State Planning Organization (SPO) employing basic regression techniques for energy forecasting. In the late 1970s, the Ministry of Energy and Natural Resources (MENR) and the Turkish Statistical Institute (TSK) started preparing energy demand projections [16], but the estimated values provided by MENR were found to be higher than the actual energy demand [17]. Numerous econometric modeling techniques were applied to forecasting energy consumption after 1984. The Model for Analysis of Energy Demand (MAED) is the most frequently used approach developed by MENR [18]. Nevertheless, the energy demand predictions generated by MAED continued to overstate actual demand, rendering them unreliable [19,20]. Utgikar and Scott [21] conducted an inquiry to understand the reasons behind unsuccessful energy predictions. They found that, although statistical models are favored by researchers due to their simplicity, they tend to deliver acceptable results only for short-term periods, while becoming increasingly unstable for longer-term forecasts. While statistical models have contributed valuable insights to energy forecasting studies, El-Telbany and El-Karmi [22] argue that these models, which rely on statistical methods, perform well under normal conditions but struggle to account for sudden changes in environmental or sociological variables. Due to the reasons discussed above, researchers have become interested in the field of energy demand modeling.

AI-based prediction methods utilize advanced algorithms and models that learn patterns, relationships, and trends from historical data to make accurate predictions about future events or outcomes. AI-based prediction methods have gained significant popularity due to their ability to handle complex, non-linear relationships and adapt to changing data patterns. Machine learning (ML) is an AI-based prediction method, which encompasses a range of algorithms that automatically learn and improve from data without being explicitly programmed. Deep learning (DL), a subset of ML, has emerged as a powerful AI-based prediction method.

AI-based prediction methods have demonstrated their effectiveness in various fields, including finance, healthcare, weather forecasting, sales forecasting, demand prediction, and fraud detection [23,24,25,26]. However, it is important to note that, when applying these methods to real-world scenarios, factors such as data quality, feature selection, model complexity, and interpretability should be carefully considered. Machine learning approaches can be broadly categorized into three main types: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning algorithms include classification and regression tasks. Regression is used for energy demand in this work. In machine learning, ensemble learning and deep learning methods outperform traditional algorithms. Ensemble methods are learning algorithms that build a set of classifiers and then classify new data points by taking (weighted) votes of their predictions. The effectiveness of an ensemble method depends on several factors, including how the underlying models are trained and how they are combined. In the literature, there are common approaches to building ensemble models that have been successfully demonstrated in various domains [27,28,29].

Ensemble machine learning is a powerful technique in the field of machine learning that involves combining the predictions of multiple individual models (base models) to create a more accurate and robust predictive model. This approach, preferred over single methods, offers several key advantages. Firstly, ensembles enhance prediction accuracy by aggregating multiple models, reducing errors and biases. Secondly, they mitigate overfitting, a common issue in machine learning, by balancing out individual model weaknesses. Moreover, ensembles prove their robustness by effectively handling noisy data and outliers, making them suitable for real-world applications [30]. In automated decision-making applications, especially in engineering, ensemble methods have demonstrated superior performance compared to individual learners. This is attributed to their ability to capture diverse patterns, reduce bias and variance, and improve generalization. Ensemble methods are particularly effective when there is a large amount of data, complex relationships, and a need for high predictive accuracy.

Common ensemble method strategies comprise bagging, boosting, blending, and stacking. Bagging, exemplified by the Random Forest algorithm, enhances model robustness by reducing overfitting and improving prediction accuracy through the wisdom of the crowd [31]. Boosting is another ensemble technique that iteratively builds a strong predictive model by giving more weight to the data points that previous models misclassified [32]. Bagging reduces variance by averaging over multiple models; boosting focuses on reducing bias through weighted data points. Blending, sometimes referred to as model stacking or meta-ensembling, involves training multiple diverse base models on the same dataset and then combining their predictions using a separate model trained on the validation set. Stacking, similar to blending, combines multiple base models to form a meta-model but differs in its approach. In stacking, the predictions of the base models serve as input features for a meta-model, which learns to make the final predictions [33]. Blending combines diverse models with a separate meta-model, and stacking uses base models to create a meta-model for predictions. The choice among these ensemble methods depends on the specific problem, dataset, and the trade-off between bias and variance in the model [28].

The main objective of this study is to use ensemble machine learning methodologies, which have not received much attention in prior research on energy, to assess energy demand in Türkiye. In this paper, several significant contributions to the field of energy demand forecasting are presented. This study stands out for its exhaustive examination of 19 distinct ML algorithms, evaluated using five different performance metrics, offering a detailed understanding of their strengths and weaknesses. The study involves comprehensive hyperparameter tuning, ensuring that the models are finely tailored to Türkiye’s energy demand data, enhancing their predictive accuracy. Additionally, the utilization of ensemble methods, which combine the predictions of multiple ML algorithms, leveraging their individual strengths, has led to an improved forecasting performance compared to relying on a single algorithm. This approach contributes to the understanding of how different ensemble strategies can be applied effectively in the domain of energy forecasting and provides valuable insights for future research and applications. To the best of my knowledge, this paper is the first to investigate Türkiye’s energy demand using ensemble ML models. Collectively, these innovative elements contribute not only to the accuracy and efficacy of the predictive model but also have broader implications for energy policy planning, aligning with the United Nations’ SDGs.

This paper is organized as follows: Section 2 will provide an overview of the scope and definition of energy demand studies. Section 3 will present the primary methods and approaches employed in energy demand study, along with the main data sources and challenges associated with energy. In Section 4, the principal findings and trends will be discussed using various ensemble learning algorithms. Section 5 will summarize the main implications and recommendations from the results of energy demand study. Finally, this paper will conclude with a discussion of limitations and directions for future research.

2. Literature Review

Energy demand forecasting is an important task for planning and managing energy systems. It involves predicting the future energy consumption of different sectors, regions, or appliances based on various factors such as weather, economic activity, population, lifestyle, etc. This literature review aims to summarize some of the key findings and trends from recent articles on this topic.

2.1. Review of Energy Demand Forecasting in the World

Global energy demand has been affected by the COVID-19 pandemic and the economic recovery in 2021. According to the Global Energy Review 2021 by the International Energy Agency (IEA), global energy demand is expected to grow by 4.6% in 2021. The IEA projects that global energy demand will grow by 0.8% per year on average between 2021 and 2030 in its Stated Policies Scenario (STEPS), which reflects current and announced policies and targets [34].

This literature review provides a brief overview of some of the main findings and trends from recent articles on energy demand in the world. There are many methods of energy demand forecasting, spanning from conventional approaches like econometric and time series models [35,36,37,38] to contemporary soft computing techniques, including artificial intelligence methods and evolutionary algorithms [39,40,41,42,43,44]. A systematic literature review encompassing 419 articles on energy demand modeling, covering the period between 2015 and 2020, was conducted by Verwiebe et al. [45]. They analyzed the methodologies, prediction accuracy, input variables, energy sources, sectors, temporal scopes, and spatial resolutions employed in these models. They found that machine learning techniques were the most used, followed by engineering-based models, metaheuristic and uncertainty techniques, and statistical techniques. They also discussed the drawbacks and countermeasures of each technique. Another systematic literature review of energy demand forecasting methods published in 2005–2015 was conducted by Ghalehkhondabi et al. [46]. They focused on the methods that are used to predict energy consumption and compared their performance and applicability. They reported that neural networks were the most cited technique and had notable performance but also high computation time. They suggested that hybrid methods could be a promising field for future research.

2.2. Energy Demand Forecasting in Türkiye

A summary of studies for Türkiye’s energy demand forecasting is tabulated in Table 1. However, to the best of my knowledge, there is no research paper that employs ensemble machine learning methods and compares them with each other to forecast Türkiye’s energy demand.

3. Materials and Methods

In this section, the proposed methodology is introduced in detail. Ensemble methods refer to algorithms that combine multiple machine learning models into a unified framework. These methods have gained significant attention and recognition in the machine learning community due to their ability to enhance prediction accuracy and robustness [40,41]. By combining the predictions of multiple models, ensemble methods can mitigate the limitations of individual models and provide more accurate and reliable results. Several types of ensemble methods commonly used in machine learning are bagging, boosting, blending, Random Forest, and stacking [28,31,32,33]. This paper proposes and analyzes different ensemble combination models that can be achieved by using diverse base models, varying model architectures, or training on different subsets of the data.

3.1. ML Algorithms

In the context of forecasting Türkiye’s energy demand, a selection of 19 ML algorithms was automatically generated through the use of AutoML’s capabilities. The choice of these 19 ML algorithms was guided by the necessity to thoroughly explore, compare, and evaluate various modeling approaches for energy demand forecasting, all while considering the specific requirements and characteristics of the Türkiye dataset.

Given the complex and dynamic patterns inherent in energy demand data, the objective was to identify models that are robust and effective in capturing intricate patterns under varying conditions. AutoML streamlined this process, providing a systematic and efficient means to evaluate multiple algorithms without manual intervention. The resulting set of 19 ML algorithms encompasses a diverse range of machine learning techniques, including linear and non-linear models, tree-based methods, neural networks, and ensemble methods.

These algorithms are briefly described below.

Light Gradient Boosting Machine (LightGBM) [71]

LightGBM is a popular machine learning algorithm used for both regression and classification tasks. It is designed to efficiently handle large-scale datasets with high-dimensional features. LightGBM is known for its speed, accuracy, and ability to handle complex problems. LightGBM is based on the gradient boosting framework, similar to other boosting algorithms.

XGBoost [72]

XGBoost Regressor is a powerful machine learning algorithm used for regression tasks. XGBoost Regressor is known for its efficiency, accuracy, and ability to handle complex datasets. The algorithm minimizes a loss function by iteratively adding decision trees to the ensemble. Each tree is trained to predict the residuals (the differences between the actual and predicted values) of the previous ensemble. The process continues until a specified number of trees is reached or the desired level of performance is achieved.

Extra Trees Regression [73]

Extra Trees Regression is a machine learning algorithm used for regression tasks. It belongs to the ensemble learning family and is an extension of the popular Random Forest algorithm. Extra Trees Regression combines multiple decision trees to make predictions by aggregating their outputs. The algorithm builds a user-defined number of decision trees using random subsets of the training data and random subsets of features.

Passive Aggressive Regressor (PAR) [74]

PAR is a machine learning algorithm used for regression tasks. In PAR, the algorithm updates the regression model incrementally, making predictions on new instances as they arrive. It adapts to new data points by adjusting the model’s parameters without revisiting the entire training set. This property makes it suitable for handling large-scale datasets or scenarios where data arrives in a streaming fashion.

Elastic Net [75]

Elastic Net is a regression method that combines the Lasso and Ridge regression techniques. It is used for feature selection and regularization in linear models, providing a balance between the two methods. In Elastic Net, the algorithm aims to minimize the sum of squared residuals between the predicted and actual values, similar to ordinary least-squares (OLS) regression.

Least Angle Regression (LARS) [76]

LARS is a regression method used for feature selection and model building. LARS starts with an empty set of selected features and gradually adds features in a way that balances their correlations and coefficients. The algorithm continues this process until it reaches the desired number of selected features or the maximum number of available features.

Lasso Least Angle Regression [76]

Lasso Least Angle Regression is a regression method that combines the features of the Lasso regularization and the Least Angle Regression algorithm. It is used for feature selection and regularization in linear regression tasks. Lasso Least Angle Regression aims to estimate the coefficients of a linear regression model while simultaneously performing feature selection by encouraging sparsity in the solution.

Orthogonal Matching Pursuit (OMP) [77]

OMP is an algorithm used for sparse signal recovery and feature selection tasks. It aims to find the most relevant features or components of a signal by iteratively selecting and reconstructing the signal based on a small subset of measurements or features. OMP leverages the orthogonality property to efficiently select features and estimate the signal. At each iteration, the algorithm ensures that the selected features are orthogonal or nearly orthogonal to each other, which helps in accurate signal reconstruction and efficient convergence.

Random Forest Regressor [78]

Random Forest Regressor is a popular machine learning algorithm used for regression tasks. It belongs to the ensemble learning family and is built upon the concept of decision trees. In Random Forest Regressor, a user-defined number of decision trees are constructed. Each tree is built using a random subset of the training data and a random subset of features. The process of constructing each tree involves recursively splitting the data based on different features and their respective splitting points. The splitting is done in a way that minimizes the variance of the target variable within each resulting subset.

Gradient Boosting Regressor [32]

Gradient Boosting Regressor is a powerful machine learning algorithm used for regression tasks. It belongs to the boosting family of algorithms and is designed to create a strong learner by iteratively combining weak learners. Gradient Boosting Regressor works by minimizing a loss function through an additive approach, where each new model is built to correct the errors made by the previous models.

AdaBoost Regressor [79]

AdaBoost Regressor, short for Adaptive Boosting Regressor, is a machine learning algorithm used for regression tasks. The algorithm iteratively trains a series of weak regressors, each focusing on the instances that were wrongly predicted by the previous regressors, to improve the overall prediction accuracy. In AdaBoost Regressor, each weak regressor is trained on a subset of the training data. During training, the algorithm assigns weights to each instance, with initially equal weights for all instances.

Linear Regression [80]

Linear regression is a statistical method that models the relationship between a dependent variable (y) and one or more independent variables (x). It can be used to estimate how the dependent variable changes as the independent variables change, and to test hypotheses about the strength and direction of the relationship. There are different types of linear regression, such as simple linear regression, multiple linear regression, and multivariate linear regression.

Lasso Regression [81]

Lasso Regression (LASSO) is a method of regression analysis that performs both variable selection and regularization. It aims to improve the prediction accuracy and interpretability of the regression model by shrinking the coefficients of some predictor variables to zero and reducing the magnitude of others.

K-Neighbors Regressor [82]

K-Neighbors Regressor is a machine learning algorithm used for regression tasks. It is a non-parametric method that predicts the target value of an instance by considering the average or weighted average of the target values of its k nearest neighbors in the training data. In K-Neighbors Regressor, the algorithm identifies the k nearest neighbors of a given instance based on a distance metric, such as Euclidean distance. The target values of these neighbors are then used to calculate the predicted value for the instance.

Bayesian Ridge Regression [83]

Bayesian Ridge Regression is a regression method that incorporates Bayesian principles into the linear regression framework. In Bayesian Ridge Regression, the algorithm places a prior distribution on the regression coefficients, typically assuming a Gaussian distribution. This prior distribution represents the initial belief about the likely values of the coefficients before observing the data.

Decision Tree Regressor [84]

Decision Tree Regressor is a machine learning algorithm used for regression tasks. It is based on the concept of a decision tree, which partitions the input space into regions and predicts the target value based on the average or majority value of the training instances within each region. In Decision Tree Regressor, the algorithm recursively splits the data based on different features and their respective splitting points to create a tree-like structure.

Ridge Regression [85]

Ridge Regression is a linear regression method used for modeling and prediction tasks. It is an extension of ordinary least-squares (OLS) regression that introduces a regularization term to handle multicollinearity and prevent overfitting. In Ridge Regression, the algorithm seeks to minimize the sum of squared residuals between the predicted and actual values, similar to OLS regression. However, Ridge Regression adds a penalty term, known as the Ridge or L2 penalty, to the cost function.

Huber Regressor [86]

Huber Regressor is a robust regression method that combines the benefits of both the least-squares regression and robust regression techniques. Huber Regressor addresses these issues by introducing a hybrid loss function that behaves like least squares for small residuals and like a scaled absolute loss for large residuals.

Dummy Regressor

Dummy Regressor is a simple baseline model used for regression tasks. It provides a straightforward way to establish a baseline performance against which other regression models can be compared. Dummy Regressor makes predictions based on simple rules or heuristics rather than learning patterns from the data.

3.2. Structure of the Proposed Methods

This section presents the structure and abstract overview of the study, with a basic conceptual flow shown in Figure 1. The methodology consists of several key steps to improve the accuracy of the models. The first step entails data preparation, encompassing data preprocessing, normalization, and transformation to ensure the dataset is primed for analysis. Following this, the performance of 19 different ML algorithms has been assessed. This evaluation forms the basis for creating various ensemble combinations that leverage the strengths of individual models. In the third step, four different ensemble techniques like bagging, boosting, blending, and stacking models have been used to create powerful ensemble models that can capture complex patterns and relationships in the data. Finally, this study concludes with the fourth step, where it has carefully evaluated and compared these ensemble models, gaining insights into their strengths and weaknesses. This comprehensive evaluation process has informed my final prediction, guiding us towards data-driven decisions that hold the potential to advance the field of artificial intelligence and machine learning.

3.2.1. k-Fold Cross-Validation

The next step in the processing block involves selecting machine learning algorithms that exhibit superior performance and diverse learning capabilities when applied to the energy dataset. k-fold cross-validation is a technique used in machine learning algorithms to evaluate the performance of a model by dividing the available data into k equally sized subsets or “folds”. The process entails iteratively training the model on k − 1 folds and then evaluating it on the remaining fold. This cycle is repeated k times, with each fold being used as the test set exactly once. The final evaluation is obtained by averaging the performance results from each iteration. The value of k is usually chosen as k = 5 or k = 10. A 5-fold cross-validation was used to obtain groups of performance measures in this study.

3.2.2. Model Hyperparameters Tuning

Hyperparameter tuning is the process of finding the optimal values for the hyperparameters of a machine learning algorithm. Hyperparameters are parameters that are set before the learning process begins and determine how the algorithm learns and generalizes from the training data [87].

In this study, hyperparameter tuning was employed to improve model performance and prevent overfitting before proceeding to the next stage of the framework. The commonly used approaches for hyperparameter tuning are Grid Search, Random Search, and Bayesian Optimization. The choice of the technique ultimately depends on the specific problem, available computational resources, and the characteristics of the hyperparameter search space. In this work, the Grid Search approach has been used.

3.2.3. Performance Metrics

The performance of ML algorithms in the energy demand problem was estimated using powerful validation techniques. Five validation methods, Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and R-squared (R²), were employed to evaluate the models [10].

The standard predictive performance metrics are represented by Equations (1)–(5):

M S E = \sum_{i = 1}^{n} \frac{{({\overset{˘}{y}}_{i} - O_{i})}^{2}}{n}

(1)

R M S E = \sqrt{\sum_{i = 1}^{n} \frac{{({\overset{˘}{y}}_{i} - O_{i})}^{2}}{n}}

(2)

M A E = \frac{\sum_{i = 1}^{n} |{\overset{˘}{y}}_{i} - O_{i}|}{n}

(3)

M A P E = \frac{1}{m} \sum_{i = 1}^{m} |\frac{{\overset{˘}{y}}_{i} - O_{i}}{{\overset{˘}{y}}_{i}}|

(4)

R^{2} = 1 - \frac{\sum_{i} {({\overset{˘}{y}}_{i} - O_{i})}^{2}}{\sum_{i} (O_{i} - {\bar{O}}_{i})}

(5)

Here,

({\bar{O}}_{i})

represents the magnitude of the actual values,

({\overset{˘}{y}}_{i})

is the model’s predicted value, (O_i) stands for the real data, and (n) indicates the number of observed data points.

3.3. Data Collection

The dataset used in this paper includes independent variables such as population (in millions), gross domestic product (GDP), import, and export, which were selected based on a comprehensive literature review. This dataset spans the years 1979–2021 and was sourced from various government agencies, including the Turkish Statistical Institute [88], the Turkish Ministry of Energy and Natural Resources (MENR) [89], the World Bank [1] and European Commission [2]. Additionally, the energy consumption data (measured in million tons of oil equivalents, MTOE) was obtained from the MENR. The details of the variables are given in Table 2.

The predictors mentioned above have been commonly utilized in numerous energy forecasting studies, as seen in Table 1. Considering the data collection period from 1979 to 2021, the population grew from 43.19 million to 84.78 million, while GDP increased from USD 82 billion to USD 819.04 billion, indicating a roughly 2 times and 10 times increase, respectively, by 2021. Import and export volumes also saw significant growth, rising from 5.07 and 2.26 to 271.42 and 225.29, respectively, marking approximately a 55 times and 100 times increases by 2021. Furthermore, the demand for transportation energy surged nearly fivefold, from 26.37 Mtoe in 1979 to 123.86 Mtoe in 2021. Detailed historical data for these parameters from 1979 to 2021 can be found in Table 3.

The dataset used for the predicting of energy is divided into training and test subsets, comprising approximately 75% and 25% of the total observations, respectively. The training set consists of 32 observations, while the test set has 11 samples.

4. Results and Discussion

4.1. Implementation Setup

A detailed overview of the implementation setup is given in this section. Python, a popular and general-purpose programming language that allows users to work quickly and integrate systems more effectively, was used. Python has become a popular choice for data science and ML. Its high-level, specially developed ML libraries allow users to quickly start building models and experiment with different configurations. PyCaret, a Python-based open-source machine learning library, provides automated machine learning capabilities. Its default behavior involves automating several steps of the ML process, including data preprocessing, feature engineering, and model selection. In this study, PyCaret was utilized to automate machine learning workflows, streamlining the process and enhancing efficiency. PyCaret’s robust automation tools played a significant role in quickly initiating the construction of the models mentioned in Section 3.1 and experimenting with various setups after the dataset was provided. This exhaustive approach enabled us to conduct a comprehensive evaluation of various models across different categories (linear, non-linear, tree-based, etc.) to identify algorithms most suitable for capturing potential hidden patterns that might be missed by a smaller set of models. Python version 3.11.0 and PyCaret version 3.0.4 were used, which were the latest versions as of October 2022 and July 2023, respectively. All codes were implemented in Google Colab, a cloud-based platform that provides free access to GPUs and TPUs for running machine learning experiments. All experiments were conducted on a system equipped with an Intel i7 3.40 GHz processor and 8 GB of memory.

4.2. Feature Selection

In machine learning, a correlation matrix is a table that shows how different features in a dataset are related to each other and how they affect the outcome of a model. Figure 2 presents the correlation matrix of the dataset, which includes the target variable as one of the features.

The values within the matrix describe both the intensity and direction of the correlation between pairs of features. Each element represents the correlation between two specific features. In Figure 2, the maximum correlation value is 1, while the minimum is 0.92, observed between the ‘import-population’ and ‘population-GDP’ features. A positive correlation between two features implies that, as one property’s value increases, the other feature’s value also tends to increase. It is worth noting that all features exhibit correlations with each other. Additionally, population exhibits the strongest correlation with the target variable ‘Energy’, while GDP shows the weakest correlation.

The selection and combination of features is important in machine learning because it can affect the performance and complexity of the model. Different features may have different levels of relevance, redundancy, and noise for a given problem and a given algorithm. By selecting and combining the most appropriate features, the dimensionality of the data, the computational cost, and the risk of overfitting can reduce. All combinations of predictor variables (i.e., population, GDP, import, and export) are outlined within Table 4. For instance, Model 1 (M₁) comprises two independent variables, i.e., GDP and Population, while Model 7 (M₇) comprises GDP, Population, and Import. The ML performance results with 5-fold cross-validation on the training set, using the created models that include different combinations of features, are given in Table 4. To test the models, 19 ML algorithms were run, and the best three results, sorted by the highest R-square, are provided in Table 5. The Extra Tree Regressor performed well based on the provided metrics using Model 11 (M₁₁). The lowest MAE (2296.86), MSE (8,756,864.57), and RMSE (2932.96) values indicate that the model is making accurate predictions. Additionally, the highest R-squared value of 0.9788 suggests a good fit to the data. The low MAPE of 0.0464 indicates that the model’s percentage errors are relatively small.

During the model-building process, all possible combinations were explored, and finally, the configuration with four inputs, displaying the highest R-squared and lowest error terms, was chosen for application in the next part of the study.

4.3. Performance Evaluation

Firstly, the performance of 19 ML individual algorithms were compared using five different metrics with all features (GDP, Population, Import, Export). Table 6 presents the performance results achieved by training 19 ML algorithms using 5-fold cross-validation. The first column in Table 6 lists the base ML algorithms. The subsequent columns, numbered second through to sixth, display the best values for various training-phase metrics, including MAE, MSE, RMSE, R2, and MAPE. The ML algorithms’ results are organized in descending order of R-squared values, from the highest to the lowest. The Extra Tree Regressor yields the best results among the others during the training phase, as demonstrated in Table 6. The prediction performance of the selected Extra Tree model in the test set is presented in Table 7. The R-squared value in Table 7 is slightly higher than the value from the training set and indicates the absence of overfitting.

The hyperparameters were tuned via grid searches because it is a critical step in the machine learning model development process. When ML performance degraded, this step was skipped, and the model was applied to the successive stages without tuning the hyperparameters. After this stage, ensemble methods were suggested and applied for predicting Türkiye’s energy demand. The prediction performance of the ensemble methods (bagging, boosting, blending, and stacking) in both training and test sets are shown in Table 8. Compared to the preliminary results of data from the training of 19 ML algorithms shown in Table 6, the mean R-squared values are as follows: 0.9801 with bagging, 0.9809 with boosting, 0.9874 with blending, and 0.9882 with stacking methods. Among these, the stacking ensemble model yielded the highest R-squared value, indicating its superior performance. Additionally, when considering other evaluation metrics such as MAE, MSE, RMSE, and MAPE, the stacking ensemble model consistently outperforms the others, further confirming its superiority in predictive accuracy.

Bagging and boosting techniques were used to improve the accuracy and robustness of the individual machine learning model. The Extra Tree Regressor (ET) algorithm was trained on different subsets (which were created through a process called bootstrapping) of the training data by the bagging ensemble method. In boosting, the focus is on correcting the errors made by previous models. The base model ET was trained until a certain level of accuracy was achieved.

In the blending approach, I harnessed the predictive power of three distinct machine learning algorithms: Extra Trees Regressor (ET), Random Forest Regressor (RF), and Ridge Regression (Ridge). To execute blending, each of these algorithms was initially trained separately on a portion of thetraining dataset, generating individual predictions for the target variable. Subsequently, I combined the predictions from ET, RF, and Ridge using a straightforward averaging technique. By averaging these predictions, I effectively created an ensemble prediction that capitalizes on the strengths of each algorithm.

Unlike traditional ensemble methods like bagging and boosting, stacking takes a more sophisticated approach by using the predictions of base models as input features to train a higher-level model that makes the final predictions. The 14 ML algorithms with R-squared values higher than 0.90 in Table 6 were separately combined to select a set of diverse base models. After trying many combined base models and conducting trial-and-error experiments, I leveraged the capabilities of three diverse base machine learning algorithms: ET, RF, and Ridge. To implement stacking, each of these base models was initially trained separately on a portion of the training dataset, obtaining individual predictions for the target variable. Next, a new dataset was created where each data point consisted of these base model predictions. This dataset served as the input for the meta-learner. Based on the empirical investigation, it was determined that employing a linear regression algorithm as the meta-learner for the second level of the stacking regressor was optimal, as it consistently demonstrated superior performance in terms of R-squared compared to alternative machine learning algorithms. Ridge Regression, as the meta-learner on the second level of the stacking ensemble, was selected and trained to learn how to best combine the predictions from ET, RF, and Ridge. To prevent overfitting during the training phase, 5-fold cross-validation is employed. The utilization of the Ridge model within the stacking ensemble model demonstrated notable advantages within the context of energy-related problems. Ridge Regression, known for its ability to mitigate multicollinearity and overfitting, proved effective in enhancing the robustness of my model when dealing with energy demand forecasting.

Stacking’s flexibility in utilizing both weak and strong learners makes it a powerful technique for enhancing predictive performance in various machine learning tasks. In practice, researchers often employ a mixture of weak and strong learners to construct a versatile ensemble that performs effectively across diverse datasets and problem domains. The choice of whether the base models are weak or strong is flexible and depends on the problem and the effectiveness of the ensemble. In recent years, it can also be seen that researchers have started to utilize AutoML approaches, which automatically select the best-performing models for ensembles [90,91,92]. In this study, different combinations of both weak and strong base models to create a diverse ensemble were experimented with. The aim was to enhance interpretability and transparency by the manual creation of ensemble models. It was observed that the combinations composed of strong base learners consistently delivered superior results in forecasting Türkiye’s energy demand.

The results also show that the stacking ensemble model yielded the best accuracy rates when applied to a small dataset. The robustness of the evaluation is emphasized through metrics such as R-squared, which reached an impressive accuracy rate of 0.9882 with the stacking model. This rigorous evaluation process provides confidence in the reliability of the results despite the dataset’s size. This aligns with Dietterich’s [28] assertion that, when the available data is limited, ensemble learning can assist in finding a good approximation and enhancing prediction accuracy by averaging the outputs of individual models.

In order to evaluate the efficacy of the developed ensemble methods, the prediction performance on the test set is presented in Table 8. Notably, the stacking model achieved a remarkable R-squared value of 0.9826. When compared with the findings in Table 7 and other ensemble models, the stacking model’s metrics consistently reveal a significant enhancement. These observations lead us to assert, in accordance with the scientific paper, that my proposed stacking ensemble model does not exhibit signs of either overfitting or underfitting. The utilization of features (Population, GDP, Import, Export) enhances the model’s accessibility and interpretability, facilitating its utility for generating accurate and reliable forecasts of Türkiye’s energy demand, as detailed in the paper.

Table 9 provides detailed descriptions of the model’s predicted outcomes in the ‘Prediction’ column, alongside the corresponding ground truth values for ‘Energy’. It presents the prediction performance of the stacking ensemble model for each of the five folds, utilizing all dataset features and evaluating it using five different metrics.

Three scenarios have been used to predict Türkiye’s energy demand between 2024, 2025, and 2030:

Scenario 1: It is assumed that the average growth rate of GDP is 4%, population growth rate is 0.5%, and the average imports and exports growth rate is 2%.

Scenario 2: It is assumed that the average growth rate of GDP is 5%, population growth rate is 0.6%, the average import growth is 3.5%, and export growth rate is 2%.

Scenario 3: It is assumed that the average growth rate of GDP is 6%, population growth rate is 1.5%, and the average imports and exports growth rate is 5%.

The comparison of the results from these three different scenarios is illustrated in Figure 3. Considering economic advancements and the rising number of electric vehicles, all scenarios indicate higher values than in previous years. Scenario 1 estimates a lower energy consumption compared to the other scenarios, while Scenario 3 predicts a higher energy consumption. Ultimately, the three scenarios demonstrate that Türkiye’s predicted energy consumption in 2030 would range between 144.56, 147.25, and 154.93 Mtoe.

5. Conclusions

This paper has presented a comprehensive methodology for applying ensemble techniques and machine learning algorithms to the crucial task of forecasting Türkiye’s energy demand. The primary objectives of this methodology were threefold: Firstly, to enhance the accuracy of energy demand predictions in Türkiye. Secondly, to provide authorities and institutions with an interpretable model that facilitates informed decision-making and policy development. Lastly, this study aligns closely with the United Nations’ SDGs, contributing to the broader aims of sustainable development by addressing global challenges.

Accordingly, the following key findings can be derived based on the current research.

The GDP, population, import, export, and energy data taken between 1979 and 2021 were used and it is observed that there is a strong correlation among them.
Five statistical metrics are discussed to evaluate the performance of the algorithms in the forecast.
A total of 19 machine learning algorithms were constructed and analyzed to select models for diverse ensemble combinations.
Considering all metrics collectively, the stacking ensemble model utilizing Ridge Regressor as a meta-learner outperforms single ML algorithms as well as other bagging, boosting, and blending models.
The predicted values reveal that the stacking ensemble model has delivered highly satisfactory outcomes in comparison to the actual energy demand outputs.
These ensemble models can readily be adapted and recommended for future energy demand forecasts in other countries. Notably, the stacking ensemble model demonstrates statistically superior results compared to other models, making it a more suitable choice for accurate forecasting.

It is anticipated that the outcomes of this study will make a significant contribution to the field of energy forecasting, laying the groundwork for Türkiye’s sustainable energy future. Furthermore, this research represents a meaningful step toward a more equitable, prosperous, and sustainable world for all. As future research, further improvements can be explored through the use of different hybrid techniques for optimizing hyperparameter tuning, feature selection, and more.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The author declares no conflict of interest.

References

World Bank Data World Development Indicators. Available online: https://www.worldbank.org/en/country/turkey/overview (accessed on 15 June 2023).
European Commission Eurostat. EU Energy and Climate Reports. 2023. Available online: https://commission.europa.eu/ (accessed on 4 May 2023).
Republic of Türkiye, Ministry of Foreign Affairs. Türkiye’s International Energy Strategy. Available online: https://www.mfa.gov.tr/turkeys-energy-strategy.en.mfa (accessed on 18 June 2023).
United Nations. Available online: https://sdgs.un.org/goals (accessed on 9 April 2023).
Kong, K.G.H.; How, B.S.; Teng, S.Y.; Leong, W.D.; Foo, D.C.; Tan, R.R.; Sunarso, J. Towards data-driven process integration for renewable energy planning. Curr. Opin. Chem. Eng. 2021, 31, 100665. [Google Scholar] [CrossRef]
Singh, S.; Bansal, P.; Hosen, M.; Bansal, S.K. Forecasting annual natural gas consumption in USA: Application of machine learning techniques-ANN and SVM. Resour. Policy 2023, 80, 103159. [Google Scholar] [CrossRef]
Sözen, A. Future projection of the energy dependency of Turkey using artificial neural network. Energy Policy 2009, 37, 4827–4833. [Google Scholar] [CrossRef]
Panklib, K.; Prakasvudhisarn, C.; Khummongkol, D. Electricity consumption forecasting in Thailand using an artificial neural network and multiple linear regression. Energy Sources Part B Econ. Plan. Policy 2005, 10, 427–434. [Google Scholar] [CrossRef]
Murat, Y.S.; Ceylan, H. Use of artificial neural networks for transport energy demand modeling. Energy Policy 2006, 34, 3165–3172. [Google Scholar] [CrossRef]
Sahraei, M.A.; Çodur, M.K. Prediction of transportation energy demand by novel hybrid meta-heuristic ANN. Energy 2022, 249, 123735. [Google Scholar] [CrossRef]
Çodur, M.Y.; Ünal, A. An estimation of transport energy demand in Turkey via artificial neural networks. Promet-Traffic Transp. 2019, 31, 151–161. [Google Scholar] [CrossRef]
Ferrero Bermejo, J.; Gómez Fernández, J.F.; Olivencia Polo, F.; Crespo Márquez, A. A review of the use of artificial neural network models for energy and reliability prediction. A study of the solar PV, hydraulic and wind energy sources. Appl. Sci. 2019, 9, 1844. [Google Scholar] [CrossRef]
Kaya, T.; Kahraman, C. Multicriteria decision making in energy planning using a modified fuzzy TOPSIS methodology. Expert Syst. Appl. 2011, 38, 6577–6585. [Google Scholar] [CrossRef]
Azadeh, A.; Asadzadeh, S.M.; Ghanbari, A. An adaptive network-based fuzzy inference system for short-term natural gas demand estimation: Uncertain and complex environments. Energy Policy 2010, 38, 1529–1536. [Google Scholar] [CrossRef]
Guevara, E.; Babonneau, F.; Homem-de-Mello, T.; Moret, S. A machine learning and distributionally robust optimization framework for strategic energy planning under uncertainty. Appl. Energy 2020, 271, 115005. [Google Scholar] [CrossRef]
Erdogdu, E. Electricity demand analysis using cointegration and ARIMA modelling: A case study of Turkey. Energy Policy 2007, 35, 1129–1146. [Google Scholar] [CrossRef]
Ünler, A. Improvement of energy demand forecasts using swarm intelligence: The case of Turkey with projections to 2025. Energy Policy 2008, 36, 1937–1944. [Google Scholar] [CrossRef]
Hamzaçebi, C. Forecasting of Turkey’s net electricity energy consumption on sectoral bases. Energy Policy 2007, 35, 2009–2016. [Google Scholar] [CrossRef]
Kavaklioglu, K. Modeling and prediction of Turkey’s electricity consumption using Support Vector Regression. Appl. Energy 2011, 88, 368–375. [Google Scholar] [CrossRef]
Hotunoglu, H.; Karakaya, E. Forecasting Turkey’s Energy Demand Using Artificial Neural Networks: Three Scenario Applications. Ege Acad. Rev. 2011, 11, 87–94. [Google Scholar]
Utgikar, V.P.; Scott, J.P. Energy forecasting: Predictions, reality and analysis of causes of error. Energy Policy 2006, 34, 3087–3092. [Google Scholar] [CrossRef]
El-Telbany, M.; El-Karmi, F. Short-term forecasting of Jordanian electricity demand using particle swarm optimization. Electr. Power Syst. Res. 2008, 78, 425–433. [Google Scholar] [CrossRef]
Zhang, Z.; Wu, C.; Qu, S.; Chen, X. An explainable artificial intelligence approach for financial distress prediction. Inf. Process. Manag. 2022, 59, 102988. [Google Scholar] [CrossRef]
Hewage, P.; Trovati, M.; Pereira, E.; Behera, A. Deep learning-based effective fine-grained weather forecasting model. Pattern Anal. Appl. 2021, 24, 343–366. [Google Scholar] [CrossRef]
Suganthi, L.; Samuel, A.A. Energy models for demand forecasting—A review. Renew. Sustain. Energy Rev. 2012, 16, 1223–1240. [Google Scholar] [CrossRef]
Bao, Y.; Hilary, G.; Ke, B. Artificial intelligence and fraud detection. Innov. Technol. Interface Financ. Oper. 2022, 1, 223–247. [Google Scholar]
Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ. -Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
Dietterich, T.G. Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar]
Hategan, S.M.; Stefu, N.; Paulescu, M. An Ensemble Approach for Intra-Hour Forecasting of Solar Resource. Energies 2023, 16, 6608. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Wolpert, D.H. Stacked Generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
World Energy Outlook IEA. International Energy Agency, Paris. 2022. Available online: https://www.iea.org/data-and-statistics/data-product/world-energybalances#energy-balances (accessed on 4 April 2022).
Zhang, M.; Mu, H.; Li, G.; Ning, Y. Forecasting the transport energy demand based on PLSR method in China. Energy 2009, 34, 1396–1400. [Google Scholar] [CrossRef]
Kumar, U.; Jain, V.K. Time series models (Grey-Markov, Grey Model with rolling mechanism and singular spectrum analysis) to forecast energy consumption in India. Energy 2010, 35, 1709–1716. [Google Scholar] [CrossRef]
Chaturvedi, S.; Rajasekar, E.; Natarajan, S.; McCullen, N.A. Comparative assessment of SARIMA, LSTM RNN and Fb Prophet models to forecast total and peak monthly energy demand for India. Energy Policy 2022, 168, 113097. [Google Scholar] [CrossRef]
Sahraei, M.A.; Duman, H.; Çodur, M.Y.; Eyduran, E. Prediction of transportation energy demand: Multivariate adaptive regression splines. Energy 2021, 224, 120090. [Google Scholar] [CrossRef]
Javanmard, M.E.; Ghaderi, S.F. Energy demand forecasting in seven sectors by an optimization model based on machine learning algorithms. Sustain. Cities Soc. 2023, 95, 104623. [Google Scholar] [CrossRef]
Ye, J.; Dang, Y.; Ding, S.; Yang, Y. A novel energy consumption forecasting model combining an optimized DGM (1, 1) model with interval grey numbers. J. Clean. Prod. 2019, 229, 256–267. [Google Scholar] [CrossRef]
Zhang, P.; Wang, H. Fuzzy Wavelet Neural Networks for City Electric Energy Consumption Forecasting. Energy Procedia 2012, 17, 1332–1338. [Google Scholar] [CrossRef]
Mason, K.; Duggan, J.; Howley, E. Forecasting energy demand, wind generation and carbon dioxide emissions in Ireland using evolutionary neural networks. Energy 2018, 155, 705–720. [Google Scholar] [CrossRef]
Muralitharan, K.; Sakthivel, R.; Vishnuvarthan, R. Neural network based optimization approach for energy demand prediction in smart grid. Neurocomputing 2018, 273, 199–208. [Google Scholar] [CrossRef]
Yu, S.; Zhu, K.; Zhang, X. Energy demand projection of China using a path-coefficient analysis and PSO–GA approach. Energy Convers. Manag. 2012, 53, 142–153. [Google Scholar] [CrossRef]
Verwiebe, P.A.; Seim, S.; Burges, S.; Schulz, L.; Müller-Kirchenbauer, J. Modeling Energy Demand—A Systematic Literature Review. Energies 2021, 14, 7859. [Google Scholar] [CrossRef]
Ghalehkhondabi, I.; Ardjmand, E.; Weckman, G.R.; Young, W.A. An overview of energy demand forecasting methods published in 2005–2015. Energy Syst. 2017, 8, 411–447. [Google Scholar] [CrossRef]
Aslan, M. Archimedes optimization algorithm based approaches for solving energy demand estimation problem: A case study of Turkey. Neural Comput. Appl. 2023, 35, 19627–19649. [Google Scholar] [CrossRef]
Korkmaz, E. Energy demand estimation in Turkey according to modes of transportation: Bezier search differential evolution and black widow optimization algorithms-based model development and application. Neural Comput. Appl. 2023, 35, 7125–7146. [Google Scholar] [CrossRef]
Aslan, M.; Beşkirli, M. Realization of Turkey’s energy demand forecast with the improved arithmetic optimization algorithm. Energy Rep. 2022, 8, 18–32. [Google Scholar] [CrossRef]
Ağbulut, Ü. Forecasting of transportation-related energy demand and CO₂ emissions in Turkey with different machine learning algorithms. Sustain. Prod. Consum. 2022, 29, 141–157. [Google Scholar] [CrossRef]
Özdemir, D.; Dörterler, S.; Aydın, D. A new modified artificial bee colony algorithm for energy demand forecasting problem. Neural Comput. Appl. 2022, 34, 17455–17471. [Google Scholar] [CrossRef]
Özkış, A. A new model based on vortex search algorithm for estimating energy demand of Turkey. Pamukkale Univ. J. Eng. Sci. 2020, 26, 959–965. [Google Scholar] [CrossRef]
Tefek, M.F.; Uğuz, H.; Güçyetmez, M. A new hybrid gravitational search–teaching–learning-based optimization method for energy demand estimation of Turkey. Neural Comput. Appl. 2019, 31, 2939–2954. [Google Scholar] [CrossRef]
Beskirli, A.; Beskirli, M.; Hakli, H.; Uguz, H. Comparing energy demand estimation using artificial algae algorithm: The case of Turkey. J. Clean Energy Technol. 2018, 6, 349–352. [Google Scholar] [CrossRef]
Cayir Ervural, B.; Ervural, B. Improvement of grey prediction models and their usage for energy demand forecasting. J. Intell. Fuzzy Syst. 2018, 34, 2679–2688. [Google Scholar] [CrossRef]
Koç, İ.; Nureddin, R.; Kahramanlı, H. Implementation of GSA (Gravitation Search Algorithm) and IWO (Invasive Weed Optimization) for the Prediction of the Energy Demand in Turkey Using Linear Form. Selcuk. Univ. J. Eng. Sci. Technol. 2018, 6, 529–543. [Google Scholar]
Özturk, S.; Özturk, F. Forecasting energy consumption of Turkey by Arima model. J. Asian Sci. Res. 2018, 8, 52. [Google Scholar] [CrossRef]
Beskirli, M.; Hakli, H.; Kodaz, H. The energy demand estimation for Turkey using differential evolution algorithm. Sādhanā 2017, 42, 1705–1715. [Google Scholar] [CrossRef]
Daş, G.S. Forecasting the energy demand of Turkey with a NN based on an improved Particle Swarm Optimization. Neural Comput. Appl. 2017, 28, 539–549. [Google Scholar] [CrossRef]
Kankal, M.; Uzlu, E. Neural network approach with teaching–learning-based optimization for modeling and forecasting long-term electric energy demand in Turkey. Neural Comput. Appl. 2017, 28, 737–747. [Google Scholar] [CrossRef]
Uguz, H.; Hakli, H.; Baykan, Ö.K. A new algorithm based on artificial bee colony algorithm for energy demand forecasting in Turkey. In Proceedings of the 2015 4th International Conference on Advanced Computer Science Applications and Technologies (ACSAT), Kuala Lumpur, Malaysia, 8–10 December 2015; pp. 56–61. [Google Scholar]
Tutun, S.; Chou, C.A.; Canıyılmaz, E. A new forecasting for volatile behavior in net electricity consumption: A case study in Turkey. Energy 2015, 93, 2406–2422. [Google Scholar] [CrossRef]
Kıran, M.S.; Özceylan, E.; Gündüz, M.; Paksoy, T. Swarm intelligence approaches to estimate electricity energy demand in Turkey. Knowl. Based Syst. 2012, 36, 93–103. [Google Scholar] [CrossRef]
Kankal, M.; Akpınar, A.; Kömürcü, M.İ.; Özşahin, T.Ş. Modeling and forecasting of Turkey’s energy consumption using socio-economic and demographic variables. Appl. Energy 2011, 88, 1927–1939. [Google Scholar] [CrossRef]
Ediger, V.S.; Akar, S. ARIMA forecasting of primary energy demand by fuel in Turkey. Energy Policy 2007, 35, 1701–1708. [Google Scholar] [CrossRef]
Toksari, M.D. Ant colony optimization approach to estimate energy demand of Turkey. Energy Policy 2007, 35, 3984–3990. [Google Scholar] [CrossRef]
Sözen, A.; Arcaklioğlu, E.; Özkaymak, M. Turkey’s net energy consumption. Appl. Energy 2005, 81, 209–221. [Google Scholar] [CrossRef]
Canyurt, O.E.; Ceylan, H.; Ozturk, H.K.; Hepbasli, A. Energy demand estimation based on two-different genetic algorithm approaches. Energy Sources 2004, 26, 1313–1320. [Google Scholar] [CrossRef]
Ceylan, H.; Ozturk, H.K. Estimating energy demand of Turkey based on economic indicators using genetic algorithm approach. Energy Convers. Manag. 2004, 45, 2525–2537. [Google Scholar] [CrossRef]
Ceylan, H.; Ozturk, H.K.; Hepbasli, A.; Utlu, Z. Estimating energy and exergy production and consumption values using three different genetic algorithm approaches, part 2: Application and scenarios. Energy Sources 2005, 27, 629–639. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 3149–3157. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
Crammer, K.; Dekel, O.; Keshet, J.; Shalev-Shwartz, S.; Singer, Y. Online Passive-Aggressive Algorithms. J. Mach. Learn. Res. 2006, 7, 551–585. [Google Scholar]
Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least Angle Regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef]
Elad, M.; Bruckstein, A. A generalized uncertainty principle and sparse representation in pairs of bases. IEEE Trans. Inf. Theory 2002, 48, 2558–2567. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the Machine Learning: Proceedings of the Thirteenth International Conference (ICML), Bari, Italy, 3–6 July 1996; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1996; Volume 96, pp. 148–156. [Google Scholar]
Neter, J.; Kutner, M.H.; Nachtsheim, C.J.; Wasserman, W. Applied Linear Statistical Models, 4th ed.; Irwin: Huntersville, NC, USA, 1996. [Google Scholar]
Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 711–720. [Google Scholar] [CrossRef]
MacKay, D.J. Bayesian Interpolation. Neural Comput. 1992, 4, 415–447. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Chapman & Hall/CRC: Boca Raton, FL, USA, 1984. [Google Scholar]
Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Huber, P.J. Robust Estimation of a Location Parameter. Ann. Math. Stat. 1964, 35, 492–518. [Google Scholar] [CrossRef]
Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
Turkish Statistical Institute (Turkstat). Statistical Tables, Ankara, Türkiye. Available online: https://www.tuik.gov.tr/ (accessed on 8 June 2023).
Turkish Ministry of Energy and Natural Resources (MENR). Ankara, Türkiye. Available online: https://enerji.gov.tr/eigm-raporlari (accessed on 8 June 2023).
Krzywanski, J. Advanced AI Applications in Energy and Environmental Engineering Systems. Energies 2022, 15, 5621. [Google Scholar] [CrossRef]
Huybrechts, T.; Reiter, P.; Mercelis, S.; Famaey, J.; Latré, S.; Hellinckx, P. Automated Testbench for Hybrid Machine Learning-Based Worst-Case Energy Consumption Analysis on Batteryless IoT Devices. Energies 2021, 14, 3914. [Google Scholar] [CrossRef]
Toquica, D.; Agbossou, K.; Malhamé, R.; Henao, N.; Kelouwani, S.; Cardenas, A. Adaptive Machine Learning for Automated Modeling of Residential Prosumer Agents. Energies 2020, 13, 2250. [Google Scholar] [CrossRef]

Figure 1. A basic conceptual flow of this study.

Figure 2. Correlation matrix of all variables including the target variable.

Figure 3. Estimation of total energy demand according to Scenarios 1–3.

Table 1. A summary of the literature on Türkiye’s energy demand.

Author(s)	Year	Method Used	Dataset	Input Parameters	Performance Metric	Forecasting for
Aslan [47]	2023	Archimedes Optimization Algorithm	1979–2005 1979–2011	GDP, Population, Import, Export	The Amount of Error, Relative Error (%)	Energy
Korkmaz [48]	2022	Bezier Search Differential Evolution Black Widow Optimization (BWO)	2000–2017	Passenger-km, Freight-km, Carbon dioxide emissions, GDP, Infrastructure Investment	AE, APE, Std_AE, Std_APE, R², Adj R², MAE, MAPE, and RMSE	Transportation Energy
Aslan and Beşkirli [49]	2022	Improved Arithmetic Optimization Algorithm	1979–2011	GDP, Population, Import, Export	The Amount of Error, Relative Error (%)	Energy
Ağbulut [50]	2022	Deep Learning (DL) Support Vector Machine (SVM) Artificial Neural Network (ANN)	1970–2016	GDP, Population, Vehicle-km, Year	R², RMSE, MAPE, MBE, rRMSE, and MABE	Transportation Energy
Özdemir et al. [51]	2022	Modified Artificial Bee Colony Algorithm	1979–2005	GDP, Population, Import, Export	AE, APE, Std_AE, Std_APE, R², MAE, MAPE, and RMSE	Energy
Özkış [52]	2020	Vortex Search Algorithm (VS)	1979–2005 1979–2011	GDP, Population, Import, Export	The Amount of Error	Energy
Tefek et al. [53]	2019	Hybrid Gravitational Search, Teaching, Learning-Based Optimization Method	1980–2014	Population, Gross Generation, Net Consumption, GDP, Installed Power	R², RMSE, MAPE	Energy
Beskirli et al. [54]	2018	Artificial Algae Algorithm (AAA)	1979–2005	GDP, Population, Import, Export	The Amount of Error, Relative Error (%)	Energy
Cayir Ervural and Ervural [55]	2018	Grey Prediction Model Based on GA Grey Prediction Model Based on PSO	1996–2016	Previous Annual Electricity Consumption Data	RMSE, MAPE	Electricity Energy consumption
Koç et al. [56]	2018	Gravity Search Algorithm (GSA), Invasive Weed Optimization Algorithm (IWO)	1979–2011	GDP, Population, Import, Export	The Amount of Error, Relative Error (%)	Energy
Öztürk and Öztürk [57]	2018	ARIMA	1970–2015	Previous Energy Consumption Data	AIC	Energy
Beskirli et al. [58]	2017	Differential Evolution Algorithm (DE)	1979–2011	GDP, Population, Import, Export	Mean Absolute Relative Error, Relative Error (%), Magnitude of Error	Energy
Daş [59]	2017	Neural Network Based on Particle Swarm Optimization	1979–2005	GDP, Population, Import, Export	Absolute Relative Error, Relative Error (%), R², RMSE, MAPE, and MAD	Energy
Kankal and Uzlu [60]	2017	ANN	1980–2012	GDP, Population, Import, Export	Average Relative Error, RMSE, and MAE	Electricity Energy
Uguz et al. [61]	2015	Artificial Bee Colony with Variable Search Strategies (ABCVSS)	1979–2005	GDP, Population, Import, Export	The Amount of Error, Relative Error (%)	Energy
Tutun et al. [62]	2015	Regression and ANN	1975–2010	Import, Export, Gross generation, Transmitted energy	R², RMSE, MAPE, MSE, MA, and SSE	Electricity Energy consumption
Kıran et al. [63]	2012	Hybrid Meta-Heuristic (Particle Swarm Optimization, Ant Colony Optimization)	1979–2005	GDP, Population, Import, Export	Relative Error (%), R²	Electricity Energy consumption
Kankal et al. [64]	2011	Regression Analysis/ANN	1980–2007	GDP, Population, Import, Export, Employment	Relative Error (%), R² and RMSE	Energy
Ünler [17]	2008	Particle Swarm Optimization	1979–2005	GDP, Population, Import, Export	The Amount of Error, Relative Error (%)	Energy
Ediger and Akar [65]	2007	Autoregressive Integrated Moving Average (ARIMA) and seasonal ARIMA (SARIMA)	1950–2005	Previous Energy Consumption Data	MSE and MAED	Energy
Toksarı [66]	2007	Ant Colony Optimization	1970–2005	Population, GDP, Import, Export	R²	Energy
Sözen et al. [67]	2005	ANN	1975–2003	Population, Gross Generation, Installed Capacity, Import, Export	R², RMSE, and MAPE	Energy
Canyurt et al. [68]	2004	Genetic Algorithm	1970–2001	GDP, Population, Import, Export	Relative Error (%)	Energy
Ceylan and Öztürk [69]	2004	Genetic Algorithm	1970–2001	GDP, Population, Import, Export	Relative Error (%), MSE, and R²	Energy
Ceylan et al. [70]	2004	Genetic Algorithm	1990–2001	GDP, Population, Import, Export	Average Relative Error	Energy and exergy production and consumption

Table 2. Input parameter-use rationales.

Variable	The Influencing Factors for Using This Variable
GDP	There exists a strong correlation between GDP and energy consumption, as the level of economic activity directly impacts the demand for energy. When the GDP of a country increases, it generally indicates a growth in industrial and commercial activities, leading to higher energy consumption. Considering the substantial impact of GDP on energy demand, GDP is often chosen as an independent variable in studies analyzing energy consumption patterns.
Population	Population growth directly affects the demand for energy in a country or region. As the population increases, there is a greater need for energy to meet the demands of the growing population, including residential, commercial, industrial, and transportation sectors. Understanding and considering population values as an independent variable is crucial for analyzing and planning energy resources.
Import	The relationship between imports and energy consumption is significant, as the availability and reliance on imported energy resources can directly impact a country’s energy demand. The import values of energy resources are chosen as independent variables in this study due to their influence on the overall energy consumption patterns.
Export	The relationship between exports and energy consumption is an important aspect of understanding a country’s energy demand. The export values of energy resources are chosen as independent variables in this study due to their potential impact on a country’s overall energy consumption patterns.

Table 3. Observed historical data related to the energy demand in Türkiye.

Years	Population (10⁶)	GDP (USD 10⁹)	Import (USD 10⁹)	Export (USD 10⁹)	Energy (Mtoe)
1979	43.19	82.00	5.07	2.26	26.37
1980	44.09	68.82	7.91	2.91	27.51
1981	44.98	71.04	8.93	4.70	27.60
1982	45.95	64.55	8.84	5.75	29.59
1983	47.03	61.68	9.24	5.73	30.25
1984	48.11	59.99	10.76	7.13	31.75
1985	49.18	67.23	11.34	7.96	32.73
1986	50.22	75.73	11.10	7.46	34.59
1987	51.25	87.17	14.16	10.20	38.70
1988	52.28	90.85	14.34	11.66	39.73
1989	53.31	107.14	15.80	11.62	40.40
1990	54.32	150.68	22.30	12.96	42.24
1991	55.32	150.03	21.05	13.59	43.09
1992	56.30	158.46	22.87	14.71	44.70
1993	57.30	180.17	29.43	15.35	48.26
1994	58.31	130.69	23.27	18.11	45.77
1995	59.31	169.49	35.71	21.64	50.53
1996	60.29	181.48	43.63	23.22	54.85
1997	61.28	189.83	48.56	26.26	57.99
1998	62.24	275.97	45.92	26.97	57.12
1999	63.19	256.39	40.67	26.59	55.22
2000	64.11	274.30	54.50	27.77	61.60
2001	65.07	201.75	41.40	31.33	55.60
2002	65.99	240.25	51.55	36.06	59.49
2003	66.87	314.59	69.34	47.25	64.59
2004	67.79	408.88	97.54	63.17	68.24
2005	68.70	506.31	116.77	73.48	70.33
2006	69.60	557.06	139.58	85.53	74.82
2007	70.47	681.34	170.06	107.27	79.79
2008	71.32	770.46	201.96	132.03	77.76
2009	72.23	649.27	140.93	102.14	78.36
2010	73.20	776.99	185.54	113.88	79.84
2011	74.17	838.76	240.84	134.91	84.91
2012	75.28	880.56	236.55	152.46	88.84
2013	76.58	957.78	260.82	161.48	88.07
2014	78.11	938.95	251.14	166.50	89.25
2015	79.65	864.32	213.62	150.98	99.47
2016	81.02	869.69	202.19	149.25	104.57
2017	82.09	859.00	238.72	164.50	111.65
2018	82.81	778.47	231.15	177.17	109.44
2019	83.48	759.94	210.35	180.83	110.65
2020	84.14	720.30	219.52	169.64	113.70
2021	84.78	819.04	271.42	225.29	123.86

Table 4. The combinations of features.

Model	Input
M₁	GDP, Population
M₂	GDP, Import
M₃	GDP, Export
M₄	Population, Import
M₅	Population, Export
M₆	Import, Export
M₇	GDP, Population, Import
M₈	GDP, Population, Export
M₉	Population, Import, Export
M₁₀	GDP, Import, Export
M₁₁	* GDP, Population, Import, Export

* All features.

Table 5. Top three results of ML algorithms based on the training set using models’ combinations.

Models	ML Algorithm	MAE	MSE	RMSE	R²	MAPE
M₁	Extra Trees Regressor	2743.27	14,435,527.00	3622.18	0.9751	0.0546
	Huber Regressor	3419.46	20,734,897.00	4395.82	0.9642	0.0749
	Extreme Gradient Boosting	3725.91	22,807,495.50	4551.60	0.9625	0.0655
M₂	K-Neighbors Regressor	6881.86	84,149,616.80	8596.44	0.8710	0.1104
	Random Forest Regressor	6809.08	114,145,888.80	9963.98	0.8252	0.1167
	Extra Trees Regressor	6452.43	123,658,755.00	9929.56	0.8117	0.1178
M₃	Extra Trees Regressor	3977.99	44,054,367.30	5730.78	0.9299	0.0695
	Random Forest Regressor	4631.80	55,354,193.10	6583.28	0.9162	0.0797
	Gradient Boosting Regressor	5351.58	64,199,913.90	7220.71	0.9031	0.0901
M₄	Extra Trees Regressor	3042.48	17,937,995.55	3844.90	0.9733	0.0591
	Random Forest Regressor	3666.07	22,957,290.18	4448.29	0.9716	0.0685
	Gradient Boosting Regressor	4156.41	26,308,872.75	4930.11	0.9652	0.0742
M₅	Huber Regressor	3864.21	36,775,418.87	5527.85	0.9541	0.0601
	Lasso Regression	4003.55	36,200,997.00	5456.62	0.9530	0.0678
	Least Angle Regression	4003.72	36,196,280.60	5456.35	0.9530	0.0678
M₆	K-Neighbors Regressor	5707.22	80,792,845.60	7992.88	0.8962	0.1035
	Random Forest Regressor	5455.21	77,021,388.80	8193.73	0.8644	0.0930
	Extra Trees Regressor	5511.29	82,472,925.80	8300.80	0.8540	0.0950
M₇	Extra Trees Regressor	2308.94	12,339,053.90	3277.79	0.9754	0.0488
	Random Forest Regressor	2972.75	17,408,515.60	3854.32	0.9608	0.0576
	AdaBoost Regressor	3400.27	18,546,026.70	4014.45	0.9538	0.0640
M₈	Extra Trees Regressor	3189.81	17,110,325.38	3874.79	0.9716	0.0537
	AdaBoost Regressor	4293.89	29,715,232.81	5282.08	0.9475	0.0728
	Random Forest Regressor	4287.09	35,730,037.10	5185.02	0.9460	0.0700
M₉	Extra Trees Regressor	3018.13	30,503,239.40	4394.13	0.9477	0.0407
	Random Forest Regressor	3583.32	45,575,422.90	5273.91	0.9304	0.0473
	AdaBoost Regressor	4069.59	37,580,683.81	5358.08	0.9285	0.0609
M₁₀	K-Neighbors Regressor	5670.51	74,121,506.00	7652.91	0.9017	0.1003
	Random Forest Regressor	5372.22	75,187,291.90	7930.92	0.8896	0.0942
	Ridge Regression	7009.35	69,191,604.80	8277.45	0.8621	0.1643
M₁₁	Extra Trees Regressor	2296.86	8,756,864.57	2932.96	0.9788	0.0464
	Random Forest Regressor	3186.05	14,777,499.11	3817.37	0.9684	0.0658
	Ridge Regression	3676.12	21,641,675.00	4466.13	0.9655	0.0736

Table 6. Results of ML algorithms based on train set.

ML Algorithm	MAE	MSE	RMSE	R²	MAPE
Extra Trees Regressor	2296.86	8,756,864.57	2932.96	0.9788	0.0464
Random Forest Regressor	3186.05	14,777,499.11	3817.37	0.9684	0.0658
Ridge Regression	3676.12	21,641,675.00	4466.14	0.9655	0.0736
Linear Regression	3780.00	23,739,669.80	4668.86	0.9635	0.0825
Lasso Regression	3779.85	23,736,214.80	4668.54	0.9635	0.0825
Least Angle Regression	3780.00	23,739,655.90	4668.86	0.9635	0.0825
Lasso Least Angle Regression	3779.85	23,736,205.30	4668.54	0.9635	0.0825
Orthogonal Matching Pursuit	3780.00	23,739,655.90	4668.86	0.9635	0.0825
Huber Regressor	3828.58	22,823,167.86	4595.40	0.9634	0.0785
AdaBoost Regressor	3575.50	15,583,915.43	3934.88	0.9611	0.0691
Gradient Boosting Regressor	3772.78	16,872,570.61	4096.12	0.9556	0.0707
Decision Tree Regressor	3768.74	16,856,622.84	4094.38	0.9554	0.0707
Extreme Gradient Boosting	3768.71	16,856,282.40	4094.34	0.9554	0.0706
K-Neighbors Regressor	3987.62	29,417,635.00	5274.57	0.9493	0.0848
Elastic Net	7402.15	81,150,325.20	8721.88	0.8666	0.1248
Bayesian Ridge	22,303.79	705,872,003.20	25,684.29	−0.0970	0.4306
Light Gradient Boosting Machine	22,303.79	705,872,041.79	25,684.29	−0.0970	0.4306
Dummy Regressor	22,303.79	705,872,041.60	25,684.29	−0.0970	0.4306
Passive Aggressive Regressor	40,863.75	2,361,836,689.9	48,178.33	−3.9041	0.5635

Table 7. Results of the Extra Trees Regressor algorithm based on the test set.

ML Algorithm	MAE	MSE	RMSE	R2	MAPE
Extra Trees Regressor	2989.27	17,145,375.48	4140.6975	0.9811	0.0406

Table 8. Performance of the ensemble models in both the training and test dataset.

				Training					Test
Ensemble Methods	Fold	Base Learners	Meta Learner	MAE	MSE	RMSE	R²	MAPE	MAE	MSE	RMSE	R²	MAPE
	0			2384.30	10,532,299.28	3245.35	0.9855	0.0425
	1			2428.40	10,139,704.39	3184.29	0.9870	0.0583
	2			1320.12	4,588,805.18	2142.15	0.9606	0.0211
Bagging	3	ET		2943.76	18,949,550.99	4353.11	0.9814	0.0811	3247.76	20,526,807.39	4530.65	0.9773	0.0476
	4			2833.04	10,421,373.56	3228.22	0.9859	0.0642
	Mean			2381.92	10,926,346.68	3230.62	0.9801	0.0534
	Std			574.24	4,594,895.36	699.59	0.0099	0.0204
	0			2324.68	10,142,168.21	3184.68	0.9861	0.0401
	1			2292.10	7,616,694.33	2759.84	0.9902	0.0526
	2			1591.23	6,141,944.75	2478.29	0.9473	0.0249
Boosting	3	ET		2450.57	10,639,983.49	3261.90	0.9895	0.0582	2791.95	16,986,253.22	4121.44	0.9811	0.0367
	4			2256.76	6,211,550.01	2492.30	0.9916	0.0484
	Mean			2183.07	8,150,468.16	2835.41	0.9809	0.0448
	Std			303.05	1,910,132.98	333.12	0.0169	0.0116
	0			2621.68	13,996,026.78	3741.13	0.9808	0.0456
	1			2115.51	6,368,872.54	2523.66	0.9918	0.0388
	2	ET		1227.95	2,599,808.47	1612.39	0.9777	0.0213
Blending	3	RF		2266.80	7,245,254.56	2691.70	0.9929	0.0348	3138.73	20,627,053.08	4541.70	0.9772	0.0430
	4	Ridge		1783.21	4,566,049.47	2136.83	0.9938	0.0311
	Mean			2003.03	6,955,202.36	2541.14	0.9874	0.0343
	Std			472.02	3,864,676.67	705.55	0.0068	0.0081
	0			2332.41	10,667,480.27	3266.11	0.9853	0.0383
	1			2359.09	6,559,821.14	2561.21	0.9916	0.0470
	2	ET		1133.61	2,887,446.54	1699.25	0.9752	0.0187
Stacking	3	RF	Ridge	2110.38	6,221,784.33	2494.35	0.9939	0.0343	2704.34	15,710,000.99	3963.58	0.9826	0.0359
	4	Ridge		1520.56	3,762,509.98	1939.72	0.9949	0.0294
	Mean			1891.21	6,019,808.45	2392.13	0.9882	0.0335
	Std			484.35	2,714,418.86	545.46	0.0073	0.0094

Table 9. Comparison of the actual values ‘Energy’ and predicted values using the proposed stacking ensemble model.

Years	Observed Energy Demand (Mtoe)	Predicted Energy Demand (Mtoe)	Amount of Errors	Relative Errors (%)
1980	27.51	26.96	0.55	1.99
1983	30.25	28.97	1.28	4.23
1984	31.75	30.42	1.33	4.18
1988	39.73	38.69	1.04	2.62
1989	40.40	40.53	−0.13	−0.32
2002	59.49	62.74	−3.25	−5.46
2007	79.79	76.95	2.84	3.56
2010	79.84	81.93	−2.09	−2.62
2013	88.07	91.51	−3.44	−3.91
2014	89.25	96.05	−6.80	−7.62
2021	123.86	113.73	10.13	8.18

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kayacı Çodur, M. Ensemble Machine Learning Approaches for Prediction of Türkiye’s Energy Demand. Energies 2024, 17, 74. https://doi.org/10.3390/en17010074

AMA Style

Kayacı Çodur M. Ensemble Machine Learning Approaches for Prediction of Türkiye’s Energy Demand. Energies. 2024; 17(1):74. https://doi.org/10.3390/en17010074

Chicago/Turabian Style

Kayacı Çodur, Merve. 2024. "Ensemble Machine Learning Approaches for Prediction of Türkiye’s Energy Demand" Energies 17, no. 1: 74. https://doi.org/10.3390/en17010074

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ensemble Machine Learning Approaches for Prediction of Türkiye’s Energy Demand

Abstract

1. Introduction

2. Literature Review

2.1. Review of Energy Demand Forecasting in the World

2.2. Energy Demand Forecasting in Türkiye

3. Materials and Methods

3.1. ML Algorithms

3.2. Structure of the Proposed Methods

3.2.1. k-Fold Cross-Validation

3.2.2. Model Hyperparameters Tuning

3.2.3. Performance Metrics

3.3. Data Collection

4. Results and Discussion

4.1. Implementation Setup

4.2. Feature Selection

4.3. Performance Evaluation

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI