Modeling Carbon Release of Brazilian Highest Economic Pole and Major Urban Emitter: Comparing Classical Methods and Artificial Neural Networks

Debone, Daniela; Martins, Tiago Dias; Miraglia, Simone Georges El Khouri

doi:10.3390/cli10010009

Open AccessArticle

Modeling Carbon Release of Brazilian Highest Economic Pole and Major Urban Emitter: Comparing Classical Methods and Artificial Neural Networks

by

Daniela Debone

^*

,

Tiago Dias Martins

and

Simone Georges El Khouri Miraglia

Institute of Environmental, Chemical and Pharmaceutical Sciences, Federal University of Sao Paulo (UNIFESP), Diadema 09913030, Brazil

^*

Author to whom correspondence should be addressed.

Climate 2022, 10(1), 9; https://doi.org/10.3390/cli10010009

Submission received: 11 December 2021 / Revised: 8 January 2022 / Accepted: 10 January 2022 / Published: 15 January 2022

(This article belongs to the Section Climate and Economics)

Download

Browse Figures

Versions Notes

Abstract

:

Despite the concern about climate change and the associated negative impacts, fossil fuels continue to prevail in the global energy consumption. This paper aimed to propose the first model that relates CO₂ emissions of Sao Paulo, the main urban center emitter in Brazil, with gross national product and energy consumption. Thus, we investigated the accuracy of three different methods: multivariate linear regression, elastic-net regression, and multilayer perceptron artificial neural networks. Comparing the results, we clearly demonstrated the superiority of artificial neural networks when compared with the other models. They presented better results of mean absolute percentage error (MAPE = 0.76%) and the highest possible coefficient of determination (R² = 1.00). This investigation provides an innovative integrated climate-economic approach for the accurate prediction of carbon emissions. Therefore, it can be considered as a potential valuable decision-support tool for policymakers to design and implement effective environmental policies.

Keywords:

climate change; carbon dioxide emissions; economic growth; energy consumption; artificial neural networks; artificial intelligence

1. Introduction

1.1. Global Carbon Emissions and Economic Indicators

Despite the increasing concern about climate change and the need of equity for sharing associated economic impacts and the urgency of developing low-carbon economy, fossil fuels continue to prevail in the global energy consumption [1,2].

In this regard, economic advancement and urbanization are the major processes that contribute to the high levels of consumption of fossil fuels. Consequently, the release of CO₂ and other pollutants to the atmosphere also increases. Energy is considered a fundamental input for the production process, influencing the economic result. Additionally, it is a determinant of economic growth [3,4,5,6]. As a consequence, there has been an increased interest in modeling approaches to explore the relation among carbon emissions (CO₂)—the main greenhouse gas (GHG) emitted worldwide [7]—and economic indicators, such as gross national product (GDP) and energy consumption (EN).

China is globally recognized as the largest CO₂ emitter [8,9], counting approximately 9.5 billion tons in 2018, according to the International Energy Agency [7]. Thus, it is the most studied country on this field [10,11,12,13,14,15,16,17,18,19]. India and European Union, the third and fourth emitters [7], respectively, are also important objects of investigation [9,20,21,22,23,24,25]. However, to the best of our knowledge, studies that disentangle the relation among carbon emissions, GDP, and energy consumption are scarce for Brazilian data, mainly for Sao Paulo state, which is an important CO₂ emitter in the country, responsible for emitting almost 92 million tons in 2019 [26]. This denotes an important knowledge gap, since Brazil was the 13th worldwide emitter in 2019, according to the Statista database [27].

In 2016, Brazil established ambitious goals for mitigation and adaptation to climate change by 2030, such as reducing greenhouse gas (GHG) emissions by 43%; restoration and reforestation of million hectares of forests, achieving Amazon zero-deforestation targets; and increasing the share of renewable energies in the energy matrix to around 45% [28,29,30]. However, in the wrong path of these goals, the current government has acted to dismantle environmental policy, resulting in negative impacts, such as the increased fires in the Amazon and Pantanal regions [31,32], which corresponded to 77,396 km² and 40,606 km² deforested burning areas in 2020, respectively, according to the National Institute for Space Research [33].

Given the continental dimensions of Brazil, in addition to preventing burning and deforestation, increasing efforts to mitigate GHG emissions in regions that host large urban centers could contribute to the achievement of the national targets. For instance, in the state of Sao Paulo, the energy sector is the most intensive for carbon emissions [34]. In 2019, 84% of all CO₂ emitted in the state was generated by this sector, mostly from fossil fuels burning by transportation, industries, and fuel production. Furthermore, Sao Paulo is the third largest greenhouse gas emitter in the country; excluding the land-use change and forestry sector, mainly represented by deforestation and associated fires, Sao Paulo is the major Brazilian emitter of GHG [26]. Therefore, efficient modeling techniques to relate carbon emissions with economic indicators for this state could be of great importance at a global level.

1.2. Literature Review

Several studies investigated this relationship using different mathematical methods, such as Granger causality [12,35,36,37,38], autoregressive distributed lags (ARDL) [9,39,40], and stochastic impacts by regression on population, affluence, and technology (STIRPAT model) [41,42,43,44].

Using the ARDL model, Leal et al. (2018) found a trade-off between economic growth and CO₂ intensity (measure of CO₂ produced per USD of GDP) by analyzing annual Australian data from 1965 to 2015. The authors reported that the increased GDP raised investments in renewable energy, but without impact on reducing of CO₂ intensity [40]. On the other hand, Ghazali and Ali (2019) investigated different drivers of CO₂ for ten newly industrialized countries, using the STIRPAT model. They found that there was a relationship between urbanization, GDP per capita, and CO₂ intensity on the increasing CO₂ emissions [45]. Polloni-Silva et al. (2021) also applied the STIRPAT model to investigate the Brazilian growth CO₂ nexus. The authors found that both population size and GDP per capita were the main driving factors of CO₂ emissions, since a 1% increase in the population size was associated with an increase of 2.4–2.7% in CO₂ emissions, and 1% increase in GDP per capita was related to an increase of about 1.2% in CO₂ emissions [44].

Ben Jebli et al. (2020), using the Granger causality approach, investigated the relationship among renewable energy consumption, industrial added value, value-added service, economic growth, and carbon emissions of 102 countries [36]. Bayar et al. (2021), through the bootstrap panel Granger causality test, explored the mutual causality among renewable energy, globalization, economic indicators, and CO₂ emissions. The authors found that the renewable energy use has influenced the decrease in CO₂ emissions of Eastern Europe from 1995 to 2015 [37]. The observed causal relationships usually varied considerably in terms of the causality direction among the variables, mainly according to the development degree of the analyzed country/region. However, renewable energy resources have been considered as a crucial option for economic decarbonization [36,37,40,45].

Based on previously published articles, some divergences are easily perceived among the results, since they involve analysis about different countries, addressing regional or global perspective, through the application of different mathematical methods, variable types, and periods. These differences give ground for the use of more sophisticated and accurate methods, such as machine learning approaches. In this context, artificial neural networks (ANNs) models have also gained a remarkable popularity in the field. This approach is suited better for the cases in which there are nonlinear relationships between the target and the independent variables [46,47,48]. Additionally, they are able to provide prediction metrics with higher accuracy, compared with other methods, such as multivariate linear regression and even other machine learning tools [1,49,50].

Recently, many authors have clearly shown this feature. Guo et al. (2018) used the back-propagation neural network to forecast Chinese CO₂ emissions in 2030 under different scenarios, and found a strong similarity between fitted and observed data, with a R-Squared level higher than 0.99 [48]. Ahmadi et al. (2019) used the group method data handling neural network approach to analyze the CO₂ emissions of five Middle East countries between 2000 and 2017. The authors found a precision of 0.99 (R-Squared) in predicting CO₂ emissions [46]. Acheampong and Boateng (2019) also used the back-propagation neural network and found a high level of accuracy when predicting CO₂ emissions in Brazil, China, and India, reporting reduced mean squared error values between 0.00042 and 0.00420 [47]. Mardani et al. (2020), considering the period between 1962 and 2016, and based on economic growth and energy consumption as input parameters, found a mean absolute error (MAE) equal to 0.104, when predicting CO₂ emissions for the G20 countries using adaptive neuro-fuzzy inference system and artificial neural networks [1]. Bamisile et al. (2021) applied different architectures of feed-forward backpropagation ANNs to predict carbon emissions of several African countries and also obtained very accurate models, with R-Squared higher than 0.98 [51].

Modeling applications using Artificial Intelligence has been an emerging area, mainly regarding carbon emissions and socioeconomic indicators for relationship investigations. However, this type of analysis remains little-explored for Brazilian data, as demonstrated in a recently published systematic review [52]. Concerning the ANN modeling for the state of Sao Paulo, the few available studies are limited to epidemiological studies of the health effects due to outdoor air pollution, also showing outstanding results in terms of prediction accuracy [53,54,55].

1.3. Purpose of the Study

Our work aims to fill this gap and to contribute to the state of the art, by developing, for the first time, a model of CO₂ emissions in the state of Sao Paulo, based on GDP and energy consumption. Additionally, we aimed to compare the accuracy of three different methods: multivariate linear regression, elastic-net regression, and multilayer perceptron artificial neural networks. Figure 1 provides the study design.

The remaining part of this paper is structured as follows: Section 2 describes in detail the obtained data, models, metrics and software used. Section 3 discusses the obtained results. Section 4 presents the conclusion remarks and policy implications, as well as the study’s limitations.

2. Materials and Methods

In this work, the annual CO₂ emission data were considered as the outcome variable, measured in tons. All data were collected from the System for Estimating Greenhouse Gas Emissions (SEEG) database [26]. The choice to focus this investigation on CO₂ emissions is justified, since it is the main greenhouse gas emitted in the state of Sao Paulo [26,34]. The predictors were the annual energy consumption (EN) and gross domestic product (GDP). The EN data were collected from the Energy Balance database of Sao Paulo. This is related to the annual final energy consumption data, represented by ton of oil equivalent (toe), and refers to 34% of petroleum by-products, 24% of sugarcane bagasse, 19% of electricity, 10% of ethanol, 7% of natural gas, and 6% of others sources [56].

The annual GDP was obtained from the Data Analysis System of the state of Sao Paulo [57] and converted to US dollars, considering the exchange average rate of each year available from the Institute for Applied Economic Research [58].

The period considered in this work ranged from 2000 to 2018. We investigated the relation among CO₂, GDP, and EN by using three different approaches: multivariate linear regression, penalized regression and multilayer perceptron artificial neural network.

Considering that the dataset of this study is on annual basis, and it consists of a reduced time series (n = 19), for all tested models, we used the current time (t) for all the variables. In addition, our analyses have an exploratory aspect, since they investigate the relationship between CO2, EN, and GDP, and are not intended to predict carbon emissions by using, for example, ARDL and Granger causality, or to forecast future emissions scenarios. Therefore, the lag of the variables was not considered, as performed by previously published works [1,59,60,61].

First, to check the intercorrelation of GDP and EN, the variance inflation factor (VIF) analysis was performed. As a general rule, the proposed cutoff for a multicollinearity problem is that the VIF value of each predictor should not exceed the value of 10 [62,63,64,65].

Then, by using the R-software package car, which contains the main functions for applied regression [66], we tested different combinations of multivariate linear regression (MLR) models, including quadratic forms, since this strategy was previously established in some studies [59,67,68], totalizing 5 tested models. This first step can be summarized by the following generic equation:

y ~ β_{0} + β_{1} x_{1} + β_{2} x_{1}^{2} + β_{3} x_{2} + β_{4} x_{2}^{2} + \dots β_{n} x_{n}

(1)

where y represents the outcome variable, x are the predictors, β are the regression coefficients, and n are the number of observations of the dataset.

The adjusted coefficient of determination (adjusted-R²), calculated from R², according to Equations (2) and (3), and mean absolute percentage error (MAPE), Equation (4), were used as the performance functions, as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(2)

{adjusted R}^{2} = 1 - \frac{(1 - R^{2}) (n - 1)}{n - 1}

(3)

MAPE = 100 \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} |

(4)

where y represents the real values, ŷ are the predicted values, and ȳ refers to the average values.

By using the tidy models of R-software, a collection of packages for modeling and machine learning [69], we also evaluated models based on elastic-net regression, which is a mix of two regression methods: ridge and lasso. The ridge regression reduces the model’s complexity by a coefficient shrinkage and improves the least-squares error by applying a degree of bias to the regression estimates. The lasso regression (Least Absolute Shrinkage and Selection Operator) works similarly, since it also adds a penalty for non-zero coefficients, but differently from the ridge regression, which penalizes the sum of the squared coefficients (L2 penalty, represented by the first set inside the parenthesis—Equation (5)), lasso penalizes the sum of their absolute values (L1 penalty, represented by the second set inside the parenthesis—Equation (5)) [70,71,72]. Thus, elastic-net regression combines both lasso and ridge methods by learning from their shortcomings to improve the regularization of these statistical approaches. Aside from solving the multicollinearity among predictor variables, this method overcomes the limitations of the lasso regression [73,74]. This approach aims to minimize the following loss function:

L (\hat{β}) = \frac{\sum_{i = 1}^{n} {(y_{i} - x_{i}^{j} \hat{β})}^{2}}{2 n} + λ (\overset{L 2}{\overset{︷}{\frac{1 - α}{2} \sum_{j = 1}^{m} {\hat{β}}_{j}^{2}}} + \underset{L 1}{\underset{︸}{α \sum_{j = 1}^{m} | {\hat{β}}_{j} |}})

(5)

where y represents the outcome variable, x are the predictors, β are the coefficients, α sets the degree of mixing between ridge regression and lasso, λ is the shrinkage parameter, n is the number of observations of the dataset, and m the number of variables. When λ = 0, no shrinkage is performed, and as λ increases, the coefficients are shrunk.

The training set was defined as 80% of the sample size (n = 15) and the remaining 20% was used as a test set (n = 4). In addition, for higher accuracy estimation and to minimize the mean-squared prediction error, we tuned the parameters α and λ through 5-fold cross-validation. [75,76]. Data division was random and performed automatically by the software. R², MAPE, and the Root of the Mean Square Error (RMSE), Equation (6), were used as the performance functions.

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}

(6)

Finally, we evaluated several multilayer perceptron (MLP) neural networks using MATLAB programming language. Artificial Neural Networks (ANNs) are computational methods inspired by the biological neurons’ structure, which are composed of dendrites, cell body and axon. Electrical signals from other neurons (inputs) arrive through the dendrites, the cell body processes this information, and the axon transmits the nervous signal from cell body to its ends and generates the output signals to the other neurons [77,78,79,80].

An ANN simulates this process, since it is composed by artificial neurons distributed into layers (three or more) that link the inputs to the desired output(s) (Figure 2). In this work, the variables GDP and EN were considered as inputs and CO₂ emissions were the output, all at the current time (t).

Also named feed-forward neural network, Multilayer Perceptrons (MLPs), feeds information from the front to the end, working as follows: each neuron is attributed with synaptic weights and bias parameters. The synaptic weight represents the relative influence of the different inputs to the neuron. Once it enters the neuron, the information is transformed into an activation coefficient, which is the bias plus the summation of the received information, multiplied by the corresponding synaptic weight, as follows:

α_{j} = \sum_{i = 1}^{z} w_{ij} x_{i} + b_{j}

(7)

where z is the number of inputs and α_j, b_j, x_i, and w_ij is the activation value, bias, input variable, and weight of the neuron j, respectively. To generate the output, an activation function is usually applied over the activation coefficient [80,81,82].

The MLP usually uses a supervised learning technique to determine the weights and bias of each neuron, aiming to obtain an output value closer to the expected one. For this, an optimization algorithm is used to minimize a statistical metric between the real and the fitted values, also called objective function (OF) [80,81,82].

For higher accuracy estimation, we tested different numbers of neurons on the hidden layer, algorithms, and combination of activation functions, totalizing 156 trained architectures, as can be seen in Table 1. The training set was defined at 70% of the sample size (n = 13), leaving 15% for validation (n = 3) and the remaining 15% for the test (n = 3). Data division was random and performed automatically by the software. In addition, we used the 5-fold cross-validation approach to select the best performing model. R², MAPE, and mean squared error (MSE—Equation (8)) were used as performance functions. Equation (8) was also used as OF. Before training the ANNs, all input and output variables were normalized in the range of [−1;1]. The statistical metrics of each step were analyzed by one-way ANOVA followed by the Bonferroni post-test. The p-values ≤ 0.05 were considered to be significant.

MSE = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}

(8)

Among the analyzed different approaches, the optimal model was defined by the statistical analysis of the MAPE metric, using the Mann–Whitney test. The p-values ≤ 0.05 were considered to be significant.

3. Results and Discussion

3.1. Descriptive Analysis and Multivariate Linear Regression Models

Sao Paulo is the most populous state of Brazil and the largest in terms of economic and industrial aspects. It has high rates of atmospheric pollutant concentrations and greenhouse gases emissions [83,84]. As shown in Figure 3A, between the years 2000 and 2018, the GDP of Sao Paulo state showed evident growth until 2011, remaining above USD 780 billion until 2014, and since then has suffered annual retractions due to the advance of the Brazilian economic crisis [85], reaching the value of USD 603 billion in 2018. The CO₂ emissions registered was above 75 million tons, with peaks among the years 2011 and 2014 and a strong reduction from 2015 on, presenting a similar pattern to the GDP time series. Therefore, CO₂ emissions behavior follows the economic performance. On its turn, the annual energy consumption presented a growth until 2015 and since then it has stabilized in the range of 69 million of tons of oil equivalent (toe).

These results are in agreement with Leite et al. (2020), which showed that the energy sector, with emphasis on the transport economic activity, was the most intensive CO₂ emitter in Sao Paulo state, between 2000 and 2018, with high average diesel consumption, playing a fundamental role not only in the economy, but also in the increase in negative impacts on human health associated with air pollution [34].

The results of the correlation analysis are shown in Figure 3B. We also observed a strong positive and significant correlation between CO₂ and GDP (R = 0.832), CO₂ and EN (R = 0.742), and GDP and EN (R = 0.945), with all p-values < 0.001. This result is in agreement with the studies carried out in China and Tunisia, which also analyzed these variables [86,87].

On the other hand, these findings are the oppose to those found on a study performed in another Brazilian region (Mato Grosso do Sul state), which reported an inverse relation among economic development and CO₂ emissions [59]. This divergence shows that the different regions of a country, which presents continental dimensions, cannot be treated equally. Thus, it is suggested that predictive models should be developed for each different region. This is an important observation, since each region of Brazil differs in terms of socioeconomic indicators, sources of GHG emissions, and efforts to mitigation strategies, such as clean energy production [26,88,89].

Regarding MLR models, we observed that all of the tested models presented one beta coefficient with a negative value most related to the EN variable. In addition, with the exception of ID3, the models also resulted in at least one non-significant beta coefficient (Table 2).

These results contradict the Pearson’s correlation analysis, since we reported a positive significant correlation among these variables, implying the presence of multicollinearity effects, even with VIF < 10. VIF is a widely used cutoff [62,63,64,65]. However, for some types of model, values above 2.5 could already lead to the development of these effects [63], which could explain the low performance of our analyses. We found that VIF = 9.4 for both variables of GDP and EN.

Despite the identified multicollinearity effects, the MLR models demonstrated adjusted-R² values higher than 0.68. The model without the quadratic variables (ID1) presented the worst performance (adjusted-R² = 0.68 and MAPE = 5.5%), and the best result, in terms of R² and significance levels of beta coefficients, was observed in ID3 (adjusted-R² = 0.91 and MAPE = 2.68%).

Other studies have also shown results with low accuracy when using MLR models to investigate CO₂ emissions. This could be explained by the presence of multicollinearity effects [59,60,90].

Ratanavaraha and Jomnonkwao (2015) analyzed CO₂ from the transportation sector in Thailand by applying different approaches. The log-linear regression models showed positive and significant beta coefficients, and R² ranging between 0.76 and 0.89 (values close to those shown in Table 2) [60]. However, the path analysis showed a non-significant coefficient for GDP, and the cubic and quadratic curve fit models presented negative values for the coefficients. Zaidi et al. (2017), by comparing different MLR models, found a strong relationship between CO₂, GDP, and energy in a multi-country analysis, with R² ranging between 0.77 and 0.84 [90] (values close to those shown in Table 2). However, the models also presented negative values of the beta coefficient for the GDP. Considering the Brazilian data, Kunimoto et al. (2018) analyzed CO₂ from GDP and its quadratic form for different economic sectors, and the regression models presented non-significant coefficients for energy and industry sectors, similarly to what we observed in our MLR models [59].

Although these studies have obtained efficient models, in terms of R², the authors did not discuss the potential presence of multicollinearity between the independent variables, by using approaches, such as VIF analysis or the application of penalized methods, which may reflect on the instability of these models and inaccurate results [91,92,93].

3.2. Elastic-Net Regression Method

To minimize the observed multicollinearity effects and to find the best approach in terms of CO₂ emission prediction, we applied the elastic-net regression method. This choice is justified, since this technique can solve the effects of multicollinearity, from the regularization that works to keep all the model features, but reducing the magnitude of the coefficients [69]. In the cross validation and training phases, we tuned the optimal hyperparameter pair (α = 0.103 and λ = 0.00000642) that returned the most appropriate RMSE (4.44 ± 0.78) and R² (0.913 ± 0.07) mean values (Table 3). Based on these results, the testing phase presented an acceptable performance (RMSE = 4.65 and R² = 0.868).

The application of the elastic-net regression approach resulted in a penalization that kept all variables, resulting in a similar structure to the ID5 MLR model (Table 2). In addition, it solved the problem of sign inversions of the coefficients. As it can be seen in Figure 4, the significant variables of the final model presented values greater than zero. The elastic-net regression’s coefficients of the model can be found at the Appendix A (Table A2).

The model’s accuracy was: R² = 0.923 and MAPE = 2.67% (Table 3). Therefore, it presented a higher performance than the multivariate linear regression models previously tested and shown at Table 2.

Penalized methods are still poorly explored for analyzing CO₂ emissions. However, some authors have found similar results to those presented here in terms of precision quality. Elastic-net model has shown acceptable MSE (0.83) for estimating the building energy use intensity in New York City. Additionally, a study carried out using data from Africa, using this same approach, showed a reduced RMSE (0.30), evaluating CO₂ emissions from different types of fuels [94,95].

3.3. Multilayer Perceptron Neural Network

Aiming to find the best approach to predict the carbon emissions, we also trained 156 different architectures of MLPs. Since this method does not require any assumption regarding the relationship of each independent variables, and can manage interactions among them [80,96,97], (also the predictors presented VIF < 10), we chose to analyze GDP and EN without any kind of transformation, such as the quadratic form.

Among all the tested structures, three models generated the best values of R² and MAPE, and their results are shown in Table 4. Concerning the R² values of test and validation steps, the models showed very similar performance with R² > 0.99. On the other hand, the ID47 (with 6 neurons in the hidden layer) model stands out and presents the lowest values of MAPE and OF in these two datasets. In fact, it is important to highlight that, in this work, the objective functions presented high values because of the magnitude of the analyzed variables.

Given these findings, we performed a 5-fold cross-validation to suitably select the highest performing model. Figure 5A shows the R² values obtained in the training and validation steps. A significant difference was found between models 47 and 98 and between 65 and 98, only in the train phase. There were no differences between models 47 and 65. Regarding the MAPE values, we found a significant difference between models 47 and 98 and between models 65 and 98, for both phases. Moreover, no differences were observed between models 47 and 65 (Figure 5B), indicating that the data used in this analysis did not negatively influence the performance of the ANN.

Models 47 and 65 were revealed to be equally efficient. However, model 47 was chosen as the optimal one, since it has a smaller number of neurons in the hidden layer and has also returned the lowest OF value for test and validation data (Table 4), which are important parameters that must be considered to avoid overfitting and achieve better optimization [82,98].

3.4. Comparative Analysis

For comparison purposes, Figure 6 shows the predictions using the ID3 MLR model, elastic-net regression, and ID47 MLP neural network, since these models returned the highest R² values: 0.91, 0.92, and 1.00, respectively.

Our findings demonstrate that the ANN-based model provides the best prediction accuracy. We can observe that for ID47 model, the predicted values were very close to the real ones, and showed significant deviations only in the period of 2007–2009. Although the metrics of the elastic-net model can be considered acceptable, the same accuracy could not be observed since its forecast showed significant deviations in several periods, around the years 2002–2003, 2007–2009, and 2014–2018. The ID3 MLR model presented similar accuracy to the elastic-net regression, with deviations in the same periods of the time series.

To analyze each models’ performance in more detail, we plotted the absolute percentage error (APE) frequency, according to Equation (9), as shown in Figure 7.

APE = 100 | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} |

(9)

The ID47 MLP model presented the majority of the APE values close to zero, ranging between 0.01 and 0.57, and did not present any error greater than 5.48. The other models presented similar distribution results, with the most APE values around 0.79 and 4.90 and also values above 9. Despite the similarity of the results between elastic-net regression and the ID3 MLR model, it is important to highlight that the ID3 MLR model maintained the effects of multicollinearity among the independent variables and was considered unsuitable to estimate CO₂ emissions, since it may represent inaccurate results [91,92,93].

Table 5 summarizes the metrics of the analyzed models. We observed that the ID47 MLP neural network showed the lowest maximum APE and the lowest value of the APE median. Additionally, it presented a significantly lower MAPE when compared with the other models, analyzed by Mann–Whitney test (p < 0.001), and the highest possible R² value, confirming its good performance in predicting CO₂ emissions. The elastic-net regression and ID3 MLR model showed similar results for all presented metrics.

Our findings clearly demonstrate that the best result was obtained by the model ID47, a MLP neural network with a structure of 2-6-1 trained with the Scaled Conjugate Gradient (trainscg) algorithm. This structure has one hidden layer, and can be written according to the following equation [80]:

y = g {(\sum_{k = 1}^{6} w_{j, k} f {(\sum_{i = 1}^{2} w_{i, j} x_{i} + b_{j})}_{j} + b_{k})}_{k}

(10)

where y is the output k, and f and g are the hyperbolic tangent sigmoid and linear activation functions, respectively. The weights and the bias for each neuron can be found in Appendix A (Table A3 and Table A4, and Figure A1).

The reduced number of observations (19 years’ time series dataset) is an important limitation of this study to stand out. For this purpose, the 5-fold cross-validation was applied in both elastic-net regression and multilayer perceptron artificial neural networks models as an important strategy to reduce bias and variance. In this step, every data point was used once for validation. Since R², RMSE, and MAPE are averaged over five subsets, the model is less sensitive to the splitting of the analyzed variables, in this way, significantly reducing both the bias and variance. In studies of this research field, it is quite common to use reduced time series and apply k-fold cross-validation as an analysis step. Ahmadi et al. (2019) used the ANN method and considered n = 25; Rezaei et al. (2018) also used ANN and considered n = 26; Ardakani and Seyedaliakbarb (2019) used multivariate regression and considered n = 20; and Kunimoto et al. (2018) also used multivariate regression and considered n = 14. Ahmadi et al. (2019) and Rezaei et al. (2018) did not report the application of k-fold cross-validation. However, it may have been a step of the analysis which was programmed automatically by the software [46,59,61,99].

Still regarding this limitation, the applied method and the obtained results were revealed to be promising for analyzing data for a developing country such as Brazil, which faces difficulties in the availability of public data and which can provide guidelines for public policies at a special moment for both the global and regional scale concerning climate change effects, stressing the role of a challenging area of research.

Our results are consistent with other studies that showed the superiority of the prediction of ANN-based methods [1,46,48,49,99,100]. This technique has been shown to be powerful for modeling several problems at a level of high complexity, with higher precision when compared with the conventional approaches, such as multivariate linear regression. This is mainly due to its capability of capturing possible nonlinear relationships among the input and output variables [1,49,100].

Similarly, other authors have demonstrated the high efficiency of ANN-based models in estimating carbon emissions. Xu et al. (2019) evaluated China’s CO₂ emissions and have demonstrated a higher prediction efficiency of dynamic nonlinear artificial neural network, with a better mean absolute error (MAE = 3.51), compared with the linear regression (MAE = 6.15) [100]. Mardani et al. (2020) have also shown a highly accurate estimation of neuro-fuzzy inference system and artificial neural network techniques combined method (MAE = 0.065) compared with multiple linear regression (MAE = 0.522) to estimate CO₂ emission based on the energy consumption and economic growth indicators of G20 countries [1]. According to Hosseini et al. (2019), conventional methods have shown poor quality in predicting carbon emissions, since they have generated results with divergent trends [49]. Consequently, innovative and sophisticated approaches, such as ANN-based techniques, are increasingly needed to produce effective, efficient, and accurate analyses.

Another interesting finding is that the better performance of our ANN model, when compared with elastic-net regression, a machine learning method, in terms of R², APE, and MAPE analyses, is in agreement with the findings of Bakay and Ağbulut (2021). In their study, the authors obtained a higher R² to model Turkish CO₂ and N₂O emissions from ANN-based models, than using the support vector machine [50]. Moreover, Magazzino et al. (2021b), who applied the D2C algorithm to investigate causal relationship among green energy production, economic indicators, and CO₂ emissions in different countries, obtained a R² equal to 0.9, a value very close to our elastic-net regression model [101].

Regarding the R² metric, our results are close to those obtained by the authors that reached almost the maximum possible value in estimating CO₂ emissions, such as: Ahmadi et al. (2019), Guo et al. (2018) and Rezaei et al. (2018), which reached R² of 0.99 [46,48,99].

Concerning the models that were used for evaluating atmospheric pollution, a strictly related area of carbon emissions, our results are also in accordance with recent studies. Araujo et al. (2020) obtained better results, in terms of MSE and MAE metrics, by combining GLM, a traditional time series model, with artificial neural networks, to estimate daily hospital admissions due to air pollution of two different cities of Sao Paulo state. Similarly, Magazzino et al. (2021a) and Tadano et al. (2020) were successful in estimating the relationships between air quality and COVID-19 outcomes for New York and Sao Paulo cities, respectively, by using ANN-based techniques [53,54,102].

In light of this, the current global health crisis due to COVID-19 pandemic, has forced us to reflect on the urgency of changing the “business as usual” way of life. This is an unprecedented opportunity to move toward a more sustainable global actions, in terms of speeding up the transition of making cities more inclusive, developing urban resilience, and implementing green strategies for economics, especially for food, transport and energy systems, since they are crucial drive factors to carbon emissions [103,104,105]. Therefore, concerning that climate change is a key global challenge, the application of accurate models, such as ANN-based techniques, for modeling and predicting carbon emissions, can be highly promising in acting as an effective tool to improve current and future mitigation strategies.

4. Conclusions and Policy Implications

The present study was designed to obtain an equation to relate CO₂ emissions with GDP and energy consumption of Sao Paulo state, Brazil. For this purpose, three different approaches were proposed: multivariate linear regression, elastic-net regression, and a multilayer perceptron artificial neural network.

This investigation is of great relevance because, for the first time, an application of neural network models for predicting CO₂ emissions in Sao Paulo state, Brazil, was proposed. Our findings have shown that the MLP neural network model provided results with higher accuracy compared with multivariate linear regression and elastic-net regression, according to performance metrics: R² (1.000) and MAPE (0.76%). This innovative integrated climate-economic approach provides a pioneer path for analyzing environmental and economic indicators.

ANN-based methods have been increasingly recognized as a powerful approach to estimate several parameters of interest in the field of environmental economics (e.g., carbon emissions and economic indicators). It presents the capability of capturing the possible nonlinear relationships among the selected variables as a main advantage. It can effectively estimate complex problems, and have error tolerance, high adaptability and learning capabilities. Therefore, it can be considered as an effective tool for decision-making and policy implementation, especially, in terms of promoting environmentally friendly solutions for economic growth, toward net-zero emissions targets, since efficient mitigation policies can attract huge financial incentives to explore energy structure transition and low-carbon economic development.

Our study limitations concerned mostly the dataset series, which is available in an annual periodicity and for a limited time series period (19 years). This happened because Brazilian records are scarce and date back only a few years. Moreover, the analysis throughout the different Brazilian states could provide a comparison analysis, as well as a predictive future emission patterns on a smaller time scale, e.g., monthly basis, which could provide governments to implement public policies focusing to diminish the adverse impacts.

Finally, our work provides the following insights for future research in this field: to employ modeling approaches to analyze Brazilian emissions as a whole, as well as its different regions; to investigate health indicators; and to assess mitigation practices, in terms of cost–benefit and emission reduction, since they are little–discussed aspects and will bring fundamental contributions.

Author Contributions

Conceptualization, S.G.E.K.M., T.D.M. and D.D. Formal analysis, D.D. and T.D.M. Writing—original draft preparation, D.D. Writing—review and editing, S.G.E.K.M. and T.D.M. Supervision, S.G.E.K.M. and T.D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Grant Number 1808529 from the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), and by Grant Number 2018/26193-3 from the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on reasonable request from the corresponding author.

Conflicts of Interest

The authors declare no competing interests.

Appendix A

Table A1. Number of non-zero coefficients and R² of elastic-net final model.

Number of Non-Zero Coefficients	R²	Number of Non-Zero Coefficients	R²
4	0.9234	4	0.9157
4	0.9233	4	0.9144
4	0.9233	4	0.9129
4	0.9233	4	0.911
4	0.9233	4	0.9089
4	0.9233	4	0.9063
4	0.9233	4	0.9033
4	0.9233	4	0.8997
4	0.9232	4	0.8954
4	0.9232	4	0.8903
4	0.9232	4	0.8843
4	0.9231	4	0.8772
4	0.9231	4	0.8688
4	0.9231	4	0.8588
4	0.923	4	0.8471
4	0.9229	4	0.8333
4	0.9229	4	0.817
4	0.9228	4	0.798
4	0.9227	4	0.7757
4	0.9226	4	0.7496
4	0.9224	4	0.7193
4	0.9223	3	0.6868
4	0.9221	3	0.6504
4	0.9219	3	0.6087
4	0.9217	3	0.5612
4	0.9214	3	0.5071
4	0.9211	3	0.4457
4	0.9207	3	0.3765
4	0.9203	2	0.3064
4	0.9198	2	0.2516
4	0.9192	1	0.1916
4	0.9185	1	0.1322
4	0.9177	1	0.0682
4	0.9168	0	0

Table A2. Elastic-net final model. Coefficients values related to the largest R².

	Coefficients
Intercept	88,219,707
GDP	36,946,643
GDP²	4,234,893
EN	508,937.2
EN²	19,410,830

Figure A1. ANN design of ID47 model.

Table A3. Bias and weights of hidden layer of ID47 model.

Bias 1	Weights
−2.8754	4.7578	1.2974
2.2133	−1.5901	3.3818
−1.3108	0.111	−3.3552
0.6798	4.5322	−3.643
−3.2999	−3.8249	−0.9763
2.5992	3.9668	−1.2442

Table A4. Bias and weights of output layer of ID47 model.

Bias 2	Weights 2
0.3126	1.3675	−1.2193	−0.2508	−1.1435	1.1634	2.2346

References

Mardani, A.; Liao, H.; Nilashi, M.; Alrasheedi, M.; Cavallaro, F. A multi-stage method to predict carbon dioxide emissions using dimensionality reduction, clustering, and machine learning techniques. J. Clean. Prod. 2020, 275, 122942. [Google Scholar] [CrossRef]
Taconet, N.; Méjean, A.; Guivarch, C. Influence of climate change impacts and mitigation costs on inequality between countries. Clim. Change 2020, 160, 1–20. [Google Scholar] [CrossRef]
Salari, M.; Javid, R.J.; Noghanibehambari, H. The nexus between CO₂ emissions, energy consumption, and economic growth in the U.S. Econ. Anal. Policy 2021, 69, 182–194. [Google Scholar] [CrossRef]
Ghali, K.H.; El-Sakka, M.I.T. Energy use and output growth in Canada: A multivariate cointegration analysis. Energy Econ. 2004, 26, 225–238. [Google Scholar] [CrossRef]
Espíndola, I.B.; Ribeiro, W.C. Cidades e mudanças climáticas: Desafios para os planos diretores municipais brasileiros. Cad. Metrópole 2020, 22, 365–396. [Google Scholar] [CrossRef] [Green Version]
Lupi, V.; Marsiglio, S. Population growth and climate change: A dynamic integrated climate-economy-demography model. Ecol. Econ. 2021, 184, 107011. [Google Scholar] [CrossRef]
IEA. International Energy Agency. Available online: https://www.iea.org/countries (accessed on 1 March 2021).
Niu, D.; Wang, K.; Wu, J.; Sun, L.; Liang, Y.; Xu, X.; Yang, X. Can China achieve its 2030 carbon emissions commitment? Scenario analysis based on an improved general regression neural network. J. Clean. Prod. 2020, 243, 118558. [Google Scholar] [CrossRef]
Ali, S.; Ying, L.; Anjum, R.; Nazir, A.; Shalmani, A.; Shah, T.; Shah, F. Analysis on the nexus of CO₂ emissions, energy use, net domestic credit, and GDP in Pakistan: An ARDL bound testing analysis. Environ. Sci. Pollut. Res. 2021, 28, 4594–4614. [Google Scholar] [CrossRef]
Li, Q.; Wu, S.; Lei, Y.; Li, S.; Li, L. Evolutionary path and driving forces of inter-industry transfer of CO₂ emissions in China: Evidence from structural path and decomposition analysis. Sci. Total Environ. 2020, 765, 142773. [Google Scholar] [CrossRef]
Cui, H.; Wu, R.; Zhao, T. Sustainable Development Study on an Energy-Economic-Environment System Based on a Vector Autoregression Model in Shanxi, China. Pol. J. Environ. Stud. 2019, 28, 1623–1635. [Google Scholar] [CrossRef]
Ummalla, M.; Samal, A. The impact of natural gas and renewable energy consumption on CO₂ emissions and economic growth in two major emerging market economies. Environ. Sci. Pollut. Res. 2019, 26, 20893–20907. [Google Scholar] [CrossRef]
Gong, B.; Zheng, X.; Guo, Q.; Ordieres-Meré, J. Discovering the patterns of energy consumption, GDP, and CO₂ emissions in China using the cluster method. Energy 2019, 166, 1149–1167. [Google Scholar] [CrossRef]
Zhao, H.; Huang, G.; Yan, N. Forecasting Energy-Related CO₂ Emissions Employing a Novel SSA-LSSVM Model: Considering Structural Factors in China. Energies 2018, 11, 781. [Google Scholar] [CrossRef] [Green Version]
Yang, L.; Xia, H.; Zhang, X.; Yuan, S. What matters for carbon emissions in regional sectors? A China study of extended STIRPAT model. J. Clean. Prod. 2018, 180, 595–602. [Google Scholar] [CrossRef]
Sun, C.; Zhang, F.; Xu, M. Investigation of pollution haven hypothesis for China: An ARDL approach with breakpoint unit root tests. J. Clean. Prod. 2017, 161, 153–164. [Google Scholar] [CrossRef]
Riti, J.S.; Song, D.; Shu, Y.; Kamah, M. Decoupling CO₂ emission and economic growth in China: Is there consistency in estimation results in analyzing environmental Kuznets curve? J. Clean. Prod. 2017, 166, 1448–1461. [Google Scholar] [CrossRef]
Song, J.; Zhang, K.; Cao, Z. 3Es System Optimization under Uncertainty Using Hybrid Intelligent Algorithm: A Fuzzy Chance-Constrained Programming Model. Sci. Program. 2016, 1–13. [Google Scholar] [CrossRef] [Green Version]
Chang, N. Changing industrial structure to reduce carbon dioxide emissions: A Chinese application. J. Clean. Prod. 2015, 103, 40–48. [Google Scholar] [CrossRef]
Gupta, D.; Ghersi, F.; Vishwanathan, S.S.; Garg, A. Achieving sustainable development in India along low carbon pathways: Macroeconomic assessment. World Dev. 2019, 123, 104623. [Google Scholar] [CrossRef]
Akalpler, E.; Hove, S. Carbon emissions, energy use, real GDP per capita and trade matrix in the Indian economy-an ARDL approach. Energy 2019, 168, 1081–1093. [Google Scholar] [CrossRef]
Baležentis, T.; Streimikiene, D.; Zhang, T.; Liobikiene, G. The role of bioenergy in greenhouse gas emission reduction in EU countries: An Environmental Kuznets Curve modelling. Resour. Conserv. Recycl. 2019, 142, 225–231. [Google Scholar] [CrossRef]
Cucchiella, F.; D’Adamo, I.; Gastaldi, M.; Miliacca, M. Efficiency and allocation of emission allowances and energy consumption over more sustainable European economies. J. Clean. Prod. 2018, 182, 805–817. [Google Scholar] [CrossRef]
Lazăr, D.; Minea, A.; Purcel, A.-A. Pollution and economic growth: Evidence from Central and Eastern European countries. Energy Econ. 2019, 81, 1121–1131. [Google Scholar] [CrossRef]
Obradović, S.; Lojanica, N. Does environmental quality reflect on national competitiveness? The evidence from EU-15. Energy Environ. 2019, 30, 559–585. [Google Scholar] [CrossRef]
SEEG. System for Estimating Greenhouse Gas Emissions (SEEG) Database. Available online: http://seeg.eco.br/ (accessed on 1 March 2021).
STATISTA. Statista Database. Available online: https://www.statista.com/statistics/270499/co2-emissions-in-selected-countries/ (accessed on 1 March 2021).
Azevedo-Ramos, C.; Moutinho, P. No man’s land in the Brazilian Amazon: Could undesignated public forests slow Amazon deforestation? Land Use Policy 2018, 73, 125–127. [Google Scholar] [CrossRef]
Silva Junior, C.H.L.; Heinrich, V.H.A.; Freire, A.T.G.; Broggio, I.S.; Rosan, T.M.; Doblas, J.; Anderson, L.O.; Rousseau, G.X.; Shimabukuro, Y.E.; Silva, C.A.; et al. Benchmark maps of 33 years of secondary forest age for Brazil. Sci. Data 2020, 7, 269. [Google Scholar] [CrossRef] [PubMed]
Carvalho, N.B.; Berrêdo Viana, D.; Muylaert de Araújo, M.S.; Lampreia, J.; Gomes, M.S.P.; Freitas, M.A.V. How likely is Brazil to achieve its NDC commitments in the energy sector? A review on Brazilian low-carbon energy perspectives. Renew. Sustain. Energy Rev. 2020, 133, 110343. [Google Scholar] [CrossRef]
Barbosa, L.G.; Alves, M.A.S.; Grelle, C.E.V. Actions against sustainability: Dismantling of the environmental policies in Brazil. Land Use policy 2021, 104, 105384. [Google Scholar] [CrossRef]
Pereira, E.J.D.A.L.; de Santana Ribeiro, L.C.; da Silva Freitas, L.F.; de Barros Pereira, H.B. Brazilian policy and agribusiness damage the Amazon rainforest. Land Use Policy 2020, 92, 104491. [Google Scholar] [CrossRef]
INPE. Queimadas. Available online: https://queimadas.dgi.inpe.br/queimadas/aq1km/ (accessed on 1 March 2021).
Leite, V.P.; Debone, D.; Miraglia, S.G.E.K. Emissões de gases de efeito estufa no estado de São Paulo: Análise do setor de transportes e impactos na saúde. VITTALLE Rev. Ciências Saúde 2020, 32, 143–153. [Google Scholar] [CrossRef]
Cosmas, N.C.; Chitedze, I.; Mourad, K.A. An econometric analysis of the macroeconomic determinants of carbon dioxide emissions in Nigeria. Sci. Total Environ. 2019, 675, 313–324. [Google Scholar] [CrossRef]
Jebli, M.B.; Farhani, S.; Guesmi, K. Renewable energy, CO₂ emissions and value added: Empirical evidence from countries with different income levels. Struct. Chang. Econ. Dyn. 2020, 53, 402–410. [Google Scholar] [CrossRef]
Bayar, Y.; Sasmaz, M.U.; Ozkaya, M.H. Impact of Trade and Financial Globalization on Renewable Energy in EU Transition Economies: A Bootstrap Panel Granger Causality Test. Energies 2021, 14, 19. [Google Scholar] [CrossRef]
Piłatowska, M.; Geise, A. Impact of Clean Energy on CO₂ Emissions and Economic Growth within the Phases of Renewables Diffusion in Selected European Countries. Energies 2021, 14, 812. [Google Scholar] [CrossRef]
Zamil, A.M.A.; Furqan, M.; Mahmood, H. Trade openness and CO₂ emissions nexus in Oman. Entrep. Sustain. Issues 2019, 7, 1319–1329. [Google Scholar] [CrossRef] [Green Version]
Leal, P.H.; Marques, A.C.; Fuinhas, J.A. How economic growth in Australia reacts to CO₂ emissions, fossil fuels and renewable energy consumption. Int. J. Energy Sect. Manag. 2018, 12, 696–713. [Google Scholar] [CrossRef] [Green Version]
Miao, L.; Gu, H.; Zhang, X.; Zhen, W.; Wang, M. Factors causing regional differences in China’s residential CO₂ emissions—evidence from provincial data. J. Clean. Prod. 2019, 224, 852–863. [Google Scholar] [CrossRef]
Zhang, S.; Zhao, T. Identifying major influencing factors of CO₂ emissions in China: Regional disparities analysis based on STIRPAT model from 1996 to 2015. Atmos. Environ. 2019, 207, 136–147. [Google Scholar] [CrossRef]
Lohwasser, J.; Schaffer, A.; Brieden, A. The role of demographic and economic drivers on the environment in traditional and standardized STIRPAT analysis. Ecol. Econ. 2020, 178, 106811. [Google Scholar] [CrossRef]
Polloni-Silva, E.; Silveira, N.; Ferraz, D.; de Mello, D.S.; Moralles, H.F. The drivers of energy-related CO₂ emissions in Brazil: A regional application of the STIRPAT model. Environ. Sci. Pollut. Res. 2021, 28, 51745–51762. [Google Scholar] [CrossRef]
Ghazali, A.; Ali, G. Investigation of key contributors of CO₂ emissions in extended STIRPAT model for newly industrialized countries: A dynamic common correlated estimator (DCCE) approach. Energy Rep. 2019, 5, 242–252. [Google Scholar] [CrossRef]
Ahmadi, M.H.; Jashnani, H.; Chau, K.-W.; Kumar, R.; Rosen, M.A. Carbon dioxide emissions prediction of five Middle Eastern countries using artificial neural networks. Energy Sources Part A Recover. Util. Environ. Eff. 2019, 1–13. [Google Scholar] [CrossRef]
Acheampong, A.O.; Boateng, E.B. Modelling carbon emission intensity: Application of artificial neural network. J. Clean. Prod. 2019, 225, 833–856. [Google Scholar] [CrossRef]
Guo, D.; Chen, H.; Long, R. Can China fulfill its commitment to reducing carbon dioxide emissions in the Paris Agreement? Analysis based on a back-propagation neural network. Environ. Sci. Pollut. Res. 2018, 25, 27451–27462. [Google Scholar] [CrossRef] [PubMed]
Hosseini, S.M.; Saifoddin, A.; Shirmohammadi, R.; Aslani, A. Forecasting of CO₂ emissions in Iran based on time series and regression analysis. Energy Rep. 2019, 5, 619–631. [Google Scholar] [CrossRef]
Bakay, M.S.; Ağbulut, Ü. Electricity production based forecasting of greenhouse gas emissions in Turkey with deep learning, support vector machine and artificial neural network algorithms. J. Clean. Prod. 2021, 285, 125324. [Google Scholar] [CrossRef]
Bamisile, O.; Obiora, S.; Huang, Q.; Yimen, N.; Abdelkhalikh Idriss, I.; Cai, D.; Dagbasi, M. Impact of economic development on CO₂ emission in Africa; the role of BEVs and hydrogen production in renewable energy integration. Int. J. Hydrogen Energy 2021, 46, 2755–2773. [Google Scholar] [CrossRef]
Debone, D.; Leite, V.P.; Miraglia, S.G.E.K. Modelling approach for carbon emissions, energy consumption and economic growth: A systematic review. Urban Clim. 2021, 37, 100849. [Google Scholar] [CrossRef]
Tadano, Y.S.; Potgieter-Vermaak, S.; Kachba, Y.R.; Chiroli, D.M.G.; Casacio, L.; Santos-Silva, J.C.; Moreira, C.A.B.; Machado, V.; Alves, T.A.; Siqueira, H.; et al. Dynamic model to predict the association between air quality, COVID-19 cases, and level of lockdown. Environ. Pollut. 2020, 268, 115920. [Google Scholar] [CrossRef]
Araujo, L.N.; Belotti, J.T.; Alves, T.A.; de Souza Tadano, Y.; Siqueira, H. Ensemble method based on Artificial Neural Networks to estimate air pollution health risks. Environ. Model. Softw. 2020, 123, 104567. [Google Scholar] [CrossRef]
Kachba, Y.; Chiroli, D.M.D.G.; Belotti, J.T.; Antonini Alves, T.; de Souza Tadano, Y.; Siqueira, H. Artificial Neural Networks to Estimate the Influence of Vehicular Emission Variables on Morbidity and Mortality in the Largest Metropolis in South America. Sustainability 2020, 12, 2621. [Google Scholar] [CrossRef] [Green Version]
BEESP. Energy balance of the state of São Paulo 2019/Secretariat of Infrastructure and Environment. Available online: http://dadosenergeticos.energia.sp.gov.br/ (accessed on 1 March 2021).
SEADE. State Data Analysis System. Available online: https://www.seade.gov.br/institucional/ (accessed on 1 March 2021).
IPEA. Câmbio—Ipeadata. Available online: http://www.ipeadata.gov.br (accessed on 1 March 2021).
Kunimoto, S.Y.; Boson, D.S.; De Oliveira, M.A.C.; Mendes, D.R.F. Economic development’s impact on CO₂ emissions: An application of the kuznets environmental curve for mato grosso do sul. Veredas Direito 2018, 15, 321–345. [Google Scholar] [CrossRef]
Ratanavaraha, V.; Jomnonkwao, S. Trends in Thailand CO₂ emissions in the transportation sector and Policy Mitigation. Transp. Policy 2015, 41, 136–146. [Google Scholar] [CrossRef]
Ardakani, M.K.; Seyedaliakbar, S.M. Impact of energy consumption and economic growth on CO₂ emission using multivariate regression. Energy Strateg. Rev. 2019, 26, 100428. [Google Scholar] [CrossRef]
Lee, J.; Kwon, H.-B. Progressive performance modeling for the strategic determinants of market value in the high-tech oriented SMEs. Int. J. Prod. Econ. 2017, 183, 91–102. [Google Scholar] [CrossRef]
Midi, H.; Sarkar, S.K.; Rana, S. Collinearity diagnostics of binary logistic regression model. J. Interdiscip. Math. 2010, 13, 253–267. [Google Scholar] [CrossRef]
Lim, H.; Ahn, S.; Lee, K.S.; Han, J.; Shim, Y.M.; Woo, S.; Kim, J.-H.; Yie, M.; Lee, H.Y.; Yi, C.A. Persistent Pure Ground-Glass Opacity Lung Nodules ≥ 10 mm in Diameter at CT Scan. Chest 2013, 144, 1291–1299. [Google Scholar] [CrossRef]
Al-Musharaf, S. Prevalence and Predictors of Emotional Eating among Healthy Young Saudi Women during the COVID-19 Pandemic. Nutrients 2020, 12, 2923. [Google Scholar] [CrossRef] [PubMed]
Fox, J.; Friendly, G.G.; Graves, S.; Heiberger, R.; Monette, G.; Nilsson, H.; Ripley, B.; Weisberg, S.; Fox, M.J.; Suggests, M. The car package. R Found. Stat. Comput. 2007. Available online: http://ftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/car.pdf (accessed on 10 December 2021).
Pao, H.-T.; Tsai, C.-M. Multivariate Granger causality between CO₂ emissions, energy consumption, FDI (foreign direct investment) and GDP (gross domestic product): Evidence from a panel of BRIC (Brazil, Russian Federation, India, and China) countries. Energy 2011, 36, 685–693. [Google Scholar] [CrossRef]
Begum, R.A.; Sohag, K.; Abdullah, S.M.S.; Jaafar, M. CO₂ emissions, energy consumption, economic and population growth in Malaysia. Renew. Sustain. Energy Rev. 2015, 41, 594–601. [Google Scholar] [CrossRef]
Kuhn, M.; Wickham, H. Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles. 2020. Available online: https://www.tidymodels.org (accessed on 1 March 2021).
Khusfi, E.Z.; Roustaei, F.; Ebrahimi Khusfi, M.; Naghavi, S. Investigation of the relationship between dust storm index, climatic parameters, and normalized difference vegetation index using the ridge regression method in arid regions of Central Iran. Arid Land Res. Manag. 2020, 34, 239–263. [Google Scholar] [CrossRef]
Shahid, N.; Shah, M.A.; Khan, A.; Maple, C.; Jeon, G. Towards Greener Smart Cities and Road Traffic Forecasting Using Air Pollution Data. Sustain. Cities Soc. 2021, 72, 103062. [Google Scholar] [CrossRef]
Melkumova, L.E.; Shatskikh, S.Y. Comparing Ridge and LASSO estimators for data analysis. Procedia Eng. 2017, 201, 746–755. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. Shrinkage Methods. In An Introduction to Statistical Learning; Springer Texts in Statistics; Springer International Publishing: Cham, Switzerland, 2013; Volume 103, pp. 214–222. ISBN 978-1-4614-7137-0. [Google Scholar]
Shahbazi, H.; Karimi, S.; Hosseini, V.; Yazgi, D.; Torbatian, S. A novel regression imputation framework for Tehran air pollution monitoring network using outputs from WRF and CAMx models. Atmos. Environ. 2018, 187, 24–33. [Google Scholar] [CrossRef]
Al-Jawarneh, A.S.; Ismail, M.T.; Awajan, A.M.; Alsayed, A.R.M. Improving accuracy models using elastic net regression approach based on empirical mode decomposition. Commun. Stat. Simul. Comput. 2020, 1–20. [Google Scholar] [CrossRef]
Cho, S.; Kim, H.; Oh, S.; Kim, K.; Park, T. Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis. BMC Proc. 2009, 3, S25. [Google Scholar] [CrossRef] [Green Version]
Stetco, A.; Dinmohammadi, F.; Zhao, X.; Robu, V.; Flynn, D.; Barnes, M.; Keane, J.; Nenadic, G. Machine learning methods for wind turbine condition monitoring: A review. Renew. Energy 2019, 133, 620–635. [Google Scholar] [CrossRef]
Romano, A.V.C.; Martins, T.D.; Maciel, R.; De Paula, E.V.; Annichino-Bizzacchi, J.M. Artificial Neural Network for Prediction of Venous Thrombosis Recurrence. Blood 2016, 128, 3771. [Google Scholar] [CrossRef]
Mosavi, A.; Ardabili, S.F.; Shamshirband, S. Demand Prediction with Machine Learning Models; State of the Art and a Systematic Review of Advances. 2019, 1–21. [Google Scholar] [CrossRef]
Haykin, S. Neural Networks—A Comprehensive Foundation, 2nd ed.; Prentice Hall PTR: Upper Saddle River/Hoboken, NJ, USA, 2004; ISBN 978-0-13-273350-2. [Google Scholar]
Heidari, E.; Sobati, M.A.; Movahedirad, S. Accurate prediction of nanofluid viscosity using a multilayer perceptron artificial neural network (MLP-ANN). Chemom. Intell. Lab. Syst. 2016, 155, 73–85. [Google Scholar] [CrossRef]
Melo, E.B.; Oliveira, E.T.; Martins, T.D. A neural network correlation for molar density and specific heat of water: Predictions at pressures up to 100 MPa. Fluid Phase Equilib. 2020, 506, 112411. [Google Scholar] [CrossRef]
CETESB. QUALAR. Sistema de Informações da Qualidade do Ar. Available online: https://qualar.cetesb.sp.gov.br/qualar/home.do (accessed on 1 March 2021).
Brazilian Institute of Statistics and Geography. Estimativa da População—Diretoria de Pesquisas, Coordenação de População e Indicadores Sociais. Available online: https://www.ibge.gov.br/cidades-e-estados.html (accessed on 1 March 2021).
Barbosa, F.D.H. A crise econômica de 2014/2017. Estud. Avançados 2017, 31, 51–60. [Google Scholar] [CrossRef]
Cherni, A.; Jouini, S.E. An ARDL approach to the CO₂ emissions, renewable energy and economic growth nexus: Tunisian evidence. Int. J. Hydrogen Energy 2017, 42, 29056–29066. [Google Scholar] [CrossRef]
Bai, Y.; Deng, X.; Gibson, J.; Zhao, Z.; Xu, H. How does urbanization affect residential CO₂ emissions? An analysis on urban agglomerations of China. J. Clean. Prod. 2019, 209, 876–885. [Google Scholar] [CrossRef]
Nardoto, G.B.; Sena-Souza, J.P.; Kisaka, T.B.; Costa, F.J.V.; Duarte-Neto, P.J.; Ehleringer, J.; Martinelli, L.A. Increased in carbon isotope ratios of Brazilian fingernails are correlated with increased in socioeconomic status. NPJ Sci. Food 2020, 4, 9. [Google Scholar] [CrossRef]
Cruz, T.; Schaeffer, R.; Lucena, A.F.P.; Melo, S.; Dutra, R. Solar water heating technical-economic potential in the household sector in Brazil. Renew. Energy 2020, 146, 1618–1639. [Google Scholar] [CrossRef]
Zaidi, I. Examining the relationship between economic growth, energy consumption and CO₂ emission using inverse function regression. Appl. Ecol. Environ. Res. 2017, 15, 473–484. [Google Scholar] [CrossRef]
Harrell, F.E. Regression Modeling Strategies, 2nd ed.; Springer Series in Statistics; Springer International Publishing: Cham, Switzerland, 2015; Volume 3, ISBN 978-3-319-19424-0. [Google Scholar]
Keith, T.Z. Multiple Regression and Beyond: An Introduction to Multiple Regression and Structural Equation Modeling, 3rd ed.; Routledge: London, UK, 2019; ISBN 1315162342. [Google Scholar]
Shrestha, N. Detecting multicollinearity in regression analysis. Am. J. Appl. Math. Stat. 2020, 8, 39–42. [Google Scholar] [CrossRef]
Abu Sheha, M.A.; Tsokos, C.P. Statistical Modeling of Emission Factors of Fossil Fuels Contributing to Atmospheric Carbon Dioxide in Africa. Atmos. Clim. Sci. 2019, 9, 438–455. [Google Scholar] [CrossRef] [Green Version]
Ma, J.; Cheng, J.C.P. Estimation of the building energy use intensity in the urban scale by integrating GIS and big data technology. Appl. Energy 2016, 183, 182–192. [Google Scholar] [CrossRef]
Briesch, R.; Rajagopal, P. Neural network applications in consumer behavior. J. Consum. Psychol. 2010, 20, 381–389. [Google Scholar] [CrossRef]
Larasati, A.; DeYong, C.; Slevitch, L. Comparing Neural Network and Ordinal Logistic Regression to Analyze Attitude Responses. Serv. Sci. 2011, 3, 304–312. [Google Scholar] [CrossRef] [Green Version]
Alkinani, H.H.; Al-Hameedi, A.T.T.; Dunn-Norman, S.; Lian, D. Application of artificial neural networks in the drilling processes: Can equivalent circulation density be estimated prior to drilling? Egypt. J. Pet. 2020, 29, 121–126. [Google Scholar] [CrossRef]
Rezaei, M.H.; Sadeghzadeh, M.; Alhuyi Nazari, M.; Ahmadi, M.H.; Astaraei, F.R. Applying GMDH artificial neural network in modeling CO₂ emissions in four nordic countries. Int. J. Low-Carbon Technol. 2018, 13, 266–271. [Google Scholar] [CrossRef] [Green Version]
Xu, G.; Schwarz, P.; Yang, H. Determining China’s CO₂ emissions peak with a dynamic nonlinear artificial neural network approach and scenario analysis. Energy Policy 2019, 128, 752–762. [Google Scholar] [CrossRef]
Magazzino, C.; Mele, M.; Schneider, N. A machine learning approach on the relationship among solar and wind energy production, coal consumption, GDP, and CO₂ emissions. Renew. Energy 2021, 167, 99–115. [Google Scholar] [CrossRef]
Magazzino, C.; Mele, M.; Sarkodie, S.A. The nexus between COVID-19 deaths, air pollution and economic growth in New York state: Evidence from Deep Machine Learning. J. Environ. Manag. 2021, 286, 112241. [Google Scholar] [CrossRef]
D’Adamo, I.; Rosa, P. How Do You See Infrastructure? Green Energy to Provide Economic Growth after COVID-19. Sustainability 2020, 12, 4738. [Google Scholar] [CrossRef]
Debone, D.; da Costa, M.V.; Miraglia, S.G.E.K. 90 Days of COVID-19 Social Distancing and Its Impacts on Air Quality and Health in Sao Paulo, Brazil. Sustainability 2020, 12, 7440. [Google Scholar] [CrossRef]
Giudice, F.; Caferra, R.; Morone, P. COVID-19, the Food System and the Circular Economy: Challenges and Opportunities. Sustainability 2020, 12, 7939. [Google Scholar] [CrossRef]

Figure 1. The study design scheme.

Figure 2. Schematic representation of ANN model with one hidden layer.

Figure 3. Descriptive analysis of the variables. (A) GDP (green line), CO₂ (black line) and EN (red line) time series. (B) Pearson’s correlation analysis. Significance level: *** p < 0.001.

Figure 4. Elastic-net regression model outcomes. Importance level of the significant variables.

Figure 5. Cross-validation metrics of the best MLP models. (A) R-Squared values of train and validation phases. (B) MAPE values of train and validation phases. Boxplots with gray borders refer to the train phase and those with black borders to validation phase. * indicates significant difference between models 47 and 98. ^# indicates significant difference among 65 and 98. ^x represents the mean of values. Data were analyzed by one-way ANOVA followed by Bonferroni post-test. Significance level: *^{, #} p < 0.05 and **^{, ##} p < 0.01.

Figure 6. CO₂ predictions using elastic-net regression (dotted blue line), ID3 MLR (dotted green line) and ID47 MLP neural network (dotted red line) models. The black line represents the original dataset of CO₂ emissions.

Figure 7. APE distribution for each model. ID3 MLR model, elastic-net regression, and ID47 MLP neural network were represented by green, blue, and red bars, respectively.

Table 1. Summary of the different trained architectures.

Performed Tests
Structures	The number of neurons of hidden layer varied from 1 to 20
Algorithms	Levenberg–Marquardt (trainlm) and Scaled Conjugate Gradient (trainscg)
Activation functions	linear (purelin), hyperbolic tangent sigmoid (tansig) and logarithmic sigmoid (logsig)
Total trained architectures	156

Table 2. MLR model analysis.

ID	Predictors	β₀	β₁	β₂	β₃	β₄	Adjusted-R²	MAPE (%)
1	GDP + EN	9.28 × 10⁷ **	5.75 × 10⁻⁵ **	-	−5.32 × 10⁻¹	-	0.68	5.50
2	GDP² + EN²	7.87 × 10⁷ ***	-	4.58 × 10⁻¹⁷ **	-	−1.16 × 10⁻⁹	0.75	4.79
3	GDP + EN + EN²	4.84 × 10⁸ ***	4.59 × 10⁻⁵ ***	-	−1.36 × 10 ***	1.08 × 10⁻⁷ ***	0.91	2.68
4	GDP + GDP² + EN	4.83 × 10⁷ *	−1.36 × 10⁻⁴ *	1.41 × 10⁻¹⁶ **	1.02	-	0.81	4.01
5	GDP + GDP² + EN + EN²	4.43 × 10⁸ ***	1.81 × 10⁻⁵	2.10× 10⁻¹⁷	−1.22 × 10 **	9.85 × 10⁻⁸ **	0.90	2.66

CO₂ as the outcome variable. Significance level: * p < 0.05, ** p < 0.01 and *** p < 0.001.

Table 3. Metrics of the elastic-net regression model.

Cross Validation (Train)		Test		Predictions
RMSE	R²	RMSE	R²	RMSE	MAPE	R²
4.44 (±0.78)	0.913 (±0.07)	4.65	0.868	2.95	2.67%	0.923

RMSE expressed as millions of tons of CO₂.

Table 4. Metrics of the three best MLP neural network models.

Models’ characteristics				Train			Validation			Test
#ID	Algorithm	Structure	AF	R²	MAPE (%)	OF	R²	MAPE (%)	OF	R²	MAPE (%)	OF
47	trainscg	2-6-1	tansig; purelin	0.988	1.05	3.09 × 10¹²	1.000	0.23	6.98 × 10¹⁰	1.000	0.04	1.76 × 10⁹
65	trainscg	2-8-1	tansig; purelin	0.987	1.18	3.21 × 10¹²	0.997	0.48	2.06 × 10¹¹	0.998	0.74	5.00 × 10¹¹
98	trainlm	2-15-1	logsig; purelin	0.960	1.97	8.56 × 10¹²	0.999	2.46	7.32 × 10¹²	0.999	0.37	1.10 × 10¹¹

AF: activation function; OF: objective function; Structure: number of neurons in the input layer—number of neurons in the hidden layer—number of neurons in the output layer.

Table 5. Metrics of CO₂ predictions of the analyzed models.

	R²	APE (Max)	APE (Median)	MAPE (%)
ID47 MLP ANN	1.00	5.48	0.13	0.76 ***
Elastic-net Regression	0.92	9.34	1.74	2.67
ID3 MLR model	0.91	9.63	1.34	2.68

MAPE values were analyzed by Mann–Whitney test. Significance level: *** p < 0.001.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Debone, D.; Martins, T.D.; Miraglia, S.G.E.K. Modeling Carbon Release of Brazilian Highest Economic Pole and Major Urban Emitter: Comparing Classical Methods and Artificial Neural Networks. Climate 2022, 10, 9. https://doi.org/10.3390/cli10010009

AMA Style

Debone D, Martins TD, Miraglia SGEK. Modeling Carbon Release of Brazilian Highest Economic Pole and Major Urban Emitter: Comparing Classical Methods and Artificial Neural Networks. Climate. 2022; 10(1):9. https://doi.org/10.3390/cli10010009

Chicago/Turabian Style

Debone, Daniela, Tiago Dias Martins, and Simone Georges El Khouri Miraglia. 2022. "Modeling Carbon Release of Brazilian Highest Economic Pole and Major Urban Emitter: Comparing Classical Methods and Artificial Neural Networks" Climate 10, no. 1: 9. https://doi.org/10.3390/cli10010009

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Carbon Release of Brazilian Highest Economic Pole and Major Urban Emitter: Comparing Classical Methods and Artificial Neural Networks

Abstract

1. Introduction

1.1. Global Carbon Emissions and Economic Indicators

1.2. Literature Review

1.3. Purpose of the Study

2. Materials and Methods

3. Results and Discussion

3.1. Descriptive Analysis and Multivariate Linear Regression Models

3.2. Elastic-Net Regression Method

3.3. Multilayer Perceptron Neural Network

3.4. Comparative Analysis

4. Conclusions and Policy Implications

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI