Modeling Sulphur Dioxide (SO2) Quality Levels of Jeddah City Using Machine Learning Approaches with Meteorological and Chemical Factors

Alamoudi, Mohammed; Taylan, Osman; Keshtegar, Behrooz; Abusurrah, Mona; Balubaid, Mohammed

doi:10.3390/su142316291

Open AccessArticle

Modeling Sulphur Dioxide (SO₂) Quality Levels of Jeddah City Using Machine Learning Approaches with Meteorological and Chemical Factors

by

Mohammed Alamoudi

^1,*

,

Osman Taylan

^1,2

,

Behrooz Keshtegar

³,

Mona Abusurrah

⁴ and

Mohammed Balubaid

¹

Department of Industrial Engineering, Faculty of Engineering, King Abdulaziz University, P.O. Box 80204, Jeddah 21589, Saudi Arabia

²

Department of Industrial Engineering, OSTIM Technical University, Ankara 06374, Türkiye

³

School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

⁴

Department of Management Information Systems, College of Business Administration, Taibah University, P.O. Box 344, Al-Madinah 42353, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(23), 16291; https://doi.org/10.3390/su142316291

Submission received: 11 October 2022 / Revised: 12 November 2022 / Accepted: 2 December 2022 / Published: 6 December 2022

Download

Browse Figures

Versions Notes

Abstract

:

Modeling air quality in city centers is essential due to environmental and health-related issues. In this study, machine learning (ML) approaches were used to approximate the impact of air pollutants and metrological parameters on SO₂ quality levels. The parameters, NO, NO₂, O₃, PM10, RH, HyC, T, and P are significant factors affecting air pollution in Jeddah city. These factors were considered as the input parameters of the ANNs, MARS, SVR, and Hybrid model to determine the effect of those factors on the SO₂ quality level. Hence, ANN was employed to approximate the nonlinear relation between SO₂ and input parameters. The MARS approach has successful applications in air pollution predictions as an ML tool, employed in this study. The SVR approach was used as a nonlinear modeling tool to predict the SO₂ quality level. Furthermore, the MARS and SVR approaches were integrated to develop a novel hybrid modeling scheme for providing a nonlinear approximation of SO₂ concentration. The main innovation of this hybrid approach applied for predicting the SO₂ quality levels is to develop an efficient approach and reduce the time-consuming calibration processes. Four comparative statistical considerations, MAE, RMSE, NSE, and d, were applied to measure the accuracy and tendency. The hybrid SVR model outperforms the other models with the lowest RMSE and MAE, and the highest d and NSE in testing and training processes.

Keywords:

air quality; sulphur dioxide pollution; machine learning; artificial neural networks (ANN); support vector regression (SVR); multivariate adaptive regression spline (MRS); environmental conditions

1. Introduction

Worldwide, air pollution is considered a significant danger resulting from things such as soil, traffic, suspended particles, and industrial activities. Air pollution generates long-term adverse effects, such as acidic rain, global warming, and other risks associated with environment and human health [1]. Air pollutants consist of gaseous pollutants and particular matters suspended in the air [2]. Although the concertation of these pollutants has been regulated internationally, they still act as a danger to human and environmental health [3]. One of the most critical gaseous pollutants is Sulphur Dioxide (SO₂), which is a toxic gas resulting from the emissions of fossil fuel combustions [4]. In urban areas, SO₂ originates from thermal power plants and motor vehicles [5]. The city of Jeddah is the second major city in Saudi Arabia, with a population of almost four million people. Jeddah is experiencing fast growth in population, urbanization, and industrialization during the last few years [6]. It is an important commercial city in Saudi Arabia and serves as the main gate for Muslim pilgrims traveling to Mecca. Even with the city’s cultural and economic importance, there is still a shortage of information about air pollution. Several researchers found that high exposure to polluted air is related to mortality risks [7,8]. Worldwide awareness of air pollution has been increasing because of its great risk to the environment. According to the World Health Organization (WHO), the concentration of air pollution has increased by 8% each year during the previous five years [9,10].

As a result, researchers presented several forecasting models that estimate the level of air pollutants to foresee the threat before it occurs. The demand for such models has increased significantly, particularly with the growing demand for early warning systems which will assist in taking immediate and preventive steps to minimize pollution when circumstances producing air pollutants are foreseen [11]. Moreover, forecasting the level of SO₂ concentration will help identify the areas that exceed the standards, set measures to meet the standards, and predict the economic and health consequences.

As an ML approach, ANNs can be used to approximate the nonlinear relation between the SO₂ and input variables. The multilayer perceptron (MLPNN) is employed for successful applications to provide nonlinear relations for predicting air pollution. In this approach, all input variables are quoted in the input layer, even as the output layer has one node to predict the SO₂. Hence, MLPNN is called a nonlinear mapping process which is employed as an ML approach to set the relations between input variables and output response. On the other hand, pollution forecasting is a time series prediction that involves the use of some models, among which support vector regression (SVR) and multivariate adaptive regression spline (MARS) are well-known modeling techniques in atmospheric science because of their statistical accuracy [12,13]. Support vector regression (SVR) is one of the most popular and widely used algorithms for dealing with classification problems and regression analysis in machine learning (ML) applications. This algorithm recognizes non-linearity in the data and provides an appropriate forecasting model. Researchers found that the SVR model encapsulates the key idea of statistical learning theory in order to provide effective forecasting of air pollutants in the urban region [11,13]. Moreover, a recent study used SVR to forecast the level of CO in the atmosphere [14]. It was concluded that the SVR had less uncertainty in forecasting the pollutants’ quality level than any of the other models. Additionally, it was found that SVR is considered a superior forecasting tool in predicting the level of air quality [15]. Moreover, several researchers found that SVR and related hybrid models were appropriate scientific tools for forecasting time series trends such as air pollution in the future [16,17,18].

Similarly, MARS is a type of regression analysis tool. In particular, a non-parametric regression technique with multivariate and nonlinear properties, which has an acceptable approach for the selection of an effective input variable in the prediction process of a complex nonlinear event such as air pollution response, i.e., SO₂ can be used to model nonlinearities and interactions between variables. Non-parametric regression is a type of regression analysis in which the predictor does not take a predetermined form but is built based on data information. That is, a nonparametric form is assumed for determining the relations between independent and dependent variables. Nonparametric regression necessitates have larger sample sizes than parametric regression because the data must supply both the model structure and the model estimates. In the MARS modeling process, the degrees of effect as sensitivity analysis for different variables can be evaluated [19,20]. MARS is used throughout to determine the input variables of the nonlinear prediction parameters [21]. The algorithm finds a set of simple linear functions to aggregate results in order to achieve the best predictive performance. MARS is thus a type of ensemble of simple linear functions that can perform well on difficult regression problems with many input variables and non-linear relationships.

The MARS and SVR together could be applied to forecast complex data, such as the stock price model [22], stock index [23], electricity demand [17], and time series short-term wind prediction [24]. The MARS technique could be used to determine the best input variables for the SVR model to build the SVR models as a nonlinear prediction function. The main contribution of this paper is an accurate prediction for SO₂ levels using nonlinear predictors such as ANN, SVR, and MARS. Recently, ANN has been used in complex nonlinear relations approximation process [25,26]. The best input variables from metrological and chemical data points are a gap for evaluating the SVR in this complex nonlinear problem. Consequently, the MARS is combined with SVR as a novel modeling approach for the accurate prediction of SO₂ as an air quality parameter presented in the current work. Therefore, the MARS was utilized to give the best input variables as SVR candidates, while building an SVR model. The performances of the proposed hybrid SVR-MARS for the prediction of SO₂ are compared using different statistical tools, including root mean square error for evaluating the accuracy, agreement index for evaluating tendency, and the standard deviation of errors for evaluating the uncertainties. A recent study developed an adaptive neuro-fuzzy inference system (ANFIS) approach to determine the impact of important atmospheric situations on air quality and arising problems due to the ozone level and suspended PM10 concentration in the city of Jeddah [27]. Artificial Intelligence approaches are successfully utilized for nonlinear model problems [28]. The ANFIS and ANN methods were applied to approximate specific environmental conditions on air quality and SO₂ pollution levels in the city of Konya [29].

2. Description of Data for SO₂ Estimation

Since the level of SO₂ can be affected by climatic conditions that can stimulate chemical reactions and produce a set of harmful pollutants, there is a need to address the association between the concentration of air pollutants and meteorological parameters [30]. Meteorological conditions, such as atmospheric pressure (P), relative humidity (RH), and temperature (T) have a significant role in the concentration of air pollutants [31]. For example, the concentration of air pollutants was inversely correlated with wind speed and relative humidity, but positively correlated with atmospheric pressure, implying that as atmospheric pressure rises, so would the concentration of air pollutants [1,16,32,33]. Therefore, the metrological air conditions, such as P, RH, and T, were included in the input variables. Additionally, to approximate the impact of air pollutants on the SO₂ level, chemical air properties, such as NO, NO₂, CO, H₂S, O₃, NO, HyC, MHC, NMHC, and PM10, are included in the input variables. Therefore, for the input variables, the maximum level (Xmax), minimum level (Xmin), average level (mean), standard deviation (STD), and coefficient of variation (COV) are applied in this study as shown in Table 1. The Sulphur Dioxide (SO₂) was considered as the output in the modeling process varied from 0–283 μg/m³ in the training and testing databases while the size of STD testing data is a little bit bigger than the training data. It is shown that the metrological variables have the lowest STD while the chemical air conditions show significant uncertainties with high COV (e.g., NO, H₂S, PM10, and NMHC). This means that the physical air conditions show stable climatological properties while the climatological properties of air varied in a large domain with the highest scattered database. Therefore, it can be concluded that almost all chemical air variables may be followed from the non-normal distribution for these studied data points.

The bar diagrams for SO₂ for training and testing data points are plotted in Figure 1A and 1B, respectively. As presented below, the database distribution was mostly obtained in the range of 12.5–240 mg/m³ (20–240 mg/m³) for training and testing data, respectively. As seen, for both training and testing phases, the databases are skewed asymmetrical; thus, the studied database of SO₂ follows from a non-normal distribution function and may be related to a highly nonlinear relationship among the input variables and its predicted variable, SO₂.

3. Methods and Modeling Procedures

Three well-known nonlinear approaches are presented for the approximation of SO₂. These models have a strong ability to predict nonlinear relations. Thus, these illustrate the ability for tendency, accuracy, uncertainty, and agreement of the applied novel hybrid model in this current study.

3.1. Artificial Neural Network ANN

The ANN is implemented to approximate the nonlinear relation between the output variable (SO₂) and input variables, including chemical and metrological air properties. The multilayer perceptron of the ANN model (MLPNN) is commonly applied for successful achievement to provide a strongly nonlinear relation in predicting the air pollution [34,35,36]. Generally, The MLPNN is made up of three layers: input, hidden, and output. Each layer involves several elements named neurons (nodes). All input variables are cited in the input layer, while the output layer has one node which is measured SO₂. Since there are multiple variables with various units, the data involved in input and output layer nodes were normalized using the Min/Max method to solve units of variables, and to improve the predictor performances map of the number from the −1 to 1 range [37]. The schematic view of the MLPNN framework is presented using three layers as shown in Figure 2, in which the ANN model is employed to generate a nonlinear map between input and output nodes. As MLPNN is a nonlinear mapping ML tool, M-node is applied to the hidden layer. Moreover, the input data can be joined to a nonlinear response as an output node using the hidden nodes. The number of neurons in the hidden layer ranges from 5 to 15 nodes to provide the lowest mean square error (MSE), using Equation (1), between observed (O) and predicted (Y) SO₂ for N training data points by trial and error using the following equation:

M S E = \frac{1}{N} \sum_{i = 1}^{N} {[O_{i} - Y_{i}]}^{2}

(1)

In the MLPNN, the approximated function for predictions of SO₂ is determined based on the following relations:

Y = b + \sum_{j = 1}^{M} w_{j} φ_{j}

(2)

In which, b indicates bias, and w_j denotes the weights for the output layer that M—hidden nodes are linked to output node with respect to b and wj. The term φj is the response of j—node of a hidden layer that can be determined by a nonlinear relationship as a sigmoid function. The nonlinear relation in the hidden nodes is given as follows:

φ_{j} = \frac{1}{1 + \exp [- (b_{j} + \sum_{i = 1}^{n} w_{ji} x_{i})}

(3)

where b_j is biased for jth hidden node, w_ji indicates the weights for connections of jth hidden node and ith node in input layer with n-input node. Thus, the sigmoid function and linear function are respectively used as active functions for hidden and output nodes in the current study.

In order to generate the optimal linkages of input and output layers, the learning approach for finding the optimal conditions of weights (w) and biases (b) is more important for accurate prediction. A training algorithm is used to obtain acceptable weights and biases. The backpropagation (BP) method of ANN is a well-known training approach to determining the relationship between the input and output parameters. The gradient-based optimization methods can be used for BP in the training phase due to efficient computational approaches. In this study, the Levenberg-Marquardt (LM) approach using a gradient basis algorithm was applied for the learning phase of the MLPNN model where the MATLAB ANN toolbox was implemented for the computation of the ANN model using the LM algorithm [38].

3.2. Multivariate Adaptive Regression Spline (MARS)

In statistics, the MARS is considered a nonlinear nonparametric regression analysis tool as well as a flexible nonlinear ML approach to provide a connection between input and output variables [39]. In high dimensional input models, the MARS is a popular nonparametric modeling technique that is provided with a nonlinear map using splines. Recently, MARS as a machine learning tool has had successful applications in complex air pollution predictions, including atmospheric particulate matter as (PM10) [40], (O₃) [41], (SO₂) [42], (NO₂) [43], and Benzene concentration [44]. The MARS could be used to build a nonlinear relationship using the piecewise linear splines basis function (BF) using the following equation:

Y = b_{0} + \sum_{i = 1}^{m} w_{i} {BF}_{i}

(4)

where, b represents bias and w_i indicates the weights as unknown coefficients, which are used to connect m-basis function (BF) to complex response SO₂ that unknown coefficients are determined by the least-squares method. It is found using the forward–backwards stepwise scheme for determining the effective BF_i which is calculated from the piecewise linear function as shown below:

{BF}_{i} = {\max (0, x - C_{i}), \max (0, C_{i} - x)}

(5)

where, C is a knot which is a constant coefficient. The piecewise linear basis function is known based on a knot of C, followed by a piecewise linear basis function as presented in Figure 3. Generally, the term “max” denotes that the positive part of BF that is used in the modeling process using BF as plotted in Figure 4. The MARS produced by BFs is generated by a stepwise search for possible univariate knots and interactions between all variables. The MARS algorithm involves the forward and backward phases. The forward phase is used to find the candidate knots by a random position in the range of the input variable, which is applied for defining the pair of BFs. This procedure of selecting BFs is repeated until the maximum inputs are obtained with an overfitted model. The backward step entails eliminating BFs that have no meaningful impact on the model. By using several knots and various basis functions, the structure of the MARS model is calibrated according to the training data set. The applied basis functions (BF) are found by using a stepwise process. The model of MARS is designed by using two phases as (i) BF functions and their potential knots are selected to provide the accurate estimation in the first phase, and (ii) in the second stage, the BF terms with the least effect are removed [45]. The BF can be represented by multiplying truncated functions which are computed using input as follows:

{BF}_{i} = \prod_{k = 1}^{K} S_{ki} (x (k, i) - C_{ki})

(6)

In which, K is the number of knots, S_ki denotes the right/left associated linear step function which takes values of 1/–1, x(k, i) is the input variable i at knot k, and C_ki indicates knot location.

Generalized Cross-Validation (GCV) is a simple technique used to compute the subset model in MARS for N response data as observed points. The GCV for the training data set is determined by the following relation [46]:

GCV = \frac{\frac{1}{N} \sum_{i = 1}^{N} {(O_{i} - Y_{i})}^{2}}{{(1 - \frac{c (b)}{N})}^{2}}

(7)

where, N denotes the total training number of data and c(b) represents the complexity penalty that increases by increasing BF, which is defined as follows:

c (b) = (B + 1) + dB

(8)

where, B is the number of basic functions in the MARS model and is related to the number of parameters for nonlinear functions. The penalty of BF is presented by the parameter d. The effective input variables are selected using the MARS model; thus, the nonlinear cross-correlation and internal interaction between the input variables may be improved by using the inefficient variables.

3.3. Support Vector Regression (SVR)

The accuracy of the predicted function is one of the essential challenges in the modeling approaches. The SVR is successfully applied for the prediction of air pollution due to its efficiency and accuracy. The successful application of SVR with accurate prediction was presented for the prediction of SO₂ hourly in Wanliu of Beijing in China [47]. Additionally, the SVR was applied for modeling the total Column Ozone using input variables of daily temperature and humidity [48]. It was concluded that SVR outperformed prediction algorithms compared to ANNs. The SVR modeling technique to build the nonlinear relationship could improve the accuracy of prediction by the input variables with the below relation:

Y = b + \sum_{i = 1}^{N} w_{i} K (x, x_{i})

(9)

In which, b is the bias and K(x, x_i) denotes the Kernel function that transfers the input variables from real-space into N-dimensional feature space, which is related to the number of train data points. Generally, to transfer the input data, the Gaussian kernel function was used as follows [49]:

K (x, x_{i}) = \exp (\frac{- 0.5 {| | x - x_{i} | |}^{2}}{σ^{2}})

(10)

where,

σ

is the kernel parameter that provides the smoothness of the Kernel function, w_i is the weights to connect the input uncertainties in feature space which is calculated by two slack variables

ξ_{i}, ξ_{i}^{*}

using the optimization problem as follows [50]:

Minimise \frac{{| | w | |}^{2}}{2} + C \sum_{i = 1}^{N} (ξ_{i} + ξ_{i}^{*})

(11)

Subject to {\begin{matrix} y_{i} - w . K (x, x_{i}) - b \leq ε + ξ_{i} \\ w . K (x, x_{i}) + b - y_{i} \leq ε + ξ_{i}^{*} \\ ξ_{i}, ξ_{i}^{*} \geq 0, i = 1, 2, \dots, N . \end{matrix}

where, factor C ≥ 0 is the regularization coefficient, and ε is the insensitive loss function. The schematic view of the SVR model is illustrated in Figure 4. The structure of SVR is shown in Figure 4b. The input data set (x) are used to calibrate the probabilistic model using SVR.

Three parameters (

σ

, C and ε) of SVR have controlled the accuracy of the prediction. The good selection of these hyper-parameters may be related to accurate predictions. The hybrid optimization and SVR algorithms are generally used to determine the optimal parameters of SVR. The influence variables on air pollutants for acidic rain as SO₂ and NO₂ have been predicted in four cities located in central China using two-hybrid SVR models combined with Cuckoo Search and Grey Wolf optimization approach. The hybrid models have presented the best modeling approaches for the accurate predictions of NO₂ and SO₂ [51]. The hybrid SVR with an improved glowworm swarm has been presented for forecasting the air pollutant in the Jing-Jin-Ji region. The hybrid method produced the highest accuracy compared to SVR due to adjusting SVR parameters using an optimization approach [52]. A hybrid model using a genetic algorithm as an optimizer to find the factors C and σ of SVR has been presented for CO concentration with O₂, CO_2, and C₂H₆ input variables [53]. This hybrid method showed superior accuracy compared to standard SVR, random forest, and MLPNN models. For improving the prediction of PM2.5, [54] the SVR model, hybrid SVR with the optimization method, and generalized regression ANN were applied for three cities in China. The results demonstrated the hybrid SVR has the top ability for forecasting PM2.5, among others. For achieving the accurate prediction, the SVR combined with Quantum-behaved particle swarm optimization (QPSO) for forecasting the atmospheric PM2.5 and NO₂ was used in Wanliu Station of Beijing in China [55]. Three hybrid methods named PSO-SVR, GA-SVR, and grid search–SVR are compared with QPSO-SVR, and QPSO-SVR is an efficient modeling approach compared to others. It is a main issue of the SVR by hybrid modeling approach as the computational efficiency for determining the best condition of parameters. Thus, developing a hybrid SVR with a computational burden like the original SVR is a major change for a database with vast training points. Thus, it presents a novel hybrid SVR model that is an efficient-based computational burden for determining a nonlinear model as SVR.

3.4. Hybrid Intelligent Model

The accuracy of SVR is improved using effective input variables in the training procedure. The MARS model can be given the influence of input variables in the modeling procedure to find the knots. In this current work, the MARS model has been established based on all effective input variables in the first stage. By using the effective input variables on the SO₂ among the chemical and metrological conditions, the nonlinear model using the SVR approach for prediction is utilized in the second stage. The nonlinear cross-correlation and internal interaction of input variables are considered by the nonparametric regression approach using MARS models. The MARS has controlled the input variables of the SVR in this modeling approach. Two models of MARS and SVR are integrated to provide a nonlinear approximation of air pollution as SO₂ concentration in the urban region. The platform of hybrid MARS-SVR for giving the effective input variables is plotted in Figure 5 for the current study. As presented in this figure, the hybrid model involves five main steps with two main modeling approaches. In the first modeling approach, the effective input variables obtained from the database presented in Table 1 were selected using MARS, while the nonlinear relation was provided by SVR using the effective inputs acquired from the first modeling approach, and were used in the second modeling approach. Thus, the steps are,

Step 1: Give input variables. It is 12 total inputs for selecting the effective parameters by using the MARS model.

Step 2: The database was given from step 1 and was normalized by the Min/Max method. The studied data points which are applied are presented in Table 1 and have a large variety of different units. To solve units of variables and to improve the predictor performance, the data points are normalized from −1 to 1 range using the below relation:

Z = 2 * \frac{x - X \max}{X \max - X \min} - 1

(12)

where Xmax and Xmin were given from the results of Table 1.

Step 3: Calibrate the MARS model. Using the normalized data presented in Step 2, the MARS model is established for modeling SO₂. In this step, the MARS is utilized to predict SO₂ and to select the final input variables which are used for inputs of the SVR model.

Step 4: Give the influence input variables by using the results presented in Step 3.

Step 5: Calibrate the SVR using influence input data obtained from Step 4. The main innovation of this hybrid approach applied for predicting SO₂ is to develop an efficient approach for reducing the time-consuming calibration procedure. This approach is an efficiency-based computational problem for the SVR model.

4. Results and Discussion

4.1. Comparative Metrics

The novel hybrid modeling scheme as hybrid SVR combined with MARS to select the effective inputs was compared using the database of SO₂ concentration to illustrate its accuracy and tendency. Four comparative statistical measures, such as (MAE), (RMSE), (NSE), and (d) were employed to measure the tendency, and the accuracy of the models (ANN, MARS, SVR, and SVR-MARS) used for this work with the observed data. Using the comparative statistical measures in Equations (13)–(16), low RMSE and MAE, and high values for d and NSE showed the superior tendency and accuracy for a model, among others. This indicates the model produced more accurate forecasting and the best agreement for observed SO₂ data.

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {[O_{i} - Y_{i}]}^{2}}

(13)

MAE = \frac{1}{N} \sum_{i = 1}^{N} | O_{i} - Y_{i} |

(14)

d = 1 - \frac{\sum_{i = 1}^{N} | O_{i} - Y_{i} |}{\sum_{i = 1}^{N} | O_{i} - \bar{O} | + | Y_{i} - \bar{O} |}, 0 < d \leq 1

(15)

NSE = 1 - \frac{\sum_{i = 1}^{N} | O_{i} - Y_{i} |}{\sum_{i = 1}^{N} | O_{i} - \bar{O} |}, - \infty < d \leq 1

(16)

In which, N is the data number for the training and testing stages.

O_{i}, P_{i}

and

\bar{O}

, respectively denote the observed, predicted i-th data and mean of the observed SO₂ computed using

\bar{O} = \frac{\sum_{i = 1}^{N} O_{i}}{N}

4.2. Sensitivity Analysis of Model and Effective Variables

To build ANN and SVR models, several parameters are required to be defined to give an accurate result manually. The number of hidden layers in the MLPNN is more important for providing the acceptable prediction for SO₂. The MSE is examined to obtain an accurate model of MLPNN for training data points using various hidden layers given from 5 to 15 as presented in Figure 6. The lowest and highest MSE are determined by hidden nodes of 5 and 12, respectively, as shown in Figure 6. By applying hidden nodes as 15, the training process is time-consuming as an inefficient model. In contrast, this model cannot provide accurate calibration compared to models with hidden nodes of 14 and 12. Thus, in the training process of the ANN model, the total number of hidden nodes is given as 12 for this current prediction.

The parameters of SVR (i.e., C, ε, and σ) in the original SVR and the hybrid SVR should be selected with acceptable performances. As mentioned, the optimization approach can be used for determining the optimum condition for these parameters. However, the computation burden to find accurate results is significantly increased for calibrating a nonlinear relation compared to SVR. In the current work for balancing the time computational efficiency and accuracy of predictions, three levels of C = 500, 1000, and 2000 with four levels of parameter σ (0.5, 1, 2, 5) and ε (0.05, 0.1, 0.2, 0.5) are examined to find the best selection by the lowest MSE. The results of the MSE for different parameters are presented in Table 2 and Table 3 for SVR and hybrid SVR with MARS, respectively. According to these tables, the best parameters for SVR are given as C = 500, ε = 0.2, and σ = 2 while they are selected as C = 1000, ε = 0.1, and σ = 1 for hybrid SVR in the proposed model. As seen, the SVR parameters affect the accuracy of SVR models, and it extracted different values for SVR and hybrid SVR.

The MSE using hybrid SVR (MSE = 43.99) is strongly enhanced by about 200% compared to the original SVR (MSE = 114.57) for modeling the SO₂. Thus, the effective input variables can be provided with accurate results for this nonlinear problem. Using MARS, the effective input variables are extracted as (NO), (NO2), (CO), (H2S), (O₃), (PM10), (P), (RH), and (T). These inputs are given from the modeling process using MARS. To confirm these input data, the correlation coefficient between the observed data (O) and input variable (x) was calculated to determine the sensitivity analysis to illustrate the impact of input variables. The correlation coefficient is computed for each input variable as below:

r_{i} = \frac{\sum_{k = 1}^{n} (x_{i, k} \times O_{k})}{\sqrt{\sum_{k = 1}^{n} x_{i, k}^{2} \sum_{k = 1}^{n} O_{k}^{2}}}

(17)

where,

x_{i, k}

and

O_{k}

represent the ith input variable and output for k = 1, 2,…, n data points, respectively. The most effective parameter provides the highest absolute values. Figure 7 presents the sensitivity diagram obtained by Equation (17) for all inputs considered in SO₂ models. It is shown that the metrological inputs, i.e., T, P, and RH are sensitive input variables while CO and HyC are insensitive to input variables. By comparing the results obtained by MARS, the CO is the effective input variable while the NHMC and NO are neglected as ineffective variables in the MARS modeling approach, but NHMC and (NO) show the highest positive influence on the prediction of SO₂ using the linear correlation coefficient.

The influenced variables using MARS and the correlation coefficient using Equation (6) are confined, but two input variables of CO and NHMC show opposite performances from MARS. This means that it may be provided with a nonlinear effect among the input variables and output response. The linear correlation cannot be considered while the nonlinear modeling given from the nonparametric regression is considered in the modeling procedure to obtain the effective input variables. In the current work, a nonlinear sensitivity vector was presented to determine the effective results of inputs. Exponential map-based regression analysis is used to examine the sensitivity of the input variables. This proposed sensitivity analysis relationship is computed as follows:

α_{i} = \frac{w_{i}}{\sqrt{\sum_{k = 1}^{nv} w_{k}^{2}}}

(18)

where,

α_{i}

(i = 1, 2,…, nv) is the effective degree for input variable x_i, and nv represents the number of input variables. The factor

w_{i}

is computed using the nonlinear regression by the following function:

Y_{i} = b_{i} + w_{i} N_{i}^{2}

(19)

where,

b_{i}

and

w_{i}

are respectively the bias and weight which are calibrated based on the nonlinear relation using a least square estimator to provide a nonlinear connection between the observed data of SO₂ (Y_i) and the normalized input variable (

N

), which is determined as below:

N_{i} = \frac{x_{i} - μ_{i}}{σ_{i}}

(20)

where,

μ_{i}

and

σ_{i}

are the mean and standard diversion for input variable

x_{i}

, respectively. The µ and σ are found in the experimental database presented in Table 1.

Using the proposed nonlinear sensitivity degree presented in Equation (18), Figure 8 demonstrates the sensitivity degree for the input variables using the nonlinear prediction. As seen, the sensitivity degree using the nonlinear relation is closely confirming the effective input variables obtained by MARS and r_i. Temperature shows the highest positive effects on the SO₂ in this current study. This means that the temperature of Jeddah is increased by increasing SO₂ concentration. Three input variables of CO, NO, and NMHC are insensitive inputs using the proposed nonlinear sensitivity analysis as well as the MARS model. It can be stated that four sets of input variables are possible to categorize as strongly sensitive (e.g., T, RH, P), highly sensitive (e.g., H₂S, O₃, NO₂), moderately sensitive (e.g., NO, MHC, PM10), and lowly sensitive (e.g., CO, NO, NMHC) variables. Unlike the linear correlation sensitive analysis, the proposed method illustrated that the PM10 and H₂S are insensitive inputs while the P is a sensitive factor on the SO₂. By increasing RH and P, the SO₂ concentration is reduced.

4.3. Comparative Results for Accuracy and Tendency

The performance for both accuracy (i.e., lowest RMSE and MAE) and tendency (i.e., highest d and NSE) for training and testing phases were compared for different models in Table 4. It is shown that the hybrid SVR model outperforms the other models with the lowest RMSE and MAE, and highest d and NSE compared to the studied models of SVR, ANN, and MARS in testing and training predicted data points. The hybrid SVR model improved the accuracy of SO₂ prediction by comparing MAE and RMSE for training and testing data points with MARS, SVR, and ANN by around 115% (70%) and 190% (50%), 65% (50%) and 60% (40%), 95% (70%) and 180% (80%), respectively. It can be conducted that the proposed model significantly performed with the highest accuracy for the prediction of the SO₂ compared to other models, and the accuracy of SVR in the predictions as testing (validation) data points is effectively increased by using the influence of input variables. The acceptable nonlinearity using SVR and cross-correlation is obtained by using the proposed hybrid model with a simple strategy, more computational efficiency. The MARS model shows the highest MAE (RMSE) compared to the SVR and ANN models. By considering the d and NSE to compare the tendency of models, the highest d (NSE) was extracted by the proposed machine learning model, while SVR provides the tendency among other models as a second modeling approach. The hybrid SVR achieved a higher tendency than the SVR model. The NSE for testing data points using hybrid SVR is improved by around 25%, 20%, and 25% compared to MARS, SVR, and ANN, respectively. The hybrid SVR model has increased the tendency of nonlinear predictions for SO₂ by comparing the d by around 13% (12%) with MARS, 7% (8%) with SVR, and 11% (12%) with ANN for training and testing phases, respectively. Therefore, the superior abilities for both tendency (highest agreement of model) and accuracy (lowest error of model) are presented by applying the effective input variables in the SVR modeling process.

To consider the accuracy and tendency, the d-to-RMSE (d/RMSE) ratio was utilized for better compression of models. The d/RMSE shows the lowest error using RMSE and the highest tendency using the d index for superior comparative models’ performances. The model with the highest d/RMSE provides superior performance among other models. The predicted d/RMSE for testing and training datasets of models is presented in Figure 9. It can be extracted from this figure that the hybrid SVR provides the highest d/RMSE for training and testing datasets than other models. The d/RMSE of the proposed model was significantly increased for the training dataset compared to the MARS model. The nonlinear forms using the Kernel function by effective input variables in the SVR model are improved in the hybrid model.

Figure 10 shows the scatterplots for observed data corresponding to predicted SO₂ using four studied models: the hybrid SVR, SVR, MARS, and ANN for testing datasets. It also presents the related linear line between observed and predicted data points as y = b + ax (where b is biased and a is slope as linear relation) and R² as the linear correlation coefficient. When b tended to 0, and a followed to 1, the predicted data are perfectly approximated, and R² equal to 1 showed that a perfect tendency of the model related to indicated points. The results presented in Figure 10 illustrate that the ML approaches using the hybrid SVR model provide the most accurate nonlinear relation for this complex problem while the MARS follows as the ANN. The SVR model is in more agreement with the best b, and R² than that of the ANN and MARS models. The models could be ranked from best to worst by comparing a and R² as hybrid SVR-MARS, SVR, MARS, and ANN.

4.4. Comparative Results for Uncertainties

The model uncertainties are compared by using the mean (average), standard deviation (STD) and error bar diagram for errors of data points in the test phase as the error indicated the difference between the observed and the predicted values. If the mean and STD of errors are followed from zero, then there is a superior model with the lowest uncertainties in predicting the SO₂. The error bar diagram is plotted in Figure 11 for testing data points. The ANN shows a large STD compared to other models while the applied hybrid model has improved it by around 90% compared to ANN. The lowest and highest uncertainties are obtained using hybrid SVR and ANN models. The bound of errors using the hybrid SVR is less than the SVR and MARS. Thus, effective inputs or sensitivity analysis using a nonlinear relation to select the input variables can be provided with a platform to give a robust model with the lowest uncertainty. Nonlinear sensitivity analysis can be used for complex nonlinear prediction to illustrate the nonlinear effects of input variables in the future. Generally, the Kernel function applied in SVR can be provided with acceptable flexibility for the nonlinearity of mapping between several variables and output results of SO₂ by comparing the STD of the SVR and hybrid SVR. It is suggested that the hybrid SVR coupled with an optimization algorithm to find the best condition of parameters is also applied as the best modeling approach in the future.

Figure 12 illustrates the Taylor diagram of the STD and agreement index (d) for predicted models and observed data (with the blue point on the horizontal line as observation). The Taylor diagram demonstrated that the ANN and MARS followed the same uncertainty of agreement and scattered data points, while the SVR and hybrid SVR relations generated the most accurate results with the highest agreement than that of the ANN and MARS models. The proposed hybrid SVR was attained from the observed data for the agreement index. The proposed novel hybrid SVR for the prediction of SO₂ achieved superior agreement predictions with the observed data among studied models. It can be extracted from the Taylor diagram that best to worst models can be ranked as (1) hybrid SVR-MARS, (2) SVR, (3) MARS, and (4) ANN for this dataset.

5. Conclusions

Air pollution usually rises parallel to increasing population and industrial development. Meteorological and topographic conditions cause an increase in air pollution. This study aims to determine the relationship between the industrial district meteorological parameters and the air pollution in the city of Jeddah. To estimate the influence of environmental circumstances and pollutants, machine learning algorithms were used on air quality considering SO₂ as an important pollutant in Jeddah. In the present work, the SO₂ was considered as the output in the modeling process. The concentration of the SO₂ ranges from 0–283 μg/m³ in Jeddah city which is considered as an independent and critical parameter in this study. The proposed model produced more accurate forecasting and the best agreement with the observed SO₂, as shown in calculating the comparative statistical measures, low RMSE and MAE, and high values for d and NSE, which showed the superior tendency and accuracy for the proposed model. Using MARS, the effective input variables were extracted as (NO), (NO₂), (CO), (H₂S), (O₃), (PM10), (P), (RH), and (T). In this current work, a nonlinear sensitivity vector was presented to determine the effective results of inputs. Exponential map-based regression analysis was used to examine the sensitivity of the input variables. As seen, the sensitivity degree using the nonlinear relation is closely confirming the effective input variables obtained by MARS and r_i. Temperature shows the highest positive effects on the SO₂ in this current study. This means that the temperature of Jeddah is increased by increasing SO₂ concentration. Three input variables of CO, NO, and NMHC are insensitive inputs using the proposed nonlinear sensitivity analysis as well as the MARS model. This can be categorized as four sets of input variables by sensitive degree as strongly sensitive (e.g., T, RH, P), highly sensitive (e.g., H₂S, O₃, NO₂), moderately sensitive (e.g., NO, MHC, PM10), and lowly sensitive (e.g., CO, NO, NMHC) variables. Increasing RH and P means reduced SO₂ concentration. The performance for both accuracy (i.e., lowest RMSE and MAE) and tendency (i.e., highest d and NSE) for training and testing phases were compared for different models. It was shown that the hybrid SVR model outperforms the other models in testing and training of predicted data points. The hybrid SVR model improved the accuracy of SO₂ prediction when comparing MAE and RMSE for the training and testing data points with MARS, SVR, and ANN by around 115% (70%) and 190% (50%), 65% (50%) and 60% (40%), 95% (70%) and 180% (80%), respectively. It can be concluded that the accuracy of SVR in the predictions as testing (validation) data points is effectively increased by using the influence of input variables. The NSE for testing data points using hybrid SVR is improved by around 25%, 20%, and 25% compared to MARS, SVR, and ANN, respectively. The hybrid SVR model has increased the tendency of nonlinear predictions for SO₂ by comparing the d around 13% (12%) with MARS, 7% (8%), and with SVR 11% (12%) with ANN for the training and testing phase, respectively. Scatterplots for observed data corresponding to predicted SO₂ illustrated that the hybrid SVR model provides the most accurate nonlinear relation for this complex problem while the MARS follows the ANN. The models could be ranked from best to worst by comparing R² as a hybrid SVR-MARS, SVR, MARS, and ANN. The lowest and highest uncertainties are obtained using hybrid SVR and ANN models, respectively. The bound of errors using the hybrid SVR is less than the SVR and MARS. Thus, effective inputs or sensitivity analysis using a nonlinear relation to select the input variables can be provided with a platform to give a robust model with the lowest uncertainty. Nonlinear sensitivity analysis can be used for complex nonlinear prediction to illustrate the nonlinear effects of input variables in the future. Generally, the Kernel function applied in SVR can be provided with acceptable flexibility for the nonlinearity of mapping between several variables and output results of SO₂ by comparing the STD of the SVR and hybrid SVR. The Taylor diagram demonstrated that the proposed novel hybrid SVR for the prediction of SO₂ achieved superior agreement predictions with the observed data among studied models. It can be extracted from the Taylor diagram that best to worst models can be ranked as (1) hybrid SVR-MARS, (2) SVR, (3) MARS, and (4) ANN for this dataset.

Author Contributions

Conceptualization, M.A. (Mohammed Alamoudi), O.T., B.K., M.A. (Mona Abusurrah), M.B.; methodology, M.A. (Mohammed Alamoudi), O.T., B.K., M.A. (Mona Abusurrah), M.B.; software, M.A. (Mohammed Alamoudi), O.T., B.K., M.A. (Mona Abusurrah), M.B.; validation, M.A. (Mohammed Alamoudi), O.T., B.K., M.A. (Mona Abusurrah), M.B.; formal analysis, M.A. (Mohammed Alamoudi), O.T., B.K., M.A. (Mona Abusurrah), M.B.; writing—original draft preparation, M.A. (Mohammed Alamoudi), O.T., B.K., M.A. (Mona Abusurrah), M.B.; visualization, M.A. (Mohammed Alamoudi), O.T., B.K., M.A. (Mona Abusurrah), M.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah, under grant No. (RG-28-135-42). The authors, therefore, gratefully acknowledge the DSR technical and financial support.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are available within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

Abbreviation	Definition
ANNs	Artificial neural networks
SVR	Support vector regression
MARS	Multivariate adaptive regression spline
SO₂	Sulphur dioxide
CO	Carbon monoxide
NO	Nitrogen monoxide
NO₂	Nitrogen dioxide
SO₂	Sulphur dioxide
O₃	Ozone
H₂S	Hydrogen sulphide
HyC	Hydrocarbons
MHC	Methanic hydrocarbons
PM10	Suspended particulates 10
NMHC	Non-methane hydrocarbons
P	Atmospheric pressure
RH	Relative humidity
T	Temperature
MLPNN	The multilayer perceptron neural network
QPSO	Quantum-behaved particle swarm optimization
MAE	Mean absolute error
RMSE	Root means square error
NSE	Nash and Sutcliffe efficiency
d	Agreement index

References

Kampa, M.; Castanas, E. Human health effects of air pollution. Environ. Pollut. 2008, 151, 362–367. [Google Scholar] [CrossRef] [PubMed]
Mostofsky, E.; Schwartz, J.; Coull, B.A.; Koutrakis, P.; Wellenius, G.A.; Suh, H.H.; Gold, D.R.; Mittleman, M.A. Modeling the Association between Particle Constituents of Air Pollution and Health Outcomes. Am. J. Epidemiol. 2012, 176, 317–326. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Oksanen, E.; Kontunen-Soppela, S. Plants have different strategies to defend against air pollutants. Curr. Opin. Environ. Sci. Health 2021, 19, 100222. [Google Scholar] [CrossRef]
Kurnaz, G.; Demir, A.S. Prediction of SO2 and PM10 air pollutants using a deep learning-based recurrent neural network: Case of industrial city Sakarya. Urban Clim. 2022, 41, 101051. [Google Scholar] [CrossRef]
Nunnari, G.; Dorling, S.; Schlink, U.; Cawley, G.; Foxall, R.; Chatterton, T. Mod-elling SO₂ Concentration at a Point with Statistical Approaches. Environ. Model. Softw. 2004, 10, 887–905. [Google Scholar] [CrossRef]
Lim, C.C.; Thurston, G.D.; Shamy, M.; Alghamdi, M.; Khoder, M.; Mohorjy, A.M.; Alkhalaf, A.K.; Brocato, J.; Chen, L.C.; Costa, M. Temporal variations of fine and coarse par-ticulate matter sources in Jeddah, Saudi Arabia. J. Air Waste Manag. Assoc. 2018, 68, 123–138. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Krewski, D.; Jerrett, M.; Burnett, R.T.; Ma, R.; Hughes, E.; Shi, Y.; Turner, M.C.; Pope, C.A., III; Thurston, G.; Calle, E.E.; et al. Extended follow-up and spatial analysis of the American Cancer Society study linking particulate air pollution and mortality. Res. Rep. Health Eff. Inst. 2009, 140, 5–114. [Google Scholar]
Brook, R.D.; Rajagopalan, S.; Pope, C.A., 3rd; Brook, J.R.; Bhatnagar, A.; Roux, A.V.D.; Holguin, F.; Hong, Y.; Luepker, R.V.; Mittleman, M.A.; et al. Particulate matter air pollution and cardiovascular disease: An update to the scientific statement from the American Heart Association. Circulation 2010, 121, 2331–2378. [Google Scholar] [CrossRef] [Green Version]
World Health Organization (WHO). Air Pollution. 2020. Available online: https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health (accessed on 20 March 2020).
Nakao, M.; Yamauchi, K.; Ishihara, Y.; Omori, H.; Ichinnorov, D.; Solongo, B. Effects of air pollution and seasons on health-related quality of life of Mongolian adults living in Ulaanbaatar : Cross-sectional studies. BMC Public Health 2017, 17, 594. [Google Scholar] [CrossRef] [Green Version]
Suárez Sánchez, A.; García Nieto, P.; Riesgo Fernández, P.; Coz Díaz, J.J.; Iglesias-Rodríguez, F. Application of an SVM-based regression model to the air quality study at local scale in the Avilés urban area (Spain). Math. Comput. Model. 2011, 54, 1453–1466. [Google Scholar] [CrossRef]
Zhu, B.; Wei, Y. Carbon price forecasting with a novel hybrid ARIMA and least squares support vector machines methodology. Omega 2013, 41, 517–524. [Google Scholar] [CrossRef]
Hussain, A.; Rahman, M.; Alam Memon, J. Forecasting electricity consumption in Pakistan: The way forward. Energy Policy 2016, 90, 73–80. [Google Scholar] [CrossRef]
Moazami, S.; Noori, R.; Amiri, B.J.; Yeganeh, B.; Partani, S.; Safavi, S. Reliable prediction of carbon monoxide using developed support vector machine. Atmos. Pollut. Res. 2015, 7, 10. [Google Scholar] [CrossRef]
Lu, W.-Z.; Wang, W.; Leung, A.Y.T.; Lo, S.-M.; Yuen, R.K.K.; Xu, Z.; Fan, H. Incorporating feature selection method into support vector regression for stock index forecasting. Air Pollut. Param. Forecast. Using Support Vector Mach. 2002, 10, 630–635. [Google Scholar]
Zhang, Q.; Jiang, X.; Tong, D.; Davis, S.J.; Zhao, H.; Geng, G.; Feng, T.; Zheng, B.; Lu, Z.; Streets, D.G.; et al. Transboundary health impacts of transported global air pollution and international trade. Nature 2017, 543, 705–709. [Google Scholar] [CrossRef] [Green Version]
Al-Musaylh, M.S.; Deo, R.C.; Adamowski, J.F.; Li, Y. Short-term electricity demand forecasting with MARS, SVR and ARIMA models using aggregated demand data in Queensland. Adv. Eng. Inform. 2018, 35, 1–16. [Google Scholar] [CrossRef]
Sahoo, B.B.; Jha, R.; Singh, A.; Kumar, D. Application of support vector regression for modelling low flow Time series. KSCE J. Civ. Eng. 2019, 23, 923–934. [Google Scholar] [CrossRef]
Barron, A.R.; Xiao, X. Discussion: Multivariate adaptive regression splines. Ann. Stat. 1991, 19, 67–82. [Google Scholar] [CrossRef]
Koc, E.K.; Bozdogan, B. Model selection in multivariate adaptive regression splines (MARS) using information complexity as the fitness function. Mach. Learn. 2015, 101, 35–58. [Google Scholar]
Geng, J.; Li, M.-W.; Dong, Z.-H.; Liao, Y.-S. Port throughput forecasting by MARS-RSVR with chaotic simulated annealing particle swarm optimization algorithm. Neurocomputing 2015, 147, 239–250. [Google Scholar] [CrossRef]
Kao, L.J.; Chiu, C.C.; Lu, C.J.; Chang, C.H. A hybrid approach by integrating wavelet-based feature extraction with MARS and SVR for stock index forecasting. Decis. Support Syst. 2013, 54, 1228–1244. [Google Scholar] [CrossRef]
Dai, W.; Shao, Y.E.; Liu, C.-J. Incorporating feature selection method into support vector regression for stock index forecasting. Neural Comput. Appl. 2013, 23, 1551–1561. [Google Scholar] [CrossRef]
Caraka, R.E.; Chen, R.C.; Bakar, S.A.; Tahmid, M.; Toharudin, T.; Pardamean, B.; Huang, S.-W. Employing best input SVR robust lost function with nature-inspired metaheuristics in wind speed energy forecasting. IAENG Int. J. Comput. Sci. 2020, 47, 572–584. [Google Scholar]
Most, T. Approximation of complex nonlinear functions by means of neural networks. In Proceedings of the Weimar Optimization and Stochastic Days, Weimar, Germany, 1–2 December 2005. [Google Scholar]
Maleki, H.; Sorooshian, A.; Goudarzi, G.; Baboli, Z.; Birgani, Y.T.; Rahmati, M. Air pollution prediction by using an artificial neural network model. Clean Technol. Environ. Policy 2019, 21, 1341–1352. [Google Scholar] [CrossRef]
Taylan, O. Assessing Air Quality in Jeddah by Modelling Suspended PM10 Concentration. J. Int. Environ. Appl. Sci. 2013, 8, 326–335. [Google Scholar]
Taylan, O. Prediction of air quality for sustainable environment by artificial intelligent techniques. Energy Educ. Sci. Technol. Part A Energy Sci. Res. 2013, 31, 1635–1652. [Google Scholar]
Dursun, S.; Kunt, F.; Taylan, O. Modelling sulphur dioxide levels of Konya city using artificial intelligent related to ozone, nitrogen dioxide and meteorological factors. Int. J. Environ. Sci. Technol. 2015, 12, 13015–13762. [Google Scholar] [CrossRef] [Green Version]
Gokhale, S.; Khare, M. A review of deterministic, stochastic and hybrid vehicular exhaust emission models. Int. J. Transp. Manag. 2004, 10, 59–74. [Google Scholar] [CrossRef]
Liu, Y.; Zhou, Y.; Lu, J. Exploring the relationship between air pollution and meteorological conditions in China under environmental governance. Sci. Rep. 2020, 10, 71337–71338. [Google Scholar] [CrossRef]
Lelieveld, J.; Evans, J.S.; Fnais, M.; Giannadaki, D.; Pozzer, A. The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature 2015, 525, 367–371. [Google Scholar] [CrossRef]
Zhang, L.; Liu, W.; Hou, K.; Lin, J.; Zhou, C.; Tong, X.; Wang, Z.; Wang, Y.; Jiang, Y.; Wang, Z.; et al. Air pollution-induced missed abortion risk for pregnancies. Nat. Sustain. 2019, 2, 1011–1017. [Google Scholar] [CrossRef]
Shams, S.R.; Jahani, A.; Moeinaddini, M.; Khorasani, N. Air carbon monoxide forecasting using an artificial neural network in comparison with multiple regression. Model. Earth Syst. Environ. 2020, 6, 1467–1475. [Google Scholar] [CrossRef]
Masih, A. Application of ensemble learning techniques to model the atmospheric concentration of SO₂. Glob. J. Environ. Sci. Manag. 2019, 5, 309–318. [Google Scholar]
Shams, S.R.; Jahani, A.; Kalantary, S.; Moeinaddini, M.; Khorasani, N. The evaluation on artificial neural networks (ANN) and multiple linear regressions (MLR) models for predicting SO₂ concentration. Urban Clim. 2021, 37, 100837. [Google Scholar] [CrossRef]
Jahani, A.; Saffariha, M. Aesthetic preference and mental restoration prediction in urban parks: An application of environmental modeling approach. Urban For. Urban Green. 2020, 54, 5. [Google Scholar] [CrossRef]
Senapati, N.P.; Panda, D.; Bhoi, R.K. Prediction of multiple characteristics of Friction-Stir welded joints by Levenberg Marquardt algorithm based artificial neural network. Mater. Today Proc. 2021, 41, 391–396. [Google Scholar] [CrossRef]
Friedman, J.H. Multivariate adaptive regression splines. Ann. Stat. 1991, 19, 1–67. [Google Scholar] [CrossRef]
Gocheva-Ilieva, S.G.; Ivanov, A.V.; Voynikova, D.S.; Stoimenova, M.P. Modeling of PM10 Air pollution in urban environment using MARS. In Proceedings of the International Conference on Large-Scale Scientific Computing, Sozopol, Bulgaria, 10–14 June 2019; Springer: Cham, Switzerland, 2019; pp. 237–244. [Google Scholar]
Srinivas, A.S.; Somula, R.; Govinda, K.; Manivannan, S. Predicting ozone layer concentration using machine learning techniques. In Social Network Forensics, Cyber Security, and Machine Learning; Springer: Singapore, 2019; pp. 83–92. [Google Scholar]
Kisi, O.; Parmar, K.S.; Soni, K.; Demir, V. Modeling of air pollutants using least square support vector regression, mul-tivariate adaptive regression spline, and M5 model tree models. Air Qual. Atmos. Health 2017, 10, 873–883. [Google Scholar] [CrossRef]
Yousefzadeh, M.; Farnaghi, M.; Pilesjö, P.; Mansourian, A. Proposing and investigating PCAMARS as a novel model for NO₂ interpolation. Environ. Monit. Assess. 2019, 191, 183. [Google Scholar] [CrossRef] [Green Version]
Menéndez García, L.A.; Sánchez Lasheras, F.; García Nieto, P.J.; Álvarez de Prado, L.; Bernardo Sánchez, A. Predicting Benzene Concentration Using Machine Learning and Time Series Algorithms. Mathematics 2020, 8, 2205. [Google Scholar]
Zhang, W.; Atc, G. Multivariate adaptive regression splines for Analysis of geotechnical engineering systems. Comput. Geotech. 2013, 48, 82–95. [Google Scholar] [CrossRef]
Golub, G.H.; von Matt, U. Generalized Cross-Validation for Large-Scale Problems. J. Comput. Graph. Stat. 1997, 6, 1–34. [Google Scholar] [CrossRef]
Luo, A.; Li, X.; Li, Y.; Li, J. Application of accurate online support vector regression in atmospheric SO₂ concentration pre-diction. In Proceedings of the 2018 Chinese Control and Decision Conference (CCDC), IEEE, Shenyang, China, 9–11 June 2018; pp. 6274–6279. [Google Scholar]
Carro-Calvo, L.; Casanova-Mateo, C.; Sanz-Justo, J.; Casanova-Roque, J.L.; Salcedo-Sanz, S. Efficient prediction of total column ozone based on support vector regression algorithms, numerical models and Suomi-satellite data. Atmósfera 2017, 30, 1–10. [Google Scholar] [CrossRef]
Brereton, R.G.; Lloyd, G.R. Support vector machines for classification and regression. Analyst 2010, 135, 230–267. [Google Scholar] [CrossRef]
Oliveira, A.L.I. Estimation of software project effort with support vector regression. Neurocomputing 2006, 69, 1749–1753. [Google Scholar] [CrossRef]
Zhu, S.; Qiu, X.; Yin, Y.; Fang, M.; Liu, X.; Zhao, X.; Shi, Y. Two-step-hybrid model based on data preprocessing and intelligent optimization algorithms (CS and GWO) for NO₂ and SO₂ forecasting. Atmos. Pollut. Res. 2019, 10, 1326–1335. [Google Scholar]
Ping, L.; Zhiwei, N.; Xuhui, Z.; Juan, S. Air Pollutant Concentration Forecast Model of SVR Based on Improved Glowworm Swarm Optimization Algorithm. J. Syst. Sci. Math. Sci. 2020, 40, 6. [Google Scholar]
Guo, Q.; Ren, W.; Lu, W. A method for predicting coal temperature using CO with GA-SVR model for early warning of the spontaneous combustion of coal. Combust. Sci. Technol. 2020, 194, 523–538. [Google Scholar] [CrossRef]
Zhu, S.; Lian, X.; Wei, L.; Che, J.; Shen, X.; Yang, L.; Qiu, X.; Liu, X.; Gao, W.; Ren, X.; et al. PM2.5 forecasting using SVR with PSOGSA algorithm based on CEEMD, GRNN and GCA considering meteorological factors. Atmos. Environ. 2018, 183, 20–32. [Google Scholar] [CrossRef]
Li, X.; Luo, A.; Li, J.; Li, Y. Air pollutant concentration forecast based on support vector regression and quantum-behaved particle swarm optimization. Environ. Model. Assess. 2019, 24, 205–222. [Google Scholar] [CrossRef]

Figure 1. Bar diagram for data points (A) training phase and (B) testing phase.

Figure 2. Structure of MLPNN with n-input, M-hidden, and 1-output nodes as n-M-1.

Figure 3. The structure of MARS model.

Figure 4. Schematic view of SVR model (a) SVR structure (b) calibrating data, ɛ-insensitive loss function and predicted model.

Figure 5. Framework of a hybrid model for prediction of SO₂.

Figure 6. Comparison of MSE for various hidden nodes in MLPNN for prediction of SO₂.

Figure 7. Sensitivity diagram for input variables on SO₂ by a linear coefficient.

Figure 8. Sensitivity diagram for input variables on SO₂ by nonlinear relation.

Figure 9. d-to-RMSE ratio for different models.

Figure 10. Scatter plot for models in testing phase.

Figure 11. Error bar diagram for different models in the testing phase.

Figure 12. Taylor diagram for the testing phase of different models.

Table 1. Statistical properties of the variables for training and testing phases.

Variables	Training (75% Total Data Points)					Testing (25% Total Data Points)
Variables	Xmin	Xmax	Mean	STD	COV	Xmin	Xmax	Mean	STD	COV
Nitrogen monoxide (NO) μg/m³	0	160	19.26	22.89	1.19	0	160	19.28	22.39	1.16
Nitrogen dioxide (NO₂) μg/m³	0	105	55.64	18.06	0.32	0	105	53.16	20.47	0.39
Carbon monoxide (CO) μg/m³	0	4.89	1.02	0.70	0.69	0	4.89	1.07	0.74	0.69
Hydrogen sulphide (H₂S) ug/m³	0	24,507	543.72	2295.56	4.22	0	20,348	556.76	2350.47	4.22
Ozone (O₃) μg/m³	7	217	34.43	20.17	0.59	7	192	34.94	21.25	0.61
Nitrogen oxides (NOx) μg/m³	0	229	54.80	29.79	0.54	0	229	52.19	30.07	0.58
Hydrocarbons (HyC) μg/m³	0.02	2.37	1.36	0.33	0.25	0.02	2.15	1.35	0.34	0.25
Methanic hydrocarbons (CH₄) μg/m³	0.02	1.89	1.30	0.24	0.19	0.02	1.84	1.29	0.25	0.20
Non methanic hydrocarbons (NMHC) μg/m³	0	1.3	0.14	0.14	1.03	0	1.3	0.14	0.15	1.07
Suspended particulates 10 (PM10) μg/m³	19	2223	115.52	115.06	1.00	19	2223	115.06	164.99	1.43
Atmospheric pressure (P) hPa	968	1016.7	1006.01	5.31	0.01	971.7	1014.7	1005.95	5.90	0.01
Temperature (T) °C	21.1	37	30.74	3.64	0.12	21.9	37	30.57	3.75	0.12
Relative humidity (RH) %	24.7	69.4	48.56	9.89	0.20	28.4	68.5	49.95	9.40	0.19
Sulphur Dioxide (SO₂) μg/m³	0	283	71.61	44.45	0.62	0	283	70.38	47.14	0.67

Table 2. MSE for different parameters of SVR models in the treating phase.

ε	C = 500				C = 1000				C = 2000
ε	σ = 0.5	σ = 1	σ = 2	σ = 5	σ = 0.5	σ = 1	σ = 2	σ = 5	σ = 0.5	σ = 1	σ = 2	σ = 5
0.05	131.33	136.12	116.35	134.97	130.72	131.20	127.49	130.85	127.93	133.40	127.37	128.50
0.1	119.86	129.94	120.07	116.62	123.26	127.86	118.78	124.86	123.63	122.81	130.32	200.89
0.2	121.38	121.12	114.57	119.68	129.39	123.91	124.95	123.13	122.95	125.56	126.46	200.49
0.5	124.20	121.11	125.01	126.11	125.82	125.76	127.18	128.41	123.37	123.82	126.66	200.31

Table 3. MSE for different parameters of hybrid SVR models in treating phase.

ε	C = 500				C = 1000				C = 2000
ε	σ = 0.5	σ = 1	σ = 2	σ = 5	σ = 0.5	σ = 1	σ = 2	σ = 5	σ = 0.5	σ = 1	σ = 2	σ = 5
0.05	55.33	50.28	52.75	56.40	47.01	48.08	48.19	47.95	46.25	49.33	48.30	50.23
0.1	52.87	51.98	50.26	54.33	45.79	43.99	46.33	45.90	46.25	45.40	46.88	46.76
0.2	54.25	51.69	51.77	56.98	47.79	48.68	47.86	50.05	47.91	44.84	47.57	49.83
0.5	54.91	53.52	51.88	55.36	48.97	48.20	48.18	51.26	48.26	46.70	47.68	50.94

Table 4. Comparative results of different models for testing and treating phases.

Models	Training				Testing
Models	MAE	RMSE	d	NSE	MAE	RMSE	d	NSE
MARS	13.997	19.280	0.789	0.598	16.257	22.175	0.762	0.548
SVR	10.448	10.704	0.846	0.700	14.537	20.458	0.795	0.596
ANN	12.582	18.528	0.809	0.639	16.258	25.033	0.758	0.548
SVR-MARS	6.424	6.633	0.905	0.816	9.577	14.699	0.861	0.734

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alamoudi, M.; Taylan, O.; Keshtegar, B.; Abusurrah, M.; Balubaid, M. Modeling Sulphur Dioxide (SO₂) Quality Levels of Jeddah City Using Machine Learning Approaches with Meteorological and Chemical Factors. Sustainability 2022, 14, 16291. https://doi.org/10.3390/su142316291

AMA Style

Alamoudi M, Taylan O, Keshtegar B, Abusurrah M, Balubaid M. Modeling Sulphur Dioxide (SO₂) Quality Levels of Jeddah City Using Machine Learning Approaches with Meteorological and Chemical Factors. Sustainability. 2022; 14(23):16291. https://doi.org/10.3390/su142316291

Chicago/Turabian Style

Alamoudi, Mohammed, Osman Taylan, Behrooz Keshtegar, Mona Abusurrah, and Mohammed Balubaid. 2022. "Modeling Sulphur Dioxide (SO₂) Quality Levels of Jeddah City Using Machine Learning Approaches with Meteorological and Chemical Factors" Sustainability 14, no. 23: 16291. https://doi.org/10.3390/su142316291

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Sulphur Dioxide (SO₂) Quality Levels of Jeddah City Using Machine Learning Approaches with Meteorological and Chemical Factors

Abstract

1. Introduction

2. Description of Data for SO₂ Estimation