Air Quality Prediction Based on Machine Learning Algorithms II

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 26 August 2024 | Viewed by 14828

Special Issue Editors

Faculty of Data and Information Sciences, Dalarna University, 791 88 Falun, Sweden
Interests: artificial intelligence and cognitive systems; machine learning-based models; prediction of air quality; programming and software development
Special Issues, Collections and Topics in MDPI journals
Grupo de Investigación en Biodiversidad, Medio Ambiente y Salud, Universidad de Las Américas, 170125 Quito, Ecuador
Interests: urban air pollution; natural aerosol formation; climate; conservation
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Worsening air quality is one of the major global causes of premature mortality, and is a major environmental risk, claiming seven million deaths every year. Nearly all urban areas fail to comply with the air quality guidelines of the World Health Organization (WHO). This health threat could be diminished by developing models to forecast air quality and inform citizens of the risks of practicing certain activities during elevated pollution episodes.

The traditional predictive approach is based on deterministic models that calculate physical processes and the transport within the atmosphere. The most commonly used approaches are chemical transport models (CTMs) that process the input information of emissions, transport, mixing, and chemical transformation of trace gases and aerosols simultaneously with meteorological data. However, the reactions between air pollutants and influential factors are highly nonlinear, leading to a very complex system of air pollutant formation mechanisms. Therefore, statistical learning (or machine learning) algorithms are increasingly used to account for the proper nonlinear modelling of air contamination. Although statistical models do not explicitly simulate the environmental processes, they generally exhibit higher predictive performance than CTMs on fine spatiotemporal scales in the presence of extensive monitoring data.

Several machine learning (ML) approaches have been used in recent years to predict a set of air pollutants using different combinations of predictor parameters. However, with a growing number of studies, why a certain algorithm is chosen over another for a given task is puzzling. The objective of this Special Issue is to gather innovative research studies on ML models of air quality in order to better understand their predictive power. We are especially interested in papers focusing on (i) state-of-the-art algorithms (e.g., support vector machine, ensemble learning, artificial neural networks, extreme learning, deep learning, and hybrid models); (ii) models able to predict pollution peaks; (iii) the prediction of contaminants recently put in the spotlight (e.g., nanoparticles); and (iv) comparative studies between CTM-based and ML-based predictions.

Prof. Dr. Yves Rybarczyk
Prof. Dr. Rasa Zalakeviciute
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • air pollution
  • particulate matter, COx, NOx, SO2, O3
  • prediction and forecasting
  • statistical modeling
  • data mining and big data
  • support vector machine
  • extreme and deep learning
  • reinforcement learning
  • hybrid models
  • time series analysis

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

2 pages, 172 KiB  
Editorial
Special Issue on Air Quality Prediction Based on Machine Learning Algorithms
by Yves Rybarczyk and Rasa Zalakeviciute
Appl. Sci. 2023, 13(11), 6460; https://doi.org/10.3390/app13116460 - 25 May 2023
Viewed by 625
Abstract
Atmospheric pollution is one of the major causes of premature mortality and climate change, as nearly all urban areas fail to comply with the air quality guidelines of the World Health Organization (WHO) [...] Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms II)

Research

Jump to: Editorial

16 pages, 2908 KiB  
Article
FedDeep: A Federated Deep Learning Network for Edge Assisted Multi-Urban PM2.5 Forecasting
by Yue Hu, Ning Cao, Wangyong Guo, Meng Chen, Yi Rong and Hao Lu
Appl. Sci. 2024, 14(5), 1979; https://doi.org/10.3390/app14051979 - 28 Feb 2024
Viewed by 356
Abstract
Accurate urban PM2.5 forecasting serves a crucial function in air pollution warning and human health monitoring. Recently, deep learning techniques have been widely employed for urban PM2.5 forecasting. Unfortunately, two problems exist: (1) Most techniques are focused on training [...] Read more.
Accurate urban PM2.5 forecasting serves a crucial function in air pollution warning and human health monitoring. Recently, deep learning techniques have been widely employed for urban PM2.5 forecasting. Unfortunately, two problems exist: (1) Most techniques are focused on training and prediction on a central cloud. As the number of monitoring sites grows and the data explodes, handling a large amount of data on the central cloud can cause tremendous computational pressures and increase the risk of data leakages. (2) Existing methods lack an adaptive layer to capture the varying impacts of different external factors (e.g., weather conditions, temperature, and wind speed). In this paper, a federated deep learning network (FedDeep) is developed for edge-assisted multi-urban PM2.5 forecasting. First, we assign each urban region to an edge cloud server (ECS). An external spatio-temporal network (ESTNet) is then deployed on each ECS. Data from different urban regions are uploaded to the corresponding ECS for training, which avoids processing all the data on the central cloud and effectively alleviates computational pressure and data leakage issues. Second, in ESTNet, we develop a gating fusion layer to adaptively fuse external factors to improve prediction accuracy. Finally, we adopted PM2.5 data collected from air quality monitoring sites in 13 prefecture-level cities, Jiangsu Province for validation. The experimental results proved that FedDeep outperformed the advanced baselines in terms of prediction accuracy and model efficiency. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms II)
Show Figures

Figure 1

21 pages, 1235 KiB  
Article
Short-Term Forecasting of Ozone Concentration in Metropolitan Lima Using Hybrid Combinations of Time Series Models
by Natalí Carbo-Bustinza, Hasnain Iftikhar, Marisol Belmonte, Rita Jaqueline Cabello-Torres, Alex Rubén Huamán De La Cruz and Javier Linkolk López-Gonzales
Appl. Sci. 2023, 13(18), 10514; https://doi.org/10.3390/app131810514 - 21 Sep 2023
Cited by 3 | Viewed by 634
Abstract
In the modern era, air pollution is one of the most harmful environmental issues on the local, regional, and global stages. Its negative impacts go far beyond ecosystems and the economy, harming human health and environmental sustainability. Given these facts, efficient and accurate [...] Read more.
In the modern era, air pollution is one of the most harmful environmental issues on the local, regional, and global stages. Its negative impacts go far beyond ecosystems and the economy, harming human health and environmental sustainability. Given these facts, efficient and accurate modeling and forecasting for the concentration of ozone are vital. Thus, this study explores an in-depth analysis of forecasting the concentration of ozone by comparing many hybrid combinations of time series models. To this end, in the first phase, the hourly ozone time series is decomposed into three new sub-series, including the long-term trend, the seasonal trend, and the stochastic series, by applying the seasonal trend decomposition method. In the second phase, we forecast every sub-series with three popular time series models and all their combinations In the final phase, the results of each sub-series forecast are combined to achieve the results of the final forecast. The proposed hybrid time series forecasting models were applied to four Metropolitan Lima monitoring stations—ATE, Campo de Marte, San Borja, and Santa Anita—for the years 2017, 2018, and 2019 in the winter season. Thus, the combinations of the considered time series models generated 27 combinations for each sampling station. They demonstrated significant forecasts of the sample based on highly accurate and efficient descriptive, statistical, and graphic analysis tests, as a lower mean error occurred in the optimized forecast models compared to baseline models. The most effective hybrid models for the ATE, Campo de Marte, San Borja, and Santa Anita stations were identified based on their superior out-of-sample forecast results, as measured by RMSE (4.611, 3.637, 1.495, and 1.969), RMSPE (4.464, 11.846, 1.864, and 15.924), MAE (1.711, 2.356, 1.078, and 1.462), and MAPE (14.862, 20.441, 7.668, and 76.261) errors. These models significantly outperformed other models due to their lower error values. In addition, the best models are statistically significant (p < 0.05) and superior to the rest of the combination models. Furthermore, the final proposed models show significant performance with the least mean error, which is comparatively better than the considered baseline models. Finally, the authors also recommend using the proposed hybrid time series combination forecasting models to predict ozone concentrations in other districts of Lima and other parts of Peru. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms II)
Show Figures

Figure 1

22 pages, 4788 KiB  
Article
Hybrid LSTM Model to Predict the Level of Air Pollution in Montenegro
by Kruna Ratković, Nataša Kovač and Marko Simeunović
Appl. Sci. 2023, 13(18), 10152; https://doi.org/10.3390/app131810152 - 09 Sep 2023
Cited by 1 | Viewed by 714
Abstract
Air pollution is a critical environmental concern that poses significant health risks and affects multiple aspects of human life. ML algorithms provide promising results for air pollution prediction. In the existing scientific literature, Long Short-Term Memory (LSTM) predictive models, as well as their [...] Read more.
Air pollution is a critical environmental concern that poses significant health risks and affects multiple aspects of human life. ML algorithms provide promising results for air pollution prediction. In the existing scientific literature, Long Short-Term Memory (LSTM) predictive models, as well as their combination with other statistical and machine learning approaches, have been utilized for air pollution prediction. However, these combined algorithms may not always provide suitable results due to the stochastic nature of the factors that influence air pollution, improper hyperparameter configurations, or inadequate datasets and data characterized by great variability and extreme dispersion. The focus of this paper is applying and comparing the performance of Support Vector Machine and hybrid LSTM regression models for air pollution prediction. To identify optimal hyperparameters for the LSTM model, a hybridization with the Genetic Algorithm is proposed. To mitigate the risk of overfitting, the bagging technique is employed on the best LSTM model. The proposed predicitive model aims to determine the Common Air Quality Index level for the next hour in Niksic, Montenegro. With the hybridization of the LSTM algorithm and by applying the bagging technique, our approach aims to significantly enhance the accuracy and reliability of hourly air pollution prediction. The major contribution of this paper is in the application of advanced machine learning analysis and the combination of the LSTM, Genetic Algorithm, and bagging techniques, which have not been previously employed in the analysis of air pollution in Montenegro. The proposed model will be made available to interested management structures, local governments, national entities, or other relevant institutions, empowering them to make effective pollution level predictions and take appropriate measures. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms II)
Show Figures

Figure 1

16 pages, 5757 KiB  
Article
Numerical Study of the Flow of Pollutants during Air Purification, Taking into Account the Use of Eco-Friendly Material for the Filter—Mycelium
by Vaidotas Vaišis, Aleksandras Chlebnikovas and Raimondas Jasevičius
Appl. Sci. 2023, 13(3), 1703; https://doi.org/10.3390/app13031703 - 29 Jan 2023
Cited by 2 | Viewed by 1709
Abstract
To improve air quality, it is customary to apply technological measures to isolate or retain pollutants by influencing the polluted stream in various ways to effectively remove the pollutants. One of the most commonly used measures is a filter, in which the air [...] Read more.
To improve air quality, it is customary to apply technological measures to isolate or retain pollutants by influencing the polluted stream in various ways to effectively remove the pollutants. One of the most commonly used measures is a filter, in which the air flow passes through a porous aggregate. A variety of filter materials allows very selective and precise cleaning of the air flow in non-standard or even aggressive microclimate conditions. In this paper, the environmental aspect of the used materials is discussed, and a theoretical model of an adapted mycelium is proposed as an alternative to the use of filter materials to predict air flow purification. In the created numerical model of an idealized filter, several cases are considered when the pore size of the mycelial fillers reaches 1.0, 0.5 and 0.1 mm, and the feed flow velocity reaches 1–5 m/s. Moreover, in the mycelium itself, the flow velocity can decrease and approach the wall to a value of 0.3 m/s, which is estimated for additional numerical studies of interaction with the surface. These preliminary studies are aimed at establishing indicative theoretical parameters for favorable air flow movement in the structure of the mycelium. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms II)
Show Figures

Figure 1

24 pages, 849 KiB  
Article
Space-Time Prediction of PM2.5 Concentrations in Santiago de Chile Using LSTM Networks
by Billy Peralta, Tomás Sepúlveda, Orietta Nicolis and Luis Caro
Appl. Sci. 2022, 12(22), 11317; https://doi.org/10.3390/app122211317 - 08 Nov 2022
Cited by 8 | Viewed by 1432
Abstract
Currently, air pollution is a highly important issue in society due to its harmful effects on human health and the environment. The prediction of pollutant concentrations in Santiago de Chile is typically based on statistical methods or classical neural networks. Existing methods often [...] Read more.
Currently, air pollution is a highly important issue in society due to its harmful effects on human health and the environment. The prediction of pollutant concentrations in Santiago de Chile is typically based on statistical methods or classical neural networks. Existing methods often assume that historical values are known at a fixed geographic point, such that air pollution can be predicted at a future hour using time series analysis. However, these methods are inapplicable when it is necessary to know the pollutant concentrations at every point of the space. This work proposes a method that addresses the space-time prediction of PM2.5 concentration in Santiago de Chile at any spatial points through the use of the LSTM recurrent network model. In particular, by considering historical values of air pollutants (PM2.5, PM10 and nitrogen dioxide) and meteorological variables (temperature, wind speed and direction and relative humidity), measured at fixed monitoring stations, the proposed model can predict PM2.5 concentrations for the next 24 h in a new location where measurements are not available. This work describes the experiments carried out, with particular emphasis on the pre-processing step, which constitutes an important factor for obtaining relatively good results. The proposed multilayer LSTM model obtained R2 values equal to 0.74 and 0.38 in seven stations when considering forecasts of 1 and 24 h, respectively. As future work, we plan to include more input variables in the proposed model and to use attention-based networks. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms II)
Show Figures

Figure 1

20 pages, 5900 KiB  
Article
Air Contaminants and Atmospheric Black Carbon Association with White Sky Albedo at Hindukush Karakorum and Himalaya Glaciers
by Irfan Zainab, Zulfiqar Ali, Usman Ahmad, Syed Turab Raza, Rida Ahmad, Zaidi Zona and Safdar Sidra
Appl. Sci. 2022, 12(3), 962; https://doi.org/10.3390/app12030962 - 18 Jan 2022
Cited by 2 | Viewed by 1667
Abstract
Environmental contaminants are becoming a growing issue due to their effects on the cryosphere and their impact on the ecosystem. Mountain glaciers are receding in the HKH region and are anticipated to diminish further as black carbon (BC) concentrations rise along with other [...] Read more.
Environmental contaminants are becoming a growing issue due to their effects on the cryosphere and their impact on the ecosystem. Mountain glaciers are receding in the HKH region and are anticipated to diminish further as black carbon (BC) concentrations rise along with other pollutants in the air, increasing global warming. Air contaminants and BC concentrations were estimated (June 2017–May 2018). An inventory of different pollutants at three glaciers in Karakoram, Hindukush, and the Himalayas has been recorded with Aeroqual 500 and TSI DRX 8533, which are as follows: ozone (28.14 ± 3.58 µg/m3), carbon dioxide (208.58 ± 31.40 µg/m3), sulfur dioxide (1.73 ± 0.33 µg/m3), nitrogen dioxide (2.84 ± 0.37 µg/m3), PM2.5 (15.90 ± 3.32 µg/m3), PM10 (28.05 ± 2.88 µg/m3), total suspended particles (76.05 ± 10.19 µg/m3), BC in river water (88.74 ± 19.16 µg/m3), glaciers (17.66 ± 0.82 µg/m3), snow/rain (57.43 ± 19.66 ng/g), and air (2.80 ± 1.20 µg/m3). BC was estimated by using DRI Model 2015, Multi-Wavelength Thermal/Optical Carbon Analyzer, in conjunction with satellite-based white-sky albedo (WSA). The average BC concentrations in the Karakoram, Himalaya, and Hindukush were 2.35 ± 0.94, 4.38 ± 1.35, and 3.32 ± 1.09 (µg/m3), whereas WSA was 0.053 ± 0.024, 0.045 ± 0.015, and 0.045 ± 0.019 (µg/m3), respectively. Regression analysis revealed the inverse relationship between WSA and BC. The resulting curves provide a better understanding of the non-empirical link between BC and WSA. Increased BC will inherit ecological consequences for the region, ultimately resulting in biodiversity loss. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms II)
Show Figures

Figure 1

15 pages, 2631 KiB  
Article
Gradient Boosting Machine to Assess the Public Protest Impact on Urban Air Quality
by Rasa Zalakeviciute, Yves Rybarczyk, Katiuska Alexandrino, Santiago Bonilla-Bedoya, Danilo Mejia, Marco Bastidas and Valeria Diaz
Appl. Sci. 2021, 11(24), 12083; https://doi.org/10.3390/app112412083 - 18 Dec 2021
Cited by 4 | Viewed by 2332
Abstract
Political and economic protests build-up due to the financial uncertainty and inequality spreading throughout the world. In 2019, Latin America took the main stage in a wave of protests. While the social side of protests is widely explored, the focus of this study [...] Read more.
Political and economic protests build-up due to the financial uncertainty and inequality spreading throughout the world. In 2019, Latin America took the main stage in a wave of protests. While the social side of protests is widely explored, the focus of this study is the evolution of gaseous urban air pollutants during and after one of these events. Changes in concentrations of NO2, CO, O3 and SO2 during and after the strike, were studied in Quito, Ecuador using two approaches: (i) inter-period observational analysis; and (ii) machine learning (ML) gradient boosting machine (GBM) developed business-as-usual (BAU) comparison to the observations. During the strike, both methods showed a large reduction in the concentrations of NO2 (31.5–32.36%) and CO (15.55–19.85%) and a slight reduction for O3 and SO2. The GBM approach showed an exclusive potential, especially for a lengthier period of predictions, to estimate strike impact on air quality even after the strike was over. This advocates for the use of machine learning techniques to estimate an extended effect of changes in human activities on urban gaseous pollution. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms II)
Show Figures

Figure 1

24 pages, 4776 KiB  
Article
Evaluation of Machine Learning Models for Estimating PM2.5 Concentrations across Malaysia
by Nurul Amalin Fatihah Kamarul Zaman, Kasturi Devi Kanniah, Dimitris G. Kaskaoutis and Mohd Talib Latif
Appl. Sci. 2021, 11(16), 7326; https://doi.org/10.3390/app11167326 - 09 Aug 2021
Cited by 20 | Viewed by 3889
Abstract
Southeast Asia (SEA) is a hotspot region for atmospheric pollution and haze conditions, due to extensive forest, agricultural and peat fires. This study aims to estimate the PM2.5 concentrations across Malaysia using machine-learning (ML) models like Random Forest (RF) and Support Vector [...] Read more.
Southeast Asia (SEA) is a hotspot region for atmospheric pollution and haze conditions, due to extensive forest, agricultural and peat fires. This study aims to estimate the PM2.5 concentrations across Malaysia using machine-learning (ML) models like Random Forest (RF) and Support Vector Regression (SVR), based on satellite AOD (aerosol optical depth) observations, ground measured air pollutants (NO2, SO2, CO, O3) and meteorological parameters (air temperature, relative humidity, wind speed and direction). The estimated PM2.5 concentrations for a two-year period (2018–2019) are evaluated against measurements performed at 65 air-quality monitoring stations located at urban, industrial, suburban and rural sites. PM2.5 concentrations varied widely between the stations, with higher values (mean of 24.2 ± 21.6 µg m−3) at urban/industrial stations and lower (mean of 21.3 ± 18.4 µg m−3) at suburban/rural sites. Furthermore, pronounced seasonal variability in PM2.5 is recorded across Malaysia, with highest concentrations during the dry season (June–September). Seven models were developed for PM2.5 predictions, i.e., separately for urban/industrial and suburban/rural sites, for the four dominant seasons (dry, wet and two inter-monsoon), and an overall model, which displayed accuracies in the order of R2 = 0.46–0.76. The validation analysis reveals that the RF model (R2 = 0.53–0.76) exhibits slightly better performance than SVR, except for the overall model. This is the first study conducted in Malaysia for PM2.5 estimations at a national scale combining satellite aerosol retrievals with ground-based pollutants, meteorological factors and ML techniques. The satisfactory prediction of PM2.5 concentrations across Malaysia allows a continuous monitoring of the pollution levels at remote areas with absence of measurement networks. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms II)
Show Figures

Figure 1

Back to TopTop