Air Quality Prediction Based on Machine Learning Algorithms

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 December 2019) | Viewed by 25340

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Data and Information Sciences, Dalarna University, 791 88 Falun, Sweden
Interests: artificial intelligence and cognitive systems; machine learning-based models; prediction of air quality; programming and software development
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Grupo de Investigación en Biodiversidad, Medio Ambiente y Salud, Universidad de Las Américas, 170125 Quito, Ecuador
Interests: urban air pollution; natural aerosol formation; climate; conservation
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Worsening air quality is one of the major global causes of premature mortality and is the main environmental risk, claiming seven million deaths every year. Nearly all urban areas do not comply with the air quality guidelines of the World Health Organization (WHO). This health threat could be diminished by developing models to forecast air quality and inform citizens of the risks of practicing certain activities during elevated pollution episodes.

The traditional predictive approach is based on deterministic models that calculate physical processes and the transport within the atmosphere. The approach most commonly used by the community are chemical transport models (CTMs) that process the input information of emissions, transport, mixing, and chemical transformation of trace gases and aerosols simultaneously with meteorology. However, the reactions between air pollutants and influential factors are highly non-linear, leading to a very complex system of air pollutant formation mechanisms. Therefore, statistical learning (or machine learning) algorithms are increasingly used to account for the proper non-linear modelling of air contamination. Although statistical models do not explicitly simulate the environmental processes, they generally exhibit higher predictive performance than CTMs on fine spatiotemporal scales in the presence of extensive monitoring data.

Several machine learning (ML) approaches have been used in recent years to predict a set of air pollutants using different combinations of predictor parameters. However, with a growing number of studies, why a certain algorithm is chosen over another for a given task is puzzling. The objective of this Special Issue is to gather innovative research studies on ML models of air quality, in order to better understand their predictive power. We are especially interested in papers focusing on: (i) state-of-the-art algorithms (e.g., support vector machine, ensemble learning, artificial neural networks, extreme learning, deep learning, and hybrid models); (ii) models able to predict pollution peaks; (iii) the prediction of contaminants recently put in the spotlight (e.g., nanoparticles); and (iv) comparative studies between CTM-based and ML-based predictions.

Prof. Dr. Yves Rybarczyk
Prof. Dr. Rasa Zalakeviciute
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Air pollution
  • Particulate matter, COx, NOx, SO2, O3
  • Prediction and forecasting
  • Statistical modeling
  • Data mining and big data
  • Support vector machine
  • Extreme and deep learning
  • Reinforcement learning
  • Hybrid models
  • Time series analysis

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 10356 KiB  
Article
A Traffic-Based Method to Predict and Map Urban Air Quality
by Rasa Zalakeviciute, Marco Bastidas, Adrian Buenaño and Yves Rybarczyk
Appl. Sci. 2020, 10(6), 2035; https://doi.org/10.3390/app10062035 - 17 Mar 2020
Cited by 28 | Viewed by 3667
Abstract
As global urbanization, industrialization, and motorization keep worsening air quality, a continuous rise in health problems is projected. Limited spatial resolution of the information on air quality inhibits full comprehension of urban population exposure. Therefore, we propose a method to predict urban air [...] Read more.
As global urbanization, industrialization, and motorization keep worsening air quality, a continuous rise in health problems is projected. Limited spatial resolution of the information on air quality inhibits full comprehension of urban population exposure. Therefore, we propose a method to predict urban air pollution from traffic by extracting data from Web-based applications (Google Traffic). We apply a machine learning approach by training a decision tree algorithm (C4.8) to predict the concentration of PM2.5 during the morning pollution peak from: (i) an interpolation (inverse distance weighting) of the value registered at the monitoring stations, (ii) traffic flow, and (iii) traffic flow + time of the day. The results show that the prediction from traffic outperforms the one provided by the monitoring network (average of 65.5% for the former vs. 57% for the latter). Adding the time of day increases the accuracy by an average of 6.5%. Considering the good accuracy on different days, the proposed method seems to be robust enough to create general models able to predict air pollution from traffic conditions. This affordable method, although beneficial for any city, is particularly relevant for low-income countries, because it offers an economically sustainable technique to address air quality issues faced by the developing world. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms)
Show Figures

Figure 1

18 pages, 877 KiB  
Article
Bayesian Proxy Modelling for Estimating Black Carbon Concentrations using White-Box and Black-Box Models
by Martha A. Zaidan, Darren Wraith, Brandon E. Boor and Tareq Hussein
Appl. Sci. 2019, 9(22), 4976; https://doi.org/10.3390/app9224976 - 19 Nov 2019
Cited by 17 | Viewed by 3987
Abstract
Black carbon (BC) is an important component of particulate matter (PM) in urban environments. BC is typically emitted from gas and diesel engines, coal-fired power plants, and other sources that burn fossil fuel. In contrast to PM, BC measurements are not always available [...] Read more.
Black carbon (BC) is an important component of particulate matter (PM) in urban environments. BC is typically emitted from gas and diesel engines, coal-fired power plants, and other sources that burn fossil fuel. In contrast to PM, BC measurements are not always available on a large scale due to the operational cost and complexity of the instrumentation. Therefore, it is advantageous to develop a mathematical model for estimating the quantity of BC in the air, termed a BC proxy, to enable widening of spatial air pollution mapping. This article presents the development of BC proxies based on a Bayesian framework using measurements of PM concentrations and size distributions from 10 to 10,000 nm from a recent mobile air pollution study across several areas of Jordan. Bayesian methods using informative priors can naturally prevent over-fitting in the modelling process and the methods generate a confidence interval around the prediction, thus the estimated BC concentration can be directly quantified and assessed. In particular, two types of models are developed based on their transparency and interpretability, referred to as white-box and black-box models. The proposed methods are tested on extensive data sets obtained from the measurement campaign in Jordan. In this study, black-box models perform slightly better due to their model complexity. Nevertheless, the results demonstrate that the performance of both models does not differ significantly. In practice, white-box models are relatively more convenient to be deployed, the methods are well understood by scientists, and the models can be used to better understand key relationships. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms)
Show Figures

Graphical abstract

14 pages, 2378 KiB  
Article
Integrated Predictor Based on Decomposition Mechanism for PM2.5 Long-Term Prediction
by Xuebo Jin, Nianxiang Yang, Xiaoyi Wang, Yuting Bai, Tingli Su and Jianlei Kong
Appl. Sci. 2019, 9(21), 4533; https://doi.org/10.3390/app9214533 - 25 Oct 2019
Cited by 54 | Viewed by 3016
Abstract
It is crucial to predict PM2.5 concentration for early warning regarding and the control of air pollution. However, accurate PM2.5 prediction has been challenging, especially in long-term prediction. PM2.5 monitoring data comprise a complex time series that contains multiple components with different characteristics; [...] Read more.
It is crucial to predict PM2.5 concentration for early warning regarding and the control of air pollution. However, accurate PM2.5 prediction has been challenging, especially in long-term prediction. PM2.5 monitoring data comprise a complex time series that contains multiple components with different characteristics; therefore, it is difficult to obtain an accurate prediction by a single model. In this study, an integrated predictor is proposed, in which the original data are decomposed into three components, that is, trend, period, and residual components, and then different sub-predictors including autoregressive integrated moving average (ARIMA) and two gated recurrent units are used to separately predict the different components. Finally, all the predictions from the sub-predictors are combined in fusion node to obtain the final prediction for the original data. The results of predicting the PM2.5 time series for Beijing, China showed that the proposed predictor can effectively improve prediction accuracy for long-term prediction. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms)
Show Figures

Figure 1

15 pages, 5241 KiB  
Article
Gridded Visibility Products over Marine Environments Based on Artificial Neural Network Analysis
by Yulong Shan, Ren Zhang, Ismail Gultepe, Yaojia Zhang, Ming Li and Yangjun Wang
Appl. Sci. 2019, 9(21), 4487; https://doi.org/10.3390/app9214487 - 23 Oct 2019
Cited by 8 | Viewed by 2195
Abstract
The reconstruction and monitoring of visibility over marine environments is critically important because of a lack of observations. To travel safely in marine environments, a high quality of visibility data is needed to evaluate navigation risk. Currently, although visibility is available through numerical [...] Read more.
The reconstruction and monitoring of visibility over marine environments is critically important because of a lack of observations. To travel safely in marine environments, a high quality of visibility data is needed to evaluate navigation risk. Currently, although visibility is available through numerical weather prediction models as well as ground and spaceborne remote sensing platforms and ship measurements, issues still exist over the remote marine environments and northern latitudes. To improve visibility prediction and reduce navigational risks, gridded visibility data based on artificial neural network analysis can be used over marine environments, and the problem can be regarded as an air quality prediction problem based on machine learning algorithms. This new method based on artificial intelligence techniques developed here is tested over the Indian Ocean. The mean error of the inferred visibility from the artificial neural network analysis is found to be less than 8.0%. The results suggested that satellite-based optical thickness and numerical model-based reanalysis data can be used to infer gridded visibility values based on artificial neural network analysis, and that could help us reconstruct and monitor surface gridded visibility values over marine and remote environments. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms)
Show Figures

Figure 1

20 pages, 1650 KiB  
Article
Mutual Information Input Selector and Probabilistic Machine Learning Utilisation for Air Pollution Proxies
by Martha A. Zaidan, Lubna Dada, Mansour A. Alghamdi, Hisham Al-Jeelani, Heikki Lihavainen, Antti Hyvärinen and Tareq Hussein
Appl. Sci. 2019, 9(20), 4475; https://doi.org/10.3390/app9204475 - 22 Oct 2019
Cited by 23 | Viewed by 3366
Abstract
An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as [...] Read more.
An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as the operators involved (i.e., virtual sensor). In this paper, we present a generic concept of pollutant proxy development based on an optimised data-driven approach. We propose a mutual information concept to determine the interdependence of different variables and thus select the most correlated inputs. The most relevant variables are selected to be the best proxy inputs, where several metrics and data loss are also involved for guidance. The input selection method determines the used data for training pollutant proxies based on a probabilistic machine learning method. In particular, we use a Bayesian neural network that naturally prevents overfitting and provides confidence intervals around its output prediction. In this way, the prediction uncertainty could be assessed and evaluated. In order to demonstrate the effectiveness of our approach, we test it on an extensive air pollution database to estimate ozone concentration. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms)
Show Figures

Graphical abstract

14 pages, 1645 KiB  
Article
Prediction of Air Pollution Concentration Based on mRMR and Echo State Network
by Xinghan Xu and Weijie Ren
Appl. Sci. 2019, 9(9), 1811; https://doi.org/10.3390/app9091811 - 01 May 2019
Cited by 22 | Viewed by 3279
Abstract
Air pollution has become a global environmental problem, because it has a great adverse impact on human health and the climate. One way to explore this problem is to monitor and predict air quality index in an economical way. Accurate monitoring and prediction [...] Read more.
Air pollution has become a global environmental problem, because it has a great adverse impact on human health and the climate. One way to explore this problem is to monitor and predict air quality index in an economical way. Accurate monitoring and prediction of air quality index (AQI), e.g., PM2.5 concentration, is a challenging task. In order to accurately predict the PM2.5 time series, we propose a supplementary leaky integrator echo state network (SLI-ESN) in this paper. It adds the historical state term of the historical moment to the calculation of leaky integrator reservoir, which improves the influence of historical evolution state on the current state. Considering the redundancy and correlation between multivariable time series, minimum redundancy maximum relevance (mRMR) feature selection method is introduced to reduce redundant and irrelevant information, and increase computation speed. A variety of evaluation indicators are used to assess the overall performance of the proposed method. The effectiveness of the proposed model is verified by the experiment of Beijing PM2.5 time series prediction. The comparison of learning time also shows the efficiency of the algorithm. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms)
Show Figures

Figure 1

21 pages, 1298 KiB  
Article
Majority Voting Based Multi-Task Clustering of Air Quality Monitoring Network in Turkey
by Goksu Tuysuzoglu, Derya Birant and Aysegul Pala
Appl. Sci. 2019, 9(8), 1610; https://doi.org/10.3390/app9081610 - 18 Apr 2019
Cited by 12 | Viewed by 4379
Abstract
Air pollution, which is the result of the urbanization brought by modern life, has a dramatic impact on the global scale as well as local and regional scales. Since air pollution has important effects on human health and other living things, the issue [...] Read more.
Air pollution, which is the result of the urbanization brought by modern life, has a dramatic impact on the global scale as well as local and regional scales. Since air pollution has important effects on human health and other living things, the issue of air quality is of great importance all over the world. Accordingly, many studies based on classification, clustering and association rule mining applications for air pollution have been proposed in the field of data mining and machine learning to extract hidden knowledge from environmental parameters. One approach is to model a region in a way that cities having similar characteristics are determined and placed into the same clusters. Instead of using traditional clustering algorithms, a novel algorithm, named Majority Voting based Multi-Task Clustering (MV-MTC), is proposed and utilized to consider multiple air pollutants jointly. Experimental studies showed that the proposed method is superior to five well-known clustering algorithms: K-Means, Expectation Maximization, Canopy, Farthest First and Hierarchical clustering methods. Full article
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms)
Show Figures

Figure 1

Back to TopTop