# Establishing a Real-Time Prediction System for Fine Particulate Matter Concentration Using Machine-Learning Models

^{1}

^{2}

^{*}

## Abstract

**:**

_{2.5}) concentration. This enables individuals to be aware of their immediate surroundings in advance, reducing their exposure to high concentrations of fine particulate matter. The research area includes Keelung City and Xizhi District in New Taipei City, located in northern Taiwan. This study establishes five fine prediction models based on machine-learning algorithms, namely, the deep neural network (DNN), M5’ decision tree algorithm (M5P), M5’ rules decision tree algorithm (M5Rules), alternating model tree (AMT), and multiple linear regression (MLR). Based on the predictive results from these five models, the study evaluates the optimal model for forecast horizons and proposes a real-time PM

_{2.5}concentration prediction system by integrating various models. The results demonstrate that the prediction errors vary across different models at different forecast horizons, with no single model consistently outperforming the others. Therefore, the establishment of a hybrid prediction system proves to be more accurate in predicting future PM

_{2.5}concentration compared to a single model. To assess the practicality of the system, the study process involved simulating data, with a particular focus on the winter season when high PM

_{2.5}concentrations are prevalent. The predictive system generated excellent results, even though errors increased in long-term predictions. The system can promptly adjust its predictions over time, effectively forecasting the PM

_{2.5}concentration for the next 12 h.

## 1. Introduction

_{2.5}) pose a significant risk to human health. However, due to the high cost of establishing air quality monitoring stations, their quantity and locations are often limited. Moreover, the data collected from these monitoring stations only represent concentrations within specific ranges, failing to meet the expectations of the general public regarding environmental air quality. Therefore, there is a need to develop precise methods to predict the PM

_{2.5}concentration in specific regions accurately. This would enable the accurate forecasting of PM

_{2.5}concentrations for the upcoming hours (such as 1 to 12 h), allowing individuals to prepare appropriate measures before engaging in outdoor activities.

_{2.5}, becoming a growing public crisis in metropolitan areas. The purpose of this study is to predict fine particulate matter concentrations and develop a hybrid real-time prediction system. This system aims to provide the public with forecasts of PM

_{2.5}concentration for the upcoming hours, allowing them to make informed decisions regarding protective measures before heading outdoors. The study locations include monitoring stations in Keelung and Xizhi, where the real-time prediction system is established to forecast PM

_{2.5}concentrations.

_{2.5}transport through atmospheric dynamics has been extensively applied to simulate regional PM

_{2.5}[7,8,9,10,11]. Nonetheless, the formation and development of PM

_{2.5}are complex, and atmospheric dispersion models have limitations in predicting PM

_{2.5}concentration [12].

_{3}and PM

_{10}concentrations, in Milan, Italy. The results showed no significant differences in the predictive accuracy. Bai et al. [18] presented an air quality prediction method using backpropagation neural networks combined with wavelet analysis. They used wavelet coefficients of air pollutant concentrations from the previous day and local meteorological data to predict daily air pollutant concentrations. Siwek and Osowski [19] presented a system for predicting next-day air pollution using multilayer perceptrons (MLPs), radial basis functions, and support vector machines (SVMs). They used genetic algorithms and a stepwise fitting approach for feature selection and applied the selected feature set as input features for random forests. The results showed that the prediction through this method outperformed direct feature application in random forests. Li and Zhu [20] proposed a pollution source selection method using fuzzy evaluation and rough set theory. They used a neural network extreme learning machine to build a deterministic prediction model, effectively predicting the air quality in six major Chinese cities. This novel hybrid air quality warning technique simplifies the existing air quality warning system’s work and enhances its efficiency. Mehdipour et al. [21] used meteorological data and chemical components to predict fine particulate matter concentrations. They built three models, decision trees, Bayesian networks, and SVM, and compared their prediction results. The SVM model exhibited the best performance, but the decision tree and Bayesian network methods also provided good results. Wang and Song [22] developed a spatial–temporal ensemble model based on long short-term memory using Granger causality for monitoring data and meteorological data between air quality monitoring stations. Lee et al. [23] introduced a gradient-boosted machine-learning approach for predicting PM

_{2.5}concentration in Taiwan. Ma et al. [24] adopted four types of machine-learning models including the random forest model, ridge regression model, SVM, and extremely randomized trees model to predict PM

_{2.5}concentration. Dai et al. [25] proposed a hybrid VAR-tree model to estimate the spatial and temporal distribution of O

_{3}concentration by using atmospheric pollutant data and meteorological data. Dai et al. [26] proposed a high-dimensional multi-objective optimal dispatch strategy for power systems based on the air pollutant dispersion model, which focuses on cost reduction and CO emission reduction in non-pollution days.

_{2.5}concentration exhibit regional variations [28,29,30,31]. In a study by Hsu et al. [32], research focused on weather patterns and associated air pollution in Taiwan. The findings revealed that concentrations of air pollutants such as PM

_{2.5}, particulate matter (PM

_{10}), and ozone (O

_{3}) demonstrate specific air quality variations in different regions under various weather patterns. In light of these challenges, this study aims to develop an hourly conceptual system for PM

_{2.5}forecasting utilizing machine-learning techniques. The objectives of the study include:

- (1)
- This study utilizes machine-learning models, including artificial neural networks, decision trees, and linear regression, to construct predictive models for PM
_{2.5}concentration. Subsequently, the study aims to identify the most appropriate models for real-time forecasting analysis. - (2)
- Given the uncertainty in determining the optimal model for predictions ranging from 1 to 12 h, this study seeks to develop an effective real-time prediction system. The proposed system aims to create an ensemble table for each prediction time horizon by incorporating results from diverse models. Subsequently, the system selects the most suitable prediction model for each time horizon through the ensemble table, thereby enhancing the accuracy of PM
_{2.5}concentration predictions.

## 2. Methodology

_{2.5}concentration in a specific region. The research encompasses the establishment of predictive models, including deep neural networks (DNNs), the non-linear continuous data tree regression algorithm M5’ (referred to as M5P), the non-linear optimization rule-based tree regression algorithm (M5’Rules, referred to as M5Rules), the alternating model tree (AMT), and the multiple linear regression (MLR). Acknowledging the inherent challenge that no single model can achieve complete precision in the field of machine learning, the study addresses this limitation by integrating the predictive capabilities of various models.

#### 2.1. Modeling Procedure

- The first step is data collection, which involves gathering meteorological and pollutant data from the air quality monitoring website (https://airtw.moenv.gov.tw/ accessed on 1 December 2022) of Taiwan’s Ministry of Environment.
- Step two involves data filtering. In this step, all collected attribute data undergo a thorough correlation analysis with the target variable, PM
_{2.5}concentration. The correlation analysis serves as the primary criterion for filtering attribute data. Attributes exhibiting low correlations with PM_{2.5}concentration are identified and, subsequently, removed during the screening process. On the other hand, attributes demonstrating high correlations are retained as input data for the models. This careful selection process aims to enhance the models’ capability to capture meaningful relationships between the input features and the target variable. - Step three involves dataset partitioning, where all input data are divided to define the training and validation data required for model construction, as well as the testing data used after model development is completed.
- Step four involves the crucial stage of model development, where we employ a meticulous trial-and-error approach for parameter tuning in each model. This iterative process is essential for enhancing the predictive performance of the models. Specifically, we systematically adjust model parameters based on prediction results, continuously calibrating and refining until an optimal combination is identified.
- Step five constitutes the crucial phase of performance evaluation, where the efficacy of the models is systematically assessed using a range of established evaluation metrics. This comprehensive set of metrics aims to provide a nuanced understanding of the models’ predictive capabilities. This research uses root mean squared error (RMSE), relative root mean squared error (rRMSE), mean absolute percentage error (MAPE), efficiency coefficient (CE), and correlation coefficient (r) as evaluation metrics. RMSE, rRMSE, MAPE, CE, and r are defined as follows:$$\mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}=\sqrt{\frac{{\sum}_{i=1}^{n}{\left({O}_{i}^{Pre}-{O}_{i}^{Obs}\right)}^{2}}{n}}$$$$\mathrm{r}\mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}=\frac{\sqrt{\frac{1}{n}{\sum}_{i=1}^{n}{\left({O}_{i}^{Pre}-{O}_{i}^{Obs}\right)}^{2}}}{{\overline{O}}^{Obs}}$$$$\mathrm{MAPE}=\frac{1}{n}{\sum}_{i=1}^{n}\left|\frac{{{O}_{i}^{Pre}-O}_{i}^{Obs}}{{O}_{i}^{Obs}}\right|\times 100$$$$\mathrm{CE}=1-\frac{{\sum}_{i=1}^{n}\left({O}_{i}^{Obs}-{O}_{i}^{pre}\right)}{{\sum}_{i=1}^{n}\left({O}_{i}^{Obs}-{\overline{O}}^{Pre}\right)}$$$$\mathrm{r}=\frac{{\sum}_{i=1}^{n}\left({O}_{i}^{Obs}-{\overline{O}}^{Obs}\right)\left({O}_{i}^{Pre}-{\overline{O}}^{Pre}\right)}{\sqrt{{\sum}_{i=1}^{n}{\left({O}_{i}^{Obs}-{\overline{O}}^{Obs}\right)}^{2}}\sqrt{{\sum}_{i=1}^{n}{\left({O}_{i}^{Pre}-{\overline{O}}^{Pre}\right)}^{2}}}$$
- Finally, step six involves the establishment and testing of the real-time prediction system. Based on the prediction results from various models, a model selection tabular is created to consolidate the composite real-time prediction system. This system can provide accurate predictions at different forecast horizons by selecting the best model from the selection tabular.

#### 2.2. Model Theory

#### 2.2.1. DNN

_{i}represents the input values, w

_{i}represents the connection weights, b

_{i}represents the bias values, f represents the activation function, and y

_{i}represents the output values.

#### 2.2.2. Decision Tree

- M5P

_{i}is the subset created by splitting nodes based on the selected attributes, sd(T

_{i}) is the standard deviation of the target attribute for the ith subset, and n is the total number of branches created by splitting attributes.

- M5Rules

- AMT

_{i}is the corresponding target value, ${f}_{j}\left(\overrightarrow{x}\right)$ is the base model, and k is the number of base models.

#### 2.2.3. MLR

_{0}is the constant term, β

_{1}, β

_{2}, … β

_{n}are the regression coefficients, X

_{1}, X

_{2}, … X

_{n}are the independent variables, n is the total number of independent variables, and ε represents the error.

#### 2.3. Study Area and Data

_{2.5}; μg/m

^{3}), particulate matter (PM

_{10}; μg/m

^{3}), ozone (O

_{3}; ppb), sulfur dioxide (SO

_{2}; ppb), carbon monoxide (CO; ppm), nitrogen oxides (NO

_{X}; ppb), nitric oxide (NO; ppb), nitrogen dioxide (NO

_{2}; ppb), total hydrocarbons (THC; ppm), methane (CH

_{4}; ppm), and non-methane hydrocarbons (NMHC; ppm). Table 1 displays the monitored parameters for each air quality monitoring station.

## 3. Model Development and Evaluation

#### 3.1. Feature Selection

_{2.5}concentration has been discussed in detail in [28,29,30,31,32]. In response to concerns regarding the absence of meteorological parameters in the selected features, we conducted a correlation analysis to identify pertinent variables for our model. Specifically, this study employed correlation analysis to select suitable input variables for the target monitoring stations, namely, Keelung Station and Xizhi Station.

_{2.5}at each target station. In other words, we selected attributes based on whether the correlation coefficient between the attribute data and PM

_{2.5}at each target station is greater than or equal to 0.3. This approach ensures that only attributes with a meaningful correlation are considered for inclusion in the model.

_{2.5}concentration in the target stations. As depicted in Table 2, Keelung Station and Xizhi Station selected 21 and 25 attributes, respectively. The results of the correlation analysis highlighted notable correlations, particularly observing high correlations for PM

_{2.5}, PM

_{10}, and CO at each station.

#### 3.2. Calibration of Model Parameters

#### 3.3. Model Testing Results

#### 3.4. Model Comparison

## 4. Simulation

#### 4.1. A Real-Time Prediction System

_{2.5}concentration can be used to test the system’s performance during this specific season. Since the system’s construction is based on annual data, we established a separate real-time PM

_{2.5}concentration prediction system specifically for the winter season. Table 6 presents the selection model ensemble tabular for the winter season.

#### 4.2. System Simulations

_{2.5}, and the orange solid lines represent the predictions generated by the system. Each prediction period covers the next 12 h from the current moment. The predictions start at 5 AM on the given day and are adjusted hourly for fine particulate matter values. Figure 10d,h indicate that Keelung Station underestimated the values starting from 11 AM, continuing into the evening at 8 PM. On the other hand, Xizhi Station showed a slight overestimation, as seen in Figure 11d,h. Overall, the real-time prediction systems for both annual data and the winter season data at Keelung Station and Xizhi Station demonstrate the changing trends in fine particulate matter concentrations.

## 5. Conclusions

- (1)
- When comparing different models, it was observed that decision tree models provided a more stable predictive performance across both Keelung Station and Xizhi Station. In contrast, the DNN model exhibited larger fluctuations in error values compared to other models across various prediction time horizons. The MLR model also displayed greater fluctuations in prediction errors at Xizhi Station compared to other models. However, it is worth noting that the DNN and MLR models outperformed other models in certain time horizons, indicating that each model has its own strengths and weaknesses in prediction.
- (2)
- If the PM
_{2.5}concentration remains above 35 μg/m^{3}for an extended period, there is a higher probability that the predicted results will be underestimated. The main reason for this may be the infrequent occurrence of high concentrations in historical data, causing the model to be ineffective in predicting high values. - (3)
- At different prediction time horizons, all models performed well in predicting results for the next 3 h. However, the prediction errors began to increase for predictions in the 4 to 6 h range. For longer-term predictions covering 12 h, all models exhibited larger prediction errors. In general, as the prediction time interval increased, the prediction errors also increased, indicating that longer-term forecasts are more challenging for all models.
- (4)
- Ultimately, this study developed a real-time fine particulate matter concentration prediction system that can predict the changes in fine particulate matter concentration for the next 12 h. The system also includes hourly adjustments to the predictions. Compared to the performance of individual models, the real-time prediction system provides a more comprehensive and accurate prediction trend.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Kong, S.; Li, X.; Li, L.; Yin, Y.; Chen, K.; Yuan, L.; Zhang, Y.; Shan, Y.; Ji, Y. Variation of polycyclic aromatic hydrocarbons in atmospheric PM
_{2}._{5}during winter haze period around 2014 Chinese Spring Festival at Nanjing: Insights of source changes, air mass direction and firework particle injection. Sci. Total Environ.**2015**, 520, 59–72. [Google Scholar] [CrossRef] [PubMed] - Alvarez, H.A.O.; Myers, O.B.; Weigel, M.; Armijos, R.X. The value of using seasonality and meteorological variables to model intra-urban PM
_{2}._{5}variation. Atmos. Environ.**2018**, 182, 1–8. [Google Scholar] [CrossRef] [PubMed] - Tai, A.P.; Mickley, L.J.; Jacob, D.J. Correlations between fine particulate matter (PM
_{2.5}) and meteorological variables in the United States: Implications for the sensitivity of PM_{2}._{5}to climate change. Atmos. Environ.**2010**, 44, 3976–3984. [Google Scholar] [CrossRef] - Li, L.; Qian, J.; Ou, C.Q.; Zhou, Y.X.; Guo, C.; Guo, Y. Spatial and temporal analysis of Air Pollution Index and its timescale-dependent relationship with meteorological factors in Guangzhou, China, 2001–2011. Environ. Pollut.
**2014**, 190, 75–81. [Google Scholar] [CrossRef] [PubMed] - Shah, A.S.V.; Langrish, J.P.; Nair, H.; McAllister, D.A.; Hunter, A.L.; Donaldson, K.; Newby, D.E.; Mills, N.L. Global association of air pollution and heart failure: A systematic review and meta-analysis. Lancet
**2013**, 382, 1039–1048. [Google Scholar] [CrossRef] [PubMed] - Srimuruganandam, B.; Nagendra, S.S. Source characterization of PM
_{10}and PM_{2}._{5}mass using a chemical mass balance model at urban roadside. Sci. Total Environ.**2012**, 433, 8–19. [Google Scholar] [CrossRef] - Atkinson, R.W.; Fuller, G.W.; Anderson, H.R.; Harrison, R.M.; Armstrong, B. Urban ambient particle metrics and health: A time-series analysis. Epidemiology
**2010**, 21, 501–511. [Google Scholar] [CrossRef] - Brauer, M.; Amann, M.; Burnett, R.T.; Cohen, A.; Dentener, F.; Ezzati, M.; Henderson, S.B.; Krzyzanowski, M.; Martin, R.V.; Van Dingenen, R.; et al. Exposure assessment for estimation of the global burden of disease attributable to outdoor air pollution. Environ. Sci. Technol.
**2012**, 46, 652–660. [Google Scholar] [CrossRef] - Cheung, K.; Daher, N.; Kam, W.; Shafer, M.M.; Ning, Z.; Schauer, J.J.; Sioutas, C. Spatial and temporal variation of chemical composition and mass closure of ambient coarse particulate matter (PM
_{10–2.5}) in the Los Angeles area. Atmos. Environ.**2011**, 45, 2651–2662. [Google Scholar] [CrossRef] - Yang, F.; Tan, J.; Zhao, Q.; Du, Z.; He, K.; Ma, Y.; Duan, F.; Chen, G.; Zhao, Q. Characteristics of PM
_{2}._{5}speciation in representative megacities and across China. Atmos. Chem. Phys.**2011**, 11, 5207–5219. [Google Scholar] [CrossRef] - Zhang, Y. Dynamic effect analysis of meteorological conditions on air pollution: A case study from Beijing. Sci. Total Environ.
**2019**, 684, 178–185. [Google Scholar] [CrossRef] [PubMed] - Ni, X.Y.; Huang, H.; Du, W.P. Relevance analysis and short-term prediction of PM
_{2}._{5}concentrations in Beijing based on multi-source data. Atmos. Environ.**2017**, 150, 146–161. [Google Scholar] [CrossRef] - Anagnostopoulos, F.K.; Rigas, S.; Papachristou, M.; Chaniotis, I.; Anastasiou, I.; Tryfonopoulos, C.; Raftopoulou, P. A novel AI framework for PM pollution prediction applied to a Greek Port City. Atmosphere
**2023**, 14, 1413. [Google Scholar] [CrossRef] - Lai, K.; Xu, H.; Sheng, J.; Huang, Y. Hour-by-hour prediction model of air pollutant concentration based on EIDW-informer—A case study of Taiyuan. Atmosphere
**2023**, 14, 1274. [Google Scholar] [CrossRef] - Liu, X.; Zhao, K.; Liu, Z.; Wang, L. PM
_{2}._{5}Concentration prediction based on LightGBM optimized by adaptive multi-strategy enhanced sparrow search algorithm. Atmosphere**2023**, 14, 1612. [Google Scholar] [CrossRef] - Mampitiya, L.; Rathnayake, N.; Leon, L.P.; Mandala, V.; Azamathulla, H.M.; Shelton, S.; Hoshino, Y.; Rathnayake, U. Machine learning techniques to predict the air quality using meteorological data in two urban areas in Sri Lanka. Environments
**2023**, 10, 141. [Google Scholar] [CrossRef] - Corani, G. Air quality prediction in Milan: Feed-forward neural networks, pruned neural networks and lazy learning. Ecol. Model.
**2005**, 185, 513–529. [Google Scholar] [CrossRef] - Bai, Y.; Li, Y.; Wang, X.; Xie, J.; Li, C. Air pollutants concentrations forecasting using back propagation neural network based on wavelet decomposition with meteorological conditions. Atmos. Pollut. Res.
**2016**, 7, 557–566. [Google Scholar] [CrossRef] - Siwek, K.; Osowski, S. Data mining methods for prediction of air pollution. Int. J. Appl. Math. Comput. Sci.
**2016**, 26, 467–478. [Google Scholar] [CrossRef] - Li, C.; Zhu, Z. Research and application of a novel hybrid air quality early-warning system: A case study in China. Sci. Total Environ.
**2018**, 626, 1421–1438. [Google Scholar] [CrossRef] - Mehdipour, V.; Stevenson, D.S.; Memarianfard, M.; Sihag, P. Comparing different methods for statistical modeling of particulate matter in Tehran, Iran. Air Qual. Atmos. Health
**2018**, 11, 1155–1165. [Google Scholar] [CrossRef] - Wang, J.; Song, G. A deep spatial-temporal ensemble model for air quality prediction. Neurocomputing
**2018**, 314, 198–206. [Google Scholar] [CrossRef] - Lee, M.; Lin, L.; Chen, C.Y.; Tsao, Y.; Yao, T.H.; Fei, M.H.; Fang, S.H. Forecasting air quality in Taiwan by using machine learning. Sci. Rep.
**2020**, 10, 4153. [Google Scholar] [CrossRef] [PubMed] - Ma, X.; Chen, T.; Ge, R.; Xv, F.; Cui, C.; Li, J. Prediction of PM
_{2}._{5}concentration using spatiotemporal data with machine learning models. Atmosphere**2023**, 14, 1517. [Google Scholar] [CrossRef] - Dai, H.; Huang, G.; Wang, J.; Zeng, H. VAR-tree model based spatio-temporal characterization and prediction of O
_{3}concentration in China. Ecotoxicol. Environ. Saf.**2023**, 257, 114960. [Google Scholar] [CrossRef] [PubMed] - Dai, H.; Huang, G.; Zeng, H. Multi-objective optimal dispatch strategy for power systems with Spatio-temporal distribution of air pollutants. Sustain. Cities Soc.
**2023**, 98, 104801. [Google Scholar] [CrossRef] - Liu, C.M.; Young, C.Y.; Lee, Y.C. Influence of Asian dust storms on air quality in Taiwan. Sci. Total Environ.
**2006**, 368, 884–897. [Google Scholar] [CrossRef] - Misra, C.; Geller, M.D.; Shah, P.; Sioutas, C.; Solomon, P.A. Development and Evaluation of a Continuous Coarse (PM
_{10}–PM_{2.5}) Particle Monitor. J. Air Waste Manag. Assoc.**2001**, 51, 1309–1317. [Google Scholar] [CrossRef] - Juda-Rezler, K.; Reizer, M.; Oudinet, J.P. Determination and analysis of PM
_{10}source apportionment during episodes of air pollution in Central Eastern European urban areas: The case of wintertime 2006. Atmos. Environ.**2011**, 45, 6557–6566. [Google Scholar] [CrossRef] - Liu, Y.; Zhao, N.; Vanos, J.K.; Cao, G. Effects of synoptic weather on ground-level PM
_{2}._{5}concentrations in the United States. Atmos. Environ.**2017**, 148, 297–305. [Google Scholar] [CrossRef] - Zhang, R.; Jing, J.; Tao, J.; Hsu, S.-C.; Wang, G.; Cao, J.; Lee, C.S.L.; Zhu, L.; Chen, Z.; Zhao, Y.; et al. Chemical characterization and source apportionment of PM
_{2}._{5}in Beijing: Seasonal perspective. Atmos. Chem. Phys. Discuss.**2013**, 13, 9953–10007. [Google Scholar] - Hsu, C.H.; Cheng, F.Y. Synoptic weather patterns and associated air pollution in Taiwan. Aerosol Air Qual. Res.
**2019**, 19, 1139–1151. [Google Scholar] [CrossRef] - De Villiers, J.; Barnard, E. Backpropagation neural nets with one and two hidden layers. IEEE Trans. Neural Netw.
**1993**, 4, 136–141. [Google Scholar] [CrossRef] [PubMed] - Kwok, T.Y.; Yeung, D.Y. Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Trans. Neural Netw.
**1997**, 8, 630–645. [Google Scholar] [CrossRef] [PubMed] - Trenn, S. Multilayer perceptrons: Approximation order and necessary number of hidden units. IEEE Trans. Neural Netw.
**2008**, 19, 836–844. [Google Scholar] [CrossRef] [PubMed] - Chien, C.F.; Chen, L.F. Data mining to improve personnel selection and enhance human capital: A case study in high-technology industry. Expert Syst. Appl.
**2008**, 34, 280–290. [Google Scholar] [CrossRef] - Wang, Y.; Witten, I.H. Induction of Model Trees for Predicting Continuous Classes; Working Paper 96/23; University of Waikato, Department of Computer Science: Hamilton, New Zealand, 1996. [Google Scholar]
- Quinlan, J.R. Learning with continuous classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, 16–18 November 1992; pp. 343–348. [Google Scholar]
- Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
- Holmes, G.; Hall, M.; Prank, E. Generating rule sets from model trees. In Proceedings of the Australasian Joint Conference on Artificial Intelligence, Sydney, Australia, 6–10 December 1999; pp. 1–12. [Google Scholar]
- Jaakkola, H.; Thalheim, B.; Kiyoki, Y.; Yoshida, N. (Eds.) Information Modelling and Knowledge Bases XXVIII; Frontiers in Artificial Intelligence and Applications; IOS Press: Amsterdam, The Netherlands, 2017; Volume 292. [Google Scholar]
- Holmes, G.; Pfahringer, B.; Kirkby, R.; Frank, E.; Hall, M. Multiclass alternating decision trees. In Proceedings of the European Conference on Machine Learning, Helsinki, Finland, 19–23 August 2002; pp. 161–172. [Google Scholar]
- Frank, E.; Mayo, M.; Kramer, S. Alternating model trees. In Proceedings of the 30th Annual ACM Symposium on Applied Computing, Salamanca, Spain, 13–17 April 2015; pp. 871–878. [Google Scholar]
- Freund, Y.; Mason, L. The alternating decision tree learning algorithm. In Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, 27–30 June 1999; pp. 124–133. [Google Scholar]
- Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 821. [Google Scholar]
- Taylor, R. Interpretation of the correlation coefficient: A basic review. J. Diagn. Med. Sonogr.
**1990**, 6, 35–39. [Google Scholar] [CrossRef] - Mahjoobi, J.; Etemad-Shahidi, A.; Kazeminezhad, M.H. Hindcasting of wave parameters using different soft computing methods. Appl. Ocean Res.
**2008**, 30, 28–36. [Google Scholar] [CrossRef] - Tso, G.K.F.; Yau, K.K.W. Predicting electricity energy consumption: A comparison of regression analysis, decision tree and neural networks. Energy
**2007**, 32, 1761–1768. [Google Scholar] [CrossRef] - Mahjoobi, J.; Etemad-Shahidi, A. An alternative approach for the prediction of significant wave heights based on classification and regression trees. Appl. Ocean. Res.
**2008**, 30, 172–177. [Google Scholar] [CrossRef]

**Figure 2.**Correlation coefficients of various attributes with PM

_{2.5}concentration in the target stations: (

**a**–

**c**) for Keelung Station; and (

**d**–

**f**) for Xizhi Station. In the figure, variables with the prefix “K_” denote Keelung Station, “X_” represents Xizhi Station, “W_” corresponds to Wanli Station, and “S_” indicates Songshan Station.

**Figure 7.**Comparison of rRMSE for 1 to 12 h forecasts at Keelung and Xizhi Stations: (

**a**) DNN; (

**b**) M5P; (

**c**) M5Rules; (

**d**) AMT; and (

**e**) MLR.

**Figure 10.**The real-time prediction system simulation results for Keelung Station: (

**a**–

**d**) for the annual prediction system, and (

**e**–

**h**) for the winter seasonal prediction system.

**Figure 11.**The real-time prediction system simulation results for Xizhi Station: (

**a**–

**d**) for the annual prediction system, and (

**e**–

**h**) for the winter seasonal prediction system.

Station | Monitoring Parameters |
---|---|

Keelung | AMB_TEMP, WIND_SPEED, WIND_DIREC, WS_HR, WD_HR, RAINFALL, RH, PM_{2.5}, PM_{10}, O_{3}, SO_{2}, CO, NO_{X}, NO, NO_{2}, THC, CH_{4}, NMHC |

Xizhi | AMB_TEMP, WIND_SPEED, WIND_DIREC, WS_HR, WD_HR, RAINFALL, RH, PM_{2.5}, PM_{10}, O_{3}, SO_{2}, CO, NO_{X}, NO, NO_{2} |

Wanli | AMB_TEMP, WIND_SPEED, WIND_DIREC, WS_HR, WD_HR, RAINFALL, RH, PM_{2.5}, PM_{10}, O_{3}, SO_{2}, CO, NO_{X}, NO, NO_{2} |

Songshan | AMB_TEMP, WIND_SPEED, WIND_DIREC, WS_HR, WD_HR, RAINFALL, RH, PM_{2.5}, PM_{10}, O_{3}, SO_{2}, CO, NO_{X}, NO, NO_{2}, THC, CH_{4}, NMHC |

Target Station | Selected Attributes |
---|---|

Keelung | PM_{2.5}, PM_{10}, SO_{2}, CO, NO_{X}, NO_{2}, and NMHCW of Keelung Station; PM_{2.5}, PM_{10}, CO, and NO_{2} of Xizhi Station; PM_{2.5}, PM_{10}, SO_{2}, CO, and NO_{2} of Wanli Station; PM_{2.5}, PM_{10}, SO_{2}, CO, and NO_{2} of Songshan Station |

Xizhi | PM_{2.5}, PM_{10}, SO_{2}, CO, and NO_{2} of Keelung Station; PM_{2.5}, PM_{10}, SO_{2}, CO, NO_{X}, and NO_{2} of Xizhi Station; PM_{2.5}, PM_{10}, SO_{2}, CO, NO_{X}, and NO_{2} of Wanli Station; PM_{2.5}, PM_{10}, SO_{2}, CO, NO_{X}, NO_{2}, THC, and CH_{4} of Songshan Station |

Lead Time (h) | Performance | DNN | M5P | M5Rules | AMT | MLR |
---|---|---|---|---|---|---|

t + 1 | RMSE (μg/m^{3}) | 5.22 | 3.99 | 3.99 | 3.99 | 4.00 |

MAPE | 16.10 | 12.06 | 13.40 | 11.72 | 16.10 | |

CE | 0.82 | 0.87 | 0.84 | 0.87 | 0.82 | |

r | 0.92 | 0.91 | 0.91 | 0.91 | 0.91 | |

t + 3 | RMSE (μg/m^{3}) | 6.65 | 5.96 | 5.96 | 6.02 | 6.05 |

MAPE | 19.62 | 17.74 | 18.52 | 18.87 | 19.62 | |

CE | 0.76 | 0.82 | 0.81 | 0.77 | 0.76 | |

r | 0.81 | 0.81 | 0.81 | 0.80 | 0.81 | |

t + 6 | RMSE (μg/m^{3}) | 7.18 | 7.68 | 7.49 | 7.57 | 7.63 |

MAPE | 21.14 | 22.93 | 21.64 | 22.88 | 21.14 | |

CE | 0.73 | 0.70 | 0.71 | 0.71 | 0.74 | |

r | 0.68 | 0.67 | 0.68 | 0.67 | 0.68 | |

t + 12 | RMSE (μg/m^{3}) | 10.28 | 9.13 | 8.89 | 8.91 | 8.98 |

MAPE | 35.78 | 32.10 | 31.56 | 32.07 | 35.78 | |

CE | 0.58 | 0.61 | 0.60 | 0.62 | 0.58 | |

r | 0.54 | 0.51 | 0.53 | 0.53 | 0.53 |

Lead Time (h) | Performance | DNN | M5P | M5Rules | AMT | MLR |
---|---|---|---|---|---|---|

t + 1 | RMSE (μg/m^{3}) | 2.61 | 2.59 | 2.59 | 2.67 | 3.08 |

MAPE | 10.63 | 10.38 | 9.72 | 9.17 | 10.63 | |

CE | 0.87 | 0.87 | 0.86 | 0.87 | 0.87 | |

r | 0.97 | 0.97 | 0.97 | 0.97 | 0.96 | |

t + 3 | RMSE (μg/m^{3}) | 5.61 | 5.21 | 5.21 | 5.35 | 5.92 |

MAPE | 19.53 | 19.13 | 18.75 | 19.18 | 19.53 | |

CE | 0.76 | 0.82 | 0.78 | 0.77 | 0.76 | |

r | 0.88 | 0.88 | 0.88 | 0.87 | 0.85 | |

t + 6 | RMSE (μg/m^{3}) | 7.61 | 6.97 | 6.97 | 7.34 | 8.37 |

MAPE | 25.91 | 23.45 | 25.11 | 25.98 | 25.91 | |

CE | 0.67 | 0.71 | 0.69 | 0.69 | 0.67 | |

r | 0.78 | 0.78 | 0.78 | 0.76 | 0.68 | |

t + 12 | RMSE (μg/m^{3}) | 9.93 | 9.19 | 8.84 | 9.03 | 9.86 |

MAPE | 38.15 | 36.28 | 36.31 | 36.21 | 38.15 | |

CE | 0.49 | 0.50 | 0.52 | 0.53 | 0.49 | |

r | 0.65 | 0.60 | 0.64 | 0.61 | 0.60 |

Lead Time (h) | Keelung Station | Xizhi Station |
---|---|---|

t + 1 | AMT | MLR |

t + 2 | MLR | MLR |

t + 3 | AMT | MLR |

t + 4 | MLR | M5P, M5Rules |

t + 5 | MLR | M5P, M5Rules |

t + 6 | DNN | M5P, M5Rules |

t + 7 | AMT | M5P |

t + 8 | AMT | M5P, M5Rules |

t + 9 | DNN | DNN |

t + 10 | AMT | DNN |

t + 11 | DNN | AMT |

t + 12 | AMT | AMT |

Lead Time (h) | Keelung Station | Xizhi Station |
---|---|---|

t + 1 | MLR | MLR |

t + 2 | MLR | MLR |

t + 3 | MLR | M5P, M5Rules |

t + 4 | M5P, M5Rules | DNN |

t + 5 | DNN | M5P |

t + 6 | DNN | DNN |

t + 7 | AMT | M5Rules |

t + 8 | AMT | DNN |

t + 9 | AMT | DNN |

t + 10 | AMT | DNN |

t + 11 | DNN | DNN |

t + 12 | AMT | M5Rules |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wei, C.-C.; Kao, W.-J.
Establishing a Real-Time Prediction System for Fine Particulate Matter Concentration Using Machine-Learning Models. *Atmosphere* **2023**, *14*, 1817.
https://doi.org/10.3390/atmos14121817

**AMA Style**

Wei C-C, Kao W-J.
Establishing a Real-Time Prediction System for Fine Particulate Matter Concentration Using Machine-Learning Models. *Atmosphere*. 2023; 14(12):1817.
https://doi.org/10.3390/atmos14121817

**Chicago/Turabian Style**

Wei, Chih-Chiang, and Wei-Jen Kao.
2023. "Establishing a Real-Time Prediction System for Fine Particulate Matter Concentration Using Machine-Learning Models" *Atmosphere* 14, no. 12: 1817.
https://doi.org/10.3390/atmos14121817