Next Article in Journal
Hierarchical Reinforcement Learning: A Survey and Open Research Challenges
Next Article in Special Issue
Explainable Machine Learning
Previous Article in Journal
A Novel Framework for Fast Feature Selection Based on Multi-Stage Correlation Measures
Previous Article in Special Issue
Surrogate Object Detection Explainer (SODEx) with YOLOv4 and LIME
 
 
Article
Peer-Review Record

Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset

Mach. Learn. Knowl. Extr. 2022, 4(1), 150-171; https://doi.org/10.3390/make4010008
by Scarlet Stadtler 1,*, Clara Betancourt 1 and Ribana Roscher 2,3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Mach. Learn. Knowl. Extr. 2022, 4(1), 150-171; https://doi.org/10.3390/make4010008
Submission received: 22 December 2021 / Revised: 25 January 2022 / Accepted: 26 January 2022 / Published: 11 February 2022
(This article belongs to the Special Issue Explainable Machine Learning)

Round 1

Reviewer 1 Report

Ozone and PM are pretty well-behaved parameters. I did not see any functional relationships, nor the natural variability of the PM and the Ozone values used. 

How do the assumptions in the equations shown contribute to the results?  If this was covered this reviewer missed that discussion. 

Comparing the results from the various techniques is of interest.  But this is more useful to compare with natural values, such as their natural variations. Otherwise, how do we know the values are real and not just a manifestation of the particular data set used.  The data set may not be representative of nature.  

Did you consider using Anomaly in PM values in your study? Curious to know what results those might reveal.  

Author Response

Reply to Review #1

 

We thank reviewer #1 for the time and effort in providing us with valuable comments and open questions. In the following, we reply to each comment/question individually and indicate changes we made in the text to improve our work.

 

Comment: Ozone and PM are pretty well-behaved parameters. I did not see any functional relationships, nor the natural variability of the PM and the Ozone values used. 

 

Response: We hope that we understood the reviewer’s comments correctly. The reviewer questions the functional relationship between the input features and the target ozone values. 

We agree with the reviewer that ozone has been studied for decades experientially and using numerical modeling. However, we chose a completely data-driven approach without prior statistical or functional assumptions on the data as it needs to be done for many processes in the environmental sciences. The input features are merely proxies and do not have a known functional relationship to ozone, but were chosen based on knowledge on functional relationships. These proxies were chosen because of their relevance (see Betancourt et al. 2021 [1]) and global availability for the downstream task of global mapping (see Betancourt et al. 2022 [2]). In a separate study, Betancourt et al. 2022 [2] also show the plausibility of the proxies using explainable machine learning. We added the following statement to the description of the dataset to clarify for the reader in Section 3, line 173: “The geospatial features in AQ-Bench characterize the measurement site. Although there are no functional relationships available as prior knowledge for machine learning in the dataset, these geospatial features were selected because they serve as proxies for ozone formation, destruction, and transport processes.” You can find it in the marked-up manuscript version.

Moreover, we emphasize that the goal of this study is to show how novel insights can be derived using explainable machine learning to analyze the dataset and check the reliability and plausibility of the obtained insights. Although we model the functional relationships between single input features and the target ozone value, our primary focus is the prediction patterns of the two respective machine learning models. Additionally, we presented our analysis in a way to underline the transferability of the used approaches for analyzing other geospatial datasets for understanding their dataset and their shortcomings. 

 

Comment: How do the assumptions in the equations shown contribute to the results?  If this was covered this reviewer missed that discussion. 

 

Response: If we understood correctly, the reviewer asks us to clarify the assumptions we made to compose our study and discuss our results. 

Indeed, we did not list our assumptions explicitly in the discussion section. We agree with the reviewer that clarifying the assumptions supports the discussion of our results, therefore we added a paragraph right at the beginning of the discussion (Section 7, line 513). “The following discussion is based on several assumptions. First, we assume that the SHAP values, which indicate the impact a feature has on the prediction, are related to the global importance of a feature when taking the entire set of SHAP values into account. Moreover, to be able to use the Euclidean distance as a measure for similarity, we assume that the weighted feature space and the representation space are smooth. This is paired with the assumption that the Euclidean distance in the weighted feature space and representation space reflects similar samples and similar prediction patterns. We also assume that the weights in the neural network and the structure of the decision trees within the random forest have meaning. Finally, we assume that the k-nearest neighbors in the representations space are the training samples that were influential for the prediction. This assumption is weak for the random forest since we identified the training samples sharing leaf nodes with the predicted test sample. It is a somewhat stronger assumption for the neural network, where we are not able to verify if the training stations we identified as k-nearest neighbors in the representation space are the stations on which the prediction on the test sample is based.”

 

Comment: Comparing the results from the various techniques is of interest.  But this is more useful to compare with natural values, such as their natural variations. Otherwise, how do we know the values are real and not just a manifestation of the particular data set used.  The data set may not be representative of nature.  

 

Response: We agree with the reviewer upon not knowing if the values are representative of just a manifestation of a particular dataset. Currently, the plausibility of our results can only be checked using domain knowledge in the field of atmospheric chemistry. Nevertheless, we emphasize that we did use the ground truth values in our analysis. First, to compare the performance of the approaches and second, for the calculation of the residuals, which determine the resulting cases we discuss. We agree that the dataset is not representative on a global scale and is limited in terms of relationships that can be learned by machine learning models, as we discuss in Section 7, lines 603 onwards. In the manuscript, we clearly state that our recommendations are limited to the machine learning tasks at hand and the given test dataset (Section 6.4, line 488). We studied the representativeness of our dataset on a global scale in Betancourt et al. 2022 [2]. 

We added the following sentence for clarity when we report our performance metrics (Section 6.2, line 386): “The coefficients of determination for the neural network and random forest for the training set, validation set, and test set can be found in Table 3. All performance metrics are calculated using the observed values (ground truth) following equations (7) and (8). The coefficient of determination R2 is over 95% for the training set for the random forest, while 64.21% for the neural network.”

 

Comment: Did you consider using Anomaly in PM values in your study? Curious to know what results those might reveal. 

Response: We appreciate reviewer #1 for sharing the idea of using anomalies to train our machine learning models. Although it would be interesting to apply our approach to particulate matter and temporally resolved data, the current version of the TOAR database mainly includes ozone observations. For this reason, the AQ-Bench dataset does not include preprocessed particulate matter values ready for machine learning. Moreover, AQ-Bench includes aggregated ozone values without any time dimension, because the time-resolved features describing the environmental characteristics are not available. To calculate the anomalies from a climatological mean, it is necessary to have a time-resolved dataset. We agree with the reviewer that we are curious about what those results might reveal. At this moment, we have to leave it open for further studies. 

 

[1] Betancourt, C., Stomberg, T., Roscher, R., Schultz, M. G., and Stadtler, S.: AQ-Bench: a benchmark dataset for machine learning on global air quality metrics, Earth Syst. Sci. Data, 13, 3013–3033, https://doi.org/10.5194/essd-13-3013-2021, 2021.

 

[2] Betancourt, C., Stomberg, T. T., Edrich, A.-K., Patnala, A., Schultz, M. G., Roscher, R., Kowalski, J., and Stadtler, S.: Global, high-resolution mapping of tropospheric ozone – explainable machine learning and impact of uncertainties, Geosci. Model Dev. Discuss. [preprint], https://doi.org/10.5194/gmd-2022-2, in review, 2022.

Reviewer 2 Report

The manuscript is dedicated to analyzing air quality model predictions using explainable machine learning techniques and provides valuable insights on the AQ-Bench dataset.
It was a pleasure to read as it is very well structured, clearly written, and the research work is excellent.

Author Response

Reply to Review #2

 

We appreciate the very positive feedback from reviewer #2 and thank reviewer #2 for supporting the publication of our manuscript.

Reviewer 3 Report

The paper is fine to be accepted.

Author Response

Reply to Review #3

 

We thank reviewer #3 for reading our manuscript and supporting its publication.

Back to TopTop