Next Article in Journal
The Porcupine Measure for Comparing the Performance of Multi-Objective Optimization Algorithms
Next Article in Special Issue
Reinforcement Learning Derived High-Alpha Aerobatic Manoeuvres for Fixed Wing Operation in Confined Spaces
Previous Article in Journal
Folding Every Point on a Polygon Boundary to a Point
Previous Article in Special Issue
Reinforcement Learning in a New Keynesian Model
 
 
Article
Peer-Review Record

Iterative Oblique Decision Trees Deliver Explainable RL Models

Algorithms 2023, 16(6), 282; https://doi.org/10.3390/a16060282
by Raphael C. Engelhardt 1, Marc Oedingen 1, Moritz Lange 2, Laurenz Wiskott 2 and Wolfgang Konen 1,*
Reviewer 1:
Reviewer 2:
Algorithms 2023, 16(6), 282; https://doi.org/10.3390/a16060282
Submission received: 25 April 2023 / Revised: 25 May 2023 / Accepted: 30 May 2023 / Published: 31 May 2023
(This article belongs to the Special Issue Advancements in Reinforcement Learning Algorithms)

Round 1

Reviewer 1 Report

This paper reports on a method that uses Decision Trees (DT) to explain Reinforcement Learning (RL) algorithms. The researchers did extensive work in using DT on several datasets that are typically used for RL benchmarking and also tested many RL algorithms. The task of explaining RL algorithms is quite tricky and even good explanations don’t always have human-sensible results; the authors explained a lot of these issues at the very beginning of the paper. They present background work and the differences of their research from it as far as the sampling state-action pairs is concerned, they have the analyzed pseudocode and then they present their results under many facets: they show how the return is influenced by the depth of the DT, how complexity and generalization are related to each other through the decision boundary and also which parameters from the resulting explanation rule were the most important for the reached long-term reward. At the end of the paper, there is an organized presentation of the positive gains and details of this method and particularly that the DT is better in performance than the original more complex RL algorithm. This is a remarkable result, but the overall work needs to provide more details to persuade about its correctness and soundness.

This research work contains significant content to justify a publication. Works that try to improve RL explainability are rare in general; even more, so that an explanation is found that is better than the model is even more interesting for the international scientific community. This and future work in this direction has to be supported, even if the results are preliminary and based on experiments and a few heuristics. 

The authors use several heuristics in their research, but they are aware of it and mention it explicitly in the footnote of page 6. In future work, this needs to be replaced. The work is something like post-hoc XRL. In „Related Work“ it is not clear why the results were not reproducible (line 67). What is „visually convincing“ on page 3? How can you detect the less apparent dependencies in lines 154-155 with xAI? The imperfect decision-making in lines 162-163 is not shown. The terms „a little bit beside“ are not concrete. The main question about this paper is how did this algorithm present in figure 1 resulted in those great outputs? Is there a mathematical explanation or proof? Since the GitHub repository is empty, the reviewers cannot decide and reproduce this. How did the authors measure the „hardly differed“ samples (line 194)? On what criteria will the hyperparameters mention in the footnote of page 6 will be set in future work? The results are impressive but somehow the reader lacks the connection with the invention of the algorithm. If the researchers suffer from situations where the goal is missed (page 12), then why didn’t they use Handsight Experience Replay (HER)?

 

-       Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., ... & Zaremba, W. (2017). Hindsight experience replay. Advances in neural information processing systems, 30.

 

 If one does not reach the goal can it be that the results are not sound? The results in lines 294-295 need to be shown.

The GitHub repository is empty. This is a very bad and unacceptable practice and the paper might be spam. The sensitivity analysis presented on page 15 is very important and reminds the perturbation analysis made by Layer-wise Relevance Propagation (LRP):

 

- Montavon, G., Binder, A., Lapuschkin, S., Samek, W., & Müller, K. R. (2019). Layer-wise relevance propagation: an overview. Explainable AI: interpreting, explaining and visualizing deep learning, 193-209.

https://doi.org/10.1007/978-3-030-28954-6_1 

 

Furthermore, for the evaluation of xAI quality, the following meta-taxonomy needs to be read. This method needs to be measured as far as more quantitative quality metrics are concerned. If comparison with SHAP is not feasible because it is so time-consuming, then please consider another method or some quality metrics:

 

- Schwalbe, G., & Finzel, B. (2023). A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, 1-59.

https://doi.org/10.1007/s10618-022-00867-8

 

Are there situations in RL problems where positive and negative relevance (like in LRP) could help?

The paper is well-written, clear, readable and well-structured. The researchers have a clear challenging problem and goals they need to fulfil. 

No comments about the quality of English language, the paper was readable, no typos found.

Author Response

Please see the attached document for the detailed answers to all reviewer's comments, suggestions and questions

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors of this paper investigate the application of three algorithms for generating training data for axis-parallel and oblique Decision Trees using Deep Reinforcement Learning. They claim that their methods are powerful, explainable RL agents that highlight the potential of DTs.

Overall, the paper is well-structured, and the experimental results effectively illustrate the proposed method. However, the introduction section could be more detailed, and the method section would benefit from a more comprehensive overview of the proposed framework. 

Additionally, the paper would be improved by discussing the limitations of the proposed method.

Lastly, the paper lacks a comparison of the proposed method with prior works. Therefore, it would be beneficial to include such a comparison to provide a more complete understanding of the proposed approach in the context of existing literature.

 

Several typos should be addressed. 

 

 

Author Response

Please see the attached document for the detailed answers to all reviewer's comments, suggestions and questions

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Appreciate the GitHub repository and extended explanations.

Minor editing of English language required

Reviewer 2 Report

Accept with current revision paper.

Back to TopTop