Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

RLSchert: An HPC Job Scheduler Using Deep Reinforcement Learning and Remaining Time Prediction

Appl. Sci. 2021, 11(20), 9448; https://doi.org/10.3390/app11209448

by Qiqi Wang¹

, Hongjie Zhang¹

, Cheng Qu¹

, Yu Shen², Xiaohui Liu² and Jing Li^1,2,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2021, 11(20), 9448; https://doi.org/10.3390/app11209448

Submission received: 9 September 2021 / Revised: 1 October 2021 / Accepted: 7 October 2021 / Published: 12 October 2021

(This article belongs to the Special Issue State-of-the-Art High-Performance Computing and Networking)

Round 1

Reviewer 1 Report

The manuscript proposed HPC scheduling algorithm using re-enforcement deep learning.

Proposed algorithm RLShert provides better resource management than existing job schedulers.

Also, the manuscript proposed various experiments to prove their methods, which is good.

Overall, I recommend accept this manuscript.

Author Response

Dear Reviewer:
Thanks very much for taking your time to review this manuscript. We really appreciate all your comments. And thank you for your approval and support for our research.

Reviewer 2 Report

This paper presents an ML-based HPC job scheduler. The paper is well written and easy to follow. I enjoyed reading the paper.
My primary concern with this paper is the time predictor, which is the crux of this paper. While the paper gives the impression that the time predictor is generic, in reality, it is not. It is heavily tied to one type of application (VASP jobs). Predicting the runtime of a single class of applications is not difficult and is a well-studied problem; hence the prediction itself is not novel.

The primary requirement of the HPC job scheduler is to schedule applications with different resource requirement characteristics. For a given class of applications, it is easy to identify the relevant input parameters for ML prediction and model the execution time. The authors must address how this will be done for random applications, whose input parameters could even be file paths. The authors must also experiment with a mix of applications in section 4.

I am good with the rest of the paper.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Eventhough, the authors did not fully solve my concern, I am fine with the current text and new data.

Article Menu

RLSchert: An HPC Job Scheduler Using Deep Reinforcement Learning and Remaining Time Prediction

Further Information

Guidelines

MDPI Initiatives

Follow MDPI