Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Research on Self-Recovery Control Algorithm of Quadruped Robot Fall Based on Reinforcement Learning

Actuators 2023, 12(3), 110; https://doi.org/10.3390/act12030110

by Guichen Zhang¹

, Hongwei Liu^1,*, Zihao Qin², Georgy V. Moiseev³ and Jianwen Huo¹

Reviewer 1:

Anthony J. Clark

Reviewer 2:

Hiram Calvo

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Actuators 2023, 12(3), 110; https://doi.org/10.3390/act12030110

Submission received: 7 January 2023 / Revised: 11 February 2023 / Accepted: 21 February 2023 / Published: 1 March 2023

(This article belongs to the Section Actuators for Robotics)

Round 1

Reviewer 1 Report

# MDPI, Actuators

Title: Research on self-recovery control algorithm of quadruped robot fall
based on reinforcement learning
Authors: Guichen Zhang, Hongwei Liu, Jianwen Huo
Submitted to section: Actuators for Robotics,

# General Comments

Writing needs to be improved. Some sentences take multiple times to read for me to understand them. For example,

"When the robot is working in a special environment, such as soil radioactive contaminant detection under complex terrain conditions, due to the complex and changeable mountain environment, the combined effect of various factors such as the unevenness of the ground, the shift of its center of gravity, and the instability of the control during the robot's walking process, it is easy to get stuck in rollover."

Punctuation (e.g., missing periods in "et al."), spacing (no space between references and the square bracket), and capitalization (some incorrect capitalizations throughout) need to be fixed.

As mentioned in the introduction, do the authors plan on including a spine at some point?

The authors mention complex terrain in the introdcution. Will this work be extended for such conditions?

The results section is lacking some way for the reader to "see" what is happening. The plots are OK, but it would be great if they could provide an animation or a video. At least, a sequence of images showing the simulation would be great (similar to the manually-drawn images in figure 1).

# Specific Comments

(118) "chapters" --> "sections"

(130) Please define "bionic principle."

(fig 2) Please label the "shank," "thigh," etc.

(fig 3(b)) Does O' not move along with the robot shell?

(lines 181-183) These equations should be moved out of the paragraph for easier reading.

(figs 3 and 4) It would be easier to understand if these figures were grouped. Or at least closer together.

(192) Move the equations out.

(244) "The control strategy used in this paper is..." --> In this paper, we develop a control strategy we refer to as..."

(246) "DQN" is never defined or cited.

(fig 5) This figure needs a longer explanation of the actor-critic algorithm. Either in the surrounding text, or in the caption. The text from 258-270 needs to be more clear and it needs to better reference the figure.

(fig 5) The "dot" is missing from several symbols.

(256) Please explain "The action space is ? = [−1,1], expand it to [−40,40] as required"

(fig 6) This figure has a caption on top and below.

(fig 6) Please label the inputs and outputs.

(288-290) Delete this paragraph.

(292) LQR does not have a citation or description.

(297) "As can be seen from the figure" reference the exact figure.

(297-299) "the control method of PID makes the robot's foot position move more, and the displacement changes quickly in the early stage of the movement, so that the landing is easy to be unstable, causing damage to the robot" --> Will the authors please describe the plots instead of making these assumptions about what the reader can see?

(301) How do the authors decide what is inferior? What specific metrics?

(303) Does the reward system specifically account for smaller motions?

(figs 7-12) It might be better to put each method on its own smaller plot. It is hard to tell the lines apart. I think separating them will work better and not require any additional space. The exact time steps do not matter, so these can be made smaller.

(section 4) What simulation software is used? Will the authors provide a link to source code?

(section 4) How many experiments were run? Is the system deterministic? How many initial conditions were tested? Quite a bit of detail seems missing.

Author Response

Dear Editors and Reviewers:

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “Research on self-recovery control algorithm of quadruped robot fall based on reinforcement learning”. Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied the comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to the editor and reviewer’s comments are as flowing:

Responds to the reviewer’s comments:

Reviewer #1:

1- Writing needs to be improved. Some sentences take multiple times to read for me to understand them.

Response: We apologize for the poor language of our manuscript. We worked on the manuscript for a long time and the repeated addition and removal of sentences and sections obviously led to poor readability. We have now worked on both language and readability and have also involved native English speakers for language corrections. We really hope that the flow and language level have been substantially improved.

2- As mentioned in the introduction, do the authors plan on including a spine at some point?

Response: Thank you very much for your valuable comments. Crustaceans are animals of the arthropod phylum Crustaceans. The phylum Arthropods are all invertebrates, which are the primitive forms of animals, accounting for 95% of the total number of animal species, and this article mainly mimics the self-recovery mode of such organisms. Vertebrates evolved from lower invertebrates, with the most complex structure, the highest evolutionary status, and higher requirements for control, so future studies will consider the self-recovery of vertebrates, and this article will not consider it.

2- The authors mention complex terrain in the introduction. Will this work be extended for such conditions?

Response： Thanks for your comment. Yes, complex terrain requires longer time and more swings to achieve re-standing than self-recovery on flat ground, and because reinforcement learning relies on a lot of training in the early stage, if the complex terrain faced by the robot is not included in the training library, it will also need to extend the time required for self-recovery, so it can be said that this work will be extended for such conditions

3- The results section is lacking some way for the reader to "see" what is happening. The plots are OK, but it would be great if they could provide an animation or a video. At least, a sequence of images showing the simulation would be great (similar to the manually-drawn images in figure 1).

Response： Thanks for your comment. In order to compare the three control methods more intuitively, and combined with the setting considerations of practical applications, the author modified the experimental part. First of all, considering that the DDPG algorithm used is composed of two neural networks, the neural network is removed as a comparison reference. Instead, it is contrasted from the point of view of classic control, modern control and intelligent control. Secondly, the rotation angle is converted from the radian system to the angle value, which is easier to understand the state of the robot. Finally, the three control methods were compared at the same magnitude, that is, the same input size and the same environment, and finally found that at the same magnitude, due to the change of model, the control effect of PID and LQR was different from before. In addition, the author provides a video about the self-recovery of quadruped robots to aid understanding.

4- Please define "bionic principle."

Response: Biomimicry is a science that studies the structure and properties of biological systems and provides new design ideas and working principles for engineering technology. The principle is to transplant and borrow thinking, transplant the required principles from animals, and borrow them into engineering. The bionic principle referred to in this paper is also the same, that is, the control strategy of quadruped robot fall self-recovery is designed by drawing on the structure of crustaceans and the way the legs swing during self-recovery.

5-(fig 2) Please label the "shank," "thigh," etc.

Response: Thanks for your suggestion. Previously I thought (lines 136-143) had already stated represents the mass of the thigh, is the mass of the shank, so only the and are indicated in the figure, and now according to your suggestion, I label the "shank," "thigh".

6-(fig 3(b)) Does O' not move along with the robot shell? (figs 3 and 4) It would be easier to understand if these figures were grouped. Or at least closer together.

Response: Thanks for your suggestion. Sorry, there are some problems in the (fig 3(b)), O' is the center of the robot's body-like cylinder, which moves with the shell of the robot, and about the suggestion that (figs 3 and 4) It would be easier to understand if these figures were grouped, Or at least closer together. I think it makes sense and it has been corrected.

7-(256) Please explain "The action space is ? = [−1,1], expand it to [−40,40] as required"

Response: Thanks for your comment. If the input state changes relatively large, and each dimension is not in an order of magnitude, the state of the input neural network must need to be naturalized, and the fully connected layer of the upper layer of the actor network output uses the tanh activation function, normalizes the action to [-1,1], and then linearly transforms it to a specific action range. The behavior range of the output depends on the environment, for the quadruped robot, its action corresponds to the input force, combined with the length of the robot's legs, the weight of the body and the power and torque of the motor, etc., it is necessary to reverse naturalize to the real range when the value of the action space is fed back to the environment, so the action space is A = [−1,1], expand it to [−40,40] as required.

8-(288-290) Delete this paragraph.

Response: Thanks for your suggest. I don't know why you gave this suggestion, but I think algorithm 1 is an explanation of the actor-critic algorithm in fig 5, so I think this table should be kept.

9-(301) How do the authors decide what is inferior? What specific metrics?

Response: Thanks for your question. I judge the advantages and disadvantages of the automatic control system according to the three performance indicators, namely stability, speed and accuracy, (344) mentioned in this part, stability and accuracy are the most important. The specific evaluation index of stability is that when the system is disturbed, it can return to the original expected value after a certain period of adjustment, as shown in the last item of Table 1. Accuracy refers to whether the system can accurately follow a given, that is, whether the robot can recover itself according to the expected trajectory and movement mode. The specific indicators of rapidity are shown in the first three items in Table 2.

10-(303) Does the reward system specifically account for smaller motions?

Response: Thanks for your question. The reward mechanism does not specifically consider smaller movements. But there are considerations for the state of the robot when small movements, in addition,, one of the discount functions is the penalty for the robot's foot position, which is expected to make the robot's foot position move as little as possible during self-recovery

11-(figs 7-12) It might be better to put each method on its own smaller plot. It is hard to tell the lines apart. I think separating them will work better and not require any additional space. The exact time steps do not matter, so these can be made smaller.

Response: Thank you very much for your advice. But I think that time step is necessary, because one of the criteria for judging the advantages and disadvantages is rapidity, and speed is reflected by time step, and the time required for the robot to complete self-recovery is different under different control methods.

12-(section 4) What simulation software is used? Will the authors provide a link to source code?

Response: Thanks for your question. The simulation software used in this article is MATLAB/Simulink, and the author will not provide specific source code links, but I have added the details of the experiment in the text, and in addition, if you have questions about the procedure, you can contact the author.

13-(section 4) How many experiments were run? Is the system deterministic? How many initial conditions were tested? Quite a bit of detail seems missing.

Response: Thanks for your question. The experiments and the parameters related to the experiments have been supplemented in this article. Regarding the question of whether the system is deterministic, reinforcement learning is sensitive to both initialization and dynamic changes in the training process, because the data is always collected online, and the only supervision that can be performed is a single scalar about the reward, so the system is deterministic when the initial conditions are consistent with the training environment.

We would like to express our great appreciation to you and reviewers for comments on our paper. Looking forward to hearing from you.

Thank you and best regards.

Yours sincerely,

Author

Reviewer 2 Report

This paper addresses the problem of self-recovery for quadruped robots that fall due to external disturbances or unexpected events. The authors propose a solution based on the self-recovery mechanism observed in crustaceans, where a resonance effect is generated through the swing of legs and the shifting of the center of gravity.

The paper establishes a kinematics model of one-leg swing and a self-recovery motion model for the falling quadruped robot. A control strategy based on reinforcement learning is then proposed, and the learning network architecture is designed according to the established mathematical model. The proposed algorithm is then experimentally compared with PID, LQR, and NN controllers to verify its feasibility.

The idea of using bio-inspired solutions for robotic self-recovery is an interesting approach and the use of reinforcement learning for control strategy is commendable. The experimental comparison with other controllers provides solid evidence for the feasibility of the proposed algorithm. Overall, this paper presents a well-thought-out and thorough solution to the problem of self-recovery for quadruped robots, and it is a valuable contribution to the field.

The following are some suggestions to improve this paper:

he authors could consider providing more detailed information on the experimental setup and results, such as the specific hardware and software used, as well as more detailed results and statistics to support the conclusions.
It would be helpful if the authors could include a discussion on the limitations of their proposed approach and how it could be improved in future work.
The authors could consider providing more information on the specific crustaceans that were studied and how their self-recovery mechanism was adapted for use in the quadruped robot.
The authors could also consider comparing their proposed approach with other bio-inspired methods for self-recovery in quadruped robots, in order to provide a more comprehensive view of the state-of-the-art.
The authors could also consider providing more details on the learning network architecture and the specific parameters used in the reinforcement learning algorithm, which would be useful for other researchers in the field.
The authors could consider providing a more detailed explanation of the mathematical models used to describe the kinematics and self-recovery motion of the quadruped robot, which would be useful for researchers who want to understand the underlying mechanics of the proposed approach.

The conclusions provided are consistent with the goals previously stated in that they both state that the proposed control method using reinforcement learning is able to achieve self-recovery of the quadruped robot after falls. The conclusions further elaborate on the specific advantages of the reinforcement learning method over traditional control methods such as PID and LQR. The conclusions also mention that the control method of reinforcement learning dynamically adjusts the parameters, making the self-adaptation ability to the environment stronger, which ensures the stability, accuracy, and rapidity of the self-recovery process.

May be the abstract could include information on the specific time that the quadruped robot can recover a stable standing posture. This work concludes that that the proposed algorithm is experimentally compared with PID, LQR, NN and DDPG are compared and the feasibility of this control method for achieving self-recovery of quadruped robot falls is proved. Conclusions provided are consistent with the paper in that they that the proposed control method using reinforcement learning is able to achieve self-recovery of the quadruped robot after falls and it is better than other controllers in terms of adaptability and stability.

it would be helpful to know the specific hardware and software used in the experiments, as well as more detailed results and statistics to support the conclusions. Additionally, it would be beneficial if the authors provided more information on the specific crustaceans that were studied and how their self-recovery mechanism was adapted for use in the quadruped robot.

Another aspect that could be improved is the discussion on the limitations of the proposed approach and how it could be improved in future work. The authors mention that the control training model can be optimized from the aspect of time consumption in the later stage, but it would be helpful to have more information on specific steps that could be taken to optimize the model.

Lastly, it would be useful to have more details on the mathematical models used to describe the kinematics and self-recovery motion of the quadruped robot, which would be useful for researchers who want to understand the underlying mechanics of the proposed approach.

Author Response

Dear Editors and Reviewers:

Responds to the reviewer’s comments:

Reviewer #2:

1- he authors could consider providing more detailed information on the experimental setup and results, such as the specific hardware and software used, as well as more detailed results and statistics to support the conclusions.

Response: Thank you for your advice. More details about the experimental setup and results have been supplemented in Sections 3 and 4.

2- It would be helpful if the authors could include a discussion on the limitations of their proposed approach and how it could be improved in future work.

Response: Thank you for your advice. The authors have added to the limitations of the proposed methodology and how they can be improved in future work in the section 5.

3- The authors could consider providing more information on the specific crustaceans that were studied and how their self-recovery mechanism was adapted for use in the quadruped robot.

Response: Thank you for your advice. Because the author himself does not have a deep understanding of living things, less information about crustaceans, only by observing their movements, they find that they make the body briefly tilt by constantly shaking their legs, and then seize the opportunity to complete the turnover, so the author can not provide more information about the organism. However, the authors added Figure 1, which is the schematic diagram of the self-recovery of crustaceans observed in this article, and hopes that this diagram will help with this Their self-recovery mechanism reminds the authors of the starting mechanism of the inverted pendulum, and by combining the robot's leg design and movement mode with the structure and movement mode of the inverted pendulum, the self-recovery mechanism of crustaceans is adapted to the use of quadruped robots.

4- The authors could also consider comparing their proposed approach with other bio-inspired methods for self-recovery in quadruped robots, in order to provide a more comprehensive view of the state-of-the-art.

Response: Thank you for your advice. In section 1, I have summarized the literature on other falls recovery, and their robot structure is different from mine, such as the back of the robotic arm to help the robot return to a standing posture, about the advantages and disadvantages of each I also mentioned in section 1.

5- The authors could also consider providing more details on the learning network architecture and the specific parameters used in the reinforcement learning algorithm, which would be useful for other researchers in the field.

Response: Thank you for your advice. More details on the learning network architecture and the specific parameters used in the reinforcement learning algorithm have been supplemented in Sections 3 and 4.

6- The authors could consider providing a more detailed explanation of the mathematical models used to describe the kinematics and self-recovery motion of the quadruped robot, which would be useful for researchers who want to understand the underlying mechanics of the proposed approach.

Response: Thank you for your advice. In response to this question, I have supplemented it in sections 3 and 4. In addition, you can also refer to the following literature.

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., ... & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. https://doi.org/10.48550/arXiv.1509.02971.
Mohamed, F. Anayi, M. Packianather, B. A. Samad and K. Yahya, "Simulating LQR and PID controllers to stabilise a three-link robotic system," 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 2022, pp. 2033-2036, doi: 10.1109/ICACITE53722.2022.9823512.
Ratnayake, D. T., & Parnichkun, M. (2020, July). LQR-based stabilization and position control of a mobile double inverted pendulum. In IOP Conference Series: Materials Science and Engineering (Vol. 886, No. 1, p. 012034). IOP Publishing. doi: 10.1088/1757-899X/886/1/012034.

We would like to express our great appreciation to you and reviewers for comments on our paper. Looking forward to hearing from you.

Thank you and best regards.

Yours sincerely,

Author

Reviewer 3 Report

In the abstract authors presented the circumstances which can bring the robot to fall but in further sections flipped-over robot is considered to be lying on horisontal plane which seems like a contradiction and brings doubts on effectiveness of the proposed algorithm in real rugged terrain.

In section 4, it is not clear what exactly the presented method compared with because there is no information on how corresponding PID, LQR and NN controllers were obtained.

The paper requires proofreding. On page 2 "the claws of the degrees of freedom are limited" probably should be "the degrees of freedom of the claws are limited" for the whole sentence to make sense. On page 4 and further "Among them" used instead of "where" etc.

On figure 3 red notations are hardly visible, they should not be placed on the gray background (see figure 4 where this problem is not present).

Author Response

Dear Editors and Reviewers:

Responds to the reviewer’s comments:

Reviewer #3:

1- In the abstract authors presented the circumstances which can bring the robot to fall but in further sections flipped-over robot is considered to be lying on horisontal plane which seems like a contradiction and brings doubts on effectiveness of the proposed algorithm in real rugged terrain.

Response: Thank you very much for your advice. There are many situations where the robot falls, such as when the robot goes up the stairs, the height of the stairs is slightly higher than the limit that the robot can accept, or the stair tread is relatively smooth, and many other emergencies will cause the robot to fall to four legs up, and the flat environment in daily life is more common, this article is mainly for this situation to experiment, so the training environment is designed as a flat environment. Indeed, rough terrain is more likely to cause robots to fall, and the application of extreme environments is the focus of research on robot application scenarios, so this is one of the future research directions, and it is not appropriate to directly describe such practical applications now, so in the abstract and introduction part, I modified the description of the algorithm in the actual application scenarios of robots.

2 – In section 4, it is not clear what exactly the presented method compared with because there is no information on how corresponding PID, LQR and NN controllers were obtained.

Response: Thank you for your advice. Taking into account the suggestions of the reviewers and the setting considerations of practical application, the authors have revised the experimental part. First of all, considering that the DDPG algorithm used is composed of two neural networks, the neural network is removed as a comparison reference, but from the point of view of classical control, modern control and intelligent control. Secondly, the author believes that PID and LQR are already very mature algorithms, and the author has not improved the algorithm, so I does not provide relevant information, and makes reference citations in the original text, hoping to help this.

3 - The paper requires proofreding. On page 2 "the claws of the degrees of freedom are limited" probably should be "the degrees of freedom of the claws are limited" for the whole sentence to make sense. On page 4 and further "Among them" used instead of "where" etc.

Response: We apologize for the description errors in the manuscript. We double-checked the text content in the paper and made revisions, which were marked in red.

4- On figure 3 red notations are hardly visible, they should not be placed on the gray background (see figure 4 where this problem is not present).

Response: Thank you for your advice. I have modified Figure 3.

We would like to express our great appreciation to you and reviewers for comments on our paper. Looking forward to hearing from you.

Thank you and best regards.

Yours sincerely,

Author

Reviewer 4 Report

The paper considers the problem of fall recovery of the quadruped robot and presents the bio-mimetic solution of this problem. The suggested solution includes the design of a hardware which mimics the insect semi sphere body and the mobility control algorithm based on reinforcement learning.

In my opinion, the paper is written in clear and quite form. Concerning the form, the only required corrections are:

- since the algorithm of Deep Deterministic Policy Gradient (DDPG) is known, it will be better to add the reference to the appropriate sources;

- the requires minor corrections and English editing; in many places the uppercase letters instead of low case letters are required, some of the references must be corrected (see e.g. source [1]), and so far.

Concerning the content, the paper suggests an interesting idea; formal considerations and verifications sound correct.

The main question here is the feasibility of the idea and its practical need. In my opinion, the paper must include:

- consideration (at least theoretical) of engineering issues like types of motors, batteries, transmissions, on-board computer and so on needed for implementation of the robot;

- consideration of the limitations of size, mass, velocity and so on of the robot;

- description of possible practical applications of the suggested robot.

Summarizing, the paper presents an interesting idea described clearly and correctly; the text requires only minor grammar corrections. Together with that, the content requires major corrections: the paper must include the discussion of the robot’s implementation and its practical use.

Author Response

Dear Editors and Reviewers:

Responds to the reviewer’s comments:

Reviewer #4:

1- since the algorithm of Deep Deterministic Policy Gradient (DDPG) is known, it will be better to add the reference to the appropriate sources.

Response: Thank you very much for your advice. DDPG is indeed an existing algorithm, and the literature cited can also be found in the references, i.e. literature [31], and I have added source citations to this section in the article.

2 – the requires minor corrections and English editing; in many places the uppercase letters instead of low case letters are required, some of the references must be corrected (see e.g. source [1]), and so far.

Response: Thank you very much for your advice. We apologize for the description errors in the manuscript. We double-checked the text content in the paper and made revisions, which were marked in red.

3 - the feasibility of the idea and its practical need.

- consideration (at least theoretical) of engineering issues like types of motors, batteries, transmissions, on-board computer and so on needed for implementation of the robot;

- consideration of the limitations of size, mass, velocity and so on of the robot;

Response: Thank you very much for your advice. I quite agree with the discussion about robot implementation, but in fact during the simulation I only considered the parameters of the robot itself, such as size, mass, speed, joint limitations, etc., which I have supplemented in section 4. In the deployment of actual engineering, the motor is intended to use the same motor used in the MIT Cheetah Mini drive, the T-Motor U8. Because the control strategy used is DDPG, the on-board master controller needs to have high computing power, so consider the use of Raspberry Pi 4b+, NVIDIA's Jetson series or an industrial computer with i7 processor like robots such as MIT-Cheetah. However, due to the lack of engineering deployment capabilities, how to apply actual engineering is currently only conjectures and assumptions, which will also become one of the directions of later research.

We would like to express our great appreciation to you and reviewers for comments on our paper. Looking forward to hearing from you.

Thank you and best regards.

Yours sincerely,

Author

Round 2

Reviewer 4 Report

In my opinion, the paper is fine and can be published in the present form.

Thank you for your efforts.

Article Menu

Research on Self-Recovery Control Algorithm of Quadruped Robot Fall Based on Reinforcement Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI