Deep Reinforcement Learning for Robots and Agents

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (10 June 2023) | Viewed by 15911

Special Issue Editor


E-Mail Website
Guest Editor
Department of Human-Centered Artificial Intelligence, Sangmyung University, Seoul 03016, Korea
Interests: decision making; reinforcement learning; computational cognitive modelling; brain-inspired AI

Special Issue Information

Dear Colleagues,

Coupled with the deep learning, reinforcement learning (RL) algorithms have demonstrated super-human performance in many challenging domains, ranging from games to robots, networks, energy, finance and more. Extending from this, deep RL has also exhibited remarkable advances in the domain of multi-agent reinforcement learning (MARL) in which multiple agents frequently interact to find an optimal policy to achieve a complex goal in real-world circumstances.

The main objective of the Special Issue is (i) to report the recent progress on deep RL research (in single- and multi-agent settings), (ii) to share successful examples of real-world applications utilizing deep RL agents and (iii) to address the future research issues that are crucial to escalate the state-of-the-art deep RL agents. 

We thus invite paper submissions exhibiting the success of deep RL agents in the theoretical and practical domains indicated above, as well as those addressing fundamental and/or practical issues in the design of deep RL agents, including (but not limited to):

  • Deep RL algorithms and architectures that cover fundamental algorithmic challenges to engineer deep RL agents;
  • Successful examples in deploying applications by utilizing deep RL agents in any science and engineering domains;
  • Reviews on deep RL agents overarching the holistic perspectives on the state-of-the-art algorithms and architectures; and
  • Perspectives on future deep RL research delivering insights into substantial advances in the development of deep RL agents in theory and practice.

For any enquiries on this Special Issue, please do not hesitate to get in touch with us. We look forward to receiving your contributions.

Prof. Dr. Jee Hang Lee
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep RL algorithms and architecture
  • model-based and model-free RL
  • MARL
  • multiobjective
  • on-line/off-line RL
  • sample/time/space efficiency
  • neuroscience of RL
  • generalization
  • applications
  • real-world RL examples.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

27 pages, 5648 KiB  
Article
Improving End-To-End Latency Fairness Using a Reinforcement-Learning-Based Network Scheduler
by Juhyeok Kwon, Jihye Ryu, Jee Hang Lee and Jinoo Joung
Appl. Sci. 2023, 13(6), 3397; https://doi.org/10.3390/app13063397 - 07 Mar 2023
Viewed by 1031
Abstract
In services such as metaverse, which should provide a constant quality of service (QoS) regardless of the user’s physical location, the end-to-end (E2E) latency must be fairly distributed over any flow in the network. To this end, we propose a reinforcement learning (RL)-based [...] Read more.
In services such as metaverse, which should provide a constant quality of service (QoS) regardless of the user’s physical location, the end-to-end (E2E) latency must be fairly distributed over any flow in the network. To this end, we propose a reinforcement learning (RL)-based scheduler for minimizing the maximum network E2E latency. The RL model used the double deep Q-network (DDQN) with the prioritized experience replay (PER). In order to see the performance change according to the type of RL agent, we implemented a single-agent environment where the controller is an agent and a multi-agent environment where each node is an agent. Since the agents were unable to identify E2E latencies in the multi-agent environment, the state and reward were formulated using the estimated E2E latencies. To precisely evaluate the RL-based scheduler, we set out benchmark algorithms to compare with which a network-arrival-time-based heuristic algorithm (NAT-HA) and a maximum-estimated-delay-based heuristic algorithm (MED-HA). The RL-based scheduler, first-in-first-out (FIFO), round-robin (RR), NAT-HA, and MED-HA were compared through large-scale simulations on four network topologies. The simulation results in fixed-packet generation scenarios showed that our proposal, the RL-based scheduler, achieved the minimization of maximum E2E latency in all the topologies. In other scenarios with random flow generation, the RL-based scheduler and MED-HA showed the lowest maximum E2E latency for all topologies. Depending on the topology, the maximum E2E latency of NAT-HA was equal to or larger than that of the RL-based scheduler. In terms of fairness, the RL-based scheduler showed a higher level of fairness than that of FIFO and RR. NAT-HA had similar or lower fairness than the RL-based scheduler depending on the topology, and MED-HA had the same level of fairness as the RL-based scheduler. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning for Robots and Agents)
Show Figures

Figure 1

14 pages, 2798 KiB  
Article
Experience Replay Optimisation via ATSC and TSC for Performance Stability in Deep RL
by Richard Sakyi Osei and Daphne Lopez
Appl. Sci. 2023, 13(4), 2034; https://doi.org/10.3390/app13042034 - 04 Feb 2023
Cited by 1 | Viewed by 1205
Abstract
Catastrophic forgetting is a significant challenge in deep reinforcement learning (RL). To address this problem, researchers introduce the experience replay (ER) concept to complement the training of a deep RL agent. However, the buffer size, experience selection, and experience retention strategies adopted for [...] Read more.
Catastrophic forgetting is a significant challenge in deep reinforcement learning (RL). To address this problem, researchers introduce the experience replay (ER) concept to complement the training of a deep RL agent. However, the buffer size, experience selection, and experience retention strategies adopted for the ER can negatively affect the agent’s performance stability, especially for complex continuous state action problems. This paper investigates how to address the stability problem using an enhanced ER method that combines a replay policy network, a dual memory, and an alternating transition selection control (ATSC) mechanism. Two frameworks were designed: an experience replay optimisation via alternating transition selection control (ERO-ATSC) without a transition storage control (TSC) and an ERO-ATSC with a TSC. The first is a hybrid of experience replay optimisation (ERO) and dual-memory experience replay (DER) and the second, which has two versions of its kind, integrates a transition storage control (TSC) into the first framework. After comprehensive experimental evaluations of the frameworks on the pendulum-v0 environment and across multiple buffer sizes, retention strategies, and sampling ratios, the reward version of ERO-ATSC with a TSC exhibits superior performance over the first framework and other novel methods, such as the deep deterministic policy gradient (DDPG) and ERO. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning for Robots and Agents)
Show Figures

Figure 1

20 pages, 505 KiB  
Article
Tensor Implementation of Monte-Carlo Tree Search for Model-Based Reinforcement Learning
by Marek Baláž and Peter Tarábek
Appl. Sci. 2023, 13(3), 1406; https://doi.org/10.3390/app13031406 - 20 Jan 2023
Cited by 2 | Viewed by 2289
Abstract
Monte-Carlo tree search (MCTS) is a widely used heuristic search algorithm. In model-based reinforcement learning, MCTS is often utilized to improve action selection process. However, model-based reinforcement learning methods need to process large number of observations during the training. If MCTS is involved, [...] Read more.
Monte-Carlo tree search (MCTS) is a widely used heuristic search algorithm. In model-based reinforcement learning, MCTS is often utilized to improve action selection process. However, model-based reinforcement learning methods need to process large number of observations during the training. If MCTS is involved, it is necessary to run one instance of MCTS for each observation in every iteration of training. Therefore, there is a need for efficient method to process multiple instances of MCTS. We propose a MCTS implementation that can process batch of observations in fully parallel fashion on a single GPU using tensor operations. We demonstrate efficiency of the proposed approach on a MuZero reinforcement learning algorithm. Empirical results have shown that our method outperforms other approaches and scale well with increasing number of observations and simulations. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning for Robots and Agents)
Show Figures

Figure 1

13 pages, 607 KiB  
Article
Fresher Experience Plays a More Important Role in Prioritized Experience Replay
by Jue Ma, Dejun Ning, Chengyi Zhang and Shipeng Liu
Appl. Sci. 2022, 12(23), 12489; https://doi.org/10.3390/app122312489 - 06 Dec 2022
Cited by 1 | Viewed by 1349
Abstract
Prioritized experience replay (PER) is an important technique in deep reinforcement learning (DRL). It improves the sampling efficiency of data in various DRL algorithms and achieves great performance. PER uses temporal difference error (TD-error) to measure the value of experiences and adjusts the [...] Read more.
Prioritized experience replay (PER) is an important technique in deep reinforcement learning (DRL). It improves the sampling efficiency of data in various DRL algorithms and achieves great performance. PER uses temporal difference error (TD-error) to measure the value of experiences and adjusts the sampling probability of experiences. Although PER can sample valuable experiences according to the TD-error, freshness is also an important character of experiences. It implicitly reflects the potential value of experiences. Fresh experiences are produced by virtue of the current networks and they are more valuable for updating the current network parameters than the past. The sampling of fresh experiences to train the neural networks can increase the learning speed of the agent, but few algorithms can perform this job efficiently. To solve this issue, a novel experience replay method is proposed in this paper. We first define that the experience freshness is negatively correlated with the number of replays. A new hyper-parameter, the freshness discounted factor μ, is introduced in PER to measure the experience freshness. Further, a novel experience replacement strategy in the replay buffer is proposed to increase the experience replacement efficiency. In our method, the sampling probability of fresh experiences is increased by raising its priority properly. So the algorithm is more likely to choose fresh experiences to train the neural networks during the learning process. We evaluated this method in both discrete control tasks and continuous control tasks via OpenAI Gym. The experimental results show that our method achieves better performance in both modes of operation. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning for Robots and Agents)
Show Figures

Figure 1

16 pages, 1481 KiB  
Article
Adherence Improves Cooperation in Sequential Social Dilemmas
by Yuyu Yuan, Ting Guo, Pengqian Zhao and Hongpu Jiang
Appl. Sci. 2022, 12(16), 8004; https://doi.org/10.3390/app12168004 - 10 Aug 2022
Cited by 3 | Viewed by 1325
Abstract
Social dilemmas have guided research on mutual cooperation for decades, especially the two-person social dilemma. Most famously, Tit-for-Tat performs very well in tournaments of the Prisoner’s Dilemma. Nevertheless, they treat the options to cooperate or defect only as an atomic action, which cannot [...] Read more.
Social dilemmas have guided research on mutual cooperation for decades, especially the two-person social dilemma. Most famously, Tit-for-Tat performs very well in tournaments of the Prisoner’s Dilemma. Nevertheless, they treat the options to cooperate or defect only as an atomic action, which cannot satisfy the complexity of the real world. In recent research, these options to cooperate or defect were temporally extended. Here, we propose a novel adherence-based multi-agent reinforcement learning algorithm for achieving cooperation and coordination by rewarding agents who adhere to other agents. The evaluation of adherence is based on counterfactual reasoning. During training, each agent observes the changes in the actions of other agents by replacing its current action, thereby calculating the degree of adherence of other agents to its behavior. Using adherence as an intrinsic reward enables agents to consider the collective, thus promoting cooperation. In addition, the adherence rewards of all agents are calculated in a decentralized way. We experiment in sequential social dilemma environments, and the results demonstrate the potential for the algorithm to enhance cooperation and coordination and significantly increase the scores of the deep RL agents. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning for Robots and Agents)
Show Figures

Figure 1

Review

Jump to: Research

23 pages, 5293 KiB  
Review
Reinforcement Learning in Game Industry—Review, Prospects and Challenges
by Konstantinos Souchleris, George K. Sidiropoulos and George A. Papakostas
Appl. Sci. 2023, 13(4), 2443; https://doi.org/10.3390/app13042443 - 14 Feb 2023
Cited by 3 | Viewed by 7524
Abstract
This article focuses on the recent advances in the field of reinforcement learning (RL) as well as the present state–of–the–art applications in games. First, we give a general panorama of RL while at the same time we underline the way that it has [...] Read more.
This article focuses on the recent advances in the field of reinforcement learning (RL) as well as the present state–of–the–art applications in games. First, we give a general panorama of RL while at the same time we underline the way that it has progressed to the current degree of application. Moreover, we conduct a keyword analysis of the literature on deep learning (DL) and reinforcement learning in order to analyze to what extent the scientific study is based on games such as ATARI, Chess, and Go. Finally, we explored a range of public data to create a unified framework and trends for the present and future of this sector (RL in games). Our work led us to conclude that deep RL accounted for roughly 25.1% of the DL literature, and a sizable amount of this literature focuses on RL applications in the game domain, indicating the road for newer and more sophisticated algorithms capable of outperforming human performance. Full article
(This article belongs to the Special Issue Deep Reinforcement Learning for Robots and Agents)
Show Figures

Figure 1

Back to TopTop