# Comprehensive Automated Driving Maneuvers under a Non-Signalized Intersection Adopting Deep Reinforcement Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- Considering the efficiency of comprehensive AD maneuvers (e.g., going straight ahead and left turn) at a non-signalized intersection over different MPRs and real traffic volumes by adopting the DRL method;
- Evaluating the performance of the set of clipped PPO hyperparameters in the context of comprehensive AD maneuvers for a non-signalized intersection;
- The meaningful development of traffic quality at a non-signalized intersection with a higher market penetration rate (MPR) and a lower traffic volume.

## 2. Methods

#### 2.1. Car-Following Model

_{a}refers to the bumper-to-bumper distance to the leading vehicle, v

_{a}is a host vehicle’s actual speed, and v

_{a−}

_{1}refers to the leading vehicle’s speed.

_{IDM}) was determined as follows:

_{0}is the desired velocity, $\delta $ is the acceleration exponent, s* is the desired headway, and s is the actual headway. In addition, the formula for desired headway is expressed as follows:

_{0}refers to the minimum headway, T is the time headway, $\Delta v$ is the different velocity, a is the acceleration term, and b is the comfortable deceleration.

#### 2.2. Reinforcement Learning Algorithm

#### 2.3. Proposed Method Architecture

_{θ}refers to the stochastic policy, logπ

_{θ}refers to the probabilities of the stochastic policy, and A

^{π,γ}refers to the advantage function.

#### 2.3.1. Initial Setting

#### 2.3.2. State

_{0}defines the coordinate of AVs, v

_{0}defines the speeds of AVs, d

_{l}defines the bumper-to-bumper headways of leading vehicles, v

_{l}defines the speeds of leading vehicles, d

_{f}defines the bumper-to-bumper headways of following vehicles, and v

_{f}defines the speeds of following vehicles.

#### 2.3.3. Action

#### 2.3.4. Observation

#### 2.3.5. Reward

_{des}defines the arbitrary desired speed and v ∈ R

_{k}defines the speeds of all vehicles at a non-signalized intersection.

#### 2.3.6. Termination

## 3. Hyperparameter Setting and Evaluation Indicators

#### 3.1. Hyperparameter Setting

#### 3.2. Evaluation Indicators

- Average speed: the mean value of the speed for all vehicles;
- Emissions: the mean value of emissions for all vehicles, including nitrogen oxide (NOx) and hydrocarbons (HC).

## 4. Experiments and Results

#### 4.1. Simulation Experiments

#### 4.2. Experimental Results

#### 4.2.1. Training Policy Evaluation

#### 4.2.2. The Efficiency of Leading AVs According to Measures of Effectiveness

#### 4.2.3. Comparison of the Entirely Human-Driven Vehicle Experiment

#### 4.2.4. Comparison of the Experiment with Go Straight Movements

## 5. Conclusions

- Consider the effects of the comprehensive turning (e.g., left turn, right turn, going straight, and lane change) for an urban network. Hence, it is necessary to make complex scenarios as real as possible;
- Compare our method to other machine learning algorithms aiming to achieve better performance of decision making for AVs under a mixed-traffic environment.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- World Health Organization. Global Status Report in Road Safety. Available online: https://www.who.int/publications/i/item/9789241565684 (accessed on 17 June 2018).
- Traffic Accident Analysis System. Available online: https://taas.koroad.or.kr/web/bdm/srs/selectStaticalReportsDetail.do (accessed on 30 December 2019).
- Singh, S. Critical Reasons for Crashes Investigated in the National Motor Vehicle Crash Causation Survey. NHTSA’s National Center for Statistics and Analysis. Available online: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812506 (accessed on 1 March 2018).
- Crayton, T.J.; Meier, B.M. Autonomous vehicles: Developing a public health research agenda to frame the future of transportation policy. J. Transp. Health
**2017**, 6, 245–252. [Google Scholar] [CrossRef] - Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM
**2017**, 60, 84–90. [Google Scholar] [CrossRef] - Schwarz, B. Lidar: Mapping the world in 3d. Nat. Photonics
**2010**, 4, 429. [Google Scholar] [CrossRef] - Mathew, T.V.; Ravishankar, K.V.R. Car-following behavior in traffic having mixed vehicle-types. Transp. Lett. Int. J. Transp. Res.
**2011**, 3, 113–126. [Google Scholar] [CrossRef] - Rajamani, R.; Zhu, C. Semi-Autonomous Adaptive Cruise Control Systems. IEEE Trans. Veh. Technol.
**2002**, 51, 1186–1192. [Google Scholar] [CrossRef] - David, L.C. Effect of adaptive cruise control systems on mixed traffic flow near an on ramp. Phys. A Mech. Appl.
**2007**, 379, 274–290. [Google Scholar] - Treiber, M.; Kesting, A. Car-Following Models Based on Driving Strategies. In Traffic Flow Dynamics: Data, Models and Simulations, 1st ed.; Spring: Berlin, Germany, 2013; pp. 181–204. [Google Scholar]
- He, S.L.; Guo, X.; Ding, F.; Qi, Y.; Chen, T. Freeway Traffic Speed Estimation of Mixed Traffic Using Data from Connected and Autonomous Vehicles with a Low Penetration Rate. J. Adv. Transp.
**2020**, 1178, 1361583. [Google Scholar] [CrossRef] - Wymann, B.; Espi’e, E.; Guionneau, C.; Dimitrakakis, R.; Sumner, A. TORCS, the Open Racing Car Simulator. Available online: http://www.torcs.org (accessed on 12 March 2015).
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, CA, USA, 13–15 November 2017; pp. 1–16. [Google Scholar]
- Krajzewicz, D.; Erdmann, J.; Behrisch, M.; Bieker, L. Recent development and applications of sumo-simulation of urban mobility. Int. J. Adv. Syst. Meas.
**2012**, 5, 128–243. [Google Scholar] - Vinitsky, E.; Kreidieh, A.; Le Flem, L.; Kheterpal, N.; Jang, K.; Wu, F.; Liaw, R.; Liang, E.; Bayen, A.M. Benchmarks for Reinforcement Learning in Mixed-Autonomy Traffic. In Proceedings of the Conference on Robot Learning, Zürich, Switzerland, 29–31 October 2018. [Google Scholar]
- Bai, Z.; Cai, B.; ShangGuan, W.; Chai, L. Deep Learning Based Motion Planning for Autonomous Vehicle Using Spatiotemporal LSTM Network. In Proceedings of the 2018 Chinese Automation Congress (CAC), Xi’an, China, 30 November–2 December 2018; pp. 1610–1614. [Google Scholar]
- Bellman, R. A Markovian Decision Process. J. Math. Mech.
**1957**, 6, 679–684. [Google Scholar] [CrossRef] - Hubmann, C.; Schulz, J.; Becker, M.; Althoff, D.; Stiller, C. Automated driving in uncertain environments: Planning with interaction and uncertain maneuver prediction. IEEE Trans. Intell. Veh.
**2018**, 3, 5–17. [Google Scholar] [CrossRef] - Song, W.; Xiong, G.; Chen, H. Intention-aware autonomous driving decision-making in an uncontrolled intersection. Math. Probl. Eng.
**2016**, 2016, 1025349. [Google Scholar] [CrossRef] - Tan, K.L.; Poddar, S.; Sarkar, S.; Sharma, A. Deep Reinforcement Learning for Adaptive Traffic Signal Control. In Proceedings of the ASME 2019 Dynamic Systems and Control Conference (DSCC), Park City, UT, USA, 9–11 October 2019. [Google Scholar]
- Gu, J.; Fang, Y.; Sheng, Z.; Wen, P. Double Deep Q-Network with a Dual-Agent for Traffic Signal Control. Appl. Sci.
**2020**, 10, 1622. [Google Scholar] [CrossRef] [Green Version] - Shu, K.; Yu, H.; Chen, X.; Wang, Q.; Li, L.; Cao, D. Autonomous Driving at Intersections: A Critical-Turning-Point Approach for Left Turns. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–6. [Google Scholar]
- Liu, T.; Mu, X.; Huang, B.; Tang, X.; Zhao, F.; Wang, X.; Cao, D. Decision-making at Unsignalized Intersection for Autonomous Vehicles: Left-turn Maneuver with Deep Reinforcement Learning. arXiv
**2020**, arXiv:2008.06595. [Google Scholar] - Tran, Q.D.; Bae, S.H. Improved Responsibility-Sensitive Safety Algorithm Through a Partially Observable Markov Decision Process Framework for Automated Driving Behavior at Non-Signalized Intersection. Int. J. Automot. Technol.
**2021**, 22, 301–314. [Google Scholar] [CrossRef] - Tran, Q.D.; Bae, S.H. Proximal Policy Optimization Through a Deep Reinforcement Learning Framework for Multiple Autonomous Vehicles at a Non-Signalized Intersection. Appl. Sci.
**2020**, 10, 5722. [Google Scholar] [CrossRef] - Tran, Q.D.; Bae, S.H. An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning. Appl. Sci.
**2021**, 11, 1514. [Google Scholar] [CrossRef] - Milanes, V.; Shladover, S.E. Modeling cooperative and autonomous adaptive cruise control dynamic response using experimental data. Transp. Res. Part C Emerg. Technol.
**2014**, 48, 285–300. [Google Scholar] [CrossRef] - Rumelhart, D.; Hinton, G.; Williams, R. Learning representations by back-propagating errors. Nature
**1986**, 323, 533–536. [Google Scholar] [CrossRef] - Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv
**2017**, arXiv:1707.06347. [Google Scholar]

**Figure 4.**The leading automated vehicle experiments in the context of a non-signalized intersection: (

**a**) mixed-traffic conditions with market penetration rates ranging from 20% to 80%; (

**b**) fully autonomous conditions with a 100% market penetration rate.

**Figure 5.**The entirely human-driven vehicle experiment in the context of a non-signalized intersection.

**Figure 6.**A comparison experiment with go straight movements in the context of a non-signalized intersection: (

**a**) mixed-traffic conditions with market penetration rates ranging from 20% to 80%; (

**b**) fully autonomous conditions with a 100% market penetration rate.

**Figure 9.**Emissions with different market penetration rates and real traffic volumes: (

**a**) NOx; (

**b**) HC.

**Figure 10.**A comparison of the entirely human-driven vehicle experiment with different traffic volumes: (

**a**) Average speeds; (

**b**) Average rewards.

Parameters | Value |
---|---|

Desired velocity | 15 m/s |

Time headway | 1.0 s |

Minimum headway | 2.0 m |

Acceleration exponent | 4.0 |

Acceleration | 1.0 m/s^{2} |

Comfortable acceleration | 1.5 m/s^{2} |

Desired velocity | 15 m/s |

Parameters | Value |
---|---|

Number of training iterations | 200 |

Time horizon per training iteration | 6000 |

Hidden layers | 256 × 256 × 256 |

GAE Lambda | 1.0 |

Clip parameter | 0.2 |

Step size | 5 × 10^{4} |

Value function clip parameter | 10^{4} |

Number of SGD iterations | 10 |

Parameters | Value |
---|---|

Lane width | 3.2 m |

Number of lanes in each direction | 2 |

Length in each direction | 420 m |

Maximum acceleration | 3 m/s^{2} |

Minimum acceleration | −3 m/s^{2} |

Maximum speed | 12 m/s |

Horizon | 600 |

Traffic volume | 1000 veh/h |

**Table 4.**A comparison of the proposed experiment and the experiment with going straight movements with a traffic volume of 1000 veh/h.

Market Penetration Rates (%) | Average Speed (m/s) | |
---|---|---|

Proposed Experiment | Experiment with Going Straight Movements | |

20 | 6.09 | 6.56 |

40 | 6.27 | 6.71 |

60 | 6.31 | 7.01 |

80 | 6.70 | 7.24 |

100 | 6.93 | 7.51 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Tran, Q.-D.; Bae, S.-H.
Comprehensive Automated Driving Maneuvers under a Non-Signalized Intersection Adopting Deep Reinforcement Learning. *Appl. Sci.* **2022**, *12*, 9653.
https://doi.org/10.3390/app12199653

**AMA Style**

Tran Q-D, Bae S-H.
Comprehensive Automated Driving Maneuvers under a Non-Signalized Intersection Adopting Deep Reinforcement Learning. *Applied Sciences*. 2022; 12(19):9653.
https://doi.org/10.3390/app12199653

**Chicago/Turabian Style**

Tran, Quang-Duy, and Sang-Hoon Bae.
2022. "Comprehensive Automated Driving Maneuvers under a Non-Signalized Intersection Adopting Deep Reinforcement Learning" *Applied Sciences* 12, no. 19: 9653.
https://doi.org/10.3390/app12199653