Platooning Cooperative Adaptive Cruise Control for Dynamic Performance and Energy Saving: A Comparative Study of Linear Quadratic and Reinforcement LearningBased Controllers
Abstract
:1. Introduction
2. Model Description
2.1. Vehicle Model
2.2. Spacing Policy
3. LQ Controller
3.1. Open Loop System: Model Linearization
3.2. Control: Closed Loop System
3.2.1. Tuning of Q and R Matrices and Time Headway
 Comfortable drive: to ensure comfort for each vehicle, threshold values for longitudinal acceleration and jerk are set to ${a}_{x}<2\frac{m}{{s}^{2}}$ and $\frac{d{a}_{x}}{dt}<0.9\frac{m}{{s}^{3}}$, as specified by the authors of [26].
 Safety: the control system must be able to safely stop each vehicle in the platoon, namely avoiding rearend collisions, when an emergency braking condition occurs at the maximum speed permitted by the regulations, e.g., considering highway speed limits (80 km/h for HDVs).
3.2.2. WLTP Class 3 Driving Cycle
3.2.3. Emergency Braking Manoeuvre
Driving Cycle  $\mathbf{Jerk}[\mathbf{m}/{\mathbf{s}}^{3}$] (Limit 0.9 m/s^{3})  ${\mathit{a}}_{\mathit{x}}$$[\mathbf{m}/{\mathbf{s}}^{2}$] (Limit 2 m/s^{2})  

Vehicle  ${\mathit{t}}_{\mathit{h}}=1.5\mathbf{s}$  ${\mathit{t}}_{\mathit{h}}=3\mathbf{s}$  ${\mathit{t}}_{\mathit{h}}=4.5\mathbf{s}$  ${\mathit{t}}_{\mathit{h}}=1.5\mathbf{s}$  ${\mathit{t}}_{\mathit{h}}=3\mathbf{s}$  ${\mathit{t}}_{\mathit{h}}=4.5\mathbf{s}$  
Follower 1  ${R}_{0}={10}^{7}$  0.156  0.113  0.088  0.49  0.423  0.373 
${R}_{0}={10}^{5}$  0.154  0.111  0.086  0.495  0.428  0.374  
${R}_{0}={10}^{3}$  0.148  0.106  0.082  0.505  0.431  0.374  
Follower 2  ${R}_{0}={10}^{7}$  0.122  0.073  0.049  0.452  0.36  0.294 
${R}_{0}={10}^{5}$  0.117  0.070  0.047  0.453  0.359  0.293  
${R}_{0}={10}^{3}$  0.108  0.063  0.042  0.456  0.354  0.287 
4. RL Control System
4.1. Reinforcement Learning for Control
4.2. The DDPG Algorithm
 Actor, which receives observations and returns actions (thus playing the role of the policy).
 The critic, which receives the observations and the actions taken by the actor, evaluates them and returns a prediction of the discounted longterm reward.
Algorithm 1: DDPG Algorithm 

4.3. Agent Structure
 ${t}_{h,1}$ and ${t}_{h,2}$ are the time headway of follower 1 with respect to the leader and follower 2 with respect to follower 1, respectively;
 ${v}_{L}$ and ${a}_{L}$ are the velocity and acceleration of the leader;
 ${v}_{F1}$ and ${v}_{F2}$ are the velocities of follower 1 and follower 2.
 
 The reward is set to the maximum negative value (−10);
 
 The corresponding training iteration has been stopped.
4.4. Evaluation of the RL Agent
 percentage energy savings with respect to the leader vehicle,
 RMS of time headway,
 RMS of time headway error with the desired one
 RMS of jerk.
5. Results
 
 Standard driving cycle: the leader vehicle tracks the FTP75 driving cycle (saturated to a minimum speed value of 2 m/s to avoid backward movement of the platoon), and the followers try to keep the reference intervehicle distance defined by a linear spacing policy and a reference platoon velocity.
 
 Cutin scenario: a new vehicle (not belonging to the platoon) invades the lane occupied by the platoon, thus altering the equilibrium of the controlled system. The controller’s ability to react when an unexpected obstacle breaks the platoon’s equilibrium can therefore be verified.
5.1. String Stability
5.2. LQC—RL Comparison
5.2.1. Driving Cycle Results
5.2.2. CutIn Scenario Results
6. Conclusions
 The proposed model of the truck platoon includes the dependency of the aerodynamic drag with the intervehicle distance; it is vital to quantify the fuel savings of each vehicle in the platoon.
 The virtual environment that has been developed enables one to tune and train classical and AI controllers and assess platoon performance under different driving cycles.
 The LQC controller is string stable across the entire frequency range, while the RLbased controller may have a limited bandwidth of string stability.
 Regardless of the type of controller, a linear spacing policy proved to be a suitable choice to meet all the requirements (dynamic performance and energy savings).
 The training of RL provides satisfactory results even in the case of driving cycles different from the ones used for the learning phase of the agent.
 The simulation results of an RLbased controller are affected by nonlinearities; this controlrelated nonlinear behaviour comes from the combined operation of the safetyrelated penalty on the reward function and the activation functions of the neural network.
 The comparison through the selected performance indices (i.e., acceleration and jerk, final SOC, and energy consumption) during a standard driving cycle showed that, by properly selecting the reward function, RL and LQC achieve similar dynamic and energetic targets.
 The comparison during a cutin simulation scenario showed that both controllers properly reduced the vehicle speed when the disturbance occurred, avoiding accidents.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix B
 ${\mathrm{B}\mathrm{B}}^{+}\mathrm{B}=\mathrm{B}$
 ${\mathrm{B}}^{+}\mathrm{B}{\mathrm{B}}^{+}={\mathrm{B}}^{+}$
 ${\left({\mathrm{B}\mathrm{B}}^{+}\right)}^{\mathrm{T}}=\mathrm{B}{\mathrm{B}}^{+}$
 ${\left({\mathrm{B}}^{+}\mathrm{B}\right)}^{\mathrm{T}}={\mathrm{B}}^{+}\mathrm{B}$
References
 Alam, A.; Gattami, A.; Johansson, K.H.; Tomlin, C.J. Guaranteeing safety for heavy duty vehicle platooning: Safe set computations and experimental evaluations. Control. Eng. Pr. 2014, 24, 33–41. [Google Scholar] [CrossRef]
 Vahidi, A.; Sciarretta, A. Energy saving potentials of connected and automated vehicles. Transp. Res. Part C Emerg. Technol. 2018, 95, 822–843. [Google Scholar] [CrossRef]
 Tsugawa, S.; Jeschke, S.; Shladovers, S.E. A review of truck platooning projects for energy savings. IEEE Trans. Intell. Veh. 2016, 1, 68–77. [Google Scholar] [CrossRef]
 Bergenhem, C.; Pettersson, H.; Coelingh, E.; Englund, C.; Shladover, S.; Tsugawa, S. Overview of platooning systems. In Proceedings of the 19th ITS World Congress, Vienna, Austria, 22–26 October 2012. [Google Scholar]
 Ellis, M.; Gargoloff, J.I.; Sengupta, R. Aerodynamic Drag and Engine Cooling Effects on Class 8 Trucks in Platooning Configurations. SAE Int. J. Commer. Veh. 2015, 8, 732–739. [Google Scholar] [CrossRef]
 Kaluva, S.T.; Pathak, A.; Ongel, A. Aerodynamic drag analysis of autonomous electric vehicle platoons. Energies 2020, 13, 4028. [Google Scholar] [CrossRef]
 McAuliffe, B.; Lammert, M.; Lu, X.Y.; Shladover, S.; Surcel, M.D.; Kailas, A. Influences on Energy Savings of Heavy Trucks Using Cooperative Adaptive Cruise Control. In SAE Technical Papers; SAE International: Warrendale, PA, USA, 2018. [Google Scholar] [CrossRef]
 Zhang, R.; Li, K.; Wu, Y.; Zhao, D.; Lv, Z.; Li, F.; Chen, X.; Qiu, Z.; Yu, F. A MultiVehicle Longitudinal Trajectory Collision Avoidance Strategy Using AEBS with VehicleInfrastructure Communication. IEEE Trans. Veh. Technol. 2022, 71, 1253–1266. [Google Scholar] [CrossRef]
 Wu, C.; Xu, Z.; Liu, Y.; Fu, C.; Li, K.; Hu, M. Spacing policies for adaptive cruise control: A survey. IEEE Access 2020, 8, 50149–50162. [Google Scholar] [CrossRef]
 Gunter, G.; Gloudemans, D.; Stern, R.E.; McQuade, S.; Bhadani, R.; Bunting, M.; Monache, M.L.D.; Lysecky, R.; Seibold, B.; Sprinkle, J.; et al. Are Commercially Implemented Adaptive Cruise Control Systems String Stable? IEEE Trans. Intell. Transp. Syst. 2021, 22, 6992–7003. [Google Scholar] [CrossRef]
 Naus, G.J.L.; Vugts, R.P.A.; Ploeg, J.; Van De Molengraft, M.J.G.; Steinbuch, M. Stringstable CACC design and experimental validation: A frequencydomain approach. IEEE Trans. Veh. Technol. 2010, 59, 4268–4279. [Google Scholar] [CrossRef]
 Besselink, B.; Johansson, K.H. String Stability and a DelayBased Spacing Policy for Vehicle Platoons Subject to Disturbances. IEEE Trans. Autom. Control 2017, 62, 4376–4391. [Google Scholar] [CrossRef]
 Sugimachi, T.; Fukao, T.; Suzuki, Y.; Kawashima, H. Development of autonomous platooning system for heavyduty trucks? In IFAC Proceedings Volumes (IFACPapersOnline); IFAC Secretariat: Laxenburg, Austria, 2013; pp. 52–57. [Google Scholar] [CrossRef]
 Turri, V.; Besselink, B.; Johansson, K.H. Cooperative LookAhead Control for FuelEfficient and Safe HeavyDuty Vehicle Platooning. IEEE Trans. Control Syst. Technol. 2017, 25, 12–28. [Google Scholar] [CrossRef]
 Ward, J.W.; Stegner, E.M.; Hoffman, M.A.; Bevly, D.M. A Method of Optimal Control for Class 8 Vehicle Platoons Over Hilly Terrain. J. Dyn. Syst. Meas. Control 2022, 144, 011108. [Google Scholar] [CrossRef]
 Gao, W.; Gao, J.; Ozbay, K.; Jiang, Z.P. ReinforcementLearningBased Cooperative Adaptive Cruise Control of Buses in the Lincoln Tunnel Corridor with TimeVarying Topology. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3796–3805. [Google Scholar] [CrossRef]
 Yang, J.; Liu, X.; Liu, S.; Chu, D.; Lu, L.; Wu, C. Longitudinal tracking control of vehicle platooning using DDPGbased PID. In Proceedings of the 2020 4th CAA International Conference on Vehicular Control and Intelligence, CVCI 2020, Hangzhou, China, 18–20 December 2020; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2020; pp. 656–661. [Google Scholar] [CrossRef]
 Peake, A.; McCalmon, J.; Raiford, B.; Liu, T.; Alqahtani, S. MultiAgent Reinforcement Learning for Cooperative Adaptive Cruise Control. In Proceedings of the International Conference on Tools with Artificial Intelligence, ICTAI, Baltimore, MD, USA, 9–11 November 2020; IEEE Computer Society: Washington, DC, USA, 2020; pp. 15–22. [Google Scholar] [CrossRef]
 Chu, T.; Kalabić, U. Modelbased deep reinforcement learning for CACC in mixedautonomy vehicle platoon. In Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; pp. 4079–4084. [Google Scholar] [CrossRef]
 Lin, Y.; McPhee, J.; Azad, N.L. Comparison of Deep Reinforcement Learning and Model Predictive Control for Adaptive Cruise Control. IEEE Trans. Intell. Veh. 2021, 6, 221–231. [Google Scholar] [CrossRef]
 Hussein, A.A.; Rakha, H.A. Vehicle Platooning Impact on Drag Coefficients and Energy/Fuel Saving Implications. IEEE Trans. Veh. Technol. 2022, 71, 1199–1208. [Google Scholar] [CrossRef]
 Vogel, K. A comparison of headway and time to collision as safety indicators. Accid. Anal. Prev. 2003, 35, 427–433. [Google Scholar] [CrossRef]
 Ostertag, E. Mono and Multivariable Control and Estimation: Linear, Quadratic and LMI Methods; Springer Science & Business Media: Berlin, Germany, 2011. [Google Scholar] [CrossRef]
 Dimauro, L.; Tota, A.; Galvagno, E.; Velardocchia, M. Torque Allocation of Hybrid Electric Trucks for Drivability and Transient Emissions Reduction. Appl. Sci. 2023, 13, 3704. [Google Scholar] [CrossRef]
 Barata, J.C.A.; Hussein, M.S. The MoorePenrose Pseudoinverse: A Tutorial Review of the Theory. Braz. J. Phys. 2012, 42, 146–165. [Google Scholar] [CrossRef]
 Bae, I.; Moon, J.; Jhung, J.; Suk, H.; Kim, T.; Park, H.; Cha, J.; Kim, J.; Kim, D.; Kim, S. SelfDriving like a Human driver instead of a Robocar: Personalized comfortable driving experience for autonomous vehicles. arXiv 2020, arXiv:2001.03908. [Google Scholar]
 Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
 Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
 Acquarone, M.; Borneo, A.; Misul, D.A. Acceleration control strategy for Battery Electric Vehicle based on Deep Reinforcement Learning in V2V driving. In Proceedings of the 2022 IEEE Transportation Electrification Conference & Expo (ITEC), Anaheim, CA, USA, 15–17 June 2022. [Google Scholar]
 Wei, S.; Zou, Y.; Zhang, T.; Zhang, X.; Wang, W. Design and experimental validation of a cooperative adaptive cruise control system based on supervised reinforcement learning. Appl. Sci. 2018, 8, 1014. [Google Scholar] [CrossRef]
 MATLAB & Simulink—MathWorks Italia. Train DDPG Agent for Adaptive Cruise Contro; MATLAB & Simulink—MathWorks Italia: Torino, Italy, 2023. [Google Scholar]
${\mathit{R}}_{0}$  ${\mathit{t}}_{\mathit{h}}\left[\mathbf{s}\right]$ 

${10}^{7}$  $1.5,3,4.5$ 
${10}^{5}$  $1.5,3,4.5$ 
${10}^{3}$  $1.5,3,4.5$ 
Quantity  Symbol  Value 

Equivalent vehicle mass  ${{m}_{tot}}_{i}$  13,175 kg 
Wheel radius  ${r}_{w,i}$  0.5715 m 
Electric motor maximum power  ${P}_{m,max}$  300 kW 
Electric motor maximum torque  ${T}_{m,max}$  600 Nm 
Total transmission efficiency  $\eta $  95% 
Total transmission ratio  $\tau $  19.74 
Battery max capacity  ${Q}_{batt}$  693 Ah 
Battery Nominal Voltage  ${V}_{batt,n}$  500 V 
Initial SOC  ${SOC}_{0}$  80% 
Isolated vehicles drag coefficient  ${c}_{x,0}$  0.57 
Air density  $\rho $  1.2 
Vehicle frontal Area  ${A}_{f,i}$  8.9 ${\mathrm{m}}^{2}$ 
Roadtyre friction coefficient  $\mu $  0.9 
Rolling resistance coefficient  ${f}_{0}$  0.0041 
Rolling resistance coefficient  ${f}_{2}$  0 
Time headway  ${t}_{h,des}$  1.4 s 
Nominal speed  ${v}_{n}$  80 km/h 
Nominal intervehicle distance  ${d}_{n}$  36 m 
Nominal road slope  ${\alpha}_{n}$  5° 
LQC: Q—matrix  $\mathrm{Q}$  $\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}\left({{10}^{2},10}^{5},{{10}^{2},10}^{5},20,20\right)$ 
LQC: R—matrix  $\mathrm{R}$  ${10}^{5}\times \mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}\left(\mathrm{1,1}\right)$ 
RL: reward function weight 1  ${w}_{1}$  1 
RL: reward function weight 2  ${w}_{2}$  1 
RL: Actor learning rate  ${lr}_{a}$  $5\times {10}^{5}$ 
RL: Critic learning rate  ${lr}_{c}$  ${10}^{4}$ 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Borneo, A.; Zerbato, L.; Miretti, F.; Tota, A.; Galvagno, E.; Misul, D.A. Platooning Cooperative Adaptive Cruise Control for Dynamic Performance and Energy Saving: A Comparative Study of Linear Quadratic and Reinforcement LearningBased Controllers. Appl. Sci. 2023, 13, 10459. https://doi.org/10.3390/app131810459
Borneo A, Zerbato L, Miretti F, Tota A, Galvagno E, Misul DA. Platooning Cooperative Adaptive Cruise Control for Dynamic Performance and Energy Saving: A Comparative Study of Linear Quadratic and Reinforcement LearningBased Controllers. Applied Sciences. 2023; 13(18):10459. https://doi.org/10.3390/app131810459
Chicago/Turabian StyleBorneo, Angelo, Luca Zerbato, Federico Miretti, Antonio Tota, Enrico Galvagno, and Daniela Anna Misul. 2023. "Platooning Cooperative Adaptive Cruise Control for Dynamic Performance and Energy Saving: A Comparative Study of Linear Quadratic and Reinforcement LearningBased Controllers" Applied Sciences 13, no. 18: 10459. https://doi.org/10.3390/app131810459