Design and Simulation-Based Optimization of an Intelligent Autonomous Cruise Control System

Andalibi, Milad; Shourangizhaghighi, Alireza; Hajihosseini, Mojtaba; Madani, Seyed Saeed; Ziebert, Carlos; Boudjadar, Jalil

doi:10.3390/computers12040084

Open AccessArticle

Design and Simulation-Based Optimization of an Intelligent Autonomous Cruise Control System

by

Milad Andalibi

¹,

Alireza Shourangizhaghighi

²,

Mojtaba Hajihosseini

¹,

Seyed Saeed Madani

³

,

Carlos Ziebert

³

and

Jalil Boudjadar

^4,*

¹

Department of Control and Computer Engineering, University of Zagreb Croatia, 1000 Zagreb, Croatia

²

Department of Mechanical Engineering, Shiraz University of Technology, Shiraz 71557, Iran

³

Institute of Applied Materials-Applied Materials Physics (IAM-AWP), Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany

⁴

Department of Electrical and Computer Engineering, Aarhus University Denmark, 8200 Aarhus, Denmark

^*

Author to whom correspondence should be addressed.

Computers 2023, 12(4), 84; https://doi.org/10.3390/computers12040084

Submission received: 28 February 2023 / Revised: 17 April 2023 / Accepted: 18 April 2023 / Published: 20 April 2023

(This article belongs to the Special Issue Recent Advances in Digital Twins and Cognitive Twins)

Download

Browse Figures

Versions Notes

Abstract

:

Significant progress has recently been made in transportation automation to alleviate human faults in traffic flow. Recent breakthroughs in artificial intelligence have provided justification for replacing human drivers with digital control systems. This paper proposes the design of a self-adaptive real-time cruise control system to enable path-following control of autonomous ground vehicles so that a self-driving car can drive along a road while following a lead vehicle. To achieve the cooperative objectives, we use a multi-agent deep reinforcement learning (MADRL) technique, including one agent to control the acceleration and another agent to operate the steering control. Since the steering of an autonomous automobile could be adjusted by a stepper motor, a well-known DQN agent is considered to provide the discrete angle values for the closed-loop lateral control. We performed a simulation-based analysis to evaluate the efficacy of the proposed MADRL path following control for autonomous vehicles (AVs). Moreover, we carried out a thorough comparison with two state-of-the-art controllers to examine the accuracy and effectiveness of our proposed control system.

Keywords:

autonomous vehicles; cruise control; multi-agent deep reinforcement learning; path following control; artificial intelligence

1. Introduction

With the rapid technological advances, autonomous vehicles have received extensive attention in the past decade [1,2]. Increasing safety by computers has led to considerable benefits, including omitting human errors in critical circumstances, improving occupants’ comfort, reducing traffic problems, and reducing environmental impacts, which are among the main impetuses for the automation of driving. Any autonomous driving system consists of several perception level tasks, which must be considered for the design of such a system. Autonomous driving tasks are normally divided into three categories, namely navigation, guidance, and stabilization [3]. Although autonomous vehicles have the potential to revolutionize the transportation industry, there are significant risks associated with their use that must be taken into account, such as cybersecurity concerns, technical failures, and ethical implications. One of the major challenges in self-driving cars is the path following control in which the vehicle keeps the cruise velocity and a safe distance while following another vehicle simultaneously.

Different practical control methodologies have been studied to deliver the capability to keep the cruise velocity and safe distance, such as active steering [3], differential braking [4], integrated chassis control [5] and torque vectoring [6]. In particular, the driver-assist system, which has been extended in [7], enables better lane-keeping and tracking control. In [8], the authors presented a smooth route control for autonomous transport satisfying both the initial and final circumstances, where the restriction conditions are implemented as a parameterized 6th-order polynomial model [9]. In such a study, the tracking control module along with the model predictive control technique were used. In [10], a developed Kalman filter and linear time-varying model predictive control scheme are applied to predict the future trajectory of an autonomous vehicle (AV), determine the optimal path, and optimize control [9]. A common denominating challenge for the use of machine learning in designing autonomous control applications is the dependency of the decision making no training data, where experiencing learning data that is completely different from training data may lead to inconsistent decisions.

Designing control systems to regulate both throttle and brake is the key part of adaptive cruise control since it ensures the vehicle will keep the speed response of the antecedent vehicle, and consequently retaining a safe inter-vehicle distance under the limitations of driving [11]. In [12], an adaptive cruise control structure considering a variety of funnel controllers are introduced. In [13], the authors presented a least-violating control in the application of cruise control to regulate the system with the properties of safety, uniform reachability, and uniform attractivity. In [14], a varying prediction zone nonlinear model predictive control (NMPC) using a continuation/generalized minimal residual (C/GMRES) optimizer with a dead zone penalty function to guarantee the smoothness and meet the inequalities is designed in the path-following control application. A sliding mode control for speed control in AVs has been applied in [15]. Robust H-infinity control methods are investigated in [16]. A super-twisting sliding mode is presented in [17] based on Lyapunov stability proofed by the backstepping method in the application field of AVs path-following control. These methods demonstrated acceptable outcomes based on analytic control; they require not only complex design but also, are unable to consider unknown uncertainties due to the intense dynamic structure of automobiles. Thus, focusing on the complete mathematical model is impractical. To address this issue and reach high accuracy, reinforcement learning techniques can be adopted to look for optimal controllers for systems with undetermined or highly nonlinear and stochastic dynamics.

Deep reinforcement learning (DRL), which is well-known as an efficient learning framework, is able to train an agent to impressively find the right control command signal by interacting with the system in order to optimize the reward function [18]. Recent significant advances in DRL have prompted the application of this technique in various fields of engineering. DRL algorithms are divided into three categories: 1-continuous; 2-discrete; 3-continuous or discrete. Based on the environment (or system) type, the appropriate algorithm needs to be selected. In the field of transportation, efforts have been devoted to using RL in the AVs path-following control. Unmanned vehicle track control can be divided into three environments: 1-land, 2-water and underwater, and 3-aerial. In [19], a Deep Deterministic Policy Gradient (DDPG) agent was adopted to find a suitable vessel steering policy in the presence of the ocean current. In [20], a neural network (NN)-based RL algorithm was adopted to predict the unknown disturbances, parameter uncertainties, and nonlinearities of autonomous underwater vehicles in trajectory tracking. To obtain adaptive control in AUVs, the study in [21] relies on an actor-critic RL NN-based agent. The authors of [22] presented a strategy for AUV route following by combining the benefits of DRL with interactive RL, which receives a reward from both the environment and the human operator at the same time.

In aerial field path-following, Rubi et al. [23] implemented Q-learning agent for an airship to mitigate the curse of dimensionality problem. In [24], a DDPG agent for a quadrotor was investigated, and its sustainability and performance in the path following in the presence of wind turbulence and other disturbances were probed.

Gabriel et al. proposed a model-based RL (MBRL) for high-speed autonomous driving path tracking [25]. They combined Failure Prediction and Intervention Module (FIM) with MBRL to achieve high performance in a self-driving system. Charles et al. [26] proposed a scheme in which they succeeded in obtaining high-performance longitudinal control by an NN-based policy gradient algorithm. Wang et al. [27] applied the reinforcement learning approach to learn the automated lane change behavior in an interactive driving environment.

In the aforementioned literature, most of the studies consider a single control feature of the autonomous driving system, for example, either steering control or speed control, or only the issue of constant speed tracking has been addressed. In this paper, we propose a DRL-based solution to control a constellation of driving system features simultaneously, namely speed control, steering control, and safety. Following the acceleration and steering control, in order to control the speed and distance while following the lead vehicle, two DRL agents are considered. The first agent controls the steering wheel of the car, while the second agent manages the acceleration according to speed and distance. Simulation-based experiments are conducted to test the accuracy, efficiency, and response time of the proposed DRL-based control solution. Although the algorithms we propose are formed by the integration of three agents supervising the key features of conventional vehicles, for the same level of complexity, each agent is considered a black box, where, for example, the speed to be applied is computed but the details related to how much fuel to inject and acceleration are omitted as these parameters are dependent on the actual state of the AV and environment.

The rest of the paper Is organized as follows. The dynamic Model of the ground autonomous vehicle is presented in Section 2. The design of the MADRL controller is described in Section 3. Steering, acceleration, and a DQN agent are provided in Section 4, Section 5, and Section 6, respectively. Simulation results are discussed in Section 7. Finally, Section 8 concludes the paper.

2. Two Degree of Freedom Dynamic Model of Ground Autonomous Vehicles

It is important to have a brief overview of the mathematical model to pave the ground for connecting the control agents later. Hence, a schematic representation of the proposed model is depicted in Figure 1. Note that

ψ

is the yaw angle,

\dot{ψ}

shows the yaw rate,

v_{x}

and

v_{y}

determines the velocity according to the vehicle coordination,

v_{l f}

and

v_{c f}

are the longitudinal and lateral velocity of the lead vehicle wheel,

v_{f}

is their result vector,

δ

is the front wheel angle, and

F_{l}

and

F_{c}

are longitudinal and lateral wheel forces. Thus, based on Newton’s second law of motion, dynamic equations governing the system are introduced as in [28]:

m {\dot{v}}_{x} = m {\dot{v}}_{y} \dot{ψ} + 2 F_{x f} + 2 F_{x r} m {\dot{v}}_{y} = - m {\dot{v}}_{x} \dot{ψ} + 2 F_{y f} + 2 F_{y r} I \ddot{ψ} = 2 l_{f} F_{y f} - 2 l_{r} F_{y r}

(1)

where

m

and

I

are the mass and inertia,

F_{x}

is the lateral force, and

F_{y}

is the lateral force at the center of gravity (CoG) of the vehicle. The yaw rate can be calculated as follows:

\dot{ψ} = \frac{v_{x}}{l_{f} + l_{r}} t a n (δ)

(2)

where

l_{f}

and

l_{r}

are the distances from the CoG. In addition, the position states can be obtained as:

\dot{X} = v_{x} c o s (ψ) - v_{y} s i n (ψ) \dot{Y} = v_{x} s i n (ψ) + v_{y} c o s (ψ)

(3)

F_{x}

and

F_{y}

which are the acting forces on the CoG, can be calculated by:

F_{x} = F_{l} c o s (δ) - F_{c} s i n (δ) F_{y} = F_{l} s i n (δ) - F_{c} c o s

(4)

the longitudinal

F_{l}

and lateral,

F_{c}

forces are shown as follows:

F_{l} = f (α, μ, s, F_{z}) F_{c} = f (α, μ, s, F_{z})

(5)

As shown in Figure 1,

α

is the angle between the wheel velocity vector and the wheel direction, and

μ

is the road friction coefficient. The difference between ground point velocity and the rotational velocity (slip ratio) is

s

, and

F_{z}

, which determines the vertical load action on the wheels. Under the assumption of having small

μ

values, the lateral tire forces can be obtained as:

F_{l f} = C_{f} α_{f} F_{l r} = C_{r} α_{r}

(6)

where

C_{f}

and

C_{r}

are the tire stiffness parameters and

α_{f} = δ - θ_{f} α_{r} = - θ_{r}

(7)

where

θ_{f} = a r c t a n [\frac{v_{y} + l_{f} \dot{ψ}}{v_{x}}]

and

θ_{r} = a r c t a n [\frac{v_{y} + l_{r} \dot{ψ}}{v_{x}}]

.

Assuming that the vehicle is traveling on flat ground, it is not affected by gravitational force but by air drag,

F_{a} = \frac{1}{2} C_{D} A_{a} ρ_{a} {(v + v_{w i n d})}^{2}

and rolling resistance,

F_{r} = D_{r} m g c o s (α)

.

C_{D}

is the air drag coefficient,

A_{a}

is the maximum cross-sectional area of the vehicle,

ρ_{a}

is the air density,

v_{w i n d}

is wind velocity,

D_{r}

is the roll resistance coefficient,

m

the vehicle mass, and

g

the gravitational acceleration. For convenience,

α

can be ignored because

v_{w i n d}

is very small in comparison with the vehicle velocity.

In conclusion, the equivalent dynamic state of the system can be expressed as follows:

\dot{X} = v_{x} c o s (ψ) - v_{y} s i n (ψ) \dot{Y} = v_{x} s i n (ψ) - v_{y} c o s (ψ) \dot{ψ} = \frac{v_{x}}{l_{f} + l_{r}} t g (δ) m \dot{v_{x}} = F_{x} + m v_{y} \dot{ψ} - 2 F_{c f} s i n (δ) - F_{a} - F_{r} m \dot{v_{y}} = - m v_{x} \dot{ψ} - 2 (F_{c f} c o s (δ) + F_{c r}) I \ddot{ψ} = 2 (l_{f} F_{c f} c o s (δ) - l_{r} F_{c r})

(8)

It should be noted that

δ

and

F_{x}

are the control input of the vehicle.

3. Design of MADRL Controller

To control the AV following a lead car, velocity and distance stabilization in tracking performance of conventional methodologies like Fuzzy Logic, PID, and model predictive controller (MPC) are restricted from the following regulatory aspects:

Since the time intervals in the simulation of vehicles are in microseconds, the computational time for designing the model-based schemes is very definitive in real-time.
Due to the destabilization properties and high nonlinear characteristics of AVs, achieving cruise velocity and tracking control simultaneously is one of the main objectives in this field. Thus, further efforts should be considered to not only mitigate the nonlinearities effects on optimal performance control, but also ensure the stability requirements.

As it can be seen in Figure 2, the main control approach is to keep a safe distance and a set velocity while tracking the lead vehicle. In this case, the ego car that is tracking the lead car must follow the following rules: 1-When the distance between the ego car and the lead car is greater than a safe distance, the ego car must speed up to track the set velocity; 2-Otherwise, the acceleration must be reduced to maintain the safe distance.

Due to the deficiency in the existing control methodologies, a MADRL-based scheme is proposed in this work in order to find a promising solution for the aforementioned challenges. Moreover, in the Multi Agent RL (MARL) algorithms, agents learn their own distinctive duties, which provides a helpful perspective on control. First, the nonlinear model of the system is employed so that sensors can be considered to identify system states and implement the suitable algorithm. Then, according to the system control inputs, the continuous agent is used for acceleration, while the discrete agent is considered for steering angle. As the acceleration force of the cars is continuous, an actor-critic model-free policy-based agent called Twin Delayed DDPG is used and compared to a DQN agent using a discrete view of acceleration. Yaw angle is set by a stepper motor, which has to be looked at completely in discrete values, and a DQN is utilized for this process.

3.1. Markov Decision Process

In the RL structure, a task can be determined by a Markov Decision Process (MDP) specified by a quintuple

\{S, A, r, p, γ\}

, where

S ϵ ℝ^{n}

indicates the state space,

A ϵ ℝ^{m}

shows the action space,

r : S \times A \to ℝ

indicates the reward function,

p : S \times A \times S \to [0, 1]

is the transition function, which represents the probability of transiting to a new state

s_{t + 1}

, emitting a reward

r

under the execution of action

a_{t}

on the state

s_{t}

, and

γ ϵ [0 1]

indicates the discount factor. With an initial state

s_{t}

, the RL is aimed to maximize the obtained rewards

E [\sum_{t = 0}^{\infty} γ^{t} r_{t}]

.

3.2. The Twin Delayed DDPG (TD3)

A twin-delayed deep deterministic policy gradient agent is an actor-critic RL agent that calculates an optimal policy so that it optimizes the long-term reward and acts on the environment continuously. Note that the TD3 algorithm is off-policy, model-free, and an online RL technique.

The TD3 algorithm, which is an extension of the DDPG algorithm, addresses DDPG function overestimation via learning two Q-values at a time, and during policy updating, it benefits from the minimum value function. Besides, it adds noise to target actions to explore the environment more effectively. To obtain the estimation of the policy and value functions, TD3 uses the following function approximators: (1) The actor

μ (S)

observes the system states (or observations) and

S

correspondingly acts on the environment in a way that maximizes the long-term reward. (2) To increase the stability of the optimization process, the agent

μ^{'} (S)

regularly updates the target actor parameters using the received values. (3) State set

S

and action set

A

input to the critics, then they give the related expectation of the long-term reward. (4) The previous procedure is periodically done at a determined time to update the target critics. The critics should follow the same structures, but their corresponding targets have to be the same. For better understanding, Figure 3 demonstrates the TD3 algorithm in detail.

4. MADRL AV TD3 Agent for Acceleration Control

Based on the knowledge of the AVs sensors outputs, the exact observations and reward function could be defined. Figure 4 shows that the observation and reward function consist of four parameters: (1) Relative distance (the difference between the lead and ego vehicles positions); (2) Lead vehicle velocity; (3) Ego vehicle velocity; and (4) Last exerted acceleration. It is important to explicitly address a series of calculations before dealing with observations. To do so, the safe distance between the lead car and the ego car,

D_{s a f e}

, is defined as:

D_{s a f e} = D_{d e f a u l t} + T_{g a p} V_{e g o}

(9)

where

D_{d e f a u l t}

is the pause default distance in meters,

T_{g a p}

shows the time gap between the vehicles in seconds, and

V_{e g o}

defines the ego vehicle velocity. The error distance is

D_{e r r o r} = D_{s a f e} - D_{r e l}

(10)

One should bear in mind that the velocity error is another significant issue, which can be calculated as follows:

V_{error} = {\begin{matrix} \min \{V_{l e a d}, V_{s e t}\} - V_{e g o} i f \min \{V_{l e a d}, V_{s e t}\} \\ V_{s e t} - V_{e g o} o t h e r w i s e \end{matrix}

(11)

where

V_{l e a d}

is the lead vehicle’s velocity, and

V_{s e t}

is the velocity at which the ego vehicle is set to drive. Therefore, the observation would be defined as

\{V_{e r r o r}, \int V_{e r r o r}, V_{l e a d}\}

.

To compose the reward function, the first step is to calculate the cost function. The cost function consists of the consumed energy, the control action, and the error in velocity with different weights. The cost function is defined as:

C o s t f u n c t i o n 1 = \sum w_{1} V_{e r r o r}^{2} + w_{2} F_{x}^{2}

(12)

where

w_{1}

and

w_{2}

are the weights of the considered values, and

F_{x}

(or

a

) is the acceleration. The reward function, which is expected to be maximized in the training process, is defined as the minus of the cost function. Throughout this study, the actor-critic TD3 consists of two fully connected hidden layers (HLs) with 50 neurons for both the actor and critic structures. The “rectified linear unit (ReLU)” activation function is applied to all HLs in the network. Readers are directed toward Figure 5 for a schematic configuration representation of the MADRL AV. Additionally, the parameters for the implemented algorithm are given in Table 1.

5. Deep-Q-Network (DQN)

A DQN agent is an off-policy, value-based RL agent that acts on the environment discretely. Moreover, the DQN algorithm is a model-free, online RL technique. It is challenging in complex state-action spaces to learn from the evaluation of the

Q

value of both state and action separately. In DRL, several agents’ parts, such as policy

Π (s, a)

or values

q (s, a)

are given with deep NNs. These NNs parameters are trained to minimize some loss functions through the gradient descent method. In DQN, deep networks and RL estimate values from NNs and the given states

S_{t}

. In each step, based on the current state, the agent chooses an action based on the action values

ϵ - g r e e d i l y

and the data

(S_{t}, A_{t}, R_{t + 1}, γ_{t + 1}, S_{t + 1})

which have all the preceding data in time

t

. NNs parameters are then trained using random gradient descent to minimize the following loss function:

{(R_{t + 1} + γ_{t + 1} \underset{a^{'}}{m a x} q_{θ} (S_{t + 1}, a^{'}) - q_{θ} (S_{t}, A_{t}))}^{2}

(13)

where

t

is the time step. The cost function gradient for updating

θ

is done via the back-propagation method.

θ

defines the target network’s parameters and is a copy of the online network over a given time period. Optimization is performed using RMSprop on small, sampled batches from replay memory. The structure of the DQN algorithm is shown in Figure 4.

6. The MADRL AV DQN Agent for Steering Angle Control

Similarly, as defined in the observations and reward function for TD3, for the purpose of having control over the steering angle, some calculations should be done for the DQN agent. As shown in Figure 5, the agent receives some inputs from the environment, including: 1-

{\dot{L}}_{e r r o r}

(Lateral error derivative); 2-

{\dot{Y}}_{e r r o r}

(Yaw error derivative); 3-

L_{e r r o r}

(Lateral error); 4-,

Y_{e r r o r}

(Yaw error); and 5-

θ

(Steering angle). Next, the observation set can be calculated as

\{L_{e r r o r}, {\dot{L}}_{e r r o r}, \int L_{e r r o r}, Y_{e r r o r}, {\dot{Y}}_{e r r o r}, \int Y_{e r r o r}\}

. To obtain the corresponding reward function, first the cost function should be calculated as follows:

C o s t f u n c t i o n 2 = \sum {w^{'}}_{1} L_{e r r o r}^{2} + {w^{'}}_{2} θ^{2}

(14)

where

{w^{'}}_{1}

and

{w^{'}}_{2}

are weights of the considered values, and

θ

is the steering angle. The reward function, which is expected to be maximized in the training process, is defined as the minus of the cost function.

In this work, similar to TD3 in part

C

, DQN consists of two fully connected HLs with 50 neurons. For all HLs in the network, the ReLU activation function is used. The configuration is specified in the lower agent of Figure 5, and the parameters for the implemented algorithm are provided in Table 2.

7. Results and Discussion

In this section, we evaluate the effectiveness of our proposed multi-agent DRL-based control technique and compare it to two of the state-of-the-art control alternatives, namely Holistic Adaptive Multi-Model Predictive Control (HMPC) and Hierarchical Predictive Control (HPC). HMPC [29] has a linear structure, which makes it perform well in real-time. It also has a weight-adaptive mechanism to improve its handling ability and a multi-model adaptive law to account for tire cornering stiffness uncertainties. HMPC benefits from a weight-adaptive structure to control the system in uncertain situation. HPC [30] provides a structure to switch between multiple model predictive controllers in order to decrease the response time. HPC switches between the models at runtime based on a metric called uncontrollable divergence, which reveals the divergence between predicted and true states caused by return time and model mismatch. The relevant parameters of the AV are given in Table 3. In this application, to investigate the reliability and effectiveness of the proposed MADRL controller, the errors of yaw angle, lateral distance error, distance, velocities, and control efforts are investigated. The iterations in which the agents are trained are studied too. Figure 6a plots the training reward against the number of completed epochs. As seen, the training reward improves with every epoch, indicating the algorithm’s gradual learning and performance enhancement.

To visualize the trend better, Figure 6b also shows the same training progress with an added trendline, which displays a clear improvement in the training reward over time, affirming the algorithm’s successful learning and performance enhancement.

As shown in Figure 6a,b, our MADRL based controller outperforms in stabilizing the system under favorable conditions of tracking the lead vehicle at a set velocity and avoiding crossing the safety distance. The progress depicted in Figure 6a,b resulted from the execution of our proposed algorithm on the training data, which was done through multiple epochs. Every epoch entailed iterating through the data set and adjusting the algorithm based on the feedback from the training reward.

The training reward is a metric that reflects the algorithm’s performance during training, which is determined by a specific objective function. This function aims to minimize the error between the algorithm’s predicted and actual outputs, and it gauges the degree to which the algorithm is learning and enhancing its performance.

To achieve this, as presented in Figure 7, in the early stages of driving to maintain a safe distance, the accelerations fluctuate sharply, and the range of acceleration decreases over time. In the MADRL algorithm, to achieve the set speed, the controller starts moving at a relatively high acceleration and reduces its acceleration smoothly over time, but in the two HPC and HMPC algorithms, the vehicle accelerates after about 7 s. In the velocity diagram, the MADRL method clearly proves its superiority over the two HPC and HMPC algorithms. It can be seen in Figure 8 that the two alternative algorithms can track a velocity of 30 m/s with overshoots along with some fluctuations. However, the trained neural network with MADRL controls the velocity in the way that it benefits both smooth tracking behavior and fast settling time. This optimized behavior of the system with MADRL in steering angle changes is also shown in Figure 9, while HPC and HMPC methods show oscillating behavior that is not practical and not useful as they definitely cause destructive damages to passengers. In all three compared algorithms, the need to maintain the distances between the two vehicles is another significant issue that should be investigated. Figure 10a–c represents the relative and safe distances of the ego and lead vehicles when MADRL is applied. By adjusting the speed with acceleration, the ego vehicle keeps its distance from the lead vehicle in a way not to cross the safe distance, while sustaining a relative distance to be able to follow it properly. The HPC and HMPC algorithms are shown in Figure 10b,c and they follow the same principle formulated in their constraints in the optimal control problem. As it is obvious in Figure 10a, compared to Figure 10b,c, the safe distance diagram benefits from a smooth behavior, while the HPC and HMPC algorithms experience some oscillating behaviors, for example, at times of 750 and 1080 s for HPC and at times of 420, 760, and 1080 s for HMPC. It can be perceived that with path following and cruise control, all the controllers work in the right way to keep a safe distance and track the trajectory of the leading vehicle. Moreover, the root mean square error (RMSE) is also studied for three algorithms for steering and lateral distance, as depicted in Figure 11. It is evident that the lowest RMSE is obtained using our MADRL controller. As a result, it can be readily seen that the MADRL outperforms the two state-of-the-art algorithms considered earlier.

8. Conclusions

This paper proposes a multi-agent deep reinforcement learning-based method to control both speed and steering (cruise control) of unmanned vehicles using DLR agents, in which the agents learn to select the optimum actions to control steering and acceleration. The proposed method has the potential to enhance the safety and efficiency of autonomous vehicles, particularly in challenging environments due to its reduced computation requirements distributed among agents. The study’s findings reveal that the suggested approach surpasses existing state-of-the-art techniques, demonstrating its potential to be applied in real-world situations. To overcome the real-time learning mission, both the DQN and TD3 for the actor and critic sections follow the structure of two hidden layers made up of 50 neurons, with the RELU acting as the activation function. To meet the control requirements, the MADRL technique was used, where one agent is in charge of acceleration and the other is considered to be steering angle. As a result, the following outcomes were obtained: 1-yaw and lateral errors reached approximately zero in less than 4 s, 2-the ego’s velocity reached set point velocity in less than 10 s, while it is intelligent not to pass the safe distance simultaneously, 3-acceleration and steering act in such a way that the smallest amount of energy was acquired. Lastly, the performance of the proposed control method was tested and compared to two state-of-the-art techniques, HPC and HMPC, with the clear outcome that our proposal outperforms the state-of-the-art techniques. A future work would be to consider other challenges and risks related to delays, data loss, and control compromise of AV and propose new mitigation agents to maintain safety.

Author Contributions

Conceptualization, M.A., A.S and M.H.; methodology, M.A., S.S.M. and C.Z and J.B.; software, M.A. and A.S.; validation, A.S., S.S.M., C.Z. and J.B.; formal analysis, M.A., A.S., S.S.M. and C.Z.; investigation, M.A., A.S and C.Z.; resources, all.; data curation, M.A. and M.H.; writing—original draft preparation, all.; writing—review and editing, J.B., S.S.M. and C.Z.; visualization, M.A., J.B.; supervision, C.Z. and J.B.; funding acquisition, J.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lv, M.; Peng, Z.; Wang, D.; Han, Q.-L. Event-Triggered Cooperative Path Following of Autonomous Surface Vehicles Over Wireless Network with Experiment Results. IEEE Trans. Ind. Electron. 2021, 69, 11479–11489. [Google Scholar] [CrossRef]
Jain, R.P.; Aguiar, A.P.; de Sousa, J.B. Cooperative Path Following of Robotic Vehicles Using an Event-Based Control and Communication Strategy. IEEE Robot. Autom. Lett. 2018, 3, 1941–1948. [Google Scholar] [CrossRef]
Li, W.; Xie, Z.; Wong, P.K.; Mei, X.; Zhao, J. Adaptive-Event-Trigger-Based Fuzzy Nonlinear Lateral Dynamic Control for Autonomous Electric Vehicles Under Insecure Communication Networks. IEEE Trans. Ind. Electron. 2020, 68, 2447–2459. [Google Scholar] [CrossRef]
Chen, J.; Shuai, Z.; Zhang, H.; Zhao, W. Path Following Control of Autonomous Four-Wheel-Independent-Drive Electric Vehicles via Second-Order Sliding Mode and Nonlinear Disturbance Observer Techniques. IEEE Trans. Ind. Electron. 2020, 68, 2460–2469. [Google Scholar] [CrossRef]
Zhang, L.; Ding, H.; Guo, K.; Zhang, J.; Pan, W.; Jiang, Z. Cooperative chassis control system of electric vehicles for agility and stability improvements. IET Intell. Transp. Syst. 2018, 13, 134–140. [Google Scholar] [CrossRef]
Lucchini, A.; Formentin, S.; Corno, M.; Piga, D.; Savaresi, S.M. Torque Vectoring for High-Performance Electric Vehicles: An Efficient MPC Calibration. IEEE Control Syst. Lett. 2020, 4, 725–730. [Google Scholar] [CrossRef]
Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
Kanchwala, H.; Viana, I.B.; Ceccoti, M.; Aouf, N. Model predictive tracking controller for a high fidelity vehicle dynamics model. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019. [Google Scholar] [CrossRef]
Mazzilli, V.; De Pinto, S.; Pascali, L.; Contrino, M.; Bottiglione, F.; Mantriota, G.; Gruber, P.; Sorniotti, A. Integrated chassis control: Classification, analysis and future trends. Annu. Rev. Control. 2021, 54, 172–205. [Google Scholar] [CrossRef]
Xiang, S.; Gao, H.; Liu, Z.; Gosselin, C. Dynamic transition trajectory planning of three-DOF cable-suspended parallel robots via linear time-varying MPC. Mech. Mach. Theory 2020, 146, 103715. [Google Scholar] [CrossRef]
Zhang, J.; Feng, T.; Yan, F.; Qiao, S.; Wang, X. Analysis and design on intervehicle distance control of autonomous vehicle platoons. ISA Trans. 2019, 100, 446–453. [Google Scholar] [CrossRef] [PubMed]
Berger, T.; Rauert, A.-L. Funnel cruise control. Automatica 2020, 119, 109061. [Google Scholar] [CrossRef]
Girard, A.; Eqtami, A. Least-violating symbolic controller synthesis for safety, reachability and attractivity specifications. Automatica 2021, 127, 109543. [Google Scholar] [CrossRef]
Guo, N.; Zhang, X.; Zou, Y.; Lenzo, B.; Zhang, T. A Computationally Efficient Path-Following Control Strategy of Auton-omous Electric Vehicles with Yaw Motion Stabilization. IEEE Trans. Transp. Electrif. 2020, 6, 728–739. [Google Scholar] [CrossRef]
Liang, Z.; Zhao, J.; Liu, B.; Wang, Y.; Ding, Z. Velocity-based path following control for autonomous vehicles to avoid ex-ceeding road friction limits using sliding mode method. IEEE Trans. Intell. Transp. Syst. 2020, 23, 1947–1958. [Google Scholar] [CrossRef]
Ni, J.; Hu, J.; Xiang, C. Robust Path Following Control at Driving/Handling Limits of an Autonomous Electric Racecar. IEEE Trans. Veh. Technol. 2019, 68, 5518–5526. [Google Scholar] [CrossRef]
Ao, D.; Huang, W.; Wong, P.K.; Li, J. Robust Backstepping Super-Twisting Sliding Mode Control for Autonomous Vehicle Path Following. IEEE Access 2021, 9, 123165–123177. [Google Scholar] [CrossRef]
Wu, Y.; Liao, S.; Liu, X.; Li, Z.; Lu, R. Deep Reinforcement Learning on Autonomous Driving Policy with Auxiliary Critic Network. IEEE Trans. Neural Networks Learn. Syst. 2021, 10, 1–11. [Google Scholar] [CrossRef]
Martinsen, B.; Lekkas, A.M. Curved path following with deep reinforcement learning: Results from three vessel models. In Proceedings of the OCEANS 2018 MTS/IEEE Charleston, Charleston, SC, USA, 22–25 October 2018. [Google Scholar]
Cui, R.; Yang, C.; Li, Y.; Sharma, S. Adaptive Neural Network Control of AUVs With Control Input Nonlinearities Using Reinforcement Learning. IEEE Trans. Syst. Man, Cybern. Syst. 2017, 47, 1019–1029. [Google Scholar] [CrossRef]
Carlucho, I.; De Paula, M.; Wang, S.; Petillot, Y.; Acosta, G.G. Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning. Robot. Auton. Syst. 2018, 107, 71–86. [Google Scholar] [CrossRef]
Zhang, Q.; Lin, J.; Sha, Q.; He, B.; Li, G. Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle. IEEE Access 2020, 8, 24258–24268. [Google Scholar] [CrossRef]
Hung, S.-M.; Givigi, S.N. A Q-Learning Approach to Flocking with UAVs in a Stochastic Environment. IEEE Trans. Cybern. 2016, 47, 186–197. [Google Scholar] [CrossRef] [PubMed]
Rubi, B.; Morcego, B.; Perez, R. A Deep Reinforcement Learning Approach for Path Following on a Quadrotor. In Proceedings of the 2020 European Control Conference (ECC), St. Petersburg, Russia, 12–15 May 2020. [Google Scholar] [CrossRef]
Hartmann, G.; Shiller, Z.; Azaria, A. Model-Based Reinforcement Learning for Time-Optimal Velocity Control. IEEE Robot. Autom. Lett. 2020, 5, 6185–6192. [Google Scholar] [CrossRef]
Desjardins, C.; Chaib-Draa, B. Cooperative Adaptive Cruise Control: A Reinforcement Learning Approach. IEEE Trans. Intell. Transp. Syst. 2011, 12, 1248–1260. [Google Scholar] [CrossRef]
Wang, P.; Chan, C.-Y.; De La Fortelle, A. A Reinforcement Learning Based Approach for Automated Lane Change Maneuvers. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26 June 26–1 July 2018; pp. 1379–1384. [Google Scholar]
Artunedo, A.; Villagra, J.; Godoy, J. Jerk-Limited Time-Optimal Speed Planning for Arbitrary Paths. IEEE Trans. Intell. Transp. Syst. 2021, 23, 8194–8208. [Google Scholar] [CrossRef]
Liang, Y.; Li, Y.N.; Khajepour, A.; Zheng, L. Holistic Adaptive Multi-Model Predictive Control for the Path Following of 4WID Autonomous Vehicles. IEEE Trans. Veh. Technol. 2020, 70, 69–81. [Google Scholar] [CrossRef]
Zhang, K.; Sprinkle, J.; Sanfelice, R.G. Computationally aware control of autonomous vehicles: A hybrid model predictive control approach. Auton. Robot. 2015, 39, 503–517. [Google Scholar] [CrossRef]

Figure 1. A 2-DoF schematic representation of the vehicle.

Figure 2. Path tracking from cruise control point of view.

Figure 3. Flowchart of the TD3 algorithm with actor-critic architecture.

Figure 4. Flowchart of the DQN algorithm.

Figure 5. Structure of the implemented MADRL AV.

Figure 6. Suggested algorithm training rewards. (a) TD3; (b) DQN.

Figure 7. Control effort on acceleration of the ego vehicles with a comparative perspective between the MADRL, HPC, and HMPC algorithms.

Figure 8. Different ego vehicle velocities in the scenario for the proposed MADRL algorithm and a comparison with HPC and HMPC.

Figure 9. Control effort on steering angle of the ego vehicles with a comparative perspective between the MADRL, HPC, and HMPC algorithms.

Figure 10. Relative and safe distances between the ego and lead vehicles for MADRL, HPC, and HMPC, respectively, shown in (a–c).

Figure 11. Bar chart comparison of different algorithms RMSE indices.

Table 1. Parameters of the TD3 agent.

Parameter	Value	Parameter	Value
TD3 Training Episode Length	500 ts	Critic2 Learning Rate	1 × 10⁻³
Minimum Batch Size	1024	Number of MC cycle	200
Actor Learning Rate	1 × 10⁻⁴	Discount Factor	0.99
Crtitic1 Learning Rate	1 × 10⁻³

Table 2. Parameters of the DQN agent.

Parameter	Value	Parameter	Value
DQN Training Episode Length	500 ts	Number of MC Cycle	500
Minimum Batch Size	1024	Discount Factor	0.99
Learning Rate	1 × 10⁻³

Table 3. Parameters used in the vehicle model.

Parameter	Abbreviation	Value
Vehicle mass	$m$	$1500 kg$
Inertia around z-axis	$I$	$1300 {mNs}^{2}$
Cornering stiffness front wheels	$C_{f}$	$13, 000 N / rad$
Cornering stiffness rear wheels	$C_{r}$	$12, 500 N / rad$
Distance from front wheels to CoG	$l_{f}$	$m$
Distance from rear wheels to CoG	$l_{r}$	$m$
Cross sectional area	$A_{a}$	$10.25 m^{2}$
Roll resistance coefficient	$D_{r}$	1.5 × 10⁻³
Air drag coefficient	$C_{D}$	0.6
Air density	$ρ_{A}$	$1.3 kg / m^{3}$
default spacing between lead and ego cars	$D_{d e f a u l t}$	$m$
time gap for distance maintaining	$t_{g a p}$	1.5 s
set velocity for ego car	$V_{s e t}$	$30 m / s$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Andalibi, M.; Shourangizhaghighi, A.; Hajihosseini, M.; Madani, S.S.; Ziebert, C.; Boudjadar, J. Design and Simulation-Based Optimization of an Intelligent Autonomous Cruise Control System. Computers 2023, 12, 84. https://doi.org/10.3390/computers12040084

AMA Style

Andalibi M, Shourangizhaghighi A, Hajihosseini M, Madani SS, Ziebert C, Boudjadar J. Design and Simulation-Based Optimization of an Intelligent Autonomous Cruise Control System. Computers. 2023; 12(4):84. https://doi.org/10.3390/computers12040084

Chicago/Turabian Style

Andalibi, Milad, Alireza Shourangizhaghighi, Mojtaba Hajihosseini, Seyed Saeed Madani, Carlos Ziebert, and Jalil Boudjadar. 2023. "Design and Simulation-Based Optimization of an Intelligent Autonomous Cruise Control System" Computers 12, no. 4: 84. https://doi.org/10.3390/computers12040084

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design and Simulation-Based Optimization of an Intelligent Autonomous Cruise Control System

Abstract

1. Introduction

2. Two Degree of Freedom Dynamic Model of Ground Autonomous Vehicles

3. Design of MADRL Controller

3.1. Markov Decision Process

3.2. The Twin Delayed DDPG (TD3)

4. MADRL AV TD3 Agent for Acceleration Control

5. Deep-Q-Network (DQN)

6. The MADRL AV DQN Agent for Steering Angle Control

7. Results and Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI