# Deep Reinforcement Learning-Based Torque Vectoring Control Considering Economy and Safety

^{*}

## Abstract

**:**

## 1. Introduction

- Unlike the reference [18], this paper proposes a TVC method that takes into account both economy and safety. Specifically, the torque allocation layer based on deep RL adaptively adjusts the torque of each wheel according to the current vehicle state.
- An improved heuristic randomized ensembled double Q-learning (REDQ) algorithm is introduced for EV control, which reduces the training complexity of RL compared to existing RL algorithms for direct motor torque control.

## 2. The TVC Framework and System Model

#### 2.1. The TVC Framework

#### 2.2. Tire Model Identification

_{z}is the vertical load of the tire and ${\lambda}_{ij}$ and ${\alpha}_{ij}$ tire slip angle or tire longitudinal slip ratio, respectively. $ij\in \left\{fl,fr,rl,rr\right\}$. ${B}_{x}=BC{D}_{x}/({C}_{x}\xb7{D}_{x})$, ${C}_{x}={b}_{0}$, ${D}_{x}={b}_{1}{F}_{z}^{2}+{b}_{2}{F}_{z}$, $BC{D}_{x}=\left({b}_{3}{F}_{z}^{2}+{b}_{4}F\right){e}^{-{b}_{5}{F}_{z}}$, ${E}_{x}={b}_{6}{F}_{z}^{2}+{b}_{7}{F}_{z}+{b}_{8}$, ${B}_{y}=BC{D}_{y}/({C}_{y}\xb7{D}_{y})$, ${C}_{y}={a}_{0}$, ${D}_{y}={a}_{1}{F}_{z}^{2}+{a}_{2}{F}_{z}$, $BC{D}_{y}={a}_{3}\mathrm{sin}(2\mathrm{arctan}({F}_{z}/{a}_{4}))$, ${E}_{y}={a}_{6}{F}_{z}+{a}_{7}$.

_{i}is the Fibonacci series, and the general formula is as follows:

_{b}are generated according to the best node of the current node B

_{i}

_{1}and the current node B

_{ij}:

_{a}, V

_{b}, and the current node B

_{ij}are sorted according to fitness, and the best F

_{(i+1)}nodes are retained as the next generation nodes B

_{(i+1)j}:

Algorithm 1. FTO algorithm |

1. Set the depth of Fibonacci tree N and the number of identification parameters n; |

2. Randomly generate an initial node B_{11}: randomly generate a global random node N_{1}; |

3. Repeat: |

4. Generate F_{i} global trial nodes W_{1}~W_{Fi} according to the global random node N_{i} and the node B_{ij}; |

5. Generate F_{i−1} local trial nodes V_{1}~V_{Fi−1} according to the best adaptable element B_{i}_{1} in the current node and the remaining nodes; |

6. Get the next generation node B_{i+}_{1j}; |

7. Update the node set. Incorporate the newly generated trial nodes into the current node set S, calculate the fitness function and sort, and retain the first F_{i+}_{1} nodes; |

8. Until F_{i+1} ≥ F_{N} |

9. Output the optimal node; |

_{xi}

^{*}, F

_{yi},

^{*}λ

_{i}, and α

_{i}are the experimental data of longitudinal force, lateral force, slip rate, and side slip angle under different vertical loads, respectively. N

_{x}= 29 and N

_{y}=20 represent the number of tests.

#### 2.3. Vehicle Reference Model

_{f}and C

_{r}are the front and rear tire cornering stiffness, respectively. a and b are the distance from the front and rear wheel axles to the CG, respectively. ${\delta}_{f}$ is the steering angle of the front wheels, and ${I}_{z}$ is the yaw mass moment of inertia.

_{d}is limited by the following equation:

#### 2.4. Vehicle 7-DOF Dynamic Model

_{g}is the height of the center of gravity.

## 3. The TVC Algorithm

#### 3.1. Active Safety Control Layer

#### 3.2. Torque Allocation Layer

#### 3.2.1. Average Allocation Method

#### 3.2.2. RL-Based Torque Allocation Algorithm

_{1}and B

_{2}are the parameters associated with the adhesion coefficient. Their corresponding values are specified in reference [28]. This paper aims to improve the economy of EVs while ensuring safety, so the reward function $R$ is expressed as the following four parts:

Algorithm 2. Heuristic REDQ algorithm |

1. Initialize an ensemble of Q-networks ${Q}_{\theta}({s}_{t},{a}_{t})$ with parameters ${\theta}_{i}$, Set target parameters |

2. Initialize the target Q-networks ${\tilde{Q}}_{\tilde{\theta}}({s}_{t},{a}_{t})$ with parameters ${\tilde{\theta}}_{i}\leftarrow {\theta}_{i}$ |

3. Initialize the replay buffer |

4. For each step t do: |

5. Randomly sample an action ${a}_{t}$ from the set of action strategies $\left\{{a}_{OP},{a}_{rule}\right\}$ with distribution $\left\{1-{P}_{rule},{P}_{rule}\right\}$ |

6. Execute the action ${a}_{t}$ and observe the next state ${s}_{t+1}$, reward ${r}_{t}$ |

7. Store the experience tuple $\left\{{s}_{t},{a}_{t},{r}_{t},{s}_{t+1}\right\}$ in the replay buffer |

8. for G updates do |

9. Sample a mini-batch experiences $\left\{\left({s}_{t},{a}_{t},{r}_{t},{s}_{t+1}\right)\right\}$ from replay buffer |

10. Randomly select m numbers from the set $\left\{1,2,\cdots ,N\right\}$ as a set ${\mathcal{I}}_{M}$ |

11. Based on (42) compute the Q-value estimates ${y}^{REDQ}$ |

12. for
$i=1,2,\cdots ,N$
do |

13. Based on (43), update the parameters ${\theta}_{i}$ using gradient descent method |

14. Based on (44), Update each target Q-network ${\tilde{Q}}_{\tilde{\theta}}\left({s}_{t},{a}_{t}\right)$ |

15. end for |

16. end for |

17. end for |

18. Return the learned Q-network ensemble. |

## 4. Evaluation Indicators and Simulation Results

#### 4.1. Simulation Environment

- RLES. The torque allocation algorithm proposed in this paper. The active safety control layer is a nonlinear MPC controller, and the lower controller is based on a heuristic REDQ deep RL algorithm which integrates considering economy and safety.
- MPC-CO. The torque allocation algorithm proposed in reference [26] which integrates considering economy and safety, where the lower controller is a quadratic planning algorithm.
- LQR-EQ. The active safety control layer is the LQR controller in reference [31], and the torque allocation layer is a common average allocation method in Section 3.2.1. This controller considers vehicle safety only.
- w/o control. There is no additional vehicle lateral control; steering is controlled by the driver.

#### 4.2. Performance Indicators

- Handling stability

- 2.
- Driver workload

- 3.
- Motor load

- 4.
- Additional yaw moment

- 5.
- Velocity tracking

#### 4.3. Training Performance

#### 4.4. DLC Maneuver on Slippery Road

#### 4.5. DLC Maneuver on Joint Road

#### 4.6. Step Steering Maneuver

#### 4.7. Driving Cycles

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Wu, J.; Zhang, J.; Nie, B.; Liu, Y.; He, X. Adaptive Control of PMSM Servo System for Steering-by-Wire System With Disturbances Observation. IEEE Trans. Transp. Electrif.
**2022**, 8, 2015–2028. [Google Scholar] [CrossRef] - Wu, J.; Kong, Q.; Yang, K.; Liu, Y.; Cao, D.; Li, Z. Research on the Steering Torque Control for Intelligent Vehicles Co-Driving With the Penalty Factor of Human–Machine Intervention. IEEE Trans. Syst. Man Cybern. Syst.
**2023**, 53, 59–70. [Google Scholar] [CrossRef] - Lei, F.; Bai, Y.; Zhu, W.; Liu, J. A novel approach for electric powertrain optimization considering vehicle power performance, energy consumption and ride comfort. Energy
**2019**, 167, 1040–1050. [Google Scholar] [CrossRef] - Karki, A.; Phuyal, S.; Tuladhar, D.; Basnet, S.; Shrestha, B.P. Status of Pure Electric Vehicle Power Train Technology and Future Prospects. Appl. Syst. Innov.
**2020**, 3, 35. [Google Scholar] [CrossRef] - Dalboni, M.; Tavernini, D.; Montanaro, U.; Soldati, A.; Concari, C.; Dhaens, M.; Sorniotti, A. Nonlinear Model Predictive Control for Integrated Energy-Efficient Torque-Vectoring and Anti-Roll Moment Distribution. IEEE/ASME Trans. Mechatron.
**2021**, 26, 1212–1224. [Google Scholar] [CrossRef] - Chatzikomis, C.; Zanchetta, M.; Gruber, P.; Sorniotti, A.; Modic, B.; Motaln, T.; Blagotinsek, L.; Gotovac, G. An energy-efficient torque-vectoring algorithm for electric vehicles with multiple motors. Mech. Syst. Sig. Process.
**2019**, 128, 655–673. [Google Scholar] [CrossRef] - Xu, W.; Chen, H.; Zhao, H.; Ren, B. Torque optimization control for electric vehicles with four in-wheel motors equipped with regenerative braking system. Mechatronics
**2019**, 57, 95–108. [Google Scholar] [CrossRef] - Hu, X.; Wang, P.; Hu, Y.; Chen, H. A stability-guaranteed and energy-conserving torque distribution strategy for electric vehicles under extreme conditions. Appl. Energy
**2020**, 259, 114162. [Google Scholar] [CrossRef] - Ding, S.H.; Liu, L.; Zheng, W.X. Sliding Mode Direct Yaw-Moment Control Design for In-Wheel Electric Vehicles. IEEE Trans. Ind. Electron.
**2017**, 64, 6752–6762. [Google Scholar] [CrossRef] - Zhao, B.; Xu, N.; Chen, H.; Guo, K.; Huang, Y. Stability control of electric vehicles with in-wheel motors by considering tire slip energy. Mech. Syst. Sig. Process.
**2019**, 118, 340–359. [Google Scholar] [CrossRef] - Zhang, L.; Chen, H.; Huang, Y.; Wang, P.; Guo, K. Human-Centered Torque Vectoring Control for Distributed Drive Electric Vehicle Considering Driving Characteristics. IEEE Trans. Veh. Technol.
**2021**, 70, 7386–7399. [Google Scholar] [CrossRef] - Li, Q.; Zhang, J.; Li, L.; Wang, X.; Zhang, B.; Ping, X. Coordination Control of Maneuverability and Stability for Four-Wheel-Independent-Drive EV Considering Tire Sideslip. IEEE Trans. Transp. Electrif.
**2022**, 8, 3111–3126. [Google Scholar] [CrossRef] - Deng, H.; Zhao, Y.; Nguyen, A.T.; Huang, C. Fault-Tolerant Predictive Control With Deep-Reinforcement-Learning-Based Torque Distribution for Four In-Wheel Motor Drive Electric Vehicles. IEEE/ASME Trans. Mechatron.
**2023**, early access. [Google Scholar] [CrossRef] - Aradi, S. Survey of Deep Reinforcement Learning for Motion Planning of Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst.
**2020**, 23, 740–759. [Google Scholar] [CrossRef] - Zhu, Y.; Wang, Z.; Chen, C.; Dong, D. Rule-Based Reinforcement Learning for Efficient Robot Navigation With Space Reduction. IEEE/ASME Trans. Mechatron.
**2022**, 27, 846–857. [Google Scholar] [CrossRef] - Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Sallab, A.A.A.; Yogamani, S.; Pérez, P. Deep Reinforcement Learning for Autonomous Driving: A Survey. IEEE Trans. Intell. Transp. Syst.
**2022**, 23, 4909–4926. [Google Scholar] [CrossRef] - Wei, H.; Zhang, N.; Liang, J.; Ai, Q.; Zhao, W.; Huang, T.; Zhang, Y. Deep reinforcement learning based direct torque control strategy for distributed drive electric vehicles considering active safety and energy saving performance. Energy
**2022**, 238, 121725. [Google Scholar] [CrossRef] - Peng, H.; Wang, W.; Xiang, C.; Li, L.; Wang, X. Torque Coordinated Control of Four In-Wheel Motor Independent-Drive Vehicles With Consideration of the Safety and Economy. IEEE Trans. Veh. Technol.
**2019**, 68, 9604–9618. [Google Scholar] [CrossRef] - Cabrera, J.A.; Ortiz, A.; Carabias, E.; Simon, A. An Alternative Method to Determine the Magic Tyre Model Parameters Using Genetic Algorithms. Veh. Syst. Dyn.
**2004**, 41, 109–127. [Google Scholar] [CrossRef] - Alagappan, A.; Rao, K.V.N.; Kumar, R.K. A comparison of various algorithms to extract Magic Formula tyre model coefficients for vehicle dynamics simulations. Veh. Syst. Dyn.
**2015**, 53, 154–178. [Google Scholar] [CrossRef] - Hu, C.; Wang, R.R.; Yan, F.J.; Chen, N. Should the Desired Heading in Path Following of Autonomous Vehicles be the Tangent Direction of the Desired Path? IEEE Trans. Intell. Transp. Syst.
**2015**, 16, 3084–3094. [Google Scholar] [CrossRef] - Ji, X.; He, X.; Lv, C.; Liu, Y.; Wu, J. A vehicle stability control strategy with adaptive neural network sliding mode theory based on system uncertainty approximation. Veh. Syst. Dyn.
**2018**, 56, 923–946. [Google Scholar] [CrossRef][Green Version] - Zhang, H.; Liang, J.; Jiang, H.; Cai, Y.; Xu, X. Stability Research of Distributed Drive Electric Vehicle by Adaptive Direct Yaw Moment Control. IEEE Access
**2019**, 7, 106225–106237. [Google Scholar] [CrossRef] - Houska, B.; Ferreau, H.J.; Diehl, M. An auto-generated real-time iteration algorithm for nonlinear MPC in the microsecond range. Automatica
**2011**, 47, 2279–2285. [Google Scholar] [CrossRef] - Wang, J.; Luo, Z.; Wang, Y.; Yang, B.; Assadian, F. Coordination Control of Differential Drive Assist Steering and Vehicle Stability Control for Four-Wheel-Independent-Drive EV. IEEE Trans. Veh. Technol.
**2018**, 67, 11453–11467. [Google Scholar] [CrossRef] - Deng, H.; Zhao, Y.; Feng, S.; Wang, Q.; Zhang, C.; Lin, F. Torque vectoring algorithm based on mechanical elastic electric wheels with consideration of the stability and economy. Energy
**2021**, 219, 119643. [Google Scholar] [CrossRef] - Wu, X.; Zhou, B.; Wen, G.; Long, L.; Cui, Q. Intervention criterion and control research for active front steering with consideration of road adhesion. Veh. Syst. Dyn.
**2018**, 56, 553–578. [Google Scholar] [CrossRef] - Zhai, L.; Sun, T.M.; Wang, J. Electronic Stability Control Based on Motor Driving and Braking Torque Distribution for a Four In-Wheel Motor Drive Electric Vehicle. IEEE Trans. Veh. Technol.
**2016**, 65, 4726–4739. [Google Scholar] [CrossRef] - Chen, X.; Wang, C.; Zhou, Z.; Ross, K. Randomized Ensembled Double Q-Learning: Learning Fast Without a Model. arXiv
**2021**, arXiv:2101.05982. [Google Scholar] - Parra, A.; Tavernini, D.; Gruber, P.; Sorniotti, A.; Zubizarreta, A.; Perez, J. On Nonlinear Model Predictive Control for Energy-Efficient Torque-Vectoring. IEEE Trans. Veh. Technol.
**2021**, 70, 173–188. [Google Scholar] [CrossRef] - Mirzaei, M. A new strategy for minimum usage of external yaw moment in vehicle dynamic control system. Transp. Res. Part C Emerg. Technol.
**2010**, 18, 213–224. [Google Scholar] [CrossRef]

**Figure 2.**Tire model identification results. (

**a**) Longitudinal force recognition results. (

**b**) Lateral force recognition results.

**Figure 6.**Simulation results on slippery road. (

**a**) Vehicle displacement; (

**b**) yaw rate; (

**c**) side slip angle.

**Figure 7.**Simulation results on slippery road. (

**a**) Phase trajectory portrait; (

**b**) motor power consumption.

**Figure 9.**Simulation results under joint road. (

**a**) Vehicle displacement; (

**b**) yaw rate; (

**c**) side slip angle.

**Figure 10.**Simulation results under joint road. (

**a**) Phase trajectory portrait; (

**b**) longitudinal velocity.

**Figure 11.**Simulation results under joint road. (

**a**) Stability index ${\epsilon}_{s}$; (

**b**) motor power consumption.

**Figure 12.**Simulation results under step steering maneuver. (

**a**) Vehicle displacement; (

**b**) yaw rate; (

**c**) side slip angle.

**Figure 13.**Simulation results under step steering maneuver. (

**a**) Longitudinal velocity; (

**b**) motor power consumption.

**Figure 15.**Simulation results under driving cycles. (

**a**) Motor efficiency; (

**b**) motor power consumption.

Item | 10 kN | 15 kN | 20 kN |
---|---|---|---|

FTO relative residual of longitudinal force | 1.40% | 1.37% | 1.40% |

GA relative residual of longitudinal force | 4.61% | 2.21% | 1.73% |

PSO relative residual of longitudinal force | 3.44% | 1.74% | 1.66% |

FTO relative residual of lateral force | 1.48% | 1.37% | 1.39% |

GA relative residual of lateral force | 1.78% | 1.63% | 1.48% |

PSO relative residual of lateral force | 1.65% | 1.69% | 1.56% |

Controller | 𝜺_{s} | 𝜺_{driver} | 𝜺_{motor} | 𝜺_{Mz} | 𝜺_{v} | |||
---|---|---|---|---|---|---|---|---|

RLES | 0.4085 | −96% | 16.28 | −81% | 20,820 | −93% | 735,200 | 1.1 × 10^{−5} |

MPC CO | 0.5640 | −94% | 19.58 | −77% | 23,430 | −92% | 887,200 | 1.5 × 10^{−5} |

LQR EQ | 3.6532 | −61% | 32.54 | −62% | 29,680 | −90% | 892,000 | 3.5 × 10^{−4} |

w/o control | 9.260 | - | 85.46 | - | 306,300 | - | 0 | 2.9 × 10^{2} |

Controller | 𝜺_{s} | 𝜺_{driver} | 𝜺_{motor} | 𝜺_{Mz} | 𝜺_{v} | |||
---|---|---|---|---|---|---|---|---|

RLES | 0.5574 | −94% | 16.28 | −81% | 20,820 | −93% | 735,200 | 0.88 |

MPC CO | 0.6368 | −93% | 19.58 | −77% | 23,430 | −92% | 887,200 | 1.42 |

LQR EQ | 4.2190 | −57% | 32.54 | −62% | 29,680 | −90% | 892,000 | 2.47 |

w/o control | 9.707 | - | 85.46 | - | 306,300 | - | 0 | 36.78 |

Controller | 𝜺_{s} | 𝜺_{driver} | 𝜺_{motor} | 𝜺_{Mz} | 𝜺_{v} | ||
---|---|---|---|---|---|---|---|

RLES | 0.0183 | −62% | 42.46 | −1.7% | 9930 | 1 × 10^{7} | 0.01895 |

MPC CO | 0.0193 | −60% | 42.46 | −1.7% | 9873 | 1 × 10^{7} | 0.0273 |

LQR EQ | 0.0274 | −43% | 43.21 | 0.1% | 4166 | 170,100 | 0.04486 |

w/o control | 0.0483 | - | 43.18 | - | 132.7 | 0 | 2.202 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Deng, H.; Zhao, Y.; Lin, F.; Wang, Q.
Deep Reinforcement Learning-Based Torque Vectoring Control Considering Economy and Safety. *Machines* **2023**, *11*, 459.
https://doi.org/10.3390/machines11040459

**AMA Style**

Deng H, Zhao Y, Lin F, Wang Q.
Deep Reinforcement Learning-Based Torque Vectoring Control Considering Economy and Safety. *Machines*. 2023; 11(4):459.
https://doi.org/10.3390/machines11040459

**Chicago/Turabian Style**

Deng, Huifan, Youqun Zhao, Fen Lin, and Qiuwei Wang.
2023. "Deep Reinforcement Learning-Based Torque Vectoring Control Considering Economy and Safety" *Machines* 11, no. 4: 459.
https://doi.org/10.3390/machines11040459