# Energy-Efficient Driving for Adaptive Traffic Signal Control Environment via Explainable Reinforcement Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- (1)
- A general MDP for eco-driving control is established, whereas the reward function of the system considers energy consumption, traffic mobility, and driving comfort;
- (2)
- Unlike operating in a fixed-timing traffic light environment, the system can be dedicated to a DRL-based ATSC environment, while the proposed state sharing mechanism allows the ego vehicle to take the state of the traffic light agent as part of the state itself;
- (3)
- Whereas conventional optimization-based methods suffer from computation complexity, a model-free DRL algorithm is adopted to generate the speed profile of the vehicle in real-time, and the SHAP approach is used to explain and visualize the decision of the ego vehicle, which is of help to reveal the black-box decision process of the DRL agent.

## 2. Preliminary

#### 2.1. Deep Reinforcement Learning

#### 2.2. Adaptive Traffic Signal Control

- (1)
- State: The first part of the state for a traffic light is represented by the queue length ratio in each lane. We assume that the length of homogeneous vehicles is denoted as ${l}_{v}$, and the minimal gap between vehicles is $\varpi $. Then the time-dependent queue length ratio of traffic light $j$ can be defined as:

- (2)
- Action: The action in this paper is defined as $a=1$: switch to the next phase, and $a=0$: maintain the current phase. The action is carried out with the interval of $25$ s, while the specific definition of the signal phases is given in Figure 2. It should be noted that a 3 s yellow phase will be inserted if the phase changes.

- (3)
- Reward: The reward of the traffic signal agents is defined as the summation of the waiting time of vehicles on the lanes set ${I}_{j}$. The definition aims to reduce the average waiting time of vehicles to improve traffic efficiency.

## 3. Problem Formulation

- (1)
- The ego vehicle knows its position on the lane, speed, and acceleration;
- (2)
- The data packets can be transmitted by the traffic light to the ego vehicle through V2I communication without delay or packet loss;
- (3)
- The acceleration/deceleration of the ego vehicle can be controlled accurately with the onboard control module;
- (4)
- The ego vehicle will not change the lane, which means that only longitudinal movement is considered.

## 4. Methodology

#### 4.1. MDP Model

- (1)
- State: The ego vehicle selects action in terms of the state, so the definition of the state is crucial. The state of this problem should include its dynamics, the state of its surrounding vehicles, and the state of the traffic lights. Therefore, a vector with multi-elements is built to serve as the state of the ego vehicle:

- (2)
- Action: The action is consistent with the control item in the optimal control problem, that is, the acceleration of the ego vehicle. The action space should satisfy Equation (10) to generate an effective value. Meanwhile, the speed of the agent should also satisfy $\left(9\right)$, which means that the speed change of the vehicle is denoted as:$$v\left(t\right)=max\left(min\left({v}_{max},v\left(t-1\right)+{\widehat{a}}_{t}\right),\text{}0\right),$$$${a}_{t}=min\left({\widehat{a}}_{t},\text{}{a}^{CF}\left(t\right)\right),$$

- (3)
- Reward: The definition of the reward function should be similar to the cost function $\left(6\right)$ to get the same control effect. However, the travel time item in $\left(6\right)$ is a Meyer item, which can only be calculated at the end of the process, in other words, the value can be obtained only if the ego vehicle reaches the stop line. In our RL framework, it is difficult to allocate the credit to every step because the contribution to the travel time of each step is hard to be measured. Thus, we take the travel distance in each step to replace that value, which is denoted as ${l}^{d}\left(t\right)$. Besides, we add jerk, which is defined as the differential of the acceleration, to the reward function with the intent to improve the driving comfort. The reward function is denoted as:$${r}^{v}\left(t\right)=-{\omega}_{1}\varphi \left(v\left(t\right),a\left(t\right)\right)+{\omega}_{2}{l}^{d}\left(t\right)-{\omega}_{3}\dot{a}\left(t\right)-\delta \frac{1}{\u03f5}\text{},$$

#### 4.2. Proximal Policy Optimization

Algorithm 1 PPO for eco-driving training | |

1 | Initialize policy parameters ${\theta}^{p}$ $\mathrm{and}\text{}\mathrm{value}\text{}\mathrm{function}\text{}\mathrm{parameters}\text{}{\theta}^{v}$ with random values |

2 | $\mathit{f}\mathit{o}\mathit{r}$ k $=1,2,3\dots k$ $\mathit{d}\mathit{o}$ |

3 | Carry out the preloading process and insert the ego vehicle into the road network |

4 | Control the vehicle according to policy $\pi \left({\theta}^{p}\right)$ and collect the set of trajectories ${\mathcal{D}}_{k}=\left\{{\eta}_{i}\right\}$ |

5 | Compute reward-to-go value ${G}_{t}$ |

6 | Compute advantage estimates ${\widehat{A}}_{t}$ (optimized by GAE) based on the current value function ${V}_{{\theta}_{v}}$. |

7 | Update the policy by maximizing the objective via Adam optimizer: ${\theta}_{k+1}^{p}=arg\underset{{\theta}^{p}}{max}\frac{1}{\left|{\mathcal{D}}_{k}\right|T}{\displaystyle {\displaystyle \sum}_{{\mathcal{D}}_{k}}}{\displaystyle {\displaystyle \sum}_{t=0}^{T}}min\left(\frac{{\pi}_{{\theta}^{p}}\left({a}_{t}|{s}_{t}\right)}{{\pi}_{{\theta}_{k}^{p}}\left({a}_{t}|{s}_{t}\right)}{A}^{{\pi}_{{\theta}_{k}^{p}}}\left({s}_{t},{a}_{t}\right),g\left(\epsilon ,{A}^{{\pi}_{{\theta}_{k}^{p}}}\left({s}_{t},{a}_{t}\right)\right)\right)$ |

8 | Fit value function by regression to minimize mean-squared error via Adam optimizer: ${\theta}_{k+1}^{v}=arg\underset{{\theta}^{v}}{min}\frac{1}{\left|{\mathcal{D}}_{k}\right|T}{\displaystyle {\displaystyle \sum}_{{\mathcal{D}}_{k}}}{\displaystyle {\displaystyle \sum}_{t=0}^{T}}{\left({V}_{{\theta}^{v}}\left({s}_{t}\right)-{R}_{t}\right)}^{2}$ |

9 | $\mathit{e}\mathit{n}\mathit{d}\mathit{f}\mathit{o}\mathit{r}$ |

#### 4.3. SHapley Additive exPlanations

## 5. Simulation Analysis

#### 5.1. Simulation Configuration

#### 5.2. Simulation and Discussion

#### 5.3. Sensitivity Analysis

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Appendix A

Description | Value |

Multi-step returns | $3$ |

Distributional atoms | $51$ |

Distributional min/max values | $\left[-50,\text{}50\right]$ |

Initial value of noisy net | $0.5$ |

Prioritization exponent | $0.5$ |

Prioritization importance sampling | $0.4\to 1.0$ |

Learning rate of the Adam optimizer | $0.001$ |

## References

- Jiang, X.; Zhang, J.; Li, Q.-Y.; Chen, T.-Y. A Multiobjective Cooperative Driving Framework Based on Evolutionary Algorithm and Multitask Learning. J. Adv. Transp.
**2022**, 2022, 6653598. [Google Scholar] [CrossRef] - Lakshmanan, V.K.; Sciarretta, A.; Ganaoui-Mourlan, O.E. Cooperative Eco-Driving of Electric Vehicle Platoons for Energy Efficiency and String Stability. IFAC-PapersOnLine
**2021**, 54, 133–139. [Google Scholar] [CrossRef] - Siniša, H.; Ivan, F.; Tino, B. Evaluation of Eco-Driving Using Smart Mobile Devices. Promet-Traffic Transp.
**2015**, 27, 335–344. [Google Scholar] [CrossRef] [Green Version] - Bakibillah, A.S.M.; Kamal, M.A.S.; Tan, C.P.; Hayakawa, T.; Imura, J.I. Event-Driven Stochastic Eco-Driving Strategy at Signalized Intersections from Self-Driving Data. IEEE Trans. Veh. Technol.
**2019**, 68, 8557–8569. [Google Scholar] [CrossRef] - Kamal, M.A.S.; Mukai, M.; Murata, J.; Kawabe, T. Ecological Vehicle Control on Roads with Up-Down Slopes. IEEE Trans. Intell. Transp. Syst.
**2011**, 12, 783–794. [Google Scholar] [CrossRef] - Lee, H.; Kim, N.; Cha, S.W. Model-Based Reinforcement Learning for Eco-Driving Control of Electric Vehicles. IEEE Access
**2020**, 8, 202886–202896. [Google Scholar] [CrossRef] - Zhang, R.; Yao, E.J. Eco-driving at signalised intersections for electric vehicles. IET Intell. Transp. Syst.
**2015**, 9, 488–497. [Google Scholar] [CrossRef] - Li, M.; Wu, X.K.; He, X.Z.; Yu, G.Z.; Wang, Y.P. An eco-driving system for electric vehicles with signal control under V2X environment. Transp. Res. Part C-Emerg. Technol.
**2018**, 93, 335–350. [Google Scholar] [CrossRef] - Wu, X.; He, X.; Yu, G.; Harmandayan, A.; Wang, Y. Energy-Optimal Speed Control for Electric Vehicles on Signalized Arterials. IEEE Trans. Intell. Transp. Syst.
**2015**, 16, 2786–2796. [Google Scholar] [CrossRef] - Zheng, Y.; Ran, B.; Qu, X.; Zhang, J.; Lin, Y. Cooperative Lane Changing Strategies to Improve Traffic Operation and Safety Nearby Freeway Off-Ramps in a Connected and Automated Vehicles Environment. IEEE Trans. Intell. Transp. Syst.
**2020**, 21, 4605–4614. [Google Scholar] [CrossRef] - Mintsis, E.; Vlahogianni, E.I.; Mitsakis, E. Dynamic Eco-Driving near Signalized Intersections: Systematic Review and Future Research Directions. J. Transp. Eng. Part A-Syst.
**2020**, 146, 15. [Google Scholar] [CrossRef] - Nie, Z.; Farzaneh, H. Real-time dynamic predictive cruise control for enhancing eco-driving of electric vehicles, considering traffic constraints and signal phase and timing (SPaT) information, using artificial-neural-network-based energy consumption model. Energy
**2022**, 241, 122888. [Google Scholar] [CrossRef] - Dong, H.; Zhuang, W.; Chen, B.; Lu, Y.; Liu, S.; Xu, L.; Pi, D.; Yin, G. Predictive energy-efficient driving strategy design of connected electric vehicle among multiple signalized intersections. Transp. Res. Part C Emerg. Technol.
**2022**, 137, 103595. [Google Scholar] [CrossRef] - Liu, B.; Sun, C.; Wang, B.; Liang, W.; Ren, Q.; Li, J.; Sun, F. Bi-level convex optimization of eco-driving for connected Fuel Cell Hybrid Electric Vehicles through signalized intersections. Energy
**2022**, 252, 123956. [Google Scholar] [CrossRef] - Asadi, B.; Vahidi, A. Predictive Cruise Control: Utilizing Upcoming Traffic Signal Information for Improving Fuel Economy and Reducing Trip Time. IEEE Trans. Control Syst. Technol.
**2011**, 19, 707–714. [Google Scholar] [CrossRef] - Lin, Q.; Li, S.E.; Xu, S.; Du, X.; Yang, D.; Li, K. Eco-Driving Operation of Connected Vehicle with V2I Communication Among Multiple Signalized Intersections. IEEE Intell. Transp. Syst. Mag.
**2021**, 13, 107–119. [Google Scholar] [CrossRef] - Wang, Z.; Wu, G.; Barth, M.J. Cooperative Eco-Driving at Signalized Intersections in a Partially Connected and Automated Vehicle Environment. IEEE Trans. Intell. Transp. Syst.
**2020**, 21, 2029–2038. [Google Scholar] [CrossRef] [Green Version] - Mousa, S.R.; Ishak, S.; Mousa, R.M.; Codjoe, J. Developing an Eco-Driving Application for Semi-Actuated Signalized Intersections and Modeling the Market Penetration Rates of Eco-Driving. Transp. Res. Record
**2019**, 2673, 466–477. [Google Scholar] [CrossRef] - Yang, H.; Almutairi, F.; Rakha, H. Eco-Driving at Signalized Intersections: A Multiple Signal Optimization Approach. IEEE Trans. Intell. Transp. Syst.
**2021**, 22, 2943–2955. [Google Scholar] [CrossRef] [Green Version] - Dong, H.; Zhuang, W.; Chen, B.; Yin, G.; Wang, Y. Enhanced Eco-Approach Control of Connected Electric Vehicles at Signalized Intersection with Queue Discharge Prediction. IEEE Trans. Veh. Technol.
**2021**, 70, 5457–5469. [Google Scholar] [CrossRef] - Ma, F.W.; Yang, Y.; Wang, J.W.; Li, X.C.; Wu, G.P.; Zhao, Y.; Wu, L.; Aksun-Guvenc, B.; Guvenc, L. Eco-driving-based cooperative adaptive cruise control of connected vehicles platoon at signalized intersections. Transport. Res. Part D-Transport. Environ.
**2021**, 92, 17. [Google Scholar] [CrossRef] - Zhao, W.M.; Ngoduy, D.; Shepherd, S.; Liu, R.H.; Papageorgiou, M. A platoon based cooperative eco-driving model for mixed automated and human-driven vehicles at a signalised intersection. Transp. Res. Part C-Emerg. Technol.
**2018**, 95, 802–821. [Google Scholar] [CrossRef] [Green Version] - Zhao, X.M.; Wu, X.; Xin, Q.; Sun, K.; Yu, S.W. Dynamic Eco-Driving on Signalized Arterial Corridors during the Green Phase for the Connected Vehicles. J. Adv. Transp.
**2020**, 2020, 11. [Google Scholar] [CrossRef] - Rakha, H.; Kamalanathsharma, R.K. Eco-driving at signalized intersections using V2I communication. In Proceedings of the 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), Washington, DC, USA, 5–7 October 2011; pp. 341–346. [Google Scholar]
- Mahler, G.; Vahidi, A. Reducing idling at red lights based on probabilistic prediction of traffic signal timings. In Proceedings of the 2012 American Control Conference (ACC), Montreal, QC, Canada, 27–29 June 2012; pp. 6557–6562. [Google Scholar]
- Sun, C.; Shen, X.; Moura, S. Robust Optimal ECO-driving Control with Uncertain Traffic Signal Timing. In Proceedings of the 2018 Annual American Control Conference (ACC), Milwaukee, WI, USA, 27–29 June 2018; pp. 5548–5553. [Google Scholar]
- El-Tantawy, S.; Abdulhai, B.; Abdelgawad, H. Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto. IEEE Trans. Intell. Transp. Syst.
**2013**, 14, 1140–1150. [Google Scholar] [CrossRef] - Li, L.; Lv, Y.S.; Wang, F.Y. Traffic Signal Timing via Deep Reinforcement Learning. IEEE-CAA J. Autom. Sin.
**2016**, 3, 247–254. [Google Scholar] [CrossRef] - Rasheed, F.; Yau, K.L.A.; Low, Y.C. Deep reinforcement learning for traffic signal control under disturbances: A case study on Sunway city, Malaysia. Futur. Gener. Comp. Syst.
**2020**, 109, 431–445. [Google Scholar] [CrossRef] - Liu, Y.; He, J. A survey of the application of reinforcement learning in urban traffic signal control methods. Sci. Technol. Rev.
**2019**, 37, 84–90. [Google Scholar] - Shi, J.Q.; Qiao, F.X.; Li, Q.; Yu, L.; Hu, Y.J. Application and Evaluation of the Reinforcement Learning Approach to Eco-Driving at Intersections under Infrastructure-to-Vehicle Communications. Transp. Res. Record
**2018**, 2672, 89–98. [Google Scholar] [CrossRef] - Mousa, S.R.; Ishak, S.; Mousa, R.M.; Codjoe, J.; Elhenawy, M. Deep reinforcement learning agent with varying actions strategy for solving the eco-approach and departure problem at signalized intersections. Transp. Res. Record
**2020**, 2674, 119–131. [Google Scholar] [CrossRef] - Guo, Q.Q.; Angah, O.; Liu, Z.J.; Ban, X.G. Hybrid deep reinforcement learning based eco-driving for low-level connected and automated vehicles along signalized corridors. Transp. Res. Part C-Emerg. Technol.
**2021**, 124, 20. [Google Scholar] [CrossRef] - Wegener, M.; Koch, L.; Eisenbarth, M.; Andert, J. Automated eco-driving in urban scenarios using deep reinforcement learning. Transp. Res. Pt. C-Emerg. Technol.
**2021**, 126, 15. [Google Scholar] [CrossRef] - Zhang, X.; Jiang, X.; Li, N.; Yang, Z.; Xiong, Z.; Zhang, J. Eco-driving for Intelligent Electric Vehicles at Signalized Intersection: A Proximal Policy Optimization Approach. In Proceedings of the ISCTT 2021, 6th International Conference on Information Science, Computer Technology and Transportation, Xishuangbanna, China, 26–28 November 2021; pp. 1–7. [Google Scholar]
- Zhang, J.; Jiang, X.; Cui, S.; Yang, C.; Ran, B. Navigating Electric Vehicles Along a Signalized Corridor via Reinforcement Learning: Toward Adaptive Eco-Driving Control. Transp. Res. Record
**2022**, 03611981221084683. [Google Scholar] [CrossRef] - Ouyang, Q.; Wang, Z.; Liu, K.; Xu, G.; Li, Y. Optimal Charging Control for Lithium-Ion Battery Packs: A Distributed Average Tracking Approach. IEEE Trans. Ind. Inform.
**2020**, 16, 3430–3438. [Google Scholar] [CrossRef] - Liu, K.; Li, K.; Zhang, C. Constrained generalized predictive control of battery charging process based on a coupled thermoelectric model. J. Power Sources
**2017**, 347, 145–158. [Google Scholar] [CrossRef] [Green Version] - Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4768–4777. [Google Scholar]
- Lopez, P.A.; Behrisch, M.; Bieker-Walz, L.; Erdmann, J.; Flötteröd, Y.; Hilbrich, R.; Lücken, L.; Rummel, J.; Wagner, P.; Wiessner, E. Microscopic Traffic Simulation using SUMO. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2575–2582. [Google Scholar]
- Wang, Y.Z.; Yang, X.G.; Liang, H.L.; Liu, Y.D. A Review of the Self-Adaptive Traffic Signal Control System Based on Future Traffic Environment. J. Adv. Transp.
**2018**, 12, 1096123. [Google Scholar] [CrossRef] - Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; Silver, D. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Kurczveil, T.; López, P.Á.; Schnieder, E. Implementation of an Energy Model and a Charging Infrastructure in SUMO. In Proceedings of the Simulation of Urban MObility User Conference, Berlin, Germany, 15–17 May 2013; pp. 33–43. [Google Scholar]
- Kesting, A.; Treiber, M.; Helbing, D. Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity. Philos. Trans. Royal Soc. A
**2010**, 368, 4585–4605. [Google Scholar] [CrossRef] [Green Version] - Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv
**2017**, arXiv:1707.06347. [Google Scholar] - Ying, C.S.; Chow, A.H.F.; Wang, Y.H.; Chin, K.S. Adaptive Metro Service Schedule and Train Composition with a Proximal Policy Optimization Approach Based on Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst.
**2021**, 1–12. [Google Scholar] [CrossRef] - Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Liu, K.; Peng, Q.; Li, K.; Chen, T. Data-Based Interpretable Modeling for Property Forecasting and Sensitivity Analysis of Li-ion Battery Electrode. Automot. Innov.
**2022**, 5, 121–133. [Google Scholar] [CrossRef] - He, L.; Aouf, N.; Song, B. Explainable Deep Reinforcement Learning for UAV autonomous path planning. Aerosp. Sci. Technol.
**2021**, 107052. [Google Scholar] [CrossRef] - Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Lundberg, S.M.; Nair, B.; Vavilala, M.S.; Horibe, M.; Eisses, M.J.; Adams, T.; Liston, D.E.; Low, D.K.-W.; Newman, S.-F.; Kim, J.; et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng.
**2018**, 2, 749–760. [Google Scholar] [CrossRef] [PubMed]

**Figure 7.**The change of reward, energy consumption, and travel time during the training processes of PPO and RAINBOW.

**Figure 8.**The evaluation results for PPO, RAINBOW, and IDM. (

**a**) Travel time; (

**b**) Energy consumption; (

**c**) Mean jerk.

Symbol | Description | Value |
---|---|---|

${v}_{max}$ | Road speed limit | $13.89\text{}\mathrm{m}/\mathrm{s}$ |

${a}_{max}$ | Maximal acceleration of vehicles | $3.0\text{}\mathrm{m}/{\mathrm{s}}^{2}$ |

${d}_{max}$ | Maximal deceleration of vehicles | $-4.5\text{}\mathrm{m}/{\mathrm{s}}^{2}$ |

$\Delta {v}_{d}$ | Default value for speed difference | $-14\text{}\mathrm{m}/\mathrm{s}$ |

$\Delta {a}_{d}$ | Default value for acceleration difference | $-7.2\text{}\mathrm{m}/{\mathrm{s}}^{2}$ |

$\u03f5$ | The value corresponds to the penalty item in Equation (18) | $0.01$ |

$\chi $ | The GAE parameter | $1.0$ |

$-$ | Learning rate of the Adam optimizer | $0.0001$ |

${\epsilon}_{1}$ | Clip parameter for policy | $0.3$ |

${\epsilon}_{2}$ | Clip parameter for value function | $10.0$ |

${\mathit{\omega}}_{\mathbf{1}}$ | ${\mathit{\omega}}_{\mathbf{2}}$ | ${\mathit{\omega}}_{\mathbf{3}}$ | Energy Consumption (Wh) | Travel Time (s) | Mean Jerk (m/s^{3}) |
---|---|---|---|---|---|

1 | 1 | 4 | 77.47 | 650.10 | 0.54 |

1 | 1 | 7 | 72.32 | 644.10 | 0.56 |

1 | 1 | 10 | 80.59 | 567.30 | 0.47 |

1 | 2 | 4 | 72.56 | 623.05 | 0.53 |

1 | 2 | 7 | 76.70 | 593.80 | 0.73 |

1 | 2 | 10 | 81.18 | 583.15 | 0.68 |

1 | 3 | 4 | 67.43 | 554.30 | 0.66 |

1 | 3 | 7 | 66.90 | 536.65 | 0.50 |

1 | 3 | 10 | 76.64 | 440.50 | 0.60 |

3 | 1 | 4 | 75.83 | 607.30 | 0.62 |

3 | 1 | 7 | 75.43 | 650.10 | 0.54 |

3 | 1 | 10 | 77.36 | 695.20 | 0.66 |

3 | 2 | 4 | 71.13 | 631.85 | 0.58 |

3 | 2 | 7 | 74.73 | 673.65 | 0.76 |

3 | 2 | 10 | 72.75 | 625.95 | 0.64 |

3 | 3 | 4 | 74.57 | 530.10 | 0.56 |

3 | 3 | 7 | 68.50 | 545.30 | 0.57 |

3 | 3 | 10 | 72.76 | 573.85 | 0.61 |

5 | 1 | 4 | 69.43 | 651.95 | 0.47 |

5 | 1 | 7 | 69.67 | 667.70 | 0.61 |

5 | 1 | 10 | 63.87 | 609.95 | 0.63 |

5 | 2 | 4 | 70.96 | 533.25 | 0.53 |

5 | 2 | 7 | 69.58 | 536.65 | 0.50 |

5 | 2 | 10 | 68.56 | 554.75 | 0.37 |

5 | 3 | 4 | 66.47 | 595.10 | 0.40 |

5 | 3 | 7 | 66.05 | 563.95 | 0.36 |

5 | 3 | 10 | 72.76 | 476.65 | 0.61 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Jiang, X.; Zhang, J.; Wang, B.
Energy-Efficient Driving for Adaptive Traffic Signal Control Environment via Explainable Reinforcement Learning. *Appl. Sci.* **2022**, *12*, 5380.
https://doi.org/10.3390/app12115380

**AMA Style**

Jiang X, Zhang J, Wang B.
Energy-Efficient Driving for Adaptive Traffic Signal Control Environment via Explainable Reinforcement Learning. *Applied Sciences*. 2022; 12(11):5380.
https://doi.org/10.3390/app12115380

**Chicago/Turabian Style**

Jiang, Xia, Jian Zhang, and Bo Wang.
2022. "Energy-Efficient Driving for Adaptive Traffic Signal Control Environment via Explainable Reinforcement Learning" *Applied Sciences* 12, no. 11: 5380.
https://doi.org/10.3390/app12115380