Next Article in Journal
Classification of Firewall Log Data Using Multiclass Machine Learning Models
Next Article in Special Issue
Comparing the Sustainability of Different Powertrains for Urban Use
Previous Article in Journal
A Constant Current Wireless Power Transfer Scheme with Asymmetric Loosely Coupled Transformer for Electric Forklift
Previous Article in Special Issue
Global Sensitivity Analysis of Economic Model Predictive Longitudinal Motion Control of a Battery Electric Vehicle
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Reinforcement Learning-Based Real-Time Joint Optimal Power Split for Battery–Ultracapacitor–Fuel Cell Hybrid Electric Vehicles

Department of Computer Science, Hanyang University, Seoul 04763, Korea
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(12), 1850; https://doi.org/10.3390/electronics11121850
Submission received: 1 May 2022 / Revised: 26 May 2022 / Accepted: 8 June 2022 / Published: 10 June 2022

Abstract

:
Hybrid energy storage systems for hybrid electric vehicles (HEVs) consisting of multiple complementary energy sources are becoming increasingly popular as they reduce the risk of running out of electricity and increase the overall lifetime of the battery. However, designing an efficient power split optimization algorithm for HEVs is a challenging task due to their complex structure. Thus, in this paper, we propose a model that jointly learns the optimal power split for a battery/ultracapacitor/fuel cell HEV. Concerning the mechanical system of the HEV, two propulsion machines with complementary operation characteristics are employed to achieve higher efficiency. Additionally, to train and evaluate the model, standard driving cycles and real driving cycles are employed as input to the mechanical system. Then, given the inputs, a temporal attention long short-term memory model predicts the next time step velocity, and through that velocity, the predicted load power and its corresponding optimal power split is computed by a soft actor–critic deep reinforcement learning model whose training phase is aided by shaped reward functions. In contrast to global optimization techniques, the local velocity and load power prediction without future knowledge of the driving cycle is a step toward real-time optimal energy management. The experimental results show that the proposed method is robust to different initial states of charge values, better allocates the power to the energy sources and thus better manages the state of charge of the battery and the ultracapacitor. Additionally, the use of two motors significantly increases the efficiency of the system, and the prediction step is shown to be a reliable way to plan the HESS power split in advance.

1. Introduction

The basic operating principle of internal combustion engine (ICE) vehicles involves transforming energy from fossil fuels into thermal energy. In this combustion, the gases generated are released into the atmosphere with a negative impact on the environment and human health. In particular, the transportation sector contributed 29% of greenhouse gas emissions (GHG) to total United States GHG emissions in 2019 [1]. Furthermore, there is a growing concern regarding the scarcity of fossil fuels and the need to implement a sustainable economy based on renewable energy sources. The Sustainable Development Goals (SDGs) established in 2015 by the United Nations General Assembly have sustainable energy at its core. The seventh goal specifically emphasizes the need for renewable energy sources [2]. In light of the growing concern of society with environmental issues, the development of battery electric vehicles (BEVs) is a step toward the fulfillment of the seventh SDG.
The engine is the primary distinction between a traditional internal combustion engine (ICE) car and an electric vehicle (EV). The latter is powered by an electric motor that works by transforming chemical energy stored in rechargeable batteries into electrical energy. The generated electricity is then transformed into mechanical and kinetic energy. These new models of vehicles also have multiple advantages when compared to ICE ones. EVs are efficient, do not emit tailpipe pollutants, are quiet and produce little noise pollution. In addition, recharging an EV is cheaper than refueling an ICE car [3,4].
Electric vehicles include battery electric vehicles (BEVs), hybrid electric vehicles(HEVs), plug-in hybrid electric vehicles(PHEVs), and fuel cell hybrid electric vehicles (FCHEVs), depending on the energy source for the vehicle [5,6].
Fuel cells are considered an ideal energy source for electric vehicles because of their high efficiency and zero emission [7]. However, there is very little charging infrastructure for fuel cells compared to charging stations for EVs and gas stations for ICEs. Moreover, the cost of charging a fuel cell is much higher than that of charging an electric vehicle. Because of this shortcoming of fuel cells, FCHEVs require additional energy sources.
Batteries are the most widely used energy source in EVs because they have high energy density [8,9]. However, they have disadvantages such as long charging times, high prices, and high temperature sensitivity. Therefore, recently, solid-state type batteries with better performance and shorter charging time than conventional batteries have been continuously studied [10,11].
Ultracapacitors (UC) have high power density and very long lifetimes and are not affected by temperature; so they are suitable for devices with high peak currents [12]. However, low energy density, the need for voltage balancing and high self-discharging are major drawbacks. One of the most promising solutions proposed for FCHEVs is a hybrid energy storage system (HESS) consisting of three energy sources: a fuel cell (FC), a battery, and an ultracapacitor [13].
Since the HESS structure for EVs is diverse, a different energy management system (EMS) control strategy is required according to the HESS structure [14]. One of the most common HESS structures consists of a battery and a UC [15]. Among other existing studies, there are studies that reviewed a HESS consisting of an FC and a battery [16] and an FC and a UC [17]. As a more complex structure, there have been studies on hybrid energy storage systems that include all three of them: a battery, a UC, and an FC [18].
In addition, there are some studies [19,20] of EMS considering a complex HESS structure in which not only multiple energy storage units but also more than one propulsion machine are connected. To improve power performance and propulsion efficiency, two propulsion machines with complementary torque–speed characteristics can be used in the EV powertrain.
The energy management systems of electric vehicles with hybrid energy storage systems are largely classified into two types: rule-based and optimization-based [21]. Rule-based EMSs are divided into deterministic rule-based EMSs and fuzzy rule-based EMSs according to the characteristics of the rules used. In addition, optimization-based EMSs can be classified into global optimization and real-time optimization EMSs according to the level of information of the driving condition. Global optimization-based methods have the advantage of finding the global optimum of the problem, but they demand previous knowledge of the entire driving cycle, which would be unfeasible in a real-time optimization scenario.
Many previous studies focused on convex optimization techniques [19,20,22,23]. However, convex optimization of complex systems is very difficult due to the excessive number of parameters to consider and the difficulty of linearizing all systems. Therefore, a neural network (NN)-based machine learning algorithm was proposed by [20] to solve the multi-purpose energy problem for a dual-motor battery/FC/UC. The control was performed through two steps. First, the load power was divided between the two complementary motors, and then the load power was split between the battery pack, the FC and the UC in the HESS. For the HESS power split, a convex optimization technique was used to generate optimal target values which became the training data for the NN. When employing convex optimization, the results showed that the battery life was extended by about 5 years in contrast to the NN model which extended the battery life by about 92.5% of 5 years.
Another technique that has been emerging in the area of HEV EMSs is deep reinforcement learning (DRL) [24,25,26,27,28]. They explore different DRL techniques such as deep Q-learning (DQN), soft actor–critic (SAC) and deep deterministic policy gradient (DDPG) for different HEV topologies. Particularly, by applying reinforcement learning using real driving data, energy savings of about 16% compared to the existing binary control strategy were confirmed [28]. The energy efficiency was improved by looking at the existing driving cycle of one’s vehicle and dividing the power output level of the ICE into 24 levels through reinforcement learning, instead of just following the rules set by the EMS.
However, we found only one work that addresses the dual-motor battery/UC/FC HEV EMS control strategy, due to their complex structure [20], which makes use of convex optimization and machine learning techniques, not reinforcement learning. Additionally, even though the authors in [27] develop a reinforcement learning-based model for an FC/battery/UC HEV, they use a deep-Q network model and do not explore the use of multiple motors nor the forecast of the future load power. Thus, in this paper, a real-time deep reinforcement learning (SAC)-based EMS control strategy for a HESS of a vehicle that has three energy sources, battery, FC and UC, and two complementary motors is proposed. First, a method for real-time velocity and load power prediction using SDCs and RDCs is proposed. This prediction step allows the EMS to better plan the use of resources of the HEV and prevent it from running out of electricity. Given the predicted load power, the SAC model is then used to efficiently distribute energy in the HESS. Through the experimental results, it was confirmed that compared to traditional rule-based methods the proposed method can better allocate the energy sources power and thus achieve good FC efficiencies and better manage the SOCs of the battery and the ultracapacitor. The model is also shown to be robust, as it can handle different initial values of SOC while satisfying all the constraints of the system. Moreover, the addition of reward shaping to the training phase of the SAC agent accelerated its convergence. Additionally, it can be seen that the use of two complementary motors leads to a big improvement in the efficiency of the vehicle compared to a single-motor architecture. Finally, the results of the prediction step show that the method is reliable to forecast the future speeds and load power and, therefore, aid the model in allocating the resources of the HEV.

2. Preliminaries

2.1. Overall Procedure for a Reinforcement Learning-Based Energy Management Strategy (EMS)

Figure 1 illustrates the proposed method. The inputs of the neural network are the five past velocity and acceleration values of either standard driving cycles (SDCs) or real driving cycles (RDC). Given those values, the neural network predicts the future velocity and acceleration for one time step ahead, which will be used to compute the load powers of the two electrical motors. The load powers of these motors will be sent into the EMS, which will distribute the load power between the battery, the fuel cell, and the ultracapacitor using deep reinforcement learning (DRL).

2.2. Input Data

For the input, both SDCs and RDCs were used. Particularly, the worldwide harmonized light vehicles test cycles (WLTC) and the urban dynamometer driving schedule (UDDS) were used. Despite their ease of use, they do not represent the reality well enough; so they were mainly used for model validation and performance evaluation purposes. On the other hand, RDCs overcome these limitations because of the higher degree of complexity associated to them. There are personal elements, such as the driver’s behavior, which is hard to model because of its time-varying nature. There are also factors that do not depend on the driver, such as the weather conditions (snowstorm, heavy rain, flooding), traffic signals and traffic conditions (construction and accidents may slow down the traffic). The RDCs data, collected by an On-Board Diagnostics (OBD)-II dongle connected to one of the author’s vehicles, are the same as the data used in Hong et al. [29]. The four RDCs employed in this paper are shown in Figure 2.

2.3. Neural Network for Velocity Forecast

The problem that the neural network is trying to solve can be modeled as a univariate time series; that is, there is only one time-varying variable, which is the velocity of the vehicle. Despite the existing challenges related to time series forecast, it is an important field of study because knowing the future allow us to better plan short-term and long-term goals. Particularly, in our work, we predict the velocity to optimally allocate the available resources so that the energy storage components may achieve maximum efficiency, their lifetime may be maximized and the vehicle may not run out of electricity during the trip.
Based on the previous work of Hong et al. [29], which used the same RDC dataset as this present paper and evaluated different time series and deep learning models for velocity prediction (seasonal auto regressive integrated moving average (SARIMA), recurrent neural network (RNN), gated recurrent unit (GRU) and long short-term memory (LSTM)), we decided to use the attention-based LSTM model with step size T = 5 and hidden state size m = 32 proposed by the authors, as it outperformed the other models that were tested.

2.4. Vehicle System Overview

The vehicle is mainly composed of the HESS and the mechanical components shown in Figure 3. The HESS is composed of three energy sources (battery/ultracapacitor/fuel cell), and the HEV has two complementary propulsion machines.

2.5. Power Split Optimization for the Propulsion Machines

The mechanical model of the vehicle may be computed by Equations (1)–(14), and its parameters are given in Table 1. The acceleration values needed for the equations come from Section 2.3 by differentiating the forecast velocity values. Such acceleration values are then used to compute the total force F t acting on the vehicle, given by (1), consisting of the rolling resistance F r r , the aerodynamic drag F d , the grading resistance F g r and the F l a . Given the total force, the load power and the power loss for each motor may be computed according to the motor efficiencies. Then, combining all the Equations (1)–(14) with the power split optimization procedure proposed by [19], the optimal power split and losses for the two motors may be computed.
In this paper, the motors chosen are the Toyota Camry MG1 [30] and the UQM PowerPhase 125 [31]. Their parameters are shown in Table 2 and their efficiency curves may be seen in Figure 4. Finally, Figure 5 shows the output of the method (load power and losses) for the two motors for the four RDCs shown in Figure 2.
F t = F r r + F a d + F g r + F l a
F r r = C r m g cos ( α )
C r = a 0 + a 1 · v + a 2 · v 2 + a 3 · v 3 + a 4 · v 4 + + a 5 · v 5
F a d = 1 2 ρ A C d v 2
F g r = m g sin ( α )
F l a = δ m a
T t = F t × r w
P t = F t × v
ω w = v r w
τ = r wh · F t G r
ω m = v r wh G r
P Load = τ ω m
P loss , i ( t ) = ω ( t ) · τ i ( t ) · 1 η i ( t ) if τ i ( t ) 0 ω ( t ) · τ i ( t ) · 1 η i ( t ) η i ( t ) if τ i ( t ) 0 , i = { 1 , 2 }   is   the   motor   index
Efficiency ( η ) = lookup 2 d ( Torque ( Nm ) , Speed ( rpm ) )

2.6. Electrical Model of the Hybrid Energy Storage System (HESS)

The battery and the ultracapacitor packs adopted are based on the ones proposed by [19]. They are composed of, respectively, multiple K2 High Capacity Lithium iron phosphate (LiFePO4) 22650P battery cells [32] and multiple Maxwell BCAP1500 ultracapacitors [33]. Their basic models are given, respectively, by Figure 6 and Figure 7; the parameters are given by Table 3 and Table 4, and the equations used to compute the parameters are given by Equations (15)–(22) and (23)–(29).
Finally, the fuel cell (FC) and the DC/DC converter were modelled based on the system proposed in [20]. The DC has a maximum output of 35kW, the parameters for the FC are given in Table 5 and the FC stack’s efficiency and power loss may be computed according to Equations (30) and (31).
Two inverters for two propulsion machines were modelled based on the datasheet [34]. These inverters are insulated gate bipolar transistors (IGBTs) with a based pulse width modulated (PWM) inverter. We assumed that inverter efficiency is 97%.
V T = N s · V c e l l
R n = N s · R ; n = 1 , . . . , 94
R T = R n N p
E c e l l = V c e l l · C c e l l
C T = C c e l l · N p
E T = E c e l l · N p · N s
i b a t = V T V T 2 4 R T 2 R T
S o C b a t = S o C b a t , p r e v + β i b a t d t C T
V T = N s · V c e l l
R T = N s · R
C T = C c e l l N s
E c e l l = 0.5 · C c e l l · V c e l l 2
E T = E c e l l · N s
i U C = V T V T 2 4 R T 2 R T
S o C U C = S o C U C , p r e v + β i U C d t C T
η f c = a · P f c P f c , n o m + b
P l o s s , f c = P f c · ( 1 η f c )
P l o s s , i n v = P i n v · ( 1 η i n v )
P P M = P i n v · η i n v

3. Soft Actor–Critic (SAC) for HESS Power Split Optimization

3.1. Problem Overview

Reinforcement learning (RL) at its core involves an agent trying to learn decision making and control by interacting with an environment. Such interaction happens through actions taken by the agent, which can modify the environment and cause a state transition. As every action has an impact on the environment, each yields rewards to encourage good actions and discourage bad ones. The choice of action for each state depends on the policy of the agent, which maps the states to the probabilities of choosing each action given a specific state.
Therefore, the problem of power split optimization for the presented HESS can be modeled as a deep reinforcement learning problem. As shown in the next sections, the agent of our proposed model must learn the best policy possible to maximize the rewards of the system as a whole. Moreover, as it is not possible to map all the possible state–action pairs of the environment to their respective rewards, we use deep neural networks (DNN) to approximate them. Finally, all the code related to the deep reinforcement learning-based model was implemented using the Python programming language and the TensorFlow’s TF-Agents framework.

3.2. Agent, Environment, Action and State

In our case, we have the HESS, which is the agent inside the HEV, which is the environment. The HEV is supposed to move for a specified amount of time according to a driving cycle while varying velocity, optimizing the fuel cell’s efficiency and minimizing the battery power magnitude and variation. However, initially the HEV does not know how to distribute the load power computed in Section 2.5 between the three different energy storage systems (battery, ultracapacitor and fuel cell). The task of power split is performed by the HESS, which must decide among infinite possible combinations of the optimal battery, ultracapacitor and fuel cell powers. Each of the combinations chosen is an action performed by the agent, which leads to a state transition. Thus, in this paper, the state and the action spaces are given, respectively, by S = { P l o a d ; S O C b a t ; S O C u c } and A = { P b a t , % , P U C , % , P F C , % } . The actions are not absolute power values but continuous values ranging from zero to one, which are transformed into percentages summing up to one through the softmax function. For instance, if P l o a d = 10 kW and { P b a t , % , P U C , % , P F C , % } = {0.8, 0.5, 0.2}, the softmax function outputs approximately { 0.44 , 0.32 , 0.24 } , which would represent in absolute power values { P b a t , P U C , P F C } = {4.4 kW, 3.2 kW, 2.4 kW}.

3.3. Rewards and Penalties

The learning process of the agent is driven by rewards and penalties depending on how good the agent’s actions are. In our case, the agent should ideally satisfy all the following criteria (35)–(41), which are based on [20]. The parameters related to the constraints may be found in Table 6, and the objective function to be optimized is shown in Equation (34), where the coefficients (a) to (f) are positive penalty coefficients.
g = a · P b a t + b · P U C + c · P F C + d · S O C b a t 2 + e · S O C U C 2 + f · η F C 2
P m i n , b a t P b a t ( t ) P m a x , b a t
P m i n , b a t _ d i f f P b a t ( t ) P b a t ( t 1 ) P m a x , b a t _ d i f f
P m i n , D C P D C ( t ) P m a x , D C
P m i n , F C P F C ( t ) P m a x , F C
P m i n , U C P U C ( t ) P m a x , U C
P b a t ( t ) + P F C ( t ) P D C ( t ) = P M 1 ( t )
P U C ( t ) + P D C ( t ) = P M 2 ( t )
Considering the constraints (35)–(41), terminal states were designed. The terminal states and actions would be a S O C b a t , S O C u c , P b a t , P u c and P F C smaller or greater than the minimum or maximum values allowed according to Table 6. Thus, whenever the vehicle reached an illegal state or tried to perform an illegal action, the environment was reset, and the agent received a penalty proportional to the length of the driving cycle. As the length of the driving cycles used were all less than 2000, the agent receives a penalty of 1000, and the environment is reset. On the other hand, if the states and the actions chosen are valid, the vehicle receives a reward according to r = S O C b a t 2 + S O C u c 2 + η f c 2 + g ( x ) , where g ( x ) = 1 d d m a x p is the output of the shaped reward function as explained below.
Reward shaping is a technique that involves changing the structure of a sparse reward function to offer more regular feedback to the agent [35] and thus accelerate the learning process. Figure 8 shows an example of a sparse and a shaped reward function.
In our paper, we designed three different reward functions and one penalty function. Equations (42)–(44) show the reward functions, and Equation (45) represents the penalty function for different values of d and d m a x of the equation g ( x ) = 1 d d m a x p . The value of p was set to 0.4 after performing hyperparameter tuning.
All the four functions were activated depending on the SOC of the ultracapacitor to encourage higher or lower power from the three energy sources. Their thresholds and descriptions are shown in Table 7.
g 1 ( x ) = 1 P m a x , U C P U C P m a x , U C 0.4
g 2 ( x ) = 1 P m a x , F C P F C P m a x , F C 0.4
g 3 ( x ) = 1 P m a x , b a t P b a t P m a x , b a t 0.4
g 4 ( x ) = 1 P m a x , U C P U C P m a x , U C 0.4

3.4. Soft Actor–Critic (SAC)

The essence of solving a reinforcement learning problem lies in optimizing the trade-off between exploration and exploitation. In contrast to supervised learning, in RL there are no labels, and the agent must learn to satisfy all the rules of the environment simply by exploration. During the training stage, the model will solely perform random actions and gradually find an optimal balance between exploration and exploitation, thus the optimal policy. On the other hand, during the testing stage, the model will only perform exploitation. That is, it will act according to the optimal policy learned during the training stage.
The soft actor–critic (SAC) model [36] is an off-policy method that uses DRL to find the optimal balance between exploration and exploitation by maximizing both the reward and the entropy of the system. By maximizing the entropy, the model is encouraged to keep exploring and thus assign similar probabilities to actions with similar action values and not assign excessively large probabilities to a specific set of actions. On the other hand, by maximizing the rewards, the model will strive toward finding the optimal policy. Therefore, given the large action and state spaces of our model (see Figure 5 and Table 6), we believe that the SAC model would be appropriate to learn the ideal power split algorithm for the HESS.
The architecture chosen for our SAC agent is a two-network design without shared features between them: one for the actor and one for the critic. Their parameters after hyperparameter tuning are shown in Table 8.

4. Experimental Results

4.1. SAC Agent Training Phase

The training phase of the SAC agent was performed using the WLTC class 3 data and the parameters shown in Table 8. Figure 9 shows the rewards over 1,400,000 iterations, while Figure 10 represents the average rewards over 10,000 steps. It may be seen that the average rewards converge to the value of about 800.

4.2. SAC Agent Evaluation with SDCs

To evaluate the proposed method, a rule-based power split technique was employed in the HESS for the WLTC class 1 standard driving cycle. The power split rule changed based on the sign of the load power. For positive load powers, ratios of { P b a t , P U C , P F C } = { 40 % P l o a d , 40 % P l o a d , 20 % P l o a d } were used, and in the case of regenerative braking, ratios of { P b a t , P U C , P F C } = { 50 % P l o a d , 50 % P l o a d , 0 % P l o a d } were used. The results are shown in the Figure 11, Figure 12 and Figure 13.
Both techniques have similar results in terms of power split, but it can be seen that the rule-based technique struggles with managing the SOCs and the currents of the ultracapacitor and the battery. The initial SOCs are 0.9, but after the 1023 s of simulation, the final SOCs of the ultracapacitor are about 62.8% and 46.9% for the DRL and rule-based technique, respectively, as shown in Figure 13. The latter value is below the acceptable value because the theoretically minimum allowed SOC for the ultracapacitor is 50%, as shown in Table 6 due to the approximately linear discharge of the ultracapacitor.
Additionally, the SAC model is far more robust than the rule-based one. Figure 14 shows the SOCs of the battery and the UC when their initial values are 0.7. The SAC model is able to find the optimal power split while satisfying the constraints from Table 6, in contrast to the rule-based technique.
Finally, the DRL model also has the advantage of being able to perform well in new data without the need for retraining. It is shown in Section 4.3 and Section 4.4 that the model can obtain good results, even for different data on which it was never trained.

4.3. SAC Agent Evaluation with RDCs

As explained in Section 2.2, four RDCs were employed to evaluate the model. Table 9 shows the results obtained by the proposed method when prediction of the speed and load power is not considered. First, it is interesting to analyze the S O C U C . Two initial values were considered: 55% and 90%. In general, when the initial S O C U C = 55 % , the model could perform the optimal power split while respecting the constraints from Table 6, as the minimum S O C U C did not go below 50%. Additionally, it may be seen that the model focused on recharging the UC through regenerative braking, as the final S O C U C was greater than the initial. On the other hand, when the initial S O C U C = 90 % , the model allocated more power to the UC instead of recharging it, as the final S O C U C of approximately 82% is smaller than the initial S O C U C .
Second, considering that the minimum and maximum efficiencies η F C are, respectively, 40% and 62.1%, the obtained results in the range of 55.8% to 57.3% were satisfactory. We have also plotted the FC powers related to the RDC1 for S O C U C = 55 % and S O C U C = 90 % in Figure 15 to better analyze the FC results. One thing to note is that there was no significant difference between the plots. However, one would expect that the model would allocate more power to the FC for low values of S O C U C to reduce the use of the UC and prevent it from going below the minimum of 50 % . This means that the model could be a little bit further optimized to improve the power split method.
Third, compared to the Toyota Camry MG1-only structure, there is a significant efficiency improvement when two motors are used. The high improvement of 17.6% is achieved because the two motors are complementary.

4.4. Evaluation of the Prediction Method with RDCs

In this section, we evaluate the velocity and load power prediction method using the model described in Section 2.3 and Section 2.5, along with the SAC agent. Figure 16, Figure 17 and Figure 18 show the results obtained. By analyzing the figures, it can be seen that the predicted powers and SOCs are not too far from the actual ones. Additionally, Table 10 was made to gain greater insights into the prediction method. It shows the mean absolute error (MAE) between the predicted and actual values for 10 different parameters: speed, load power, battery and ultracapacitor SOCs, fuel cell efficiency, battery and ultracapacitor currents and battery, ultracapacitor and fuel cell powers. In general, the MAE is good for the speed, SOCs and currents prediction. In the case of the load and energy sources powers, the mean deviation was relatively high (around 5000 W) because the prediction model is highly sensitive to differences in acceleration values. Analogously, the deviation in the fuel cell efficiency is also relatively high (around 11% to 14%) because it is highly sensitive to small changes in the fuel cell power values. For instance, there is a pair of points ( P F C , p r e d i c t e d , P F C , l a b e l ) = ( 945.7 W , 0 ) ( η F C , p r e d i c t e d , η F C , l a b e l ) = ( 61.5 % , 0 ) where the big difference between the predicted and the actual efficiencies can be clearly seen.

5. Conclusions

In this paper, a DRL-based method for real-time joint power split optimization for a battery/UC/FC HEV was proposed. First, the TA-LTSM was responsible for predicting the future velocity, which was converted to required load power through the mechanical model. The load power was then optimally split between the two motors and also split between the battery/UC/FC by the proposed SAC agent, which makes use of shaped reward functions to accelerate the training process. Compared to traditional rule-based techniques, the proposed method is robust to different initial SOC values and is able to satisfy the system constraints. Moreover, the results show that the usage of two complementary motors greatly increase the efficiency of the system. Finally, the average MAEs of the prediction step are reliable; therefore, the method may be used to plan in advance the HESS power split.
In the future, a better model for the HEV, including auxiliary systems modeling (air conditioning, lights, sound, power-assisted seats, windows) could be designed to compute a more precise load power. Additionally, the velocity forecast method can currently predict velocity only for the next time step. A better forecast method would be able to forecast the velocity for more time steps, allowing the model to better allocate its resources.Finally, more emphasis could be given to the optimization of the fuel cell to increase its efficiency and its usage whenever the SOC of the ultracapacitor falls below the minimum operating value.

Author Contributions

Methodology, D.K. and S.H.; project administration, S.H.; software, D.K., S.H. and S.C.; supervision, I.J.; writing–original draft, D.K. and S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported partly by the Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. 2020-0-00107, Development of the technology to automate the recommendations for big data analytic models that define data characteristics and problems), and partly by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(NRF-2019R1I1A1A01058964).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. United States Environmental Protection Agency and Office of Transportation and Air Quality and Standards Division. Fast Facts: U.S. Transportation Sector Greenhouse Gas Emissions, 1990–2019; 2021. Available online: https://www.epa.gov/greenvehicles/fast-facts-transportation-greenhouse-gas-emissions (accessed on 30 April 2022).
  2. United Nations Department of Economic and Social Affairs. Ensure Access to Affordable, Reliable, Sustainable and Modern Energy for All. Available online: https://sdgs.un.org/goals/goal7 (accessed on 1 April 2022).
  3. Zuo, W.; Li, R.; Zhou, Z.; Li, Y.; Xia, J.; Liu, J. Battery-Supercapacitor Hybrid Devices: Recent Progress and Future Prospects. Adv. Sci. 2017, 4, 1600539. [Google Scholar] [CrossRef] [PubMed]
  4. Sanguesa, J.A.; Torres-Sanz, V.; Garrido, P.; Martinez, F.J.; Marquez-Barja, J.M. A Review on Electric Vehicles: Technologies and Challenges. Smart Cities 2021, 4, 372–404. [Google Scholar] [CrossRef]
  5. Li, H.; Ravey, A.; N’Diaye, A.; Djerdir, A. Equivalent consumption minimization strategy for hybrid electric vehicle powered by fuel cell, battery and supercapacitor. In Proceedings of the IECON 2016—42nd Annual Conference of the IEEE Industrial Electronics Society, Florence, Italy, 23–26 October 2016; pp. 4401–4406. [Google Scholar]
  6. Li, H.; Ravey, A.; N’Diaye, A.; Djerdir, A. A novel equivalent consumption minimization strategy for hybrid electric vehicle powered by fuel cell, battery and supercapacitor. J. Power Sources 2018, 395, 262–270. [Google Scholar] [CrossRef]
  7. Zhang, W.; Li, J.; Xu, L.; Ouyang, M. Optimization for a fuel cell/battery/capacity tram with equivalent consumption minimization strategy. Energy Convers. Manag. 2017, 134, 59–69. [Google Scholar] [CrossRef]
  8. Khan, F.; Oh, M.; Kim, J.H. N-functionalized graphene quantum dots: Charge transporting layer for high-rate and durable Li4Ti5O12-based Li-ion battery. Chem. Eng. J. 2019, 369, 1024–1033. [Google Scholar] [CrossRef]
  9. Zhang, M.; Liang, R.; Or, T.; Deng, Y.P.; Yu, A.; Chen, Z. Recent Progress on High-Performance Cathode Materials for Zinc-Ion Batteries. Small Struct. 2021, 2, 2000064. [Google Scholar] [CrossRef]
  10. Yao, P.; Yu, H.; Ding, Z.; Liu, Y.; Lu, J.; Lavorgna, M.; Wu, J.; Liu, X. Review on polymer-based composite electrolytes for lithium batteries. Front. Chem. 2019, 7, 522. [Google Scholar] [CrossRef]
  11. Sun, Y.K. Promising all-solid-state batteries for future electric vehicles. ACS Energy Lett. 2020, 5, 3221–3223. [Google Scholar] [CrossRef]
  12. Khan, F. N and S co-doped graphene enfolded Ni–Co-layered double hydroxides: An excellent electrode material for high-performance energy storage devices. RSC Adv. 2021, 11, 33895–33904. [Google Scholar] [CrossRef]
  13. Kasimalla, V.K.; Velisala, V. A review on energy allocation of fuel cell/battery/ultracapacitor for hybrid electric vehicles. Int. J. Energy Res. 2018, 42, 4263–4283. [Google Scholar] [CrossRef]
  14. Geetha, A.; Subramani, C. A comprehensive review on energy management strategies of hybrid energy storage system for electric vehicles. Int. J. Energy Res. 2017, 41, 1817–1834. [Google Scholar] [CrossRef]
  15. Sellali, M.; Abdeddaim, S.; Betka, A.; Djerdir, A.; Drid, S.; Tiar, M. Fuzzy-Super twisting control implementation of battery/super capacitor for electric vehicles. ISA Trans. 2019, 95, 243–253. [Google Scholar] [CrossRef] [PubMed]
  16. Mokrani, Z.; Rekioua, D.; Mebarki, N.; Rekioua, T.; Bacha, S. Proposed energy management strategy in electric vehicle for recovering power excess produced by fuel cells. Int. J. Hydrogen Energy 2017, 42, 19556–19575. [Google Scholar] [CrossRef]
  17. Majeed, M.A.; Khan, M.G.; Asghar, F. Nonlinear control of hybrid energy storage system for hybrid electric vehicles. Int. Trans. Electr. Energy Syst. 2020, 30, e12268. [Google Scholar] [CrossRef]
  18. Zhu, L.; Han, J.; Peng, D.; Wang, T.; Tang, T.; Charpentier, J.F. Fuzzy logic based energy management strategy for a fuel cell/battery/ultra-capacitor hybrid ship. In Proceedings of the 2014 First International Conference on Green Energy ICGE 2014, Sfax, Tunisia, 25–27 March 2014; pp. 107–112. [Google Scholar]
  19. Yavasoglu, H.A.; Shen, J.; Shi, C.; Gokasan, M.; Khaligh, A. Power Split Control Strategy for an EV Powertrain with Two Propulsion Machines. IEEE Trans. Transp. Electrif. 2015, 1, 382–390. [Google Scholar] [CrossRef]
  20. Yavasoglu, H.A.; Tetik, Y.E.; Ozcan, H.G. Neural network-based energy management of multi-source (battery/UC/FC) powered electric vehicle. Int. J. Energy Res. 2020, 44, 12416–12429. [Google Scholar] [CrossRef]
  21. Zhang, P.; Yan, F.; Du, C. A comprehensive analysis of energy management strategies for hybrid electric vehicles based on bibliometrics. Renew. Sustain. Energy Rev. 2015, 48, 88–104. [Google Scholar] [CrossRef]
  22. Wu, X.; Hu, X.; Yin, X.; Li, L.; Zeng, Z.; Pickert, V. Convex programming energy management and components sizing of a plug-in fuel cell urban logistics vehicle. J. Power Sources 2019, 423, 358–366. [Google Scholar] [CrossRef]
  23. Choi, M.E.; Lee, J.S.; Seo, S.W. Real-Time Optimization for Power Management Systems of a Battery/Supercapacitor Hybrid Energy Storage System in Electric Vehicles. IEEE Trans. Veh. Technol. 2014, 63, 3600–3611. [Google Scholar] [CrossRef]
  24. Li, Y.; He, H.; Peng, J.; Wang, H. Deep reinforcement learning-based energy management for a series hybrid electric vehicle enabled by history cumulative trip information. IEEE Trans. Veh. Technol. 2019, 68, 7416–7430. [Google Scholar] [CrossRef]
  25. Xu, D.; Cui, Y.; Ye, J.; Cha, S.W.; Li, A.; Zheng, C. A soft actor-critic-based energy management strategy for electric vehicles with hybrid energy storage systems. J. Power Sources 2022, 524, 231099. [Google Scholar] [CrossRef]
  26. Xu, B.; Rathod, D.; Zhang, D.; Yebi, A.; Zhang, X.; Li, X.; Filipi, Z. Parametric study on reinforcement learning optimized energy management strategy for a hybrid electric vehicle. Appl. Energy 2020, 259, 114200. [Google Scholar] [CrossRef]
  27. Sun, H.; Fu, Z.; Tao, F.; Zhu, L.; Si, P. Data-driven reinforcement-learning-based hierarchical energy management strategy for fuel cell/battery/ultracapacitor hybrid electric vehicles. J. Power Sources 2020, 455, 227964. [Google Scholar] [CrossRef]
  28. Qi, X.; Luo, Y.; Wu, G.; Boriboonsomsin, K.; Barth, M.J. Deep reinforcement learning-based vehicle energy efficiency autonomous learning system. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1228–1233. [Google Scholar]
  29. Hong, S.; Hwang, H.; Kim, D.; Cui, S.; Joe, I. Real Driving Cycle-Based State of Charge Prediction for EV Batteries Using Deep Learning Methods. Appl. Sci. 2021, 11, 11285. [Google Scholar] [CrossRef]
  30. Burress, T.A.; Coomer, C.L.; Campbell, S.L.; Seiber, L.E.; Marlino, L.D.; Staunton, R.H.; Cunningham, J.P. Evaluation of the 2007 Toyota Camry Hybrid Synergy Drive System; Oak Ridge National Laboratory: Oak Ridge, TN, USA, 2008. [Google Scholar]
  31. UQM. “UQM PowerPhase 125”, Datasheet. Available online: https://wiki.neweagle.net/ProductDocumentation/EV_Software_and_Hardware/Traction_Inverters/UQM/PowerPhase_125_DataSheet.pdf (accessed on 30 April 2022).
  32. K2 Energy. “High Capacity LFP26650P Power Cell Data”, Datasheet. Available online: http://liionbms.com/pdf/k2/LFP26650P.pdf (accessed on 30 April 2022).
  33. Maxwell Technologies. “Datasheet: K2 Series Ultracapacitors”, Datasheet. Available online: https://mp36c.ru/pdf/library/datasheets/SC/K2%20series%20maxwell.pdf (accessed on 30 April 2022).
  34. Rinehart Motion Systems, LLC. “Technical Datasheet of PM100 and PM150 Family Propulsion Inverter”, Datasheet. Available online: http://www.rinehartmotion.com/uploads/5/1/3/0/51309945/pm100-150datasheet_1.pdf (accessed on 30 April 2022).
  35. Wiewiora, E. Reward Shaping. In Encyclopedia of Machine Learning; Sammut, C., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2011. [Google Scholar] [CrossRef]
  36. Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:1801.01290. [Google Scholar]
Figure 1. Procedure of reinforcement learning-based EMS.
Figure 1. Procedure of reinforcement learning-based EMS.
Electronics 11 01850 g001
Figure 2. Speed and acceleration over time of the different time series collected by Hong et al. [29].
Figure 2. Speed and acceleration over time of the different time series collected by Hong et al. [29].
Electronics 11 01850 g002
Figure 3. HESS architecture.
Figure 3. HESS architecture.
Electronics 11 01850 g003
Figure 4. Efficiency maps: (a) Motor 1; (b) Motor 2. Adapted from [30,31].
Figure 4. Efficiency maps: (a) Motor 1; (b) Motor 2. Adapted from [30,31].
Electronics 11 01850 g004
Figure 5. Load power and losses for the two motors and the four RDCs: (a) load power and loss for motor 1 and RDC1; (b) load power and loss for motor 2 and RDC1; (c) load power and loss for motor 1 and RDC2; (d) load power and loss for motor 2 and RDC2; (e) load power and loss for motor 1 and RDC3; (f) load power and loss for motor 1 and RDC3; (g) load power and loss for motor 1 and RDC4; (h) load power and loss for motor 2 and RDC4.
Figure 5. Load power and losses for the two motors and the four RDCs: (a) load power and loss for motor 1 and RDC1; (b) load power and loss for motor 2 and RDC1; (c) load power and loss for motor 1 and RDC2; (d) load power and loss for motor 2 and RDC2; (e) load power and loss for motor 1 and RDC3; (f) load power and loss for motor 1 and RDC3; (g) load power and loss for motor 1 and RDC4; (h) load power and loss for motor 2 and RDC4.
Electronics 11 01850 g005
Figure 6. Basic model of the battery.
Figure 6. Basic model of the battery.
Electronics 11 01850 g006
Figure 7. Basic model of the ultracapacitor.
Figure 7. Basic model of the ultracapacitor.
Electronics 11 01850 g007
Figure 8. Reward shaping: (a) sparse reward function; (b) shaped reward function.
Figure 8. Reward shaping: (a) sparse reward function; (b) shaped reward function.
Electronics 11 01850 g008
Figure 9. Rewards obtained by the SAC agent.
Figure 9. Rewards obtained by the SAC agent.
Electronics 11 01850 g009
Figure 10. Average rewards computed every 10,000 steps.
Figure 10. Average rewards computed every 10,000 steps.
Electronics 11 01850 g010
Figure 11. Battery, ultracapacitor and fuel cell powers by rule-based optimization.
Figure 11. Battery, ultracapacitor and fuel cell powers by rule-based optimization.
Electronics 11 01850 g011
Figure 12. Battery, UC and FC powers by DRL.
Figure 12. Battery, UC and FC powers by DRL.
Electronics 11 01850 g012
Figure 13. Comparison of SOCs obtained by rule-based and DRL technique when the initial SOCs are 0.9.
Figure 13. Comparison of SOCs obtained by rule-based and DRL technique when the initial SOCs are 0.9.
Electronics 11 01850 g013
Figure 14. Comparison of SOCs obtained by rule-based and DRL techniques when the initial SOCs are 0.7.
Figure 14. Comparison of SOCs obtained by rule-based and DRL techniques when the initial SOCs are 0.7.
Electronics 11 01850 g014
Figure 15. FC powers for different initial S O C U C for the RDC1.
Figure 15. FC powers for different initial S O C U C for the RDC1.
Electronics 11 01850 g015
Figure 16. Predicted battery, ultracapacitor and fuel cell powers by DRL.
Figure 16. Predicted battery, ultracapacitor and fuel cell powers by DRL.
Electronics 11 01850 g016
Figure 17. Actual battery, ultracapacitor and fuel cell powers by DRL.
Figure 17. Actual battery, ultracapacitor and fuel cell powers by DRL.
Electronics 11 01850 g017
Figure 18. Comparison of the predicted and actual SOCs obtained by DRL when the initial SOCs are 0.9.
Figure 18. Comparison of the predicted and actual SOCs obtained by DRL when the initial SOCs are 0.9.
Electronics 11 01850 g018
Table 1. Mechanical model parameters.
Table 1. Mechanical model parameters.
ParametersDescriptionValues
mMass of the vehicle (kg)1650
r w h Wheel radius (m)0.33602
gGravity acceleration (m/s 2 )9.81
ρ Air density1.2
AFront area of vehicle (m 2 )2.12344
α Angle of driving surface (rad)0
a 0 Rolling resistance coefficient 8.8 · 10 3
a 1 Rolling resistance coefficient 6.42 · 10 5
a 2 Rolling resistance coefficient 9.27 · 10 6
a 3 Rolling resistance coefficient 3.3 · 10 7
a 4 Rolling resistance coefficient 6.68 · 10 11
a 5 Rolling resistance coefficient 4.46 · 10 11
C d Aerodynamic drag coefficient0.24
G r Gearbox ratio5.5
Table 2. Parameters of the motors.
Table 2. Parameters of the motors.
MotorParametersValues
Toyota Camry MG1Motor peak power rating (kW)105
Motor peak torque rating (Nm)270
Top rotational speed (rpm)14,000
Motor mass (kg)41.7
UQM PowerPhase 125Motor peak power rating (kW)125
Motor peak torque rating (Nm)300
Top rotational speed (rpm)8000
Motor mass (kg)41
Table 3. Battery parameters.
Table 3. Battery parameters.
ParametersDescriptionValues
RInternal resistance9 m Ω
N s Number of serial cells94
N p Number of parallel branches36
V T Battery voltage (V)300
R n , n = 1 , 2 , , N s Equivalent resistance of serial cells846 m Ω
R T Total equivalent resistance23.5 m Ω
V c e l l Cell voltage (V)3.2
E c e l l Maximum energy stored in a cell (Wh)8.32
C c e l l Capacity of a cell (Ah)2.6
E T Total energy stored (kWh)28.2
C T Total capacity (Ah)93.6
S O C b a t SOC of the battery-
i b a t Current of the battery-
Table 4. Ultracapacitor parameters.
Table 4. Ultracapacitor parameters.
ParametersDescriptionValues
RInternal resistance0.47 m Ω
N s Number of serial cells222
V T Battery voltage (V)600
R T Total equivalent resistance103.34 m Ω
C c e l l Capacitance of a cell (F)1500
V c e l l Cell voltage (V)2.7
E c e l l Maximum energy stored in a cell (Wh)1.52
E T Total energy stored (kWh)0.338
C T Total capacitance (F)6.76
S O C U C SOC of the battery-
i U C Current of the battery-
Table 5. FC and DC/DC converter parameters.
Table 5. FC and DC/DC converter parameters.
ParametersDescriptionValues
P n o m , f c FC nominal power8 kW
a 1 Slope (if P f c P f c , n o m 0.15 )2
b 1 Intercept (if P f c P f c , n o m 0.15 )0.4
a 2 Slope (if P f c P f c , n o m 0.15 )−0.040625
b 2 Intercept (if P f c P f c , n o m 0.15 )0.620625
η f c FC efficiencyEquation (30)
P f c , l o s s Loss in the FCEquation (31)
η i n v Inverter efficiency0.97
P i n v , l o s s Loss in the inverterEquation (32)
Table 6. Optimization constraints parameters.
Table 6. Optimization constraints parameters.
ParametersDescriptionValues
P m i n , b a t Minimum battery power−30 kW
P m a x , b a t Maximum battery power30 kW
S O C m i n , b a t Minimum battery SOC0.2
S O C m a x , b a t Maximum battery SOC0.9
P m i n , b a t _ d i f f Minimum battery power difference−9 kW
P m a x , b a t _ d i f f Maximum battery power difference9 kW
P m i n , D C Minimum DC/DC converter power−35 kW
P m a x , D C Maximum DC/DC converter power35 kW
P m i n , F C Minimum FC power0
P m a x , F C Maximum FC power8 kW
P m i n , U C Minimum ultracapacitor power−30 kW
P m a x , U C Maximum ultracapacitor power30 kW
S O C m i n , U C Minimum ultracapacitor SOC0.5
S O C m a x , U C Maximum ultracapacitor power1
Table 7. Shaped reward functions.
Table 7. Shaped reward functions.
FunctionThresholdDescription
g 1 ( x ) Activated when S O C U C > 0.65 or ( S O C U C > 0.65 and P l o a d < 0 )Encourages high UC power or regenerative power
g 2 ( x ) Activated when S O C U C < 0.65 Encourages high FC power
g 3 ( x ) Activated when S O C U C < 0.65 Encourages high battery power
g 4 ( x ) Activated when S O C U C < 0.65 Encourages low UC power
Table 8. SAC agent parameters.
Table 8. SAC agent parameters.
ParametersValues
Activation functionReLU
Discount factor0.999
Learning rate 3 × 10 4
Batch size2048
Number of hidden layers2
Number of hidden units per layer256
Target smoothing coefficient0.005
Target update interval1
Gradient steps1
Replay buffer size1,000,000
Table 9. Model results without prediction step.
Table 9. Model results without prediction step.
RDC SOC UC SOC bat Mean η FC Motors Efficiency
InitialFinalMinMaxInitialFinalMinMaxM1 OnlyM1 + M2Improvement
155%61.2%50.7%61.2%55%54.5%54.5%55%55.8%69.6%79.8%14.7%
90%81.5%77.3%90%90%89.7%89.6%90%56%
255%62.7%50.4%62.7%55%54.5%54.4%55%56.2%73.8%80.5%9.1%
90%82%75.2%90%90%89.7%89.6%90%56.4%
355%62.5%50.2%62.9%55%54.4%54.3%55%57.3%68.3%80.3%17.6%
90%82%75.3%90%90%89.6%89.5%90%57.3%
455%61.3%50%61.5%55%54.5%54.4%55%55.9%70%81.3%16.1%
90%81.5%75.6%90%90%89.7%89.6%90%56.2%
Table 10. Model results with prediction step.
Table 10. Model results with prediction step.
RDCMAE
Speed (m/s) P load ( W ) SOC bat SOC UC η FC i bat ( A ) i UC ( A ) P bat ( W ) P UC ( W ) P FC ( W )
10.2614708.90.03%0.65%11.3%0.0070.112778.91618.7458.1
20.2735818.70.02%0.68%13.1%0.0070.133476.62012.5553.9
30.2935634.50.01%0.61%11.4%0.0060.113456.51861.9480.5
40.2815641.10.02%0.32%14.3%0.0030.143318.31912.1522
Average0.2775450.80.02%0.56%12.5%0.0060.133257.61851.3503.7
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kim, D.; Hong, S.; Cui, S.; Joe, I. Deep Reinforcement Learning-Based Real-Time Joint Optimal Power Split for Battery–Ultracapacitor–Fuel Cell Hybrid Electric Vehicles. Electronics 2022, 11, 1850. https://doi.org/10.3390/electronics11121850

AMA Style

Kim D, Hong S, Cui S, Joe I. Deep Reinforcement Learning-Based Real-Time Joint Optimal Power Split for Battery–Ultracapacitor–Fuel Cell Hybrid Electric Vehicles. Electronics. 2022; 11(12):1850. https://doi.org/10.3390/electronics11121850

Chicago/Turabian Style

Kim, Daniel, Seokjoon Hong, Shengmin Cui, and Inwhee Joe. 2022. "Deep Reinforcement Learning-Based Real-Time Joint Optimal Power Split for Battery–Ultracapacitor–Fuel Cell Hybrid Electric Vehicles" Electronics 11, no. 12: 1850. https://doi.org/10.3390/electronics11121850

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop