Electric Vehicle Lithium-Ion Battery Fault Diagnosis Based on Multi-Method Fusion of Big Data

Wang, Zhifu; Luo, Wei; Xu, Song; Yan, Yuan; Huang, Limin; Wang, Jingkai; Hao, Wenmei; Yang, Zhongyi

doi:10.3390/su15021120

Open AccessArticle

Electric Vehicle Lithium-Ion Battery Fault Diagnosis Based on Multi-Method Fusion of Big Data

by

Zhifu Wang

^1,2,†,

Wei Luo

^2,†

,

Song Xu

¹,

Yuan Yan

¹,

Limin Huang

^3,*,

Jingkai Wang

¹,

Wenmei Hao

¹ and

Zhongyi Yang

²

¹

School of Mechanical and Vehicle Engineering, Beijing Institute of Technology, Beijing 100081, China

²

School of Automation, Guangxi University of Science and Technology, Liuzhou 545000, China

³

School of Mechanical Engineering, ChengDu University, Chengdu 610106, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work and should be considered co-first authors.

Sustainability 2023, 15(2), 1120; https://doi.org/10.3390/su15021120

Submission received: 5 December 2022 / Revised: 26 December 2022 / Accepted: 4 January 2023 / Published: 6 January 2023

(This article belongs to the Special Issue Emerging Research in Intelligent New Energy Vehicles)

Download

Browse Figures

Versions Notes

Abstract

:

Power batteries are the core of electric vehicles, but minor faults can easily cause accidents; therefore, fault diagnosis of the batteries is very important. In order to improve the practicality of battery fault diagnosis methods, a fault diagnosis method for lithium-ion batteries in electric vehicles based on multi-method fusion of big data is proposed. Firstly, the anomalies are removed and early fault analysis is performed by t-distribution random neighborhood embedding (t-Sne) and wavelet transform denoising. Then, different features of the vehicle that have a large influence on the battery fault are identified by factor analysis, and the faulty features are extracted by a two-way long and short-term memory network method with convolutional neural network. Finally a self-learning Bayesian network is used to diagnose the battery fault. The results show that the method can improve the accuracy of fault diagnosis by about 12% when verified with data from different vehicles, and after comparing with other methods, the method not only has higher fault diagnosis accuracy, but also reduces the response time of fault diagnosis, and shows superiority compared to graded faults, which is more in line with the practical application of engineering.

Keywords:

lithium-ion battery; electric vehicle; real-world vehicle data; fault diagnosis; data-driven; machine learning

1. Introduction

In the context of the global energy transformation and the proposal of “dual carbon”, lithium-ion batteries play a crucial role in the development of vehicle mileage and global energy storage technology. Lithium-ion batteries are widely used in intelligent devices and electric vehicles due to their high specific energy density, strong endurance and long life [1]. However, globally, electric vehicle accidents caused by battery faults is threatening people’s lives and property. During the operation of electric vehicles, small failures in the battery are not easily detected, and continued operation increases the risk of generating more heat, which may trigger thermal runaway [2]. It is difficult to change the model and structure of the lithium-ion battery in a short time to improve safety; therefore, using real vehicle data to give a timely warning of lithium-ion battery failure is necessary.

The battery has nonlinear characteristics and a complex internal structure. When a fault occurs, the internal parameters referred to are limited, such as current, voltage, state of charge (SOC), temperature, etc., which makes battery fault diagnosis more difficult [3]. The current methods for power battery fault diagnosis are mainly divided into knowledge-based, model-based and data-driven methods [4,5,6,7]. Knowledge based diagnosis needs experience and a rule base for qualitative diagnosis, which are difficult to establish. This makes it difficult to apply this type of diagnosis to most fault situations. Model based fault diagnosis is performed by comparing the parameters obtained from the established accurate model with the set threshold.

For example, Xiong et al. [8] used the residual generated by the difference in SOC to compare with the set threshold value in order to judge the fault diagnosis of the battery pack sensor. Pan et al. [9,10] used an observer to diagnose the fault of the battery input current sensor based on the battery thermoelectric coupling dynamic model that was established, and used the residual evaluation function of the norm for evaluation. Schmid et al. [11] found the sensor group with the best fault isolation characteristics through structural analysis of the thermoelectric model of the battery, and carried out fault diagnosis by calculating the minimum structural super stator (MSO). Wang et al. [12] obtained a transition probability correction function based on the n-th variance of the model probability, and improved the accuracy of battery fault diagnosis by introducing the jump threshold of the model to achieve rapid model conversion. Dey et al. [13] performed battery thermal fault diagnosis by identifying the temperature difference between two batteries through the established partial differential equation model. Model-based fault diagnosis needs to establish different fault models for different faults, and it is not easy to apply to engineering practice due to its cumbersome calculation and poor robustness. However, fault diagnosis based on laboratory fault data can be flexibly applied to different fault types; for example, Yang et al. [14], Zhao et al. [15] used the failure life data to establish a long-term and short-term memory neural network model and set a threshold for fault diagnosis. Zhao et al. [16] used data driven models and multi-scale system statistics to predict battery failure through health status. Xue et al. [17] used K-means clustering algorithm and 3-screening strategy to detect abnormal battery cells. However, the method based on laboratory data does not fully consider the actual complex conditions and lacks integration with engineering applications. In order to improve the accuracy of battery fault diagnosis in real vehicle environment, this paper proposes a fault diagnosis method based on real vehicle data driven multi method fusion. First, the data of China National Monitoring and Management Center for New Energy Vehicles is preliminarily analyzed by means of statistics, the t-distribution random neighborhood embedding (t-Sne) is used for early fault diagnosis, and wavelet transform is used to process the data, so that the points with large deviation will not affect the results of feature extraction. Then, the factor analysis method is used to analyze the fault influence of battery features, The bidirectional short- and long-term memory network method of convolutional neural network is used to extract the features with great influence on the fault. Finally, self-learning Bayesian network is used to diagnose the fault. Compared with traditional methods, this method accurately extracts the features, reduces the diagnosis time, and improves the robustness.

The fault diagnosis method based on the big data of real vehicles avoids the tedious calculation process of battery model modeling and has a broad application prospect when applied to electric vehicles. The proposed method enables the battery to reach an accuracy of about 92% in graded fault diagnosis, which is more reliable, accurate and fast compared with other methods. It opens up a new direction for solving the fault diagnosis problem of nonlinear coupled objects.

2. Fault Data Processing and Feature Extraction of Lithium Ion Battery

The lithium-ion battery fault diagnosis scheme designed in this paper is shown in Figure 1. The fault diagnosis scheme is mainly divided into four parts, which are data pre-processing, feature selection, feature extraction, and safety warning and fault diagnosis. The data pre-processing mainly identifies the early faults and processes the acquired data for denoising, slicing and reconstructing.

2.1. Monitoring Platform and Data Analysis

The safety of the power battery is one of the important factors limiting the promotion of electric vehicle market. The National Monitoring and Management Center of New Energy Vehicles can identify the cause of battery failure in real time through the display of an alarm flag, and remind users and businesses of the operating conditions of the vehicle battery, and whether it is necessary to replace the battery or suspend the use of the vehicle in case of serious failure. This platform is connected to many electric vehicle enterprises. This paper selects 10 types of vehicle data for analysis. The battery pack of each vehicle is composed of 95 single batteries. For a certain type of vehicle in half of the month of June, the change characteristics of parameters, such as vehicle speed, total voltage, total current, mileage and SOC value, are analyzed. The typical parameter change curve of the half month is shown in Figure 2. Due to the huge amount of data, half a month’s data are selected for rough analysis. In order to ensure the visual effect, the data are reduced to one day. It can be seen from Figure 2 that with the increase in vehicle mileage, the SOC range of vehicles decreases. Since the vehicle speed is controlled by the driver’s behavior, it is not easy to determine the change trend of the vehicle speed, resulting in severe fluctuations in the voltage and current. However, the change trend of the voltage and current is roughly the same as that of the data measured in the laboratory, so the voltage, current, SOC value and temperature are directly related to the vehicle status and mileage. Three-minute fault data of a certain model of vehicle are selected from the whole data for analysis. When the SOC value decreases, the overall trend of battery temperature shows an upward trend, and, finally, it rises to 32 °C, while the minimum temperature remains at 0 °C. It can be seen from Figure 2 that the vehicle speed is 0 km/h for a long time, indicating that the vehicle is in a parked state, the voltage, SOC value, minimum temperature and maximum temperature have changed, and the SOC value has dropped sharply to 0, indicating that the vehicle power battery has failed.

2.1.1. Early Identification of Power Battery Failure

The early identification of power battery failure involves using the original vehicle data to detect the abnormality of a certain characteristic quantity, which helps to identify the cause of vehicle power battery failure. If the voltage of a single battery is abnormal, it is necessary to obtain the voltage values of each battery in the time series of the same vehicle through data separation. The voltage values of each single cell are arranged in time series as one dimension of the new data set. However, since the new data set is multi-dimensional, which is not conducive to data processing and analysis, it is necessary to reduce the dimension of the data. T-Sne dimensionality reduction can not only be directly visualized, but also pay attention to local features while dimensionality reduction, in order to finally obtain dimensionality reduction data with minimal loss function, which provides convenience for the subsequent early fault identification through K-means clustering algorithm. One of the early fault discrimination results is shown in Figure 3. The data points where outliers occur in the figure may be the data points for battery failure, and whether or not a failure occurs requires the next fault diagnosis.

2.1.2. Real Vehicle Data Preprocessing

Data preprocessing mainly includes data cleaning and data noise reduction. Data cleaning and noise reduction are mainly to deal with the isolated points of data loss and actual difference due to factors such as sensor manufacturing process. Before data processing, for convenience, the original date information is replaced by sampling points with an interval of 1.

(1) Data cleaning

Data collection exceptions and failures are inevitable during mass transmission; therefore, data cleaning is required. The main data types of data cleaning include data loss, numerical error and data duplication. According to different data types, the data processing methods are different. For example, the deletion method is mainly used for data packet loss, and the interpolation method is used for data with large errors. The interpolation method can also be used for the loss of certain attribute data.

(2) Data noise reduction

The data after data cleaning may have random errors or deviations, as well as deviations from the expected outliers, which constitute the noise in the data [18]. There is noise in the data that affects the accuracy and robustness of the model, so wavelet transform is used for noise reduction. The basic principle of wavelet transform denoising is shown in Figure 4.

Data denoising by wavelet transform is mainly divided into three parts: decomposition, threshold processing and data reconstruction. In the process of wavelet transform denoising, the data signal is first transformed into wavelet, and the main signal and noise can be distinguished by defining and solving wavelet coefficients. The wavelet coefficients of the main signal are larger than those of the noise. Then, an appropriate wavelet coefficient threshold is selected. The wavelet coefficients larger than the threshold are considered to be generated by normal signals, while the wavelet coefficients smaller than the threshold are considered to be noise. Finally, the noise data are zeroed or interpolated to complete data noise reduction. The comparison of voltage data before and after noise reduction is shown in Figure 5. The isolated points in the data processed by wavelet transform are obviously reduced compared with the data before processing, and the expected data processing effect is achieved.

(3) Data slicing and integration

Due to the high data dimension and huge data volume, the equipment will have a bottleneck in processing, so it is necessary to slice and integrate the data. It will segment the time dimension, vehicle type, driving conditions, charging conditions and SOC values, and then reconstruct and integrate the segmented data.

2.2. Feature Selection Based on Factor Analysis

Logit model is currently the most widely used discrete choice model, where the value of the predictor variable is used to predict the presence or absence of a feature or outcome, and the dependent variable is changed to a quadratic variable, usually using binary logistic regression. Binary logistic regression analysis includes chi-square test and Hosmer-Lemeshow test, where the chi-square test is the degree of deviation between the actual observed value of the statistical sample and the theoretical inferred value, and the degree of deviation between the actual observed value and the theoretical inferred value determines the size of the chi-square value; if the larger the chi-square value, the greater the degree of deviation between the two. Conversely, the Hosmer–Lemeshow goodness-of-fit test is based on separating the samples according to the predicted probability or risk. Specifically, based on the estimated parameter values, for each observation in the sample, probabilities are calculated based on the covariate values for each observation.

The determination of the fit of the binary logistic regression analysis model or the effectiveness of the model will involve three indicators, the Hosmer–Lemeshow tests, R2 values and the model prediction accuracy tables. The Hosmer–Lemeshow tests are used to test whether the factual data situation remains consistent with the model fit results; if the p-value is greater than 0.05, then it means that the factual data situation is consistent with the model fit results, which means that the model fit is good.

The binary logistic regression analysis method is used to conduct regression analysis on the cleaning driving data and the dynamic indicators, such as the vehicle running state, charging state, gear position, insulation resistance, battery voltage and battery temperature difference, in order to determine the characteristic indicators related to the new energy vehicle alarm and the impact weight of each characteristic indicator on the corresponding alarm information. The binary logistic regression analysis can be expressed as:

L o g i t P_{j} = β_{0} + \sum_{q = 1}^{p} β_{q} x_{q}

(1)

where

P_{j}

is the alarm type,

β_{0}

is the constant,

p

is the number of characteristic indicators,

x_{q}

is the influencing factor,

β_{q}

is the weight coefficient of characteristic indicators and

q

is the constant,

q = 1, 2, 3, \dots, p

.

The SPSS software is used to analyze the data during vehicle operation, and the influence weight of each battery indicator on the failure is obtained in case of failure. The results of the Hosmer–Lemeshow test inspection are shown in Table 1, and the results of SPSS analysis are shown in Figure 6. The x-axis 0 in the figure represents the vehicle operation state, 1 represents the vehicle operation state (1), 2 represents the vehicle operation state (2), 3 represents the charging state, 4 represents the charging state (1), 5 represents the charging state (2), 6 represents insulation resistance, 7 represents maximum temperature, 8 represents maximum voltage, 9 represents gear, 10 represents gear (1), 11 represents gear (2), 12 represents gear (3), 13 represents SOC and 14 represents constant. Sig. and Exp(B) must be determined. Sig. must be less than 0.05 and Exp(B) greater than 0 and less than 2 to carry out the selection of indicators, which can be derived from the vehicle running state, charging state, gear, insulation resistance, battery voltage and battery temperature difference, all of which may have an impact on the fault (Table 2).

In the logistic regression model, chi square value = 87.589, p < 0.001, indicating that the model is statistically significant. The results show that the insulation resistance, the maximum voltage, the maximum temperature and the gear position have great influence on the vehicle fault.

2.3. Subsection

The basic idea of RNN is to use the historical information of time series data to process time series data. RNN has sufficient capacity to process data of any length. For t moments:

h_{t} = ϕ (U x_{t} + W h_{t - 1} + b)

(2)

where

ϕ

is the activation function, which can be functions such as

\tanh

,

r e l u

,

s i g m o i d

, etc., and

b

is the bias. Output at this point:

o_{t} = σ (V h_{t} + c)

(3)

where

σ

is the activation function, and generally the selection function

s o f t \max

. In order to solve the problem that the information persistence of RNN neural network is not strong, LSTM solves the problem of gradient explosion and gradient disappearance of recurrent neural network by adding an adaptive forgetting gate to update its relearning process and release its internal resources, and the BiLSTM neural network structure model is better than a single LSTM structure model for feature extraction efficiency and performance. The structure of the BiLSTM network is shown in Figure 7.

The calculation steps of BiLSTM are shown in Table 3.

The CNN-BiLSTM model is established to extract the features of the time series. The feature extraction structure is shown in Figure 8. As can be seen from Figure 8, the CNN-BiLSTM model is used for feature extraction of voltage and current, and the extracted results follow the original feature data very well, with the extraction accuracy of 98.28%. As can be seen from Figure 9, after more than 700 iterations of the model, the root mean square error (RMSE) is maintained at about 0.05, which shows that the feature extraction effect is good.

3. Battery Fault Diagnosis Based on Self-Learning Bayesian Network

Bayesian network is an important method to deal with the uncertainty information of fault. It deals with the uncertainty caused by conditional correlation between various knowledge and information in a probabilistic way. For the whole Bayesian network, the joint probability distribution function of the random variables represented by all nodes can be expressed as:

P (X_{1}, X_{2}, \dots, X_{n}) = P (X_{1}) P {(X_{2} | X_{1})}^{p} (x_{n} | A X_{1}, X_{2}, \dots, X_{n - 1})

(4)

where X₁, X₂, X₃……X_n is the extracted time series feature, represents the convolution kernel corresponding to the feature channel, and is the dimension of the feature vector after fusion.

The method of binary logistic regression analysis for regression analysis of dynamic indicators has been introduced above. According to the determined characteristic indexes and their corresponding alarm information, the characteristic indexes are taken as the data layer, the alarm information is taken as the alarm layer, and the security evaluation results are taken as the index layer to build the Bayesian network model. Since the input of the Bayesian network is not a specific value but a state data, this embodiment makes a state division for the actual data of each layer. For the data layer, it is defined as Faulty and Good respectively according to the non-overlapping part of alarm and normal data. For data overlap, this embodiment defines a new state Fair. For the alarm layer, two states are divided according to whether the alarm is given. For the index layer, based on the original data, it is divided into three categories: Level 1 fault, Level 2 fault and Level 3 fault

Based on the division of different states, a priori probability is calculated according to the frequency of occurrence of each characteristic index in different states, and the data of the data layer and the alarm layer are weighted according to the weight of the characteristic index in the binary logistic regression equation and the frequency of alarm occurrence, in which the membership function is used to calculate the conditional probability of the alarm layer and the index layer. The status score of the alarm information directly connected to each characteristic indicator can be expressed as:

CS = \sum_{q = 1}^{p} S_{q}^{c} β_{q}

(5)

Among them, the three status levels, Good, Fair and Faulty, of the data layer in the Bayesian network are assigned as 1–3 in turn,

S_{q}^{c}

where c represents the status level, q represents different characteristic indicators, and represents the fixed weight assigned to each characteristic indicator,

\sum β_{q} = 1

.

Next, the score-probability conversion formula is used to convert the state score into the conditional probabilities of different states of the alarm layer and the index layer:

{\begin{cases} f_{b} (x) = \frac{1}{2} - \frac{1}{2} \sin [\frac{π}{s_{b + 1} - s_{b}} (x - \frac{s_{b + 1} + s_{b}}{2})], b = 1, 3, 5 \\ f_{b} (x) = \frac{1}{2} - \frac{1}{2} \sin [\frac{π}{s_{b} - s_{b - 1}} (x - \frac{s_{b} + s_{b - 1}}{2})], b = 2, 4, 6 \end{cases}

(6)

{\begin{cases} μ_{1} (x) = {\begin{cases} 1, 1 \leq x < s_{1} \\ f_{1} (x), s_{1} \leq x < s_{2} \\ 0, x > s_{2} \end{cases} \\ μ_{2} (x) = {\begin{cases} 0, 1 \leq x < s_{1} \\ f_{2} (x), s_{1} \leq x < s_{2} \\ 1, x > s_{2} \end{cases} \end{cases}

(7)

{\begin{cases} μ_{1} (x) = {\begin{cases} 1, 1 \leq x < s_{1} \\ f_{1} (x), s_{1} \leq x < s_{2} \\ 0, x > s_{2} \end{cases} \\ μ_{2} (x) = {\begin{cases} f_{2} (x), s_{1} \leq x < s_{2} \\ 1, s_{2} \leq x < s_{3} \\ f_{3} (x), s_{3} \leq x < s_{4} \end{cases} \\ μ_{3} (x) = {\begin{cases} f_{4} (x), s_{3} \leq x < s_{4} \\ 1, s_{2} \leq x < s_{3} \\ f_{5} (x), s_{5} \leq x < s_{6} \end{cases} \\ μ_{4} (x) = {\begin{cases} f_{5} (x), s_{5} \leq x < s_{6} \\ 0, x < s_{5} \\ 1, s_{6} \leq x < 4 \end{cases} \end{cases}

(8)

Among them,

μ_{i}

represents the membership function of different state levels,

f_{1} (x),

to

f_{6} (x)

is determined by the above formula. To ensure that the calculated data are within the 95% confidence interval of the function, the values of S₁ to S₆ are taken as 1.05, 1.95, 2.05, 2.95, 3.05 and 3.95, respectively. First of all, the network structure is designed. Since there are many factors that affect whether a fault occurs, the first four factors that have a greater impact are selected as the data layer of the Bayesian network model in the binary logistic regression analysis to reduce the amount of computation, and the Bayesian network model is established for fault diagnosis. The Bayesian network is shown in Figure 10.

The next step is to determine the parameters of the network; that is, the conditional probability on each side. However, in this example, the parameter learning method is used to automatically learn from the data.

There are two typical methods of parameter learning, maximum likelihood estimation and Bayesian estimation. The over fitting of the former is serious; therefore, the latter is generally used for parameter learning. The Bayesian estimator provided by pgmpy supports three prior distributions: ‘Dirichlet’, ‘BDeu’ and ‘K2’, which are actually dirichlet distributions

Under the framework of Bayesian analysis, the parameter to be calculated is regarded as a random variable, and its estimation is to use data to calculate a posteriori based on its prior, so there must be a right prior assumption first. The prior distribution we usually take is the dirichlet distribution. For a node with discrete states, its parameter is set as

θ_{i} = P (X = x_{i}), i = 1, 2, \dots, r

(9)

Its prior is Dirichlet distribution D

p (θ) = \frac{Γ (α)}{\prod_{i = 1}^{r} Γ (α_{i})} \prod_{i = 1}^{r} θ_{i}^{α_{i} - 1}

(10)

This prior has

i

parameter, which is mathematically proved to be equivalent to expressing the prior as

α

virtual sample, where the number of samples satisfied

X = x_{i}

is

α_{i}

, and this

α

becomes the equivalent sample size. This coincidence is actually the reason why the formal prior function takes this function. In addition, the posterior distribution after calculation is also Dirichlet distribution. In the program, each conditional probability is estimated together using the provided

f i t

function.

After parameter learning, the probability of failure under different states is verified.

The table of the probability of failure when the charging status is different is shown in Table 4.

The probability of failure table for normal/abnormal battery voltage is shown in Table 5.

The probability of failure table for normal/abnormal battery temperature is shown in Table 6.

The table of probability of failure when the insulation resistance value is normal/abnormal is shown in Table 7.

The final fault level judgment table, as shown in Table 8.

It can be seen from the results that when the vehicle is in the non charging mode, the accident probability is small, which is highly consistent with the previous data processing results. In the charging mode, faults are more likely to occur, and the proportion of secondary faults is high, which is consistent with the data processing results.

Then, through the data cleaning and binary logistic regression feature extraction, the data are divided into training set and test set for fault prediction simulation, and the parameters of each grid in the Bayesian network model are automatically learned from the data

Under the framework of Bayesian analysis, the parameters to be determined are represented by θ, which is regarded as a random variable. Its estimation is based on its prior, using data to obtain a posteriori, so it is necessary to make an assumption regarding its θ. The prior distribution we usually take is Dirichlet distribution. For a node with i discrete states, we set its parameter as

θ_{i} = P (X = x_{i}), i = 1, 2, \dots, r

and let its prior be Dirichlet distribution, and finally obtain the probability of failure occurrence through Bayesian formula

P (θ | X) = \frac{P (X | θ) P (θ)}{P (X)}

(11)

Two groups of data of two types of vehicles are selected as the test data of fault prediction. First, the data are divided into training set and test set, and the data in the training set are learned to establish a Bayesian network model. Next, different parameters in the test set are input, and the probability of failure occurrence is calculated through Bayesian estimator to predict the occurrence of failure. Finally, the test set is compared with the prediction results, and residual error and accuracy are calculated. Among them, the prediction results of vehicle model I data and vehicle model II data are shown in Figure 11 and Figure 12.

ID in the horizontal coordinate indicates the data ID and alarm_info in the vertical coordinate indicates the fault code, indicating whether a fault occurs and the type of the fault. Residual represents the difference between actual failure (real) and predicted failure (predict). A positive residual indicates that a fault actually occurs but is not predicted, a responsible residual indicates that a fault is predicted but no fault actually occurs, and a residual of 0 indicates that the predicted result is consistent with the actual result. Finally, the fault diagnosis accuracy of the two groups of data is shown in Table 9, respectively, 91.82% and 90.49%, which is equivalent to the fault diagnosis accuracy of the Bayesian network without feature extraction and self-learning improved by about 12%. In the case of 200 data, compared with the neural network (BP) algorithm, genetic idea based optimization neural network (GA-BP) algorithm, support vector machine (SVM) algorithm, random forest (RF) algorithm, and K-nearest neighbor (KNN) algorithm proposed in the literature [19,20], the results of this paper compared with other methods are shown in Figure 13, and the accuracy rate was improved by 19.42%, 5.16%, 6.02%, 3.72% and 17.62%, respectively, and the algorithm designed in this paper is also faster in fault diagnosis time, with the highest time reduced by 15.27s and the smallest by 3.37s, indicating that the algorithm can respond to hierarchical faults in a shorter time and can achieve better results.

4. Conclusions

In order to improve the accuracy of the self-learning Bayesian hierarchical fault diagnosis model, data preprocessing and fault feature extraction using CNN-BiLSTM algorithm are required, where data preprocessing reduces the influence of noise on fault diagnosis, and fault feature extraction is chosen because the fusion of CNN and BiLSTM algorithms is related to time series and needs to consider data from both directions. The fault model is also trained using a large amount of collected real vehicle data, and it is verified that the fault model can perform hierarchical fault diagnosis of the battery system online or offline to improve the robustness. The method demonstrates superiority in fault diagnosis accuracy and fault diagnosis time when compared with BP, GA-BP and SVM methods with the same database size. It also enables quantitative assessment on the battery safety risk status of new energy vehicles, more accurate identification of the battery safety status, and also guarantees the normal and smooth operation of new energy vehicles, which has a great room for improvement in fault warning.

Future research focuses on whether a large amount of real vehicle data and laboratory data can be combined to target specific battery fault research, and reduce the computational pressure of the algorithm, improve the accuracy of fault diagnosis, so that the battery fault diagnosis method is universal, and, at the same time, can be further applied to the safety management of complex dynamic systems.

Author Contributions

Conceptualization, Z.W. and W.H.; methodology, W.L. and S.X.; software, S.X.; validation, Y.Y. and J.W.; formal analysis, S.X. and W.L.; investigation, Z.Y.; resources, Z.W. and W.H.; data curation, W.L. and S.X.; writing—original draft preparation, W.L., S.X. and Y.Y.; writing—review and editing, W.L. and S.X.; visualization, W.L., S.X. and J.W.; supervision, Z.W. and L.H.; project administration, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant number 51775042).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would also like to thank the editors and reviewers for their valuable suggestions. Thanks to the National Monitoring and Management Center for New Energy Vehicles and the National Big Data Alliance for New Energy Vehicles for providing the data.

Conflicts of Interest

The authors declare no conflict of interest.

References

Leonora, E.; Longhitano, F.; Orlando, A.; Randazzo, N.; Riccobene, G.; Viola, S. KM3NeT underwater autonomous power supply system. J. Instrum. 2021, 16, C10003. [Google Scholar] [CrossRef]
Jiang, L.; Deng, Z.; Tang, X.; Hu, L.; Lin, X.; Hu, X. Data-driven fault diagnosis and thermal runaway warning for battery packs using real-world vehicle data. Energy 2021, 234, 121266–121277. [Google Scholar] [CrossRef]
He, H.; Zhao, X.; Li, J.; Wei, Z.; Huang, R.; Jia, C. Voltage abnormality-based fault diagnosis for batteries in electric buses with a self-adapting update model. J. Energy Storage 2022, 53, 105074–105087. [Google Scholar] [CrossRef]
Wang, Z.P.; Li, X.Y.; Yuan, C.G.; Li, X.H. Challenges and development trend of electric vehicle power battery fault diagnosis technology under big data. J. Mech. Eng. 2021, 57, 52–63. [Google Scholar] [CrossRef]
Li, F.; Min, Y.; Zhang, Y. A Novel Method for Lithium-Ion Battery Fault Diagnosis of Electric Vehicle Based on Real-Time Voltage. Wirel. Commun. Mob. Comput. 2022, 2022, 7277446. [Google Scholar] [CrossRef]
Jiang, J.; Cong, X.; Li, S.; Zhang, C.; Zhang, W.; Jiang, Y. A Hybrid Signal-Based Fault Diagnosis Method for Lithium-Ion Batteries in Electric Vehicles. IEEE Access 2021, 9, 19175–19186. [Google Scholar] [CrossRef]
Zhang, F.; Xing, Z.X.; Wu, M.H. Fault diagnosis method for lithium-ion batteries in electric vehicles using generalized dimensionless indicator and local outlier factor. J. Energy Storage 2022, 52, 104963–104973. [Google Scholar] [CrossRef]
Xiong, R.; Yu, Q.; Shen, W.; Lin, C.; Sun, F. A Sensor Fault Diagnosis Method for a Lithium-Ion Battery Pack in Electric Vehicles. IEEE Trans. Power Electron. 2019, 34, 9709–9718. [Google Scholar] [CrossRef]
Pan, F.W.; Gong, D.L.; Gao, Y.; Xu, W.M.; Ma, B. Current sensor fault diagnosis based on linearized model of lithium-ion battery. J. Jilin Univ. 2021, 51, 435–441. [Google Scholar]
Pan, F.W.; Ma, B.; Gao, Y.; Xu, M.; Gong, D. Parity space method for electric vehicle lithium-ion battery sensor fault diagnosis. Automot. Eng. 2019, 41, 831–838. [Google Scholar]
Schmid, M.; Gebauer, E.; Endisch, C. Structural Analysis in Reconfigurable Battery Systems for Active Fault Diagnosis. IEEE Trans. Power Electron. 2021, 36, 8672–8684. [Google Scholar] [CrossRef]
Wang, Y.; Meng, D.; Li, R.; Zhou, Y.; Zhang, X. Multi-Fault Diagnosis of Interacting Multiple Model Batteries Based on Low Inertia Noise Reduction. IEEE Access 2021, 9, 18465–18480. [Google Scholar] [CrossRef]
Dey, S.; Perez, H.E.; Moura, S.J. Model-based battery thermal fault diagnostics: Algorithms, analysis, and experiments. IEEE Trans. Control Syst. Technol. 2017, 27, 576–587. [Google Scholar] [CrossRef]
Yang, N.; Xu, C.; Fang, R.; Li, H.; Xie, H. Capacity failure prediction of lithium batteries for vehicles based on large data. In Proceedings of the Second International Conference on Optoelectronic Science and Materials (ICOSM 2020), Hefei, China, 25–27 September 2020; Volume 11606, p. 1160609. [Google Scholar] [CrossRef]
Zhao, X.; Wang, L.; Wang, X.; Sun, Y.; Jiang, T.; Li, Z.; Zhang, Y. Reliable Life Prediction and Evaluation Analysis of Lithium-ion Battery Based on Long-Short Term Memory Model. In Proceedings of the IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C), Sofia, Bulgaria, 22–26 July 2019; pp. 507–510. [Google Scholar] [CrossRef]
Zhao, J.; Ling, H.; Wang, J.; Burke, A.F.; Lian, Y. Data-driven prediction of battery failure for electric vehicles. iScience 2022, 25, 104172–104193. [Google Scholar] [CrossRef]
Xue, Q.; Li, G.; Zhang, Y.; Shen, S.; Chen, Z.; Liu, Y. Fault diagnosis and abnormality detection of lithium-ion battery packs based on statistical distribution. J. Power Sources 2020, 482, 228964–228976. [Google Scholar] [CrossRef]
Li, D.; Zhang, Z.; Wang, Z.; Liu, P.; Liu, Z.; Lin, N. Timely Thermal Runaway Prognosis for Battery Systems in Real-world Electric Vehicles Based on Temperature Abnormality. IEEE J. Emerg. Sel. Top. Power Electron. 2022. [Google Scholar] [CrossRef]
Jiao, X.; Jing, B.; Huang, Y.; Li, J.; Xu, G. Research on fault diagnosis of airborne fuel pump based on EMD and probabilistic neural networks. Microelectron. Reliab. 2017, 75, 296–308. [Google Scholar] [CrossRef]
Wu, J.; Kong, L.; Yi, M.; Chen, Q.; Cheng, Z.; Zuo, H.; Yang, Y. Prediction and Screening Model for Products Based on Fusion Regression and XGBoost Classification. Comput. Intell. Neurosci. 2022, 2022, 4987639. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Flow Chart of Lithium-Ion Battery Fault Diagnosis.

Figure 2. New Energy Vehicle Real Vehicle Data.

Figure 3. Fault early identification results chart.

Figure 4. Flow Chart of Wavelet Transform Denoising.

Figure 5. Comparison Chart of actual vehicle data before and after noise reduction: (a) time voltage diagram before noise reduction; (b) time voltage diagram after noise reduction.

Figure 6. Analysis results of SPSS.

Figure 7. The BiLSTM network structure.

Figure 8. CNN-BiLSTM feature extraction diagram of time series: (a) current; (b) voltage.

Figure 9. Root mean square error plot of feature extraction.

Figure 10. Bayesian network model.

Figure 11. Fault diagnosis results of vehicle Model 1: (a) after data processing, direct fault diagnosis; (b) after the method designed in this paper, the fault diagnosis.

Figure 12. Fault diagnosis results of vehicle Model 2: (a) after data processing, direct fault diagnosis; (b) after the method designed in this paper, the fault diagnosis.

Figure 13. Comparison chart of the method in this paper and other methods.

Table 1. Results of Hosmer–Lemeshow inspection.

Step	chi Square Value	Df	Sig
1	312.822	8	0.000

Table 2. Model Summary.

Step	−2 Log Likelihood	Cox & Snell R	Nagelkerke R
1	26,752.453 ¹	0.024	0.376

¹ The maximum number of iterations has been reached; it is estimated to terminate when the number of iterations is 20, and the final solution cannot be found.

Table 3. BiLSTM Calculation Step.

Title 1	Title 2	Title 3
Computation of forget gates and selection of information to forget	$f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})$	input character $X_{t}$ , cell state $C_{t}$ , temporary cell state ${\tilde{C}}_{t}$ ; hidden layer states $h_{t}$ ; forgetting gates $f_{t}$ ; memory gates $i_{t}$ and output gates $o_{t}$ ; $W$ and $b$ are the weights and biases in the neuron, respectively; subscripts $f$ , $i$ and $c$ are the forgetting gate, input gate and output gate, respectively; $*$ denote the scalar product of two vectors; $T$ is the text of the input sentence. $W_{S}$ and $b_{S}$ is Weight matrix and deviation generated randomly during training
Computing memory gates and temporary cell states	$\begin{array}{l} i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}) \\ {\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C}) \end{array}$
Calculate the current cell state	$C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}$
Calculate the current state of the output gate and hidden layer	$\begin{array}{l} o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}) \\ h_{t} = o_{t} * \tanh (C_{t}) \end{array}$
Softmax classifier input to predict	$p (y \| T) = s o f t \max (W_{S} H + b_{S})$

Table 4. Charging state failure probability table.

Charging Status	T	F
Failure probability	0.12	0.88

Table 5. Battery voltage failure probability table.

Battery Voltage	T	F
Failure probability	0.08	0.92

Table 6. Battery temperature failure probability table.

Battery Temperature	T	F
Failure probability	0.21	0.79

Table 7. Insulation resistance failure probability table.

Insulation Resistance Value	T	F
Failure probability	0.02	0.98

Table 8. Fault level judgement.

Failure Probability	No Fault	Level 1	Level 2	Level 3
No fault/No charge	1	0	0	0
No fault/Charge	1	0	0	0
Fault/No charge	0.80	0.03	0.16	0.01
Fault/Charge	0.72	0.07	0.18	0.03

Table 9. The method designed in this paper is compared with the accuracy after direct data processing.

	Vehicle Model 1 (%)	Vehicle Model 2 (%)
Direct fault diagnosis method	80.33	78.51
The method designed in this paper	91.82	90.49
Improvement of accuracy	11.49	11.98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Luo, W.; Xu, S.; Yan, Y.; Huang, L.; Wang, J.; Hao, W.; Yang, Z. Electric Vehicle Lithium-Ion Battery Fault Diagnosis Based on Multi-Method Fusion of Big Data. Sustainability 2023, 15, 1120. https://doi.org/10.3390/su15021120

AMA Style

Wang Z, Luo W, Xu S, Yan Y, Huang L, Wang J, Hao W, Yang Z. Electric Vehicle Lithium-Ion Battery Fault Diagnosis Based on Multi-Method Fusion of Big Data. Sustainability. 2023; 15(2):1120. https://doi.org/10.3390/su15021120

Chicago/Turabian Style

Wang, Zhifu, Wei Luo, Song Xu, Yuan Yan, Limin Huang, Jingkai Wang, Wenmei Hao, and Zhongyi Yang. 2023. "Electric Vehicle Lithium-Ion Battery Fault Diagnosis Based on Multi-Method Fusion of Big Data" Sustainability 15, no. 2: 1120. https://doi.org/10.3390/su15021120

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Electric Vehicle Lithium-Ion Battery Fault Diagnosis Based on Multi-Method Fusion of Big Data

Abstract

1. Introduction

2. Fault Data Processing and Feature Extraction of Lithium Ion Battery

2.1. Monitoring Platform and Data Analysis

2.1.1. Early Identification of Power Battery Failure

2.1.2. Real Vehicle Data Preprocessing

2.2. Feature Selection Based on Factor Analysis

2.3. Subsection

3. Battery Fault Diagnosis Based on Self-Learning Bayesian Network

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI