Next Article in Journal
Safety Assessment of High Dynamic Pre-Loaded Lithium Ion Pouch Cells
Next Article in Special Issue
Optimal Capacity and Cost Analysis of Battery Energy Storage System in Standalone Microgrid Considering Battery Lifetime
Previous Article in Journal
Urea-Based Deep Eutectic Solvent with Magnesium/Lithium Dual Ions as an Aqueous Electrolyte for High-Performance Battery-Supercapacitor Hybrid Devices
Previous Article in Special Issue
Physics-Based SoH Estimation for Li-Ion Cells
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data-Driven Thermal Anomaly Detection in Large Battery Packs †

1
Department of Mechanical Engineering, The Pennsylvania State University, State College, PA 16802, USA
2
Wabtec Corporation, Pittsburgh, PA 15212, USA
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in 2022 American Control Conference (ACC), Atlanta, GA, USA, 8–10 June 2022; pp. 5277–5281.
Batteries 2023, 9(2), 70; https://doi.org/10.3390/batteries9020070
Submission received: 1 December 2022 / Revised: 10 January 2023 / Accepted: 13 January 2023 / Published: 18 January 2023
(This article belongs to the Collection Recent Advances in Battery Management Systems)

Abstract

:
The early detection and tracing of anomalous operations in battery packs are critical to improving performance and ensuring safety. This paper presents a data-driven approach for online anomaly detection in battery packs that uses real-time voltage and temperature data from multiple Li-ion battery cells. Mean-based residuals are generated for cell groups and evaluated using Principal Component Analysis. The evaluated residuals are then thresholded using a cumulative sum control chart to detect anomalies. The mild external short circuits associated with cell balancing are detected in the voltage signals and necessitate voltage retraining after balancing. Temperature residuals prove to be critical, enabling anomaly detection of module balancing events within 14 min that are unobservable from the voltage residuals. Statistical testing of the proposed approach is performed on the experimental data from a battery electric locomotive injected with model-based anomalies. The proposed anomaly detection approach has a low false-positive rate and accurately detects and traces the synthetic voltage and temperature anomalies. The performance of the proposed approach compared with direct thresholding of mean-based residuals shows a 56 % faster detection time, 42 % fewer false negatives, and 60 % fewer missed anomalies while maintaining a comparable false-positive rate.

1. Introduction

Li-ion batteries (LiBs) are widely used in energy storage applications, such as power grids, electric vehicles, and electric locomotives, due to their high energy density, power density, long cycle life, and extended calendar life. Feng et al. [1], however, list several recent accidents due to the failure of LiBs, often due to thermal runaway. Thermal runaway is often preceded by an internal short circuit caused by thermal, mechanical, and electrical abuse. Overcharge and over-discharge can also lead to thermal runaway [2]. Other critical anomalies in battery packs include balancing circuit failures and external short circuits (ESCs). Furthermore, sensor anomalies can lead to inaccurate control actions by the battery management system (BMS). Thus, it becomes critical to have an early and quick detection method followed by appropriate actions to avoid fault propagation, ensuring the safe and reliable operation of LiB packs.
The time-series data outputs of a battery system are non-stationary due to the time-varying current and environmental conditions. Anomalies may not be detected by directly thresholding the voltage and temperature measurements, especially at anomaly initiation when the voltage and temperature deviations are small. Therefore, the data are made stationary by estimating the voltage and temperature residuals as the difference between the measurements and the expected responses. Previous research focuses on cell-level anomaly detection using model-based residual estimation and thresholding [3,4,5,6,7,8,9]. State observers, such as extended Kalman filters (EKF) [10,11], adaptive EKF [9], unscented Kalman filters (UKF) [12], dual EKF [13], and nonlinear observers [3], have been used, along with parameter estimation techniques, such as recursive least squares [7,11,12,14] and particle swarm optimization [8,14], to generate residuals. Anomaly detection is also performed by thresholding the model-based voltage, temperature, and state of charge (SoC) residuals against predetermined thresholds [9,13,15,16]. However, generating model-based residuals is computationally expensive for battery packs, as it involves estimators for many cells. Computational complexity can be reduced through bar-delta filtering, cell mean models, and cell difference models to estimate the SoC of each cell [11,14,17]. Several works are reported in the literature that detect different types of battery-related [4,5,6,7,8] and sensor-related [9,10,18,19] anomalies. However, most of the aforementioned approaches are applicable to only one type of fault [7,8,9,10]. Some techniques work only if no two faults occur at the same time [16,19,20]. Some of the aforementioned approaches require parameter estimation by performing specific characteristic tests [6,9,18].
Apart from model-based approaches, data-driven models, which utilize the cell-to-cell redundant voltage information in battery packs, are used for anomaly detection. Correlation-based methods detect and trace voltage anomalies using the correlation coefficient between cell voltages [20,21,22]. However, these methods can be sensitive to measurement noise [23]. Entropy-based anomaly detection methods detect voltage anomalies by monitoring the entropy measure such as Shannon entropy [24,25,26]. Sun et al. [27] detected and located short-circuit anomalies in battery packs by thresholding the modified Z-score of the relative entropy of individual cells with the pack median. Shannon entropy is also used for thermal runaway prognosis by detecting thermal faults [28]. These methods have high computational costs and their performance is dependent on the choice of entropy measure and computation window, especially in the case of noisy data [23]. Machine learning (ML)-based anomaly detection approaches that have been applied to other domains and LiBs [29] include classification, clustering, nearest-neighbor, statistical, information-theoretic, and spectral-based techniques [30]. ML techniques, such as neural networks [31], the k-means clustering algorithm [32], support vector machines [33], and random forest classifiers [34,35], have also been applied to anomaly detection in battery systems. However, most of these techniques require large amounts of labeled battery-fault data for training.
Among the other data-driven approaches, Principal Component Analysis (PCA) is a promising unsupervised anomaly detection algorithm that has been extensively used in anomaly detection for multivariate systems [36,37,38]. Wang et al. [38], for example, proposed sensor fault detection for a chiller system using PCA on the process variables. Schmid et al. [39] proposed a PCA-based approach that detects voltage anomalies in a group of cells by applying PCA on voltage data processed using outlier robust sample studentization. In [40], these researchers extended their method to include the kernel PCA-based method to detect internal short-circuit (ISC) faults using voltage signals but it is computationally expensive. The approaches in [39,40] detected anomalies with a single anomalous voltage but their applicability in the case of multiple anomalous signals was not studied. The effect of cell balancing on detection performance was also not studied. Furthermore, the literature lacks an effective anomaly detection approach that can also detect thermal anomalies, even in the case of multiple anomalous signals.
This paper extends and improves the work in [39] to present an anomaly detection scheme that combines PCA and the cumulative sum (CUSUM) control chart to detect and locate both voltage and temperature anomalies in groups of Li-ion cells in real time. Addressing the aforementioned research gaps, the proposed approach detects voltage and temperature anomalies, even in the case of multiple simultaneous anomalous signals. The voltage and temperature residuals are the difference between the measured cell signals and the mean signals of the cell group. Unlike model-based approaches, median-based and mean-based residuals (MBRs) reduce the effect of aging, as all the cells in the cell group experience similar loading and environmental conditions during their life. In the proposed approach, the MBRs are processed using PCA to capture cell-to-cell information including the inconsistencies and thresholded using the CUSUM control chart to detect anomalies, which reduces the false positives and improves the detection rate. Experimental validation of the proposed approach is performed on external short-circuit data from a battery electric locomotive. We compare the proposed approach using PCA-processed MBRs (PCA method) with the direct thresholding of voltage and temperature MBRs (direct method) [41]. To further evaluate performance, statistical testing of the proposed approach is performed using model-based synthetic anomalies injected into nominal experimental data. The detection time, recovery time, false-negative rate, missed anomaly rate, and false-positive rate statistics are compared for the two methods. Finally, the effect of balancing on the performance of the anomaly detection approach is also studied.

2. Anomaly Detection Algorithms

Mean-based residual generation is proposed for both voltage and temperature. We assume that the cells within a group behave similarly under nominal conditions. These cell groups could be battery strings consisting of series/parallel connected cells that are spatially, thermally, chemically, and electrically similar. The MBRs of voltage and temperature in a cell group with n cells are calculated by
x i ( t ) = X i ( t ) μ X ( t )
where X i is the voltage/temperature of the ith cell and the mean μ X ( t ) = 1 n i = 1 n X i ( t ) .
The first and simplest anomaly detection scheme (direct method) directly compares each residual signal with a predetermined threshold [41]. For earlier anomaly detection, the PCA method captures cell-to-cell heterogeneity using PCA. Figure 1 shows a block diagram for anomaly detection using the PCA method. Both methods rely on parameters derived from k samples of the residuals of nominal anomaly-free data. This training data provide the mean of the voltage/temperature residuals of each cell ( μ V r , i / μ T r , i ) and the standard deviation of the voltage/temperature residuals for all cells ( σ V r / σ T r ), which are used to calculate the Z-score. PCA is applied on the Z-score because the application of PCA directly on residuals would point the first principal component toward the mean of the data instead of the direction of the highest variance of the residuals.
The training data are placed in the matrix X R n × k and decomposed via singular value decomposition to X = USV T , where U R n × n is the left singular matrix, S R n × n is the singular value matrix, and V R k × n is the right singular matrix. The number of principal components, p, is selected to provide a cumulative variance of 90 % [37]. The truncated left singular matrix U r is the first p columns of U .
In real time, x ( t ) is measured, mean shifted, and normalized by the training data μ X r , i and σ X r to estimate the Z-score as
x ¯ ( t ) = x i ( t ) μ X r , i σ X r
The matrix multiplication U r T x ¯ ( t ) is the projection into the lower dimensional space (i.e., principal subspace), where only nominal data points exist, and x ^ ( t ) = U r x ¯ ( t ) gives the projection back to the original dimension. The reconstruction of x ¯ ( t ) [42] is
x ^ ( t ) = U r U r T x ¯ ( t ) .
Because the principal subspace spanned by U r is anomaly-free, the reconstruction x ^ ( t ) is the expected value in the case of nominal operation. Therefore, an anomaly can be detected by monitoring the difference between x ¯ ( t ) and x ^ ( t ) .
Statistical process control (SPC) charts have been widely used in residual-based anomaly detection for stationary processes. Shewhart, CUSUM, and exponentially weighted moving average (EWMA) control charts are commonly used in univariate SPC [36,43]. Among these, the CUSUM control chart is one of the most effective in detecting small deviations in monitored signals [36,43]. CUSUM control charts have been used in model-based anomaly detection for battery systems [9,10,41].
The PCA method reduces the normalized residual vector to a scalar using the RMSE and uses CUSUM statistics [43] to threshold the filtered RMSE. A simple first-order low-pass filter with a cutoff frequency manually tuned to 4.9 mHz is used to filter the RMSE for robust detection by filtering out high-frequency variations. However, the direct method directly thresholds the absolute values of the filtered voltage and temperature residuals using CUSUM statistics, where the filter has a cutoff frequency of 8.4 m H z . The positive deviation CUSUM is C + [ t ] = m a x ( 0 , C + [ t 1 ] + ( y [ t ] μ c ) K ) , and the negative deviation CUSUM is C [ t ] = m a x ( 0 , C [ t 1 ] ( y [ t ] μ c ) K ) , with C + [ 0 ] = C [ 0 ] = 0 and K is chosen to be 4 σ c for lower false positives, where μ c and σ c are the mean and standard deviation of the thresholding variable, y [ t ] , for the anomaly-free training data. Both C + and C are compared against the 5 σ c control limits for the direct method [41]. In the PCA method, it is sufficient to compare C + against the 5 σ c control limit because PCA always produces a positive deviation in the RMSE. Voltage and temperature anomalies are detected independently from C V + and C T + , respectively.
If an anomaly is detected, the anomalous cell can be identified as the cell with the maximum absolute error in the reconstructed residuals. The first and first two principal components are used to reconstruct the voltage and temperature residuals, respectively, for good tracing performance [42].

3. Synthetic Anomalous Data

LiBs have been commonly modeled using electrochemical and equivalent circuit models (ECM). The former are more accurate and explain the electrochemical processes that occur inside a battery but are computationally expensive [44]. The latter are computationally efficient and can provide sufficient accuracy to be widely used in real-time applications [45]. Thevenin’s equivalent circuit models have been widely used to model LiBs [3,12,13], sometimes including a short-circuit resistance to model LiB cells under internal short circuits [4,46,47]. Higher-order dynamic thermal models are available in the literature [5,6] but a lumped thermal model is often sufficiently accurate [3].
One of the main challenges in testing anomaly detection algorithms is the lack of experimental anomalous data. We adopted a hybrid experimental model approach rather than relying exclusively on model-based data. Anomalies are injected into the voltage and/or temperature of the anomalous cell. Two sensor anomalies, loose voltage and temperature sense leads, are injected by adding bias terms with noise into the experimental data. Figure 2 shows the schematics of the anomaly injection approach to create synthetic anomalous data for internal short circuits, air flow anomalies, and voltage dropouts. Thevenin’s equivalent circuit model with short circuit resistance and a first-order lumped thermal model are used to generate anomalous voltage and temperature data, respectively. In the electrical model, the state propagation for the SoC ( z ) and diffusion voltage ( V c ) , and output equations [41] are:
I b ( t ) = I s c ( t ) + I ( t ) , z ˙ ( t ) = I b ( t ) 36 Q , V ˙ c ( t ) = 1 R 1 C 1 V c ( t ) + I b ( t ) C 1 , V ( t ) = O C V ( z ( t ) ) V c ( t ) I ( t ) R 0 R 0 + R s c R s c ,
where R 0 is the Ohmic resistance, R 1 is the polarization resistance, C 1 is the polarization capacitance, Q is the capacity, R s c is the short-circuit resistance, and O C V is the open circuit voltage as a linear function of z. The thermal behavior can be modeled using the dynamic model [41],
T ˙ ( t ) = a I ( t ) 2 R 0 + I s c ( t ) 2 R s c + V c ( t ) 2 R 1 + b T ( t ) T a m b ( t ) F ( t ) ,
where a is the thermal dissipation coefficient and b is the thermal inertia coefficient. T a m b is the ambient temperature and F is the fan status (0 for off and 1 for on). This thermal model incorporates heat generation due to the ISC modeled as Joule’s heating [48].
As the proposed approach is a purely data-driven approach, it essentially detects the cell-to-cell voltage and temperature deviations from the nominal setting and does not depend on the cause of the deviation. Therefore, synthetic anomalous data are generated to test the effectiveness of the proposed approach for different fault signatures. For example, even though a real air–flow anomaly could lead to multiple anomalous temperatures, we inject an anomaly only in one cell temperature to test the detection performance because the higher the number of anomalous signals, the easier they can be detected by PCA. Hence, statistical testing using multiple cell anomalies may not be ideal for evaluating detection performance and anomalous cell identification accuracy.
Six performance indices are used to evaluate anomaly detection performance: the detection time ( D T ), recovery time ( R T ), false-negative rate ( F N R ), false-positive rate ( F P R ), missed anomaly rate ( M A R ), and true tracing rate ( T T R ). The D T is the time between the start and detection of the anomaly. The R T is the elapsed time between the end and flag reset of the anomaly. The FNR is the percentage of false negatives between the first detection and the end of the anomaly. The F P R is the percentage of the time that the anomalies are flagged in the nominal data. The M A R is the ratio of missed detections to total anomalies. The true tracing rate ( T T R ) is the percentage of time the anomalous cell is located accurately when an anomaly is detected. Thus, an ideal approach will have a low D T , low R T , low F P R , low F N R , low M A R , and a high T T R .

4. Results and Discussion

4.1. Battery System and Data

This study used experimental current, voltage, temperature, and fan-status data from a Wabtec FLXDrive battery electric locomotive battery pack consisting of 825 Li-ion NMC cells with a 37 Ah capacity in a 275S-3P arrangement. The 3P cells were considered a single equivalent cell with the same voltage, three times the capacity, and each cell receiving 1 / 3 of the current. Twenty-five cell groups were formed, each with 11 similar cells. The voltage (V) and surface temperature (T) were measured for each cell and the current (I) was measured for the entire battery pack (VTI data) during nominal locomotive operations. The current was positive during discharging and negative during charging. The battery pack was air-cooled and a fan blew air into the pack to enhance the convective heat transfer. The ambient temperature ( T a m b ) and fan status (F) were also measured for each sub-group. All the measurements were sampled at 1 Hz. During cell balancing, a passive circuit discharged the cells to the lowest SoC within the series string through a shunt resistance of 100 Ω .

4.2. Validation of Synthetic Anomalous Data

The model parameters were the batch least-square estimates, R 0 , R 1 , C 1 , Q, a, and b. The anomaly model parameters were the anomaly type, anomaly magnitude ( ϑ ), start time ( t a ), and anomaly duration ( Δ t a ). The anomaly magnitude ranged from 0 to 1 from no anomaly to the most severe anomaly considered, respectively. The ISC was modeled with R s c = exp ( 9 ( 1 0.6 ϑ ) 2 ) 1 (See Figure 3b). The voltage dropout anomaly was modeled with R s c = exp ( 9 ( 1 0.6 ϑ ) 2 ) 1 (See Figure 3d). The air–flow anomaly was modeled with b = ( 1 ϑ ) b ^ , where b ^ is the nominal value of the thermal inertia coefficient (See Figure 3e). Unlike the other anomalies, loose voltage and temperature sense leads were injected for a fixed duration, as shown in Figure 3a,c, respectively. Figure 4 shows that the maximum voltage and temperature deviations as functions of the anomaly magnitude for all five anomalies were similar.

4.2.1. Air–Flow Anomaly Affecting a Single Cell’s Temperature

Both anomaly detection algorithms were trained with 24 h of data and tested on a different 24 h of data. Figure 5 shows an example of anomaly detection on a cell group with a mild ( ϑ = 0.2 ) air–flow anomaly injected into cell 3 at t a = 8.33 h, with a deviation of Δ t a = 11.67 h. The parameters obtained from the training process in the PCA method are reported in Table 1. The nominal current is shown in Figure 5a. The voltage and temperature of the cells were tightly clustered, as shown in Figure 5b,c, respectively, before the anomaly was injected. The anomalous MBR of the cell 3 temperature was smaller than its nominal MBR, as shown in Figure 5d. Figure 5e shows that C Δ T 3 + and C Δ T 3 did not cross their thresholds. Thus, the direct method failed to detect this mild anomaly. Figure 5f shows the anomaly being detected using the PCA method, as the temperature anomaly score C T + crossed the threshold around 33 min after the anomaly injection. C T + increased gradually, indicating a persistent anomaly. Figure 5g shows the corresponding anomalous cell being located with 90.9 % accuracy. Figure 6 shows the anomaly score increasing nonlinearly with the anomaly magnitude. Thus, larger anomalies were substantially easier to detect.

4.2.2. Air–Flow Anomaly Affecting Two Cells’ Temperatures

To evaluate the ability of the proposed algorithm in the case of simultaneous multiple anomalous signals, the PCA method was tested on synthetic anomalous data, with a mild air–flow anomaly ( ϑ = 0.2 ) initiated at t a = 8.33 h, leading to anomalous temperatures in cells 3 and 4 for 11.67 h, as shown in Figure 7. The nominal experimental data are the same as in Section 4.2.1. Even though cells 3 and 4 had temperature anomalies, all the cell temperatures were tightly packed, as shown in Figure 7a, and thus undetectable by the direct method. The deviations from the nominal behavior were small and are visualized by comparing the MBRs in the anomalous and nominal cases in Figure 7b. The PCA method detected the anomaly within 29 min (see Figure 7c) and traced the anomalous cell accurately as either cell 3 or 4 for more than 92.7 % of the time. It should be noted that the detection time, in this case, was 4 min lower than that reported in Section 4.2.1, where only cell 3 was anomalous. As evident by the detection times in both cases, the occurrence of multiple anomalous signals made them easier for the PCA method to detect because the cumulative change in the cell-to-cell relationship was amplified. Thus, the PCA method can detect anomalies in the case of multiple anomalous cells in the early stage and accurately trace the cell that has the most anomalous deviation. However, this approach failed to detect anomalies where all the cells within the cell group showed identical anomaly signatures, but this situation is highly unlikely in a real battery pack. For example, the voltage signals in the case of a module ESC tracked each other, but the temperature anomaly signatures were non-identical.

4.3. Experimental ESC testing

4.3.1. Testing on Single-Cell ESC

Cell balancing in the battery electric locomotive involves connecting a 100 Ω shunt resistor across the cells’ terminals. This is a mild ( ϑ = 0.47 ) ESC or micro-short circuit [14]. One single cell-balancing event and 13 module balancing events were used to evaluate the PCA method for ESC detection. Figure 8 shows the single-cell ESC fault initiated at 50 min, causing cell 10’s voltage (dashed line) to drop while the current was zero. The PCA method on the voltage data detected the ESC within 255 min. The algorithm did not detect anomalies in the temperature data. The anomalous cell 10 was accurately traced. Even though the example in Figure 8 shows the application of the PCA method to a zero-current operation, this method did not use the current signals and detected the ESC, even when the current was non-zero because the residual of the shorted cell voltage behaved differently compared to the nominal cell-to-cell relationship.

4.3.2. Statistical Testing on Module ESC

During module balancing, all 11 cells experienced ESCs. The pack current, cell voltage, and cell temperature are shown in Figure 9a–c, respectively. Figure 9d shows that the temperature PCA detected the anomaly within 16.3 min. The voltage PCA, however, did not detect the anomaly because all the cells were balancing, as discussed earlier. In this example, even though the temperature variations were unnoticeable, as seen in Figure 9c, due to the cell-to-cell variation in the thermal dynamics, multiple anomalous temperatures were present. Statistical testing on 13 different module balancing events showed that the temperature PCA detected the fault within 13.5 min, on average, with an F N R of 2.3 % . The voltage PCA, however, was ineffective with a 99 % F N R . This experimentally validates the ability of the PCA method to detect anomalies in the case of multiple anomalous signals.

4.4. Statistical Testing Using Synthetic Anomalous Data

To evaluate the FPR, 24 h of nominal experimental data from twenty-five cell groups of 11 cells each were processed using both methods. The direct method and PCA method showed low average FPRs of 1.9 % and 2.9 % , respectively. To explore the overall performance of the proposed anomaly detection algorithms, we tested the performance on families of synthetic anomalous data. Each anomaly was tested on all 25 cell groups, with magnitudes varying from 0.1 to 1 in steps of 0.1 . Figure 10 shows the DT, RT, FNR, and MAR results for the ISC. The PCA method detected ISC anomalies quicker (lower DT) and more accurately (lower FNR) than the direct method. The PCA method missed fewer anomalies overall than the direct method and detected all anomalies with ϑ 0.4 . As the anomaly spanned until the end, there was no RT in this example.
Table 2 summarizes the average DT, RT, FNR, and MAR for the five anomalies. The PCA method performed much better than the direct method in the DT, FNR, and MAR. Relative to the direct method, the PCA method improved the DT, MAR, and FNR by 56 % , 60 % , and 42 % , respectively. The air–flow anomalies were most accurately predicted, with the lowest FNR and MAR. Loose temperature and voltage sense lead anomalies were quickly detected (low DT), as the residuals contained sudden changes. In this study, the RT is relevant only for loose voltage and temperature sense leads because the other anomalies will never recover on their own in a practical situation. Generally speaking, the RT of the PCA method was longer than that of the direct method due to the history effect in the CUSUM control chart. Figure 11a shows the MAR for the PCA method versus the anomaly magnitude. Voltage anomalies with deviations greater than 4 mV and temperature anomalies with deviations greater than 0.15 °C were detected with a zero MAR. Both methods showed similar detection rates of Δ V 26 mV and Δ T 2.4 °C, respectively. Figure 11b shows the TTR versus ϑ for the PCA method. The tracing accuracy increased with the increasing anomaly magnitude. Anomalous cells were correctly traced with more than 95 % accuracy for voltage and temperature deviations greater than 7 mV and 0.3 °C, respectively.
The voltage and temperature data used in this work were collected with highly sensitive sensors with noise of less than 0.4 mV and 0.03 °C, respectively. Other sensing systems with lower sensitivities would not be capable of detecting 4 mV and 0.15 °C deviations because the thresholds would be increased to prevent false positives through the training of the CUSUM control chart on the noisy data. It is, therefore, expected that the sensitivity of voltage and temperature anomaly detection depends strongly on the quality of the sensors and data acquisition system.

4.5. Retraining after Balancing Events

Balancing events occur periodically to equalize the SoC of cells in the string. Balancing is an ESC event that changes the cell-to-cell voltage relationship. The first two principal components (PCs) of the voltage and temperature before, during, and after the balancing events, are compared in Figure 12 and Figure 13, respectively. Both the voltage and temperature PCs before and during balancing did not match. Figure 12 also shows that the cell-to-cell voltage relationships before and after balancing are different. Thus, the voltage PCA needs to be retrained after balancing to adapt to the new nominal characteristics and avoid false positives. However, Figure 13 shows that the temperature PCs were similar before and after balancing. This was expected because balancing only changes the relative SoC of the cells, not the electrothermal cell characteristics. Thus, the temperature PCA does not need retraining after balancing events.

5. Conclusions

This paper shows that mean-based voltage and temperature residuals for a group of similar cells can effectively detect electrical and thermal anomalies in battery systems. Mean-based residuals convert real-time voltage and temperature measurements to stationary data. These residuals are filtered and CUMSUM thresholded to detect anomalies in the direct method. Thus, the direct method detects anomalies where the temperature and/or voltage data deviate significantly from the mean. In the PCA method, PCA is used to reconstruct the normalized residuals, which are then scalarized using the RMSE as an additional step, giving the added ability to detect anomalies where the temperature and/or voltage data deviate from cell to cell. Both methods require nominal training data to establish the normalization constants, thresholds, and left singular matrix (for the PCA method). Both methods detect and trace synthetic internal short circuits, air–flow constrictions, and loose and broken sensor connections. False-positive rates are low (<3%) and can be reduced via increased thresholds but with an increase in missed detections. Overall, the PCA method outperforms the direct method by 40–60% and is able to detect all anomalies with voltage and temperature deviations greater than 4 mV and 0.15 °C, respectively. Experimental ESC anomalies associated with balancing are detected within 14 min, relying on the temperature residuals for module-level events. Voltage PCA retraining is required after cell-balancing events.

Author Contributions

Conceptualization, K.B., A.K., J.B., J.P., N.B. and C.D.R.; methodology, K.B. and C.D.R.; software, K.B.; validation, K.B.; formal analysis, K.B.; investigation, J.P. and N.B.; resources, C.D.R.; data curation, J.P. and N.B.; writing—original draft preparation, K.B.; writing—review and editing, C.D.R.; visualization, K.B.; supervision, A.K., J.B., J.P., N.B. and C.D.R.; project administration, A.K., J.B., J.P., N.B. and C.D.R.; funding acquisition, J.B. and C.D.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Restrictions apply to the availability of these data. Data was obtained from Wabtec Corporation.

Acknowledgments

This work was supported and funded by the Wabtec Corporation.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Feng, X.; Ouyang, M.; Liu, X.; Lu, L.; Xia, Y.; He, X. Thermal runaway mechanism of lithium ion battery for electric vehicles: A review. Energy Storage Mater. 2018, 10, 246–267. [Google Scholar] [CrossRef]
  2. Qian, K.; Li, Y.; He, Y.B.; Liu, D.; Zheng, Y.; Luo, D.; Li, B.; Kang, F. Abuse tolerance behavior of layered oxide-based Li-ion battery during overcharge and over-discharge. RSC Adv. 2016, 6, 76897–76904. [Google Scholar] [CrossRef]
  3. Marcicki, J.; Onori, S.; Rizzoni, G. Nonlinear Fault Detection and Isolation for a Lithium-Ion Battery Management System. Dyn. Syst. Control Conf. 2010, 44175, 607–614. [Google Scholar]
  4. Seo, M.; Goh, T.; Park, M.; Koo, G.; Kim, S. Detection of Internal Short Circuit in Lithium Ion Battery Using Model-Based Switching Model Method. Energies 2017, 10, 76. [Google Scholar] [CrossRef]
  5. Feng, X.; Weng, C.; Ouyang, M.; Sun, J. Online internal short circuit detection for a large format lithium ion battery. Appl. Energy 2016, 161, 168–180. [Google Scholar] [CrossRef] [Green Version]
  6. Dey, S.; Biron, Z.A.; Tatipamula, S.; Das, N.; Mohon, S.; Ayalew, B.; Pisu, P. Model-based real-time thermal fault diagnosis of Lithium-ion batteries. Control Eng. Pract. 2016, 56, 37–48. [Google Scholar] [CrossRef] [Green Version]
  7. Ouyang, M.; Zhang, M.; Feng, X.; Lu, L.; Li, J.; He, X.; Zheng, Y. Internal short circuit detection for battery pack using equivalent parameter and consistency method. J. Power Sources 2015, 294, 272–283. [Google Scholar] [CrossRef]
  8. Chen, Z.; Xiong, R.; Tian, J.; Shang, X.; Lu, J. Model-based fault diagnosis approach on external short circuit of lithium-ion battery used in electric vehicles. Appl. Energy 2016, 184, 365–374. [Google Scholar] [CrossRef]
  9. Liu, Z.; He, H. Sensor fault detection and isolation for a lithium-ion battery pack in electric vehicles using adaptive extended Kalman filter. Appl. Energy 2017, 185, 2033–2044. [Google Scholar] [CrossRef]
  10. Liu, Z.; Ahmed, Q.; Zhang, J.; Rizzoni, G.; He, H. Structural analysis based sensors fault detection and isolation of cylindrical lithium-ion batteries in automotive applications. Control Eng. Pract. 2016, 52, 46–58. [Google Scholar] [CrossRef] [Green Version]
  11. Chen, Z.; Zheng, C.; Lin, T.; Yang, Q. Multifault Diagnosis of Li-Ion Battery Pack Based on Hybrid System. IEEE Trans. Transp. Electrif. 2022, 8, 1769–1784. [Google Scholar] [CrossRef]
  12. Wang, Y.; Tian, J.; Chen, Z.; Liu, X. Model based insulation fault diagnosis for lithium-ion battery pack in electric vehicles. Meas. J. Int. Meas. Confed. 2019, 131, 443–451. [Google Scholar] [CrossRef]
  13. Lin, T.; Chen, Z.; Zheng, C.; Huang, D.; Zhou, S. Fault Diagnosis of Lithium-Ion Battery Pack Based on Hybrid System and Dual Extended Kalman Filter Algorithm. IEEE Trans. Transp. Electrif. 2021, 7, 26–36. [Google Scholar] [CrossRef]
  14. Gao, W.; Zheng, Y.; Ouyang, M.; Li, J.; Lai, X.; Hu, X. Micro-Short-Circuit Diagnosis for Series-Connected Lithium-Ion Battery Packs Using Mean-Difference Model. IEEE Trans. Ind. Electron. 2019, 66, 2132–2142. [Google Scholar] [CrossRef]
  15. Khalid, H.M.; Ahmed, Q.; Peng, J.C.H. Health Monitoring of Li-Ion Battery Systems: A Median Expectation Diagnosis Approach (MEDA). IEEE Trans. Transp. Electrif. 2015, 1, 94–105. [Google Scholar] [CrossRef]
  16. Cheng, Y.; D’Arpino, M.; Rizzoni, G. Optimal Sensor Placement for Multi-Fault Detection and Isolation in Lithium-Ion Battery Pack. IEEE Trans. Transp. Electrif. 2021, 8, 4687–4707. [Google Scholar] [CrossRef]
  17. Plett, G.L. Efficient battery pack state estimation using bar-delta filtering. In Proceedings of the EVS24 International Battery, Hybrid and Fuel Cell Electric Vehicle Symposium, Stavanger, Norway, 13–16 May 2009; pp. 1–8. [Google Scholar]
  18. Liu, Z.; He, H. Model-based Sensor Fault Diagnosis of a Lithium-ion Battery in Electric Vehicles. Energies 2015, 8, 6509–6527. [Google Scholar] [CrossRef] [Green Version]
  19. Xiong, R.; Yu, Q.; Shen, W.; Lin, C.; Sun, F. A Sensor Fault Diagnosis Method for a Lithium-Ion Battery Pack in Electric Vehicles. IEEE Trans. Power Electron. 2019, 34, 9709–9718. [Google Scholar] [CrossRef]
  20. Lin, T.; Chen, Z.; Zhou, S. Voltage-correlation based multi-fault diagnosis of lithium-ion battery packs considering inconsistency. J. Clean. Prod. 2022, 336, 130358. [Google Scholar] [CrossRef]
  21. Xia, B.; Shang, Y.; Nguyen, T.; Mi, C. A correlation based fault detection method for short circuits in battery packs. J. Power Sources 2017, 337, 1–10. [Google Scholar] [CrossRef] [Green Version]
  22. Li, X.; Wang, Z. A novel fault diagnosis method for lithium-Ion battery packs of electric vehicles. Measurement 2018, 116, 402–411. [Google Scholar] [CrossRef]
  23. Hu, X.; Zhang, K.; Liu, K.; Lin, X.; Dey, S.; Onori, S. Advanced Fault Diagnosis for Lithium-Ion Battery Systems: A Review of Fault Mechanisms, Fault Features, and Diagnosis Procedures. IEEE Ind. Electron. Mag. 2020, 14, 65–91. [Google Scholar] [CrossRef]
  24. Zheng, Y.; Han, X.; Lu, L.; Li, J.; Ouyang, M. Lithium ion battery pack power fade fault identification based on Shannon entropy in electric vehicles. J. Power Sources 2013, 223, 136–146. [Google Scholar] [CrossRef]
  25. Yao, L.; Wang, Z.; Ma, J. Fault detection of the connection of lithium-ion power batteries based on entropy for electric vehicles. J. Power Sources 2015, 293, 548–561. [Google Scholar] [CrossRef]
  26. Wang, Z.; Hong, J.; Liu, P.; Zhang, L. Voltage fault diagnosis and prognosis of battery systems based on entropy and Z-score for electric vehicles. Appl. Energy 2017, 196, 289–302. [Google Scholar] [CrossRef]
  27. Sun, Z.; Wang, Z.; Chen, Y.; Liu, P.; Wang, S.; Zhang, Z.; Dorrell, D.G. Modified Relative Entropy-Based Lithium-Ion Battery Pack Online Short-Circuit Detection for Electric Vehicle. IEEE Trans. Transp. Electrif. 2022, 8, 1710–1723. [Google Scholar] [CrossRef]
  28. Hong, J.; Wang, Z.; Liu, P. Big-Data-Based Thermal Runaway Prognosis of Battery Systems for Electric Vehicles. Energies 2017, 10, 919. [Google Scholar] [CrossRef] [Green Version]
  29. Samanta, A.; Chowdhuri, S.; Williamson, S.S. Machine Learning-Based Data-Driven Fault Detection/Diagnosis of Lithium-Ion Battery: A Critical Review. Electronics 2021, 10, 1309. [Google Scholar] [CrossRef]
  30. Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
  31. Zhao, Y.; Liu, P.; Wang, Z.; Zhang, L.; Hong, J. Fault and defect diagnosis of battery for electric vehicles based on big data analysis methods. Appl. Energy 2017, 207, 354–362. [Google Scholar] [CrossRef]
  32. Xue, Q.; Li, G.; Zhang, Y.; Shen, S.; Chen, Z.; Liu, Y. Fault diagnosis and abnormality detection of lithium-ion battery packs based on statistical distribution. J. Power Sources 2021, 482. [Google Scholar] [CrossRef]
  33. Yao, L.; Fang, Z.; Xiao, Y.; Hou, J.; Fu, Z. An Intelligent Fault Diagnosis Method for Lithium Battery Systems Based on Grid Search Support Vector Machine. Energy 2021, 214, 118866. [Google Scholar] [CrossRef]
  34. Naha, A.; Khandelwal, A.; Agarwal, S.; Tagade, P.; Hariharan, K.S.; Kaushik, A.; Yadu, A.; Kolake, S.M.; Han, S.; Oh, B. Internal short circuit detection in Li-ion batteries using supervised machine learning. Sci. Rep. 2020, 10, 1301. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Yang, R.; Xiong, R.; He, H.; Chen, Z. A fractional-order model-based battery external short circuit fault diagnosis approach for all-climate electric vehicles application. J. Clean. Prod. 2018, 187, 950–959. [Google Scholar] [CrossRef]
  36. Sánchez-Fernández, A.; Baldán, F.J.; Sainz-Palmero, G.I.; Benítez, J.M.; Fuente, M.J. Fault detection based on time series modeling and multivariate statistical process control. Chemom. Intell. Lab. Syst. 2018, 182, 57–69. [Google Scholar] [CrossRef]
  37. Nawaz, M.; Maulud, A.S.; Zabiri, H.; Taqvi, S.A.A.; Idris, A. Improved process monitoring using the CUSUM and EWMA-based multiscale PCA fault detection framework. Chin. J. Chem. Eng. 2021, 29, 253–265. [Google Scholar] [CrossRef]
  38. Wang, S.; Cui, J. Sensor-fault detection, diagnosis and estimation for centrifugal chiller systems using principal-component analysis method. Appl. Energy 2005, 82, 197–213. [Google Scholar] [CrossRef]
  39. Schmid, M.; Kneidinger, H.G.; Endisch, C. Data-Driven Fault Diagnosis in Battery Systems Through Cross-Cell Monitoring. IEEE Sens. J. 2021, 21, 1829–1837. [Google Scholar] [CrossRef]
  40. Schmid, M.; Kleiner, J.; Endisch, C. Early detection of Internal Short Circuits in series-connected battery packs based on nonlinear process monitoring. J. Energy Storage 2022, 48, 103732. [Google Scholar] [CrossRef]
  41. Bhaskar, K.; Kumar, A.; Bunce, J.; Pressman, J.; Burkell, N.; Rahn, C.D. Detecting synthetic anomalies using median-based residuals in lithium-ion cell groups. In Proceedings of the 2022 American Control Conference (ACC), Atlanta, GA, USA, 8–10 June 2022; pp. 5277–5281. [Google Scholar]
  42. Camacho, J.; Picó, J.; Ferrer, A. Data understanding with PCA: Structural and Variance Information plots. Chemom. Intell. Lab. Syst. 2010, 100, 48–56. [Google Scholar] [CrossRef]
  43. Garcia-Alvarez, D.; Fuente, M.J.; Sainz, G. Design of residuals in a model-based Fault Detection and Isolation system using Statistical Process Control techniques. In Proceedings of the ETFA2011, Toulouse, France, 5–9 September 2011; pp. 1–7. [Google Scholar]
  44. Rahn, C.; Wang, C. Battery Systems Engineering; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
  45. Plett, G. Battery Management Systems, Volume II: Equivalent-Circuit Methods; Artech: Morristown, NJ, USA, 2015. [Google Scholar]
  46. Xu, J.; Wang, H.; Shi, H.; Mei, X. Multi-scale short circuit resistance estimation method for series connected battery strings. Energy 2020, 202, 117647. [Google Scholar] [CrossRef]
  47. Lai, X.; Zheng, Y.; Zhou, L.; Gao, W. Electrical behavior of overdischarge-induced internal short circuit in lithium-ion cells. Electrochim. Acta 2018, 278, 245–254. [Google Scholar] [CrossRef]
  48. Feng, X.; He, X.; Lu, L.; Ouyang, M. Analysis on the Fault Features for Internal Short Circuit Detection Using an Electrochemical-Thermal Coupled Model. J. Electrochem. Soc. 2018, 165, A155–A167. [Google Scholar] [CrossRef]
Figure 1. Block diagram of proposed PCA-based anomaly detection algorithm (PCA Method).
Figure 1. Block diagram of proposed PCA-based anomaly detection algorithm (PCA Method).
Batteries 09 00070 g001
Figure 2. Flow chart for synthetic data generation for an air–flow anomaly.
Figure 2. Flow chart for synthetic data generation for an air–flow anomaly.
Batteries 09 00070 g002
Figure 3. Examples of nominal (dashed) and synthetic anomalous (solid) voltage (b,e) and temperature (c,f) data associated with (1) loose voltage sense lead, (2) ISC, (3) loose temperature sense lead, (4) voltage dropout, and (5) air–flow anomaly, with the input current (a,d).
Figure 3. Examples of nominal (dashed) and synthetic anomalous (solid) voltage (b,e) and temperature (c,f) data associated with (1) loose voltage sense lead, (2) ISC, (3) loose temperature sense lead, (4) voltage dropout, and (5) air–flow anomaly, with the input current (a,d).
Batteries 09 00070 g003
Figure 4. Variation in maximum deviation of temperature and voltage with anomaly magnitude; solid lines represent temperature deviations (left axis) and dashed/dotted lines represent voltage deviations: ◯ ISC, ▽ air flow, ☐ loose temperature sense lead, ◊ loose voltage sense lead, and △ voltage dropout.
Figure 4. Variation in maximum deviation of temperature and voltage with anomaly magnitude; solid lines represent temperature deviations (left axis) and dashed/dotted lines represent voltage deviations: ◯ ISC, ▽ air flow, ☐ loose temperature sense lead, ◊ loose voltage sense lead, and △ voltage dropout.
Batteries 09 00070 g004
Figure 5. Anomaly detection using PCA (PCA method) and baseline (direct method) approaches for air–flow anomaly ( ϑ = 0.2 ), with anomaly initiation (dotted) at 8.33 h: (a) Input current profile; (b) Voltage of 11 cells and mean voltage (dashed); (c) Temperature of 11 cells and mean temperature (dashed); (d) Nominal (dashed) and anomalous (solid) temperature MBR of cell 3; (e) Direct method temperature C + and C of cell 3 and threshold (dashed); (f) Temperature anomaly score from PCA method with threshold (dashed); and (g) PCA method tracing of anomalous cell.
Figure 5. Anomaly detection using PCA (PCA method) and baseline (direct method) approaches for air–flow anomaly ( ϑ = 0.2 ), with anomaly initiation (dotted) at 8.33 h: (a) Input current profile; (b) Voltage of 11 cells and mean voltage (dashed); (c) Temperature of 11 cells and mean temperature (dashed); (d) Nominal (dashed) and anomalous (solid) temperature MBR of cell 3; (e) Direct method temperature C + and C of cell 3 and threshold (dashed); (f) Temperature anomaly score from PCA method with threshold (dashed); and (g) PCA method tracing of anomalous cell.
Batteries 09 00070 g005
Figure 6. Variation in temperature anomaly score with anomaly magnitude for air–flow anomaly with anomaly initiation (dotted) at 8.33 h: Threshold (dashed), ϑ = 0.2 (solid), ϑ = 0.4 (dash-dotted), ϑ = 0.6 (dotted), ϑ = 0.8 thick (solid), ϑ = 1 (thick dash-dotted).
Figure 6. Variation in temperature anomaly score with anomaly magnitude for air–flow anomaly with anomaly initiation (dotted) at 8.33 h: Threshold (dashed), ϑ = 0.2 (solid), ϑ = 0.4 (dash-dotted), ϑ = 0.6 (dotted), ϑ = 0.8 thick (solid), ϑ = 1 (thick dash-dotted).
Batteries 09 00070 g006
Figure 7. Detection of air–flow anomaly ( ϑ = 0.2 ), leading to anomalous temperatures in cells 3 and 4, with anomaly initiation (dotted) at 8.33 hours, using PCA method: (a) Temperature of 11 cells and mean temperature (dashed); (b) Nominal (dashed) and anomalous (solid) temperature MBR of cells 3 and 4; (c) Temperature anomaly score from PCA method with threshold (dashed); and (d) PCA method tracing of anomalous cell.
Figure 7. Detection of air–flow anomaly ( ϑ = 0.2 ), leading to anomalous temperatures in cells 3 and 4, with anomaly initiation (dotted) at 8.33 hours, using PCA method: (a) Temperature of 11 cells and mean temperature (dashed); (b) Nominal (dashed) and anomalous (solid) temperature MBR of cells 3 and 4; (c) Temperature anomaly score from PCA method with threshold (dashed); and (d) PCA method tracing of anomalous cell.
Batteries 09 00070 g007
Figure 8. Detection using PCA method on experimental data with ESC in cell 10 initiated at 50 min (vertical, dotted): (a) Pack current; (b) Voltages of 11 cells (cell 10, dashed); (c) Temperatures of 11 cells (cell 10, dashed); (d) C V + with its threshold (dashed).
Figure 8. Detection using PCA method on experimental data with ESC in cell 10 initiated at 50 min (vertical, dotted): (a) Pack current; (b) Voltages of 11 cells (cell 10, dashed); (c) Temperatures of 11 cells (cell 10, dashed); (d) C V + with its threshold (dashed).
Batteries 09 00070 g008
Figure 9. Detection using PCA method on experimental data with ESC in all cells initiated at 50 min (vertical dotted): (a) Pack current; (b) Voltage of 11 cells; (c) Temperature of 11 cells; (d) C T + with its threshold (dashed).
Figure 9. Detection using PCA method on experimental data with ESC in all cells initiated at 50 min (vertical dotted): (a) Pack current; (b) Voltage of 11 cells; (c) Temperature of 11 cells; (d) C T + with its threshold (dashed).
Batteries 09 00070 g009
Figure 10. Variation in performance indices of direct method (☐) and PCA method (◊) with anomaly magnitude for ISC.
Figure 10. Variation in performance indices of direct method (☐) and PCA method (◊) with anomaly magnitude for ISC.
Batteries 09 00070 g010
Figure 11. PCA method: (a) missed anomaly rate versus anomaly magnitude; (b) true tracing rate versus anomaly magnitude: ◯ ISC, ▽ air flow, ☐ loose temperature sense lead, ◊ loose voltage sense lead, and △ voltage dropout.
Figure 11. PCA method: (a) missed anomaly rate versus anomaly magnitude; (b) true tracing rate versus anomaly magnitude: ◯ ISC, ▽ air flow, ☐ loose temperature sense lead, ◊ loose voltage sense lead, and △ voltage dropout.
Batteries 09 00070 g011
Figure 12. First two voltage principal components before balancing △ (solid), during balancing ◯ (dotted), and after balancing ◊ (dashed).
Figure 12. First two voltage principal components before balancing △ (solid), during balancing ◯ (dotted), and after balancing ◊ (dashed).
Batteries 09 00070 g012
Figure 13. First two temperature principal components before balancing △ (solid), during balancing ◯ (dotted), and after balancing ◊ (dashed).
Figure 13. First two temperature principal components before balancing △ (solid), during balancing ◯ (dotted), and after balancing ◊ (dashed).
Batteries 09 00070 g013
Table 1. Outputs from training process in PCA method.
Table 1. Outputs from training process in PCA method.
Voltage PCATemperature PCA
p V 7 p T 4
σ V r [V]0.0018 σ T r [°C]0.3205
μ c , V 0.0423    μ c , T 0.0353
σ c , V 0.0366 σ c , T 0.0082
V t h r e s h o l d 0.1830 T t h r e s h o l d 0.0410
Table 2. Average performance indices from statistical testing.
Table 2. Average performance indices from statistical testing.
Anomaly TypeISCAir FlowLoose Temperature Sense LeadLoose Voltage Sense LeadVoltage Dropout
MethodDirectPCADirectPCADirectPCADirectPCADirectPCA
DT [min]28010231246 0.75 0.72 166320252
FNR [%]4728262461636284942
MAR [%]338190461432263517
RT [min]----47257424552--
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bhaskar, K.; Kumar, A.; Bunce, J.; Pressman, J.; Burkell, N.; Rahn, C.D. Data-Driven Thermal Anomaly Detection in Large Battery Packs. Batteries 2023, 9, 70. https://doi.org/10.3390/batteries9020070

AMA Style

Bhaskar K, Kumar A, Bunce J, Pressman J, Burkell N, Rahn CD. Data-Driven Thermal Anomaly Detection in Large Battery Packs. Batteries. 2023; 9(2):70. https://doi.org/10.3390/batteries9020070

Chicago/Turabian Style

Bhaskar, Kiran, Ajith Kumar, James Bunce, Jacob Pressman, Neil Burkell, and Christopher D. Rahn. 2023. "Data-Driven Thermal Anomaly Detection in Large Battery Packs" Batteries 9, no. 2: 70. https://doi.org/10.3390/batteries9020070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop