Feature Extraction, Ageing Modelling and Information Analysis of a Large-Scale Battery Ageing Experiment

de Oliveira, Jose Genario; Dhingra, Vipul; Hametner, Christoph

doi:10.3390/en14175295

Open AccessArticle

Feature Extraction, Ageing Modelling and Information Analysis of a Large-Scale Battery Ageing Experiment

by

Jose Genario de Oliveira, Jr.

^1,*

,

Vipul Dhingra

² and

Christoph Hametner

¹

Christian Doppler Laboratory for Innovative Control and Monitoring of Automotive Powertrain Systems, TU Wien, 1010 Vienna, Austria

²

AVL List GmbH, 8010 Graz, Austria

^*

Author to whom correspondence should be addressed.

Energies 2021, 14(17), 5295; https://doi.org/10.3390/en14175295

Submission received: 23 July 2021 / Revised: 13 August 2021 / Accepted: 23 August 2021 / Published: 26 August 2021

(This article belongs to the Section D2: Electrochem: Batteries, Fuel Cells, Capacitors)

Download

Browse Figures

Versions Notes

Abstract

:

Large scale testing of newly developed Li-ion cells is associated with high costs for the interested parties, and ideally, testing time should be kept to a minimum. In this work, an ageing model was developed and trained with real data from a large-scale testing experiment in order to answer how much testing time and data would have been really needed to achieve similar model generalisation performance on previously unseen data. A linear regression model was used, and the feature engineering, extraction and selection steps are shown herein, alongside accurate prediction results for the majority of the accelerated ageing experiments. Information analysis was performed to achieve the desired data reduction, obtaining similar model properties with a fifth of the number of cells and half of the testing time. The proposed ageing model uses features commonly found in the literature, and the structure is simple enough for the training to be performed online in an EV. It has good generalisation capabilities. Lastly, the data reduction approach used here is model-independent, allowing a similar methodology to be used with different modelling assumptions.

Keywords:

battery ageing; battery modelling; capacity fade estimation; feature engineering

Graphical Abstract

1. Introduction

The usage of appliances that rely on Li-ion storage technology to function has skyrocketed in recent years, with examples ranging from mobile phones to large-scale energy storage and electric/hybrid vehicles. Due to that, there is an ever-growing necessity to study the causes and consequences of degradation processes associated with Li-ion batteries. From a safety standpoint, battery cells can degrade in ways that lead to, for example, thermal runaways that can pose a serious risk to users, and it is paramount that the ageing processes that drive the cells into these states are well understood. Additionally, considering that currently it is estimated that the battery represents a significant portion of the total costs in an electric vehicle [1], models that correctly predict the capacity fade and are used to develop operating strategies that minimise cell degradation are useful in order to reduce total ownership costs of electric vehicles (EV) and hybrid electric vehicles (HEV), as shown in [2].

There are two important research areas associated with battery state of health (SoH). The first is the SoH estimation, which tries to estimate the current battery degradation during normal vehicle operation. Examples of such approaches can be seen in [3,4,5]. The other area is the SoH prediction, which predicts how the battery will degrade over time, depending on how it was used. There are several use cases when it comes to ageing prediction itself. The most common goal is to predict the remaining useful life (RUL) of the battery at a given state and presumed load. In this work, the main target was to predict the capacity fading trajectory, yielding additional insights on what happens between the current battery state and the end of life. For that purpose, an ageing prediction model that maps the future inputs to some predicted capacity and/or internal resistance change was required. In general, these models can have different structures and focuses, being divided usually into model-based, data-driven and hybrid approaches. An excellent overview on the topic is given in [6]. The main advantage of the mechanistically focused approaches is that they are able to explain how the ageing process occurs in operating regimes not necessarily covered in the training set or for other battery chemistries, given enough knowledge on the physical parameters and underlying ageing mechanisms. However, often the electrochemical side-reactions are complex; they are usually based on the solutions of partial differential equations; they are usually hard to parametrise correctly and time-consuming to simulate. A thorough analysis on several ageing phenomena that normally occur in a Li-ion battery cell was done in [7], showing that the intensity of each ageing process differs according to chemistry, and often these different degradation effects are modelled separately, with the integration of those mechanisms also posing a significant challenge. An example of an approach where the degradation effects are modelled individually can be seen in [8]. A first principles model was developed, and the authors investigated the loss of active material due to the formation of a film over the surface of the negative electrode when the cell was in charge mode. Reference [9] simplifies a pseudo-2D electrochemical degradation model in order to improve computational speed while conserving accuracy similarly to a first principles model, with the final model lying somewhere in between the two approaches. However, the parametrisation of such a model is done with measurements of the cells kept at constant storage and cycling conditions. On the other hand, data-driven models are normally restricted to specific cell chemistry and use curve-fitting tools to analyse the influences of effects such as time, number of cycles, temperature and voltage on ageing. They are also usually divided into cycling and calendar ageing. An example of such an approach can be seen in [10], which uses an Arrhenius type function to model the capacity fade over the ampere-hour

A h

throughput. In [11], a capacity fade model due to cycling and calendar ageing, restricted to constant experimental conditions, is investigated. A common limitation of some of the approaches mentioned in the literature is that extending these models to a more general framework that considers changes in ageing factors such as average C-rate or temperature, is often not straightforward. One example of a more general approach can be seen in [12]. The authors used a Gaussian process model to predict the capacity fade with an input feature extraction procedure directly from the input signal. Additionally, with new technologies associated with Li-ion cells being developed constantly, often there is an ever-pressing need of being able to test and develop models in a quick fashion, not just for these new cells, but for their ageing processes as well. One vital step of this testing procedure is the ability to make an informed decision, often based on past experience, of how much testing is required to incorporate these new cells into existing methodologies and perform these tests accordingly. Specifically concerning ageing models, these tests are quite time-consuming, even when accelerated ageing tests are conducted, and it is a challenge to determine how much and for how long to test.

The main contributions of this work are in helping to answer to the questions: how much and for how long should we test new Li-ion cells in order to obtain an ageing prediction model that is acceptable? This was done in a qualitative fashion, using the developed model, showing the trade-off between validation performance and quantity of data used. This analysis was done based on information theory; thus, it can be extended to other models as well. The other main contribution was the ageing modelling approach itself. It aims to build an incremental capacity model that extracts specific features from arbitrary known future loads in the form of current I, voltage V and temperature T signals, and then maps those to a capacity change

Δ Q

associated with this interval. This is done in such a fashion that the model training could be performed online and with limited hardware, such as in a battery management system (BMS), as shown in Figure 1. In order to parametrise the ageing model, a dataset of roughly two hundred LiNiMnCoO₂ (NMC) cells was used. They were submitted to a wide array of different accelerated ageing profiles.

This paper is divided in the following sections: In Section 2, the dataset used will be presented alongside the available measurements from one of the cells. Section 3 explains how the features used in the models were extracted and selected from the measurements, also displaying the model structure considered for the ageing model. In Section 4 presents the validation results of the ageing model compared against the dataset, Section 5 contains an Information-based analysis using the model developed in previous sections and Section 7 is the conclusion.

2. Battery Ageing Data

2.1. Dataset Description and Testing Equipment

A dataset consisting of about two hundred NMC cells (18,650) was used in this work. The nominal capacity of the cells was 330 mAh, and they were aged under diverse operating conditions, for a period of approximately two years. The experiments were conducted on two different test sites. The first used a battery tester from ARBIN, containing six channels with a current range of ±300 A (5 V). The other test site used a DIGATRON lithium cell tester with access to four circuits that were able to source ±400 A from 0–6 V. For each test in this dataset, the available measurements are current I, temperature T and voltage V. More information on the cells can be seen in Table 1.

2.2. Testing Overview

The design of experiments for the tests fall within two categories: the calendar and the accelerated ageing experiments. For the calendar ageing experiments, the two main impact factors defined previously were the storage SoC, which varied from 5 to 95% and temperature, ranging from −10 to 60 °C. For all tests, reference cycles were carried out at different temperature levels and were repeated from time to time in order to extract different cell parameters over time. A reference cycle profile is shown in Figure 2 at 25 °C. In order to cause the accelerated ageing behaviour, load cycles were repeated exhaustively. The main impact factors for the accelerated ageing tests are shown in Table 2. They were temperature(T), constant charging current (CC), peak discharging current (PDC), average SoC (SOC) and delta DoD (dDoD). The load cycles were generated by charging the cell with a predefined constant current and discharging it in a way that the remaining impact factors were met, which was then repeated for a predefined time.

Examples of these load cycles are depicted in Figure 3 and Figure 4. Additionally, there were variable resting times between load and reference cycles that differed from cell to cell and from test to test.

2.3. Capacity Estimation and Initial Analysis

The capacity was estimated from Coulomb counting between the end of a CC–CV charge (constant current, constant voltage at

V_{m a x}

) and the end of a CC–CV discharge at

V_{m i n}

. This can be seen in the beginning of the voltage plot in Figure 2. The voltages

V_{m i n}

and

V_{m a x}

for the CV parts are usually defined by the manufacturer; here they were 3.354 and 4.085 V, corresponding to 15% and 95% SoC at 25 °C. Since there is a dependency of the battery terminal voltage on the temperature, the discharge capacity estimated in this fashion is also temperature-dependent. The reference temperature for the capacity estimation was assumed to be 25 °C. Thus, only the capacity values estimated at this temperature were used for the ageing analysis. This was important in order to decouple the effects of the temperature when estimating the discharge capacity after ageing. Table 3 shows the average ageing conditions for some selected cells in the dataset; e.g., one can have an idea of the temperature influences from cells 1, 4 and 33 when comparing the capacity losses of these cells. A similar analysis can be done for the other average features shown here, such as the total Ah influence when looking at cells 5 and 17.

3. Model Based Analysis

One of the goals of this work was to be able to develop a model that predicts the capacity loss based on measurement data from a cell. In order to do that, one of the key questions that needs to be answered is, what causes a cell to age? Another is, how can we extract such information from the available measurements from the dataset? As mentioned earlier, extensive work has been done in this area from an electrochemical point of view. The main challenge associated with this approach is how to observe such effects without having to conduct a thorough inspection of the battery. Ideally, this information should be seen from the current, voltage and temperature data, which are the measurements that would be usually available online in a battery management system (BMS). Hence, a data-driven approach backed up by expert knowledge was pursued. The following equation is used to depict the incremental capacity loss associated with one interval:

\begin{matrix} Δ Q_{i} = f (x_{i} | θ) \end{matrix}

(1)

where

θ

is the model parameter vector and

x_{i}

is the feature vector extracted from the

I_{i}

,

V_{i}

and

T_{i}

signals. The subscript i denotes that the measurements were taken at the time interval i, between an arbitrary initial time instant

t_{i n i}

and the final time

t_{e n d}

. These features are mostly what we expect to be relevant, based on different Li-ion cell ageing processes. It is also relevant to point out the difference between the incremental capacity loss

Δ Q_{i}

and the capacity itself

Q_{i}

, which are related as

\begin{matrix} Q_{i} = Q_{i - 1} - Δ Q_{i} . \end{matrix}

(2)

From the measurements, the main features that drive the ageing of the cells are both the time interval

Δ t

, defined as

\begin{matrix} Δ t = t_{e n d} - t_{i n i}, \end{matrix}

(3)

with

Δ t

being the duration of the cycle, and the absolute amount of charge transferred to and from the cell, in

A h

, resulting in

\begin{matrix} Δ A h = \frac{1}{3600} \int_{t_{i n i}}^{t_{e n d}} | I (t) | d t . \end{matrix}

(4)

Another feature, which is more complex, arises from the reconstruction of the state of charge (SoC) from the current signal via standard coulomb counting, such that

\begin{matrix} S o C (t) = \frac{1}{C_{n o m}} \int_{t_{i n i}}^{t} I (t) d t + S o C (t_{i n i}), \end{matrix}

(5)

where

C_{n o m}

is the cell capacity at the beginning of the load profile,

t_{i n i}

is the initial time,

t_{e n d}

is the final time, I is the current signal and

S o C (t_{i n i})

is the initial value of the

S o C

in the considered interval. While it would be possible to use an observer to reconstruct the

S o C

, as presented in [13,14], the results obtained using standard Coulomb counting were considered sufficient for the scope of this work. The reconstructed

S o C

obtained from this procedure was then analysed with a rainflow counting algorithm, as suggested in [15]. This is depicted in Figure 5, in order to break down the

S o C

profile into N elementary cycles, each with a different amplitude

Δ D o D_{i}

. The

D o D

, or depth of discharge, is defined here as

D o D = 1 - S o C

, and this nomenclature will be used in order to maintain conformity with other works. This allows for the extraction of more features, such as

\begin{matrix} Δ D o D = \frac{1}{N} \sum_{i = 1}^{N} Δ D o D_{i}, \end{matrix}

(6)

namely, the average

Δ D o D

and the frequency of cycles associated with this procedure:

\begin{matrix} ω_{Δ D o D} = \frac{N}{t_{e n d} - t_{i n i}} . \end{matrix}

(7)

The rainflow analysis was also done for the current signal I, resulting in additional features

Δ I

and

ω_{Δ I}

defined analogously as

Δ D o D

and

ω_{Δ D o D}

. Additional features proportional to the power dissipation in the cell were also considered, resulting in

\begin{matrix} I^{2} = \frac{1}{t_{e n d} - t_{i n i}} \int_{t_{i n i}}^{t_{e n d}} I {(t)}^{2} d t \end{matrix}

(8)

and

\begin{matrix} \sum I^{2} = (t_{e n d} - t_{i n i}) I^{2} . \end{matrix}

(9)

The complete list of features that were considered for the model can be seen in Table 4. For this work, the main drive behind this feature extraction procedure was that it we wanted to have an underlying physical motivation behind the inclusion of a given feature. This resulted in the elimination of features commonly used in standard time-series analysis that would have been hard to interpret and/or that are arbitrary, such as the second coefficient of a Fourier decomposition of the voltage signal with a given sampling time. The latter is significantly harder to interpret than, e.g., the average temperature in a given cycle. Note that at this step there was no consideration of whether a feature is important for capacity fade prediction or not; this will be investigated in more detail in the next section.

4. Model Structure Selection

With the list of features considered for the ageing model presented in the last section, the question about how to combine and select such features remains. These topics are discussed in the next subsections, both regarding model structure and which features to use.

4.1. Structure Considerations

In order to keep the model having some degree of interpretability, certain considerations were kept in mind concerning the structure of the model:

Given an arbitrary $Δ Q$ with elapsed time $Δ t$ and charge throughput $Δ A h$ , if we split the interval into parts such that:

$\begin{matrix} Δ t & = Δ t_{1} + Δ t_{2}, \end{matrix}$

(10)

$\begin{matrix} Δ A h & = Δ A h_{1} + Δ A h_{2} \end{matrix}$

(11)

then

$\begin{matrix} Δ Q = Δ Q_{1} + Δ Q_{2} \end{matrix}$

(12)

must hold for any positive $Δ t_{1}$ , $Δ t_{2}$ , $Δ A h_{1}$ , $Δ A h_{2}$ .
If $Δ t = 0$ and $Δ A h = 0$ are zero then $Δ Q = 0$ . Note that it is not possible that $Δ A h \neq 0$ and $Δ t = 0$ .
In [7], the ageing effects were described together with accelerating factors. One possible way to take that into consideration is that these accelerating factors should be linked to a main ageing feature in a regression structure.
Ideally, the model’s structure should be kept simple in order to, if necessary, perform the parametrisation in an online fashion, with limited hardware, such as in a BMS.

The intuitive reason behind the first requirement is that given any arbitrary load, the sum of the capacity loss due each partial load obtained by splitting the original load into smaller pieces should be the same as the overall capacity loss. The second requirement constrains the model in a way that if no time has passed, there should be no capacity loss, which is also intuitive. However, these requirements restrict the possible model structures for the ageing data. On Table 4, it is easy to verify that there are some features that are intrinsically associated with the rate of decay of the capacity, such as T, V,

S o C

,

Δ D o D

,

Δ I

,

ω_{Δ D o D}

,

ω_{Δ I}

,

I_{c h}

,

I_{d i s}

,

\bar{I^{2}}

and N. These features are modelled as accelerating factors for the elapsed time

Δ t

and charge throughput

Δ A h

, as done in a similar fashion in [15]. An initial generic structure of a model based on the previous discussion is

\begin{matrix} Δ Q_{i} = f_{1} (t_{i n i}, Δ t) g_{1} (v_{i}) + f_{2} (A h_{i n i}, Δ A h) g_{2} (v_{i}), \end{matrix}

(13)

with

v_{i}

being the feature vector associated with the rates, as described previously for a given interval i,

f_{1}

and

f_{2}

; and

g_{1}

and

g_{2}

being functions associated with their arguments. Note that by doing this, the model is split into one part associated with calendric ageing and another associated with cycling. It is also interesting to point out that some features shown in Table 4 are linked to calendric ageing and/or cycling. The temperature is often considered as an Arrhenius-type accelerating factor that is independent from the rest. This consideration will be somewhat relaxed by considering a second-order linear relationship to model the temperature effect. By taking into account these structural considerations, the accelerating factor functions

g_{1}

and

g_{2}

were chosen as

\begin{matrix} g_{1} = c o n v ([1 T T^{2}], [1 S o C V]) θ_{g_{1}}, \end{matrix}

(14)

\begin{matrix} g_{2} = c o n v ([1 T T^{2}], [1 S o C V Δ D o D Δ I ω_{Δ D o D} \dots \\ \dots ω_{Δ I} I_{c h} I_{d i s} \bar{I^{2}} \sum I^{2} N]) θ_{g_{2}}, \end{matrix}

(15)

where

c o n v (u, v)

denotes the convolution between two vectors u and v, modelling the interactions between an accelerated factor function of temperature and the other features. The parameter vectors associated, respectively, with

g_{1}

and

g_{2}

are

θ_{g_{1}}

and

θ_{g_{2}}

. The functions

f_{1} (t_{i n i}, Δ t)

and

f_{2} (A h_{i n i}, Δ A h)

are taken as

\begin{matrix} f_{1} = {(t_{i n i} + Δ t)}^{p} - t_{i n i}^{p}, \end{matrix}

(16)

\begin{matrix} f_{2} = {(A h_{i n i} + Δ A h)}^{q} - A h_{i n i}^{q} . \end{matrix}

(17)

The underlying reason for this choice was to have the functions

f_{1}

and

f_{2}

model behaviours similar to mixed kinetic-diffusion processes, as seen in [15]. This assumption is flexible enough to represent both linear and accelerated behaviour during the cell beginning of life, which in general can be modelled by such processes. The parameters p and q are regressed in conjunction with the feature selection procedure that will be presented next.

4.2. Feature Selection

When selecting the features relevant for the model, all possible subsets of features were enumerated. For each combination, the parameters p and q were found by minimising the root mean squared error (RMSE) for the capacity loss over the overall training set. The approach selected here to ensure that the model generalisation performance would be good was to select the subset of features that minimised the k-fold cross-validation error defined in [16] and given by

\begin{matrix} C V (\hat{f}) = \frac{1}{N} \sum_{i = 1}^{N} L (y_{i}, {\hat{f}}^{- k (i)} (x_{i})), \end{matrix}

(18)

with the k-folds being selected randomly,

{\hat{f}}^{- k (i)}

being the fitted function with the k-th part removed and K being the number of folds. Additionally, the RMSE and NRMSE on Q are defined as

\begin{matrix} R M S E_{Q} = \frac{1}{N} \sum_{i = 1}^{N} {(Q_{i} - {\hat{Q}}_{i})}^{2}, \end{matrix}

(19)

\begin{matrix} N R M S E_{Q} = \frac{1}{Q_{n o m}} R M S E_{Q}, \end{matrix}

(20)

with N being the number of data points,

Q_{i}

the i-th measurement of Q,

{\hat{Q}}_{i}

the i-th prediction of Q and

Q_{n o m}

the nominal value of Q across the dataset, 330 mAh. The metrics

R M S E_{Δ Q}

and

N R M S E_{Δ Q}

are defined in an analogous fashion as Equations (19) and (20), with the normalisation still being done with respect to the nominal capacity. In general, there are well-known feature selection approaches available in the literature, such as filter-based methods, as seen in [17], and wrapper-based methods, such as exemplified in [18], that would be able to solve the task at hand. However, given the specialised restricted structure of the model presented here, the extremely fast model training and the limited number of features, all possible combinations of features for

g_{1}

(with and without

S o C

and/or V for example) and

g_{2}

were tested, resulting in 8192 combinations. Additionally, for each combination, the optimal parameters p and q were found by solving a minimisation problem with the 10-fold cross-validation MSE as the objective function. The model which gave the best performance was found by eliminating the average SoC associated with the cycling part

g_{2}

of the model and is given by

\begin{matrix} g_{2} = c o n v ([1 T T^{2}], [1 V Δ D o D Δ I ω_{Δ D o D} \dots \\ \dots ω_{Δ I} I_{c h} I_{d i s} \bar{I^{2}} \sum I^{2} N]) θ_{g_{2}}, \end{matrix}

(21)

with the remaining equations Equations (14), (16) and (17) being unchanged. The values for

p, q

were also regressed, yielding

p = 0.5514

and

q = 0.5178

. Note that with p and q fixed, the full problem can be written in a standard linear regression form with respect to the parameters

\begin{matrix} \hat{Y} = X θ, \end{matrix}

(22)

with

\begin{matrix} X & = [\begin{matrix} f_{1} (t_{i n i 1}, Δ t_{1}) g_{1}^{*} (v_{1}) & f_{2} (A h_{i n i 1}, Δ A h_{1}) g_{2}^{*} (v_{1}) \\ ⋮ & ⋮ \\ f_{1} (t_{i n i N}, Δ t_{N}) g_{1}^{*} (v_{N}) & f_{2} (A h_{i n i N}, Δ A h_{N}) g_{2}^{*} (v_{N}) \end{matrix}], \end{matrix}

(23)

\begin{matrix} θ & = [\begin{matrix} θ_{g_{1}} \\ θ_{g_{2}} \end{matrix}], \end{matrix}

(24)

\begin{matrix} Y & = [\begin{matrix} Δ Q_{1} \\ ⋮ \\ Δ Q_{N} \end{matrix}], \end{matrix}

(25)

where

\begin{matrix} g_{1} & = g_{1}^{*} θ_{g_{1}}, \end{matrix}

(26)

\begin{matrix} g_{2} & = g_{2}^{*} θ_{g_{2}} \end{matrix}

(27)

and N denotes the number of datapoints.

5. Model Training and Validation

In order to validate the modelling approach, the dataset was split into two parts, one only for training and the other only for validation. This division was made in a way to keep the training and validation sets similar to each other, including in terms of total capacity loss. To assess the model quality, there are two key measures that are important when it comes to age prediction, the error regarding

Δ Q

, the incremental capacity loss between two consecutive points; and Q, the actual cell capacity. In other words, the model needs to perform well in both these metrics in order for the model quality to be considered good. It was found heuristically that due to the possibly high number of features, the best validation results simultaneously on

Δ Q

and Q were found when, instead of minimising the

ℓ_{2}

norm of the error, the regularised

ℓ_{1}

norm was minimised. The cost function is written as

\begin{matrix} J_{1} = {| | Y - X θ | |}_{1} + {λ | | θ | |}_{1}, \end{matrix}

(28)

where

λ

is an arbitrary regularisation parameter, chosen here in order to minimise the 10-fold cross-validation error in the training set. The problem of minimising the cost shown in the equation above can be rewritten as a linear program (LP), as seen in [19].

Figure 6 shows the model’s prediction in a solid line and the data as crosses for the capacities of four different cells in the validation set. Some important average ageing features for the cell are shown next to the four plots. The upper left plot depicts a scenario where the cell was aged with fairly mild accelerated ageing tests, and the effect of temperature is seen when comparing it to the upper right plot, where the cycling conditions were on average not as harsh as in the first, but the temperature increased significantly—the cell achieved the same capacity loss in roughly two thirds of the time and with less than half the number of equivalent cycles. The model was able to explain what happened in a satisfactory fashion. The

N R M S E_{Q}

values for both of these cells indicate that it is really good compared to the average cell of the dataset. Again, it is worth emphasising that the average values were used for the analysis and not for the ageing model prediction. This is noticeable from the lower left plot, which shows a slightly higher total capacity loss than the upper right plot, despite having almost double the average

Δ D o D

, triple the number of cycles and almost double the total testing time. The

N R M S E_{Q}

value for this cell is significantly higher than the one from the upper left plot, this is due to an offset. The model was still really good, which is indicated by the

N R M S E_{Δ Q}

metric, especially if this model would be combined with a state of health observer. The lower right plot shows the model’s prediction for rather severe average ageing conditions, with a high temperature, a high rate of cycling and a high

Δ D o D

. Almost the same total capacity loss occurred as in the other cells in a third of the testing time. If one just looks at the

N R M S E_{Q}

, it would seem that the model prediction errors of the lower cells are similar, but that is not true, simply because the

N R M S E_{Q}

does not give the full picture of the prediction quality.

Figure 7 shows the histogram of the performance on the validation set. There was a substantial quantity of cells with which the performance was really good, an intermediate segment with average results and some outlier cells, which brought the average of the

N R M S E_{Q}

up by a significant margin. Given that the tests performed on the cells varied substantially, this could indicate that there were some underlying ageing effects that were well-explained by the model, e.g., SEI growth, whereas others such as current collector degradation were simply not covered, and while not a significant phenomenon for most cells, it degrades the cells in which it is present rather quickly. Additionally, there have might been a correlation between model prediction and temperature simply because these more extreme degradation effects are usually associated with harsh operating conditions at very high or very low temperatures. The accelerating fade towards the end of life that is usually present on some Li-ion cells is not seen in the results simply because the vast majority of the cells did not experience it; thus, it was not significantly present in the dataset. The performance metrics for

Δ Q

and Q, for both training and validation sets, are shown in Table 5.

A small detail that might go unnoticed is that usually the only signal that is known a priori for a future load profile is the current. In order to correctly estimate the voltage signal, an accurate cell model is needed, often with parameters changing over time due to ageing. This might be problematic if we use features to predict ageing, which in turn requires processing of an unknown voltage signal. Table 6 show the results for training and validation performance without considering features associated with the voltage signal, i.e., omitting the average voltage V as a feature. It is visible from the same table that the performance of the model on validation and training decreased minimally.

A similar argument could be made with respect to the future temperature measurements. However, since the temperature is often controlled in a BMS, and the seasonal average temperatures are available, it is not unreasonable to assume that at least the average temperature of the cell should be known with acceptable precision.

6. Information Analysis

Usually the process to obtain the testing data required to parametrise such ageing models is quite time-consuming and expensive. It becomes paramount then to answer a key question:

How much data and testing time are needed in order to parametrize the model?

These are usually the goals of "Design of Experiments" approaches, and they are performed before the testing phase. However, since the dataset was available here beforehand, the question that will actually be answered here instead is:

How much of the dataset and how much testing time would have been needed to sufficiently parametrize the model?

The striking difference between these two questions is that for the first, the tests for each cell would have been designed in such a way as to maximise the information, and for the second, the testing was already done. Techniques based on the concept of Fisher information will be used next to analyse the data.

6.1. Fisher Information

The Fisher information matrix, defined in [20] as

\begin{matrix} I (θ) = E {- \frac{d^{2} ℓ (θ)}{d θ^{2}}}, \end{matrix}

(29)

where

ℓ (θ)

denotes the log-likelihood function,

θ

the parameter vector and

E {}

the expected value, allows us to quantify the amount of information contained within a dataset, being directly linked to the uncertainty in an estimated parameter vector

\hat{θ}

. In the special case where the model is linear in the parameters—i.e., when it can be written as

\begin{matrix} Y = X θ + ϵ, \end{matrix}

(30)

where

ϵ

is the measurement error following a normal distribution with variance

σ^{2}

—the Fisher information is simplified to

\begin{matrix} I = \frac{X^{T} X}{σ^{2}}, \end{matrix}

(31)

being independent from the parameter vector

θ

. In Design of Experiments, a specific design is said to be efficient when it maximises the information contained within the experiments. Since the Fisher information is a matrix, there are different associated matrix norms that can be maximised or minimised when doing such task, the so-called optimality criteria. Among those, e.g., A, D and E, the D-optimality criterion was chosen due to the results and good numerical properties, such as the lack of matrix inversion, which may not be true depending on the optimality criteria of choice. A design is said to be D-optimal when it maximises the determinant of

\begin{matrix} D = | X^{T} X | . \end{matrix}

(32)

This is can be interpreted as making the information matrix “big.” The matrix

X

for the ageing model analysis is derived in Equation (23); hence, it is only dependent on extracted features of the measurements, and as mentioned earlier, not on the parameters, being completely independent from the model training step.

6.2. Optimal Cell Selection

Based on the discussion above, a good heuristic for selecting the minimum amount of cells needed to parametrise the model is then, to find the optimal cell combination, i.e., the one that maximises the information, where k cells are selected for the model training, leaving the remaining cells outside of the training set.

The data were originally split into a validation and a training set, from which training subsets were selected. In each iteration, the cell that led to the least amount of information lost was removed from the training set, as shown in Figure 8. A comparison between the validation RMSE of the capacity when the training subsets were selected at random versus using D-optimality is depicted in Figure 9. The ageing model parameters were estimated based on the partitions of the training set with the k best cells. In this case, selecting the correct 20 cells in the dataset already gave 95% of the validation RMSE that would be achieved by using the complete training set. While the information ratio and validation performance ratio are not the same, it made it possible to obtain a good guess on how many different cells this dataset would need to have in order to obtain a similar performance with respect to ageing modelling. Another important point to be made is that these results are dependent on the model structure and features that were extracted from the data. When using another ageing modelling approach, the guideline on how to conduct the analysis is still valid; however, the results are not. This means that it might be possible that, with another ageing model, 60 cells instead of 20 would have been needed in order to achieve the same RMSE ratio between the partial and full dataset.

6.3. Impact of the Testing Time

The 40 best D-optimal cells were selected and the training data were split into intervals corresponding to the total amounts of time available for testing (2 to 20 months). Figure 10 shows the validation RMSE as more data were available, until the full testing time was used. Different cell selections were also investigated to verify that even with the full testing time and wrong cell selection, the validation RMSE did not converge on the lower bound, which was computed using information-based selection. As expected, there was a trade-off between the amount of testing available and the relative error, so the decision on how much testing time was required was done by defining a maximum acceptable relative error, which was 10% relative error for 10 months and 3% for 15 months in this case.

7. Conclusions

The methodology presented here starts with an analysis of the dataset, the so-called feature engineering steps, which often depends on expert knowledge by extracting the correct features, followed by a feature selection procedure and model validation. Then, the model was used in order to help with answering two key questions: How much testing is needed to parametrise such an ageing model? Which tests proved to be more important in doing so? Actually, the results provided an insight into which cells from the dataset were more important. The same situation happened with the load test types. The key difference is that for the latter, the dataset was already obtained, and the key assumption is that some of these results are extendable to some other cells types/chemistries, thereby providing a guideline or good starting point on where to look when designing new tests and defining how many cells should be tested.

Further research on this topic could be done by trying out a more sophisticated ageing modelling approach, since, while the modelling approach presented here works for the vast majority of the cells in the dataset, there are some ageing phenomena that are not explainable using the current methods. On the other hand, this would also imply an information matrix that is possibly dependent on the model parameters, posing an additional challenge. Finally, it is important to point out that the methodology presented here assumes, in the general case, that future signals of voltage, temperature and SoC are available, when in reality only the future current is exactly known a priori. This issue can be addressed by using electrical and thermal models for the cell in order to obtain such signals if they are really needed.

Author Contributions

Conceptualisation, V.D., C.H. and J.G.d.O.J.; methodology, J.G.d.O.J. and C.H.; software, J.G.d.O.J.; validation, J.G.d.O.J.; formal analysis, J.G.d.O.J. and C.H.; investigation, J.G.d.O.J., C.H. and V.D.; resources, V.D. and C.H.; data curation, J.G.d.O.J. and V.D.; writing—original draft preparation, J.G.d.O.J.; writing—review and editing, J.G.d.O.J. and C.H.; visualisation, J.G.d.O.J. and C.H.; supervision, C.H.; project administration, C.H.; funding acquisition, V.D. and C.H. All authors have read and agreed to the published version of the manuscript.

Funding

The financial support by the Austrian Federal Ministry for Digital and Economic Affairs; the National Foundation for Research, Technology and Development; and the Christian Doppler Research Association are gratefully acknowledged.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data was obtained from AVL List GmbH, Graz, Austria.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lutsey, N.; Nicholas, M. Update on electric vehicle costs in the United States through 2030. Int. Counc. Clean Transp. 2019, 9. [Google Scholar]
Hoke, A.; Brissette, A.; Smith, K.; Pratt, A.; Maksimovic, D. Accounting for Lithium-Ion Battery Degradation in Electric Vehicle Charging Optimization. IEEE J. Emerg. Sel. Top. Power Electron. 2014, 2, 691–700. [Google Scholar] [CrossRef]
Plett, G.L. Sigma-point Kalman filtering for battery management systems of LiPB-based HEV battery packs: Part 2: Simultaneous state and parameter estimation. J. Power Sources 2006, 161, 1369–1384. [Google Scholar] [CrossRef]
Hametner, C.; Jakubek, S.; Prochazka, W. Data-Driven Design of a Cascaded Observer for Battery State of Health Estimation. IEEE Trans. Ind. Appl. 2018, 54, 6258–6266. [Google Scholar] [CrossRef]
Chen, C.; Xiong, R.; Shen, W. A Lithium-Ion Battery-in-the-Loop Approach to Test and Validate Multiscale Dual H Infinity Filters for State-of-Charge and Capacity Estimation. IEEE Trans. Power Electron. 2018, 33, 332–342. [Google Scholar] [CrossRef]
Hu, X.; Xu, L.; Lin, X.; Pecht, M. Battery Lifetime Prognostics. Joule 2020, 4, 310–346. [Google Scholar] [CrossRef]
Vetter, J.; Novák, P.; Wagner, M.; Veit, C.; Moller, K.C.; Besenhard, J.; Winter, M.; Wohlfahrt-Mehrens, M.; Vogler, C.; Hammouche, A. Ageing mechanisms in lithium-ion batteries. J. Power Sources 2005, 147, 269–281. [Google Scholar] [CrossRef]
Ramadass, P.; Haran, B.; Gomadam, P.M.; White, R.; Popov, B.N. Development of First Principles Capacity Fade Model for Li-Ion Cells. J. Electrochem. Soc. 2004, 151, A196–A203. [Google Scholar] [CrossRef]
Jin, X.; Vora, A.; Hoshing, V.; Saha, T.; Shaver, G.; García, R.E.; Wasynczuk, O.; Varigonda, S. Physically-based reduced-order capacity loss model for graphite anodes in Li-ion battery cells. J. Power Sources 2017, 342, 750–761. [Google Scholar] [CrossRef] [Green Version]
Wang, J.; Liu, P.; Hicks-Garner, J.; Sherman, E.; Soukiazian, S.; Verbrugge, M.; Tataria, H.; Musser, J.; Finamore, P. Cycle-life model for graphite-LiFePO4 cells. J. Power Sources 2011, 196, 3942–3948. [Google Scholar] [CrossRef]
Schmalstieg, J.; Käbitz, S.; Ecker, M.; Sauer, D.U. A holistic aging model for Li(NiMnCo)O2 based 18650 lithium-ion batteries. J. Power Sources 2014, 257, 325–334. [Google Scholar] [CrossRef]
Richardson, R.R.; Osborne, M.A.; Howey, D.A. Battery health prediction under generalized conditions using a Gaussian process transition model. J. Energy Storage 2019, 23, 320–328. [Google Scholar] [CrossRef]
Wu, X.; Li, X.; Du, J. State of Charge Estimation of Lithium-Ion Batteries Over Wide Temperature Range Using Unscented Kalman Filter. IEEE Access 2018, 6, 41993–42003. [Google Scholar] [CrossRef]
Hametner, C.; Jakubek, S. State of charge estimation for Lithium Ion cells: Design of experiments, nonlinear identification and fuzzy observer design. J. Power Sources 2013, 238, 413–421. [Google Scholar] [CrossRef]
Santhanagopalan, S.; Smith, K.; Neubauer, J.; Kim, G.H.; Keyser, M.; Pesaran, A. Design and aNalysis of Large Lithium-Ion Battery Systems, 1st ed.; Artech House: Norwood, MA, USA, 2015; Volume 4. [Google Scholar]
Friedman, H.; Tibshirani. The Elements of Statistical Learning, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Ferreira, A.J.; Figueiredo, M.A. Efficient feature selection filters for high-dimensional data. Pattern Recognit. Lett. 2012, 33, 1794–1804. [Google Scholar] [CrossRef] [Green Version]
Liu, H.; Motoda, H. Feature Selection for Knowledge Discovery and Data Mining, 1st ed.; Springer: New York City, NY, USA, 1998. [Google Scholar]
Boyd, S.; Vandenberghe, L. Convex Optimization, 7th ed.; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
Davison, A.C. Statistical Models, 1st ed.; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]

Figure 1. A block diagram of the state estimation and ageing prediction structure.

Figure 2. Current, voltage and temperature profiles over time from a reference test cycle showing characterisation pulses at varying SoCs.

Figure 3. Current, voltage and temperature profiles of a section of a load cycle at high temperatures, displaying uneven charge and discharge behaviour.

Figure 4. A snapshot of current, voltage and temperature profiles on a section of a load cycle at low temperatures with similar charge and discharge pulses.

Figure 5. An example of feature extraction procedure (

Δ D o D

and

ω_{Δ D o D}

) from the

S o C

signal with a rainflow counting algorithm.

Figure 5. An example of feature extraction procedure (

Δ D o D

and

ω_{Δ D o D}

) from the

S o C

signal with a rainflow counting algorithm.

Figure 6. Predictions (solid) for cells under different average temperatures, delta DoD and numbers of equivalent cycles, with varying levels of accuracy.

Figure 7. Histogram of the validation performance, showing good validation for most cells and some unexplained ageing.

Figure 8. Normalised logarithm of D-optimality when optimal selection was done compared to random realisations.

Figure 9. RMSE of the validation set when D-optimal selection was done compared to random realisations.

Figure 10. Validation RMSE versus total testing time for the 40 best cells and different selection strategies.

Table 1. Additional information on the lithium-ion cells.

Cell Chemistry
Positive Electrode	NMC
Negative Electrode	Graphite
Nominal Capacity	330 mAh
Upper Voltage Limits
Constant Current	4.085 V
5 s Pulse	4.2 V
Safety Limit	4.5 V
Lower Voltage Limits
Constant Current	3.354 V
5 s Pulse	2.8 V
Safety Limit	2.5 V

Table 2. Impact factors for the accelerated ageing tests.

Variable	Minimum	Maximum
T	−20 °C	45 °C
CC	0.2 °C	2.4 °C
PDC	0.2 °C	10 °C
SOC	15%	80%
dDoD	2.5%	80%

Table 3. Multiple cells’ data. An overview of the average conditions of the ageing tests. The temperatures are in °C, the overall experimental durations are in days and the capacity fields are in mAh.

Cell Nr.	Temperature	Duration	Mean Ah	Total Ah	Initial Cap.	Cap. Loss
1	−9	540	1.33 $\times 10^{4}$	2.83 $\times 10^{6}$	315	33
4	8	546	1.27 $\times 10^{4}$	2.93 $\times 10^{6}$	314	40
5	8	310	9.83 $\times 10^{3}$	7.20 $\times 10^{6}$	329	115
17	8	395	1.47 $\times 10^{4}$	2.84 $\times 10^{6}$	321	38
33	41	546	1.20 $\times 10^{4}$	2.28 $\times 10^{6}$	319	69
46	−8	148	1.33 $\times 10^{4}$	3.62 $\times 10^{6}$	320	65
63	29	547	1.55 $\times 10^{4}$	4.06 $\times 10^{6}$	322	35
141	40	418	1.24 $\times 10^{4}$	3.50 $\times 10^{6}$	324	96
217	42	176	1.10 $\times 10^{4}$	9.62 $\times 10^{6}$	319	120

Table 4. Extracted features.

Feature	Description
$Δ t$	Time interval
$Δ A h$	Charge in a given interval
T	Average temperature
$t_{i n i}$	Absolute time at the beginning
$A h_{i n i}$	Absolute total charge at the beginning
$\bar{V}$	Average Voltage
$\bar{S o C}$	Average $S o C$
$Δ D o D$	Average cycle magnitude extracted from $S o C$
$Δ I$	Same as above, but extracted from Current I
$ω_{Δ D o D}$	Cycling frequency extracted from $S o C$
$ω_{Δ I}$	Same as above, but extracted from Current I
$I_{c h}$	Average charging current
$I_{d i s}$	Average discharging current
$\bar{I^{2}}$	Average squared current
$\sum I^{2}$	Sum of squared current
N	Number of cycles from $D o D$ analysis

Table 5. Training and validation performance.

Dataset	${NRMSE}_{Δ Q}$	${NRMSE}_{Q}$
Training Data	0.0087	0.0353
Validation Data	0.0084	0.0384

Table 6. Training and validation—no voltage signal.

Dataset	${NRMSE}_{Δ Q}$	${NRMSE}_{Q}$
Training Data	0.0088	0.0364
Validation Data	0.0086	0.0395

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

de Oliveira, J.G., Jr.; Dhingra, V.; Hametner, C. Feature Extraction, Ageing Modelling and Information Analysis of a Large-Scale Battery Ageing Experiment. Energies 2021, 14, 5295. https://doi.org/10.3390/en14175295

AMA Style

de Oliveira JG Jr., Dhingra V, Hametner C. Feature Extraction, Ageing Modelling and Information Analysis of a Large-Scale Battery Ageing Experiment. Energies. 2021; 14(17):5295. https://doi.org/10.3390/en14175295

Chicago/Turabian Style

de Oliveira, Jose Genario, Jr., Vipul Dhingra, and Christoph Hametner. 2021. "Feature Extraction, Ageing Modelling and Information Analysis of a Large-Scale Battery Ageing Experiment" Energies 14, no. 17: 5295. https://doi.org/10.3390/en14175295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Extraction, Ageing Modelling and Information Analysis of a Large-Scale Battery Ageing Experiment

Abstract

1. Introduction

2. Battery Ageing Data

2.1. Dataset Description and Testing Equipment

2.2. Testing Overview

2.3. Capacity Estimation and Initial Analysis

3. Model Based Analysis

4. Model Structure Selection

4.1. Structure Considerations

4.2. Feature Selection

5. Model Training and Validation

6. Information Analysis

6.1. Fisher Information

6.2. Optimal Cell Selection

6.3. Impact of the Testing Time

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI