Combined Use of Cointegration Analysis and Robust Outlier Statistics to Improve Damage Detection in Real-World Structures

Turrisi, Simone; Zappa, Emanuele; Cigada, Alfredo

doi:10.3390/s22062177

Open AccessArticle

Combined Use of Cointegration Analysis and Robust Outlier Statistics to Improve Damage Detection in Real-World Structures

by

Simone Turrisi

^*

,

Emanuele Zappa

and

Alfredo Cigada

Politecnico di Milano, Department of Mechanical Engineering, Via La Masa 1, 20156 Milano, Italy

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(6), 2177; https://doi.org/10.3390/s22062177

Submission received: 18 February 2022 / Revised: 4 March 2022 / Accepted: 9 March 2022 / Published: 10 March 2022

(This article belongs to the Topic Recent Advances in Structural Health Monitoring)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Due to the need for controlling many ageing and complex structures, structural health monitoring (SHM) has become increasingly common over the past few decades. However, one of the main limitations for the implementation of continuous monitoring systems in real-world structures is the effect that benign influences, such as environmental and operational variations (EOVs), have on damage sensitive features. These fluctuations may mask malign changes caused by structural damages, resulting in false structural condition assessment. When damage identification is implemented as novelty detection due to the lack of known damage states, outliers may be part of the data set as the result of the benign and malign factors mentioned above. Thanks to the developments in the field of robust outlier detection, the current paper presents a new data fusion method based on the use of cointegration and minimum covariance determinant estimator (MCD), which allows us to visualize and to classify outliers in SHM data, depending on their origin. To validate the effectiveness of this technique, the recent case study of the KW51 bridge has been considered, whose natural frequencies are subjected to variations due to both EOVs and a real structural change.

Keywords:

structural health monitoring; novelty detection; cointegration; robust outlier analysis; environmental and operational variations; sensor data fusion; natural frequencies; acceleration signals

1. Introduction

The main engineering disciplines associated with damage detection are undoubtedly structural health monitoring (SHM) and non-destructive evaluation (NDE). Even though the most recent NDE techniques have proven to be very effective in case of composite laminates or pipe inspection [1], SHM procedures offer many advantages when applied to the case of real-world civil structures. First, SHM is based on permanent sensors, while in NDE, the devices (i.e., eddy-current probe, X-ray generator, etc.) are usually brought to the desired point of the structure. Therefore, SHM can be conducted in real-time, or at least very frequently, in an automated fashion, while NDE is performed sporadically or on request. Second, the choice of SHM philosophy allows us to install a limited amount of inexpensive sensors to obtain reliable measurements; on the other hand, the local nature of the physical effects exploited in NDE requires us to install a dense layout of transducers to reach an adequate coverage area.

Dealing with data-based approaches to SHM, damage sensitive features are extracted from these measurements and machine learning methods are used to build a statistical data model [2]. In general, damage detection procedures exhibit better.

Performances within supervised learning scenarios, where class labels (i.e., damage levels and locations) are identified in advance. Unfortunately, for the majority of practical SHM applications it is not possible to obtain known damage state data, but only data from conditions assumed as “normal”. In this case, the SHM implementation relies on unsupervised learning methods, such as novelty detection, which only require data from the undamaged state (baseline) of the structure [3]. Then, a statistical model of the normal conditions is obtained based on the feature distributions and new data are tested for conformity to the model.

The idea of novelty detection using outlier analysis is not a new one and many SHM application examples can be found [4,5,6]. This well-established statistical method essentially seeks for the presence of outliers in the data, i.e., observations that appear inconsistent with the rest of the data set and therefore are believed to be generated by an alternate mechanism. A classic discordancy measure used in the previous studies to reveal the presence of outliers is the Mahalanobis squared distance (MSD) [7]. MSD indicates how far a generic point is from the cloud points center of mass, taking into account the shape of the cloud as well. However, the major limitation of such a distance is that it suffers from outliers “masking effect”. Given that MSD is characterized by sample mean and covariance, any outliers present in the baseline data set have a negative influence on the estimation of mean and covariance. This increases the occurrence of false negatives, since new observations associated with small distances could instead hide the presence of outliers. Within a SHM context, a potential issue is represented by the influence that environmental and operational variations (EOVs) have a structure features. This effect is particularly evident for civil infrastructures such as bridges, whose measured responses are constantly subjected to important variations induced by wind, humidity, traffic, human loading [8] and, above all, temperature [9,10,11,12]. Since changes due to these benign reasons typically manifest with the same order of magnitude of those due to structure degradations, they could appear as outliers in the data masking any potential indication of ongoing structural damage.

To solve the problem of inclusive outliers in the training data, robust statistical methods are considered in a wide range of researches, including SHM [13,14,15], to provide unbiased estimates of mean and covariance parameters computed from a smaller subset of data whose behaviour is assumed to be close to the true population values. Alternatively, data normalization techniques such as principal component analysis [9], autoencoders [16] or cointegration [17,18,19,20,21], are adopted to project the data into a different space to remove or at least reduce the effect of environmental and operational changes. Although previous methods proved their effectiveness in creating a normal condition training set clear from the influence of external factors, they are unable to distinguish which of the outliers indicated in the data are “benign” and which are “malign”. This distinction would represent a significant advancement for SHM practices, as it would make possible to identify external benign variations at an early stage, remove them from the analysis, so leaving the algorithm free to detect only real structural anomalies. Within SHM, the hierarchy of the damage identification process can be classified into four levels [22]: damage detection (level 1), damage localization (level 2), damage quantification (level 3) and damage prognosis (level 4). Different techniques are usually applied according to the desired SHM level. Recent advances in the SHM field offer methods, such as the Mode Shape Curvature or the Curvature Damage Factor [23], that can achieve up to level 4; however for civil structures it is usually accepted that the procedure is not optimized for such level of detail, due to the high complexity of the system which requires a trade-off between the level of the diagnostic system and the expense of training it adequately. Level 1 is also distinguished from the others, as it can be accomplished without the knowledge of the structure behaviour when it is damaged. This is why the goal of this paper is to propose a novel approach based on the use of cointegration and robust outlier statistics aiming at improving the damage detection strategies for real-world structures. Furthermore, as the techniques applied at the different levels mentioned above are found to be complementary, the possibility of having an efficient algorithm characterized by accurate detection and low computational complexity allows us to improve also the higher levels information of the SHM process.

Later in this paper it will be shown how, thanks to the mutual use of cointegration and robust outlier analysis, the underlying nature of outliers is revealed, allowing one to separate the effects due to environmental and operational factors out of those due to damage. Into details, cointegration is first used to create a linear combination of the original measured variables

u = β_{1} y_{1} + β_{2} y_{2} + \dots + β_{n} y_{n}

, where

u

is referred to as cointegration residual, in which all the effects of EOVs are deleted. Other recent successful implementations of cointegration to eliminate environmental and operational trends from damage sensitive features can be found in [24,25,26,27,28]. The expression of the cointegration residual can equivalently be reformulated as a linear relationship where a generic variable

y_{i}

is selected as response and the remaining variables becomes the regressors

x_{i}

, i.e.,

y_{i} = b_{1} x_{1} + b_{2} x_{2} + \dots + b_{n} x_{n} + u_{i}

, thus enabling to predict the behaviour of

y_{i}

. Since to distinguish between outliers and leverage points one has to consider the response

y_{i}

in parallel with the regressors

x_{i}

, the minimum covariance determinant (MCD) estimator is then adopted to compute the robust discordancy measures of the regressors. They indicate the distance of a certain observation belonging to the space of input features from a region associated with the bulk of the data. The combination of the two previous tools lead to a new diagnostic representation in which the cointegration residual is plotted against the robust distances calculated by applying the MCD estimator on the input variables of the model. This kind of visualization permits to automatically classify the observations according to the process which originates them.

To the best of authors’ knowledge, the issue of characterizing and separating the effects of multiple outliers within the context of SHM has only been presented by Dervilis et al. in [29]. This work illustrates the use of robust regression analysis and robust discordancy measures to demonstrate that environmental and operational conditions can manifest themselves differently, when compared to damaged conditions. With respect to this study, the use of cointegration used in this work in place of a simple linear regression offers important advantages, especially in the case of real-world structures. Here, the number of features increases and the external factors affecting these features are often not measurable or known a priori. Even in the case in which temperature is found to be the dominant factor, the existing relationships between temperature and structural response are hard to establish, since they require us to manage complex phenomena such as non-uniform solar irradiations, thermal inertia, changes in the material properties and boundary conditions. Therefore, if all the influencing factors are not properly included in the model, linear regression may produce unreliable predictions in the response. In addition, SHM data often exhibits a non-stationary behaviour, mainly induced by EOVs, which may cause the issue of spurious regressions. Cointegration represents a viable solution to overcome these limitations as it is a technique which does not strictly require the direct measurement of EOVs, but only a set of non-stationary time series representing the output of a given process.

The presence of actual damage for in-service structures is something that is hard to find. That is why damage detection strategies are often validated using a numerical model of the structure, where different damage scenarios are simulated to reproduce realistic damage conditions [30,31,32]. Alternatively, many works rely on the use of experimental data from the well-known Z24 bridge benchmark study [33,34,35], where long-term continuous monitoring took place during the year before demolition and progressive failure tests were performed. In order to increase the availability of validation cases related to different types of real structures and damages, the method proposed in this paper considers the experimental data from the monitoring campaign of the KW51 railway bridge in Leuven [36] over a period of 15 months, recently acquired by the research group from the Structural Mechanics Section of the KU Leuven University.

This data set represents a challenging situation, as it includes data subjected to the contemporary presence of both changing environmental conditions and different structural states, i.e., before, during and after an operation of retrofitting to resolve a construction error that emerged during inspections. Since structural modifications associated with the retrofit manifest, among others, with changes in natural frequencies having a similar variability of those due to environmental parameters [37], the effect of confounding influences is here rather critical. Furthermore, a very limited number of publications inherent to this case study are currently available. In [38], the authors developed an automatic procedure to identify natural frequencies and strain mode shapes of the bridge deck starting from the strain time histories measured by a dense network of fiber-optic Bragg Grating sensors. Using these modal data, the influence of temperature fluctuations and that of retrofitting are extensively investigated. In [39], indeed, natural frequencies are used to evaluate the performance of linear regression and robust principal component analysis as damage detection techniques. The ability to remove variations due to environmental changes as well as to identify actual changes in the structural behaviour introduced by retrofitting are discussed.

The layout of the present paper is organized as follows. First, an introduction to the theoretical background on robust outlier detection and cointegration are explained in Section 2 and Section 3, respectively. Then, Section 4 describes the proposed approach for outliers discrimination, based on the combined use of cointegration and minimum covariance determinant. After this, Section 5 presents the main characteristics of the KW51 bridge, which represents the case study analyzed in this work. Finally, in Section 6, the technique is tested on the experimental data from the KW51 bridge and the expected performances are evaluated on this real database.

2. Robust Outlier Detection

2.1. Minimum Covariance Determinant (MCD)

Outliers are observations that do not follow the pattern of the majority of the data. For multivariate point clouds, outliers become difficult to detect because we can no longer rely on a visual perception of the problem. A classical method to quantify the presence of outliers is to compute the Mahalanobis Squared Distance (MSD), which is given by the following equation:

{MSD}_{i}^{2} = {(z_{i} - μ_{z})}^{T} Σ^{- 1} (z_{i} - μ_{z})

(1)

where

z_{i}

is the generic multivariate observation,

μ_{z}

designates the mean of the sample observations and

Σ

is the sample covariance matrix. The MSD indicates how far

z_{i}

is from the center of the distribution of the training data (location), taking into account the shape of the distribution as well (scatter).

The main disadvantage of the traditional MSD as expressed in Equation (1) is that it is sensitive to the “masking effect” of multiple outliers. Since the MSD is characterized by sample mean and covariance, any outlier present in the training data set will have a relevant influence on the estimation of mean and covariance; in this way, small distances could be assigned to new observations, resulting in increased chances for false negatives. This problem is frequent in many SHM applications, where the lack of a priori knowledge for the damaged conditions does not allow to exclude the outliers from the analysis at a preliminary stage. Consequently, the use of more robust methods for the estimation of mean and covariance in presence of inclusive outliers is of significant interest.

The pioneering work in this area was proposed in [40], in which a robust estimation of the covariance matrix was provided by iteratively removing subsets of data with the highest MSDs, until the covariance converged to within a desired tolerance. That estimator, however, has a breakdown value which is strictly related to the data dimensionality. The breakdown value represents the smallest fraction of observations that can significantly bias the estimated parameter from its true population value. Some years later, Rousseeuw [41] proposed the minimum volume ellipsoid (MVE) method, which attempts to find an ellipse of minimum volume that encloses a subset of data, under the assumption of normally distributed multivariate features. If MVE is found, then outliers can be detected as points outside the boundaries of the ellipsoid. Although it has a 50% breakdown value, it is computationally not so efficient and, therefore, was replaced by a superior estimator called minimum covariance determinant (MCD) [42], which has the same breakdown value, but provides the possibility to implement more efficient algorithms for its computation [43,44]. In the present paper, the FAST-MCD algorithm is adopted, whose implementation details can be found in [45]. The code used in the analysis for the computation of the MCD estimator was provided by a MATLAB library of robust statistical methods called LIBRA [46]. Readers can find an explanation of the main steps of the FAST-MCD technique in the following.

Suppose that a multivariate dataset is stored in an

n \times p

matrix

[Z] = {(z_{1}, \dots, z_{n})}^{T}

where

z_{i} = {(z_{i 1}, \dots, z_{i p})}^{T}

is the

i th

observation, so

n

stands for the number of observations and

p

for the number of variables. Robust estimates for the location μ_MCD and the scatter

[Σ_{MCD}]

of

[Z]

can be obtained by the computation of the MCD. The goal of the MCD method is to select

n / 2 < h \leq n

observations whose classical covariance matrix has the lowest possible determinant, i.e., the lowest possible variance of the data. The raw MCD location estimate is then the average of these

h

points, whereas the raw MCD estimate of scatter is their covariance matrix, multiplied by a consistency factor. Since the MCD estimates can resist up to

(n - h)

outliers, the breakdown value

h

, or equivalently the ratio

α = h / n

, determine the robustness of the estimator. When a large number of outliers is expected,

h

should be chosen close to

α n

with

α = 0.5

. Otherwise, an intermediate value for

h

, such as

0.75 n

, is recommended to obtain a higher finite-sample efficiency. When

p / n

is small, some data points can become co-planar, making the outlier detection problem hardly feasible. Therefore, it is recommended that when

n / p > 5

to choose

α = 0.5

. Finally, the MCD estimates of location and scatter is affine equivariant, meaning that they stay the same under affine transformations in the data. This makes them immune to different variables scales and rotations.

The exact MCD estimator is hard to compute, as it requires an exhaustive investigation of all

h

-subsets out of

n

. The authors of [45] developed the FAST-MCD algorithm, based on a Concentration step (C-step) which avoids such a complete enumeration. The main computation steps are described here. Given

[Z] = {(z_{1}, \dots, z_{n})}^{T}

, let

H_{1} \subset (1, \dots, p)

be a subset of size

h

. Then, μ₁ and

Σ_{1}

represent the mean and the covariance matrix of the data in

H_{1}

. If

|H_{1}| \neq 0

, the relative distances are calculated as:

d_{1} {(i)}^{2} = {(z_{i} - μ_{1})}^{T} Σ_{1}^{- 1} (z_{i} - μ_{1})

(2)

Then,

H_{2}

is chosen such that

\{d_{i} (i); i \in H_{2}\} = \{{(d_{1})}_{1 : n}, \dots, {(d_{1})}_{h : n}\}

where

{(d_{1})}_{1 : n} \leq {(d_{1})}_{2 : n} \dots \leq {(d_{1})}_{n : n}

are the ordered distances and compute μ₂ and

Σ_{2}

based on

H_{2}

. Then,

|Σ_{2} |\leq| Σ_{2}|

with equality if and only if μ₂ = μ₁ and

Σ_{2} = Σ_{1}

. If

| Σ_{1} | > 0

, the C-step easily yields a new h-subset with lower covariance determinant than

Σ_{1}

, i.e., more “concentrated” than

Σ_{1}

. C-steps can be iterated until a stopping criterion is satisfied. The stopping criterion is when

|Σ_{n e w}| = 0

or when

|Σ_{n e w} |=| Σ_{o l d}|

. For the first case, since the covariance matrix is singular, the new candidate subset is rejected and the previous subset is restored as an optimal solution. The sequence of determinants obtained in this way must converge in a finite number of steps because one has only a finite number of

h

-subsets. However, the final calculation of

|Σ_{n e w}|

could not converge to the global minimum of the MCD objective function. Therefore, an approximation of the MCD solution is obtained by randomly selecting several subsets

H_{1}

as initial guesses, applying the C-steps to each and, in the end, keeping the one with the lowest determinant.

2.2. Threshold Estimation Based on Extreme Value Statistics

Dealing with unsupervised methods, where data from a damaged state are not used to help with the damage detection process, the selection of an appropriate threshold to discriminate between normal and not normal behaviors of a structure is difficult to determine. Many research studies make use of statistical process control (SPC) for unsupervised damage detection to monitor such parameters as sample mean and standard deviation [47,48]. SPC is based on the assumption that the distribution of multivariate data is Gaussian, with the MSD subsequently approximated by a chi-squared distribution with

p

degrees of freedom, where

p

is the dimension of the feature vector. Therefore, the selection of control limits is the same as the quantile of the underlying distribution associated with the desired confidence level. However, this assumption of normality may result in an improper description in the behaviour of the extreme values of the data, which are associated with the tails of the distribution. As the problem of damage identification specifically focuses attention on these tails, an approach based on Monte Carlo simulation and extreme value statistics [13] is adopted in this paper for a proper threshold setting. The procedure to calculate the threshold is described in the following:

A $p \times n$ matrix is created ( $p$ is the number of dimensions and $n$ the number of observations), where each $p$ -dimensional observation is generated from a normal distribution, having zero mean and unit standard deviation.
The desired discordancy measure (i.e., MSD or MCD) is calculated for all the observations, where mean and covariance are estimated depending on the selected classical or robust methods. The largest value for each matrix is stored.
The process is repeated for a large number of iterations in order to create a vector of extreme distances. Then, all the values are sorted in decreasing order. The threshold value depends on the choice of the critical values $α$ . In the following analysis, $α$ is set equal to 5 per cent, giving a 95 per cent confidence limit.

3. Cointegration Basics

3.1. Order of Integration and Unit Root Tests

The theoretical foundations behind the concept of cointegration are based on a clear understanding of the ideas of stationarity and non-stationarity. A stationary time series is one whose statistical moments are not dependent on time. In case this assumption does not apply, the process is said to be nonstationary. To quantify the degree of non-stationarity of the time series, the order of integration is introduced. More specifically, if a nonstationary time series

y_{t}

becomes stationary after differencing it

d

times, then

y_{t}

is said to be integrated of order

d

, denoted as

y_{t} ~ I (d)

. Just to give an example,

I (1)

defines a series which becomes stationary after differencing it once, while

I (0)

indicates a stationary series. Different statistical methods have been proposed to determine the order of integration of a time series. One of the most popular is the Augmented Dickey–Fuller (ADF) test [49], which involves fitting the data to the following regression model:

∆ y_{t} = c + β t + ρ y_{t - 1} + \sum_{j = 1}^{p} b_{j} ∆ y_{t - j} + ε_{t}

(3)

where

∆ y_{t} = y_{t} - y_{t - 1}

,

p

is the number of lagged difference terms to correct for higher-order correlations,

ε_{t}

is the model residual,

c

and

β_{t}

represent a constant and a linear trend terms which can be eventually added depending on the complexity of the model. The ADF procedure provides an estimate of the parameters of Equation (3) using the least-squares methods and then tests the null hypothesis of

ρ = 0

. The test statistic on

ρ

is calculated as:

t_{ρ} = \frac{\hat{ρ}}{s (ρ)}

(4)

where

\hat{ρ}

is the estimate of parameter

ρ

and

s (ρ)

is the standard error of the estimated parameter. Because the t-statistic does not follow one of the typical distributions (Gaussian, t-Student, etc.), it should be compared with the critical values of its asymptotic distribution, whose values are tabulated and provided by Dickey and Fuller, depending on the desired significance level

α

. If

|t_{ρ}| < | t_{c r, α} |

, the null hypothesis is accepted, meaning that

y_{t}

has a unit root and is

I (1)

. If the null is rejected, the test is repeated for

∆ y_{t}

and if the hypothesis is then accepted,

y_{t}

is an

I (2)

series. If rejected, the procedure is repeated until the integrated order of the process is established.

In this work, the ADF test is used at first to assess the degree of non-stationarity of the input variables. Then, it is adopted after cointegration to verify that the residual obtained from the regression model becomes stationary. This means that cointegration successfully lowered down the order of integration of the original variables, i.e., removes the common trends which caused signals to be nonstationary.

3.2. Linear Cointegration

As previously stated, two or more nonstationary series are cointegrated if a linear combination of them can be found to be stationary. Let

Y_{t} = (y_{1 t}, y_{2 t}, \dots, y_{n t})

be a multivariate nonstationary,

I (1)

time series. The series are cointegrated if there exists a vector

β = (β_{1}, β_{2}, \dots, β_{n})

such that:

u_{t} = β^{T} Y_{t} = β_{1} y_{1 t} + β_{2} y_{2 t} + \dots + β_{n} y_{n t}

(5)

is a stationary univariate time series. Here, the vector β is referred to as cointegrating vector and the linear combination

β^{T} Y_{t}

describes the long-run equilibrium relationship between the time series. Usually for a multivariate series such as

Y_{t}

there could be more than one possible cointegrating relationship and a number of general approaches are introduced to estimate the cointegrating vectors. Among them, the Johansen procedure represents an efficient maximum likelihood estimator [50], which seeks a white Gaussian residual from the cointegrating relation. This method offers the possibility to estimate multiple cointegrating vectors at the same time and to produce a test statistic to determine the number of cointegrating vectors. An alternative approach to estimate the cointegrating relations is the Engle–Granger (EG) two-step procedure [51]. The first step of this method provides to consider a set of measured variables which have a regression model equation of the form:

y_{1 t} = b_{2} y_{2 t} + \dots + b_{n} y_{n t} + ε_{t}

(6)

where the time series have the same order of integration and the cointegrating vector

b = (1, - b_{2}, \dots, - b_{n})

is unknown and has to be estimated using ordinary least-squares methods. The regressed variable

y_{1 t}

can be selected arbitrarily from the data set, without loss of generality. The estimated regression parameters are then used to form an error correction model (ECM) [52], which describes the long-run relationships between variables. It is not necessary here to discuss about the nature of the ECM, since a simplified implementation of the EG procedure is sufficient for SHM purposes.

The second step then verifies if the obtained cointegration residual

ε_{t}

is stationary using the ADF test. If

ε_{t}

is stationary, there is cointegration within the series, i.e., the regression equation has captured all the long-term relationships between variables. In this case, a control chart can be directly used to monitor the behaviour of the residual. On the contrary, if

ε_{t}

is nonstationary, there is no cointegration within the series.

In this work, the Engle–Granger procedure is preferred to the Johansen approach to test for cointegration because it offers the advantage of having a highly consistent and more intuitive estimator of coefficients, which have already been demonstrated to fit well in case of SHM applications [33,52].

3.3. Main Steps to Run Cointegration within SHM Applications

This section resumes the main steps adopted in this paper to estimate the cointegrating relationships over a given set of monitored variables. The Engle–Granger two-step procedure described above is implemented as a core technique to test for cointegration, under the assumption that linear dependencies exist between variables. The overall method can be extended to any kind of SHM applications having similar characteristics.

Select a set of suitable monitored variables belonging to the same process and sharing common trends.
Run the ADF test on the variables to determine the order of integration (this should be the same for all the variables).
Split the original data set in two parts, one for training and one for testing. Training data are used to estimate the regression model, while test data are used to check for variations in system behaviour. To obtain a reliable model, training data should not include any damaged conditions, but should include a comprehensive time span and resolution of healthy data under different environmental and operational conditions.
Run the ADF test on the model residual to assess its stationarity. If this situation is verified, the linear cointegrating relationship is successfully established and common trends are removed. Then, the cointegration residual represents a good indicator of the health status of the structure and can be used for damage detection purposes.

4. Proposed Hybrid Approach for Outlier Discrimination

4.1. Identify Leverage Points in Regression

In linear regression, observations are classified as

(x_{i}, y_{i})

, where

x_{i}

is the

p

-dimensional vector of predictors and

y_{i}

is the one-dimensional vector of the response. When

x_{i}

is far away from the majority of the distribution of the predictors, that point is identified as a leverage point. Therefore, the definition of a leverage point only depends on the behaviour of

x_{i}

, without considering the response

y_{i}

. If both

x_{i}

and

y_{i}

lie far from the region associated with the bulk of the data, this observation becomes a bad leverage point. This point can be very critical, since it attracts or even tilts the classical least square regression. If

(x_{i}, y_{i})

is still far, but follows the linear relation, it is called a good leverage point, since it improves the precision of the regression estimates. Therefore, in the end, to distinguish between good and bad leverage points, it is necessary to consider both

x_{i}

and

y_{i}

and to know the linear relationship established by the majority of the data.

To facilitate the comprehension of the terminology just introduced, Figure 1 shows the different categories of points that may be found in case of simple linear regression. The majority of the data are regular observations and are indicated by (a). Data belonging to (b) and (d) deviate from the linear pattern and thus are called regression outliers, but points in (c) are not. Both (c) and (d) indicates clusters of leverage points, since their

x_{i}

values are outlying. As a result, points in (c) are good leverage points, while points in (d) are bad leverage points. Finally, observations in (b) can be categorized as vertical outliers, because they are regression outliers but not leverage points.

4.2. Description of the Residual Outlier Map

The main results for the case study analyzed in the next section are presented using a residual outlier map similar to that introduced in [29,53]. In these works, the map displays the least trimmed squares (LTS) residuals [53] versus the robust distances calculated using the MCD estimator on the input variables and it is very useful to classify the observations between regular, good and bad leverage points, according to the adopted regression model. In this paper, residuals are established from cointegration analysis. However, this does not change the general interpretation of the results, as both the methods are based on the same theoretical principle, i.e., linear regression.

In more detail, the computation of robust distances reveals leverage points, but cannot distinguish between good and bad ones, because

y_{i}

is not considered. At the same time, residuals identify regression outliers

y_{i}

without telling which of them are leverage points. The residual outlier map combines the previous information to provide a powerful visualization in which the inherent nature of the data is revealed.

An illustrative example of the residual outlier map is reported in Figure 2. The two horizontal red lines represent the threshold for the regression residual and depend on its statistical distribution, while the vertical red line is the threshold for the MCD distances. These threshold values divide the plot into six regions categorized with different labels, according to the behaviour assumed by the observations described above.

5. Application to SHM: The Railway Bridge KW51

The case study presented in this study is the KW51 bridge, a steel railway bridge that crosses the Leuven–Mechenel canal close to the city of Leuven, Belgium (Figure 3a). Experimental data used in this work have been acquired by the research group from the Structural Mechanics Section of the KU Leuven University and are available online as a benchmark case study.

The KW51 is a steel single-span arch bridge of bowstring type, with a length of 117 m and a width of 12 m. The two-track deck is suspended from the arch with thirty-two inclined braces and it is made of two main girders stiffened by thirty-three transverse beams. The bridge is supported by four neoprene bearings at its extremities, which directly sit on two concrete abutments.

The KW51 has been monitored since 2 October 2018. In the period from 15 May to 27 September 2019, the bridge was retrofitted to resolve a construction error that emerged during inspections. At the beginning of the retrofitting, the scaffolding was installed on the bridge, adding mass to the structure and, therefore, modifying its modal properties. Then, the retrofitting intervention took place, which consisted of strengthening the bolted connections of the braces with the deck and the arch by means of welded steel plates (Figure 3b).

The monitoring system of the KW51 bridge comprises a dense network of sensors which provides measurements of acceleration on the bridge deck and the arches, strain on the bridge deck and on the diagonals connecting the bridge deck with the arches, strain on the rails, displacement at the bearings, and a set of relevant environmental parameters such as temperature, relative humidity and wind speed.

Acceleration measurements represent the quantities of interest for this study. The acceleration of the bridge deck and the arches are acquired by means of 12 uniaxial accelerometers with a sensitivity of 1000 mV/g (model PCB393B04). The layout of the measurement network of the bridge is shown in Figure 4. Six accelerometers are installed on the bridge deck, where four sensors measure the acceleration in the vertical direction (z) and two sensors measure the acceleration in the lateral direction (y). The remaining six accelerometers are installed at the intersection between the arches and the diagonals, where they measure the accelerations in the lateral direction (y).

Acceleration data are acquired and collected every five minutes using a National Instrument (NI) data acquisition system, made of a NI DAQ 9178 chassis and three NI 9234 ICP modules, which account for signal conditioning and A/D conversion. A sampling frequency of 1651.6 Hz is selected.

Ambient vibration data have been used to perform an automated operational modal analysis (OMA) on an hourly basis. The result is the availability of the evolution of the modal parameters (natural frequencies, damping ratios and mode shapes) of the railway bridge over time. For a complete description about the measurement setup, data processing and the available data characteristics readers can refer to [36]. For the purposes of this paper, the four natural frequencies related to the vertical global modes of the bridge deck are considered, referred to as

f_{1}, f_{2}, f_{3}

and

f_{4}

in Figure 5a. Data samples relate to a period of 15 months, from 2 October 2018 to 15 January 2020. These set of frequencies are of particular interest since they are the only identifiable by OMA with a good success rate also during the retrofitting process, which starts at observation 2675. Missing data that are present in the original dataset are removed as a data pre-processing procedure.

The observations considered for the analysis are classified into five scenarios associated with five distinct structural conditions. To better understand the behaviour assumed by the natural frequencies during these periods, the upper panel of Figure 5b offers an in-depth view of the evolution over time of the second natural frequency

f_{2}

, where the different scenarios are highlighted with different colors. Air temperature readings from the same period are plotted in the lower panel.

Scenarios S1 and S2 refer to the normal conditions of the structure, where the bridge is exclusively influenced by environmental and operational parameters. It can be noted here that the variability of data points is mainly driven by the influence of temperature. In particular, two large peaks are clearly visible in S2 between samples 1404 and 1530. These fluctuations are also observed in the behaviour of other bridges [34,55] and they are highly related to very cold periods, where the temperature decrease below 0 °C causes a stiffness increase due to the freezing of the asphalt layer of the bridge deck.

The retrofitting period is divided into three main regions. The first region (S3) is where a slight decrease in the natural frequency value owing to the presence of the first part of the scaffolding (i.e., added mass) on the bridge is observed; however, its effect on the modification of the structural properties is negligible and EOVs still remain the major source of variation in natural frequencies. The second region (S4) is related instead to the evolution of the retrofitting activities, where all of the bolted connections between the diagonal braces and the deck and arch were strengthened, resulting in a gradual stiffness increase and therefore in a gradual increase in the natural frequencies. Finally, the third region (S5) belongs to the period after completing the retrofitting, where a permanent shift of the natural frequency values is observed with respect to the initial conditions. It should be stressed that, even when structural modifications due to retrofitting occur, natural frequencies still remains sensitive to the influence of temperature, as can be seen from the fluctuations that are present both during and after the retrofitting.

All the scenarios described above are resumed in Table 1, with the corresponding name and data points. Data before retrofitting (S1 and S2) are used for the training of the proposed algorithms in order to fully characterize the structure behaviour during normal operating conditions, while the remaining datasets are used for testing and are assumed to be a proof for structural modifications of different extent.

6. Results and Discussions

The univariate MSD and MCD distances are calculated for each of the four natural frequencies in order to highlight the presence of outliers in the data, as shown in Figure 6. For the computation of MCD estimates, a breakdown value

h = 0.75 n

has been selected, where

n

represents the number of observations in the training period. MSD and MCD threshold limits are calculated as described in Section 2.2, given a 95% confidence limit. First of all, one can observe that the Mahalanobis-squared distance is less sensitive to identify outliers. For the specific case of frequency

f_{4}

, the damaged condition has not even been identified. This is because observations related to the freezing period are (correctly) included in the computation of the metrics, since they belong to the normal conditions of the structure, but they biased the estimation of mean and covariance.

On the contrary, the minimum covariance determinant estimator, thanks to its major robustness to outliers, better reveals the presence of abnormal behaviors, resulting in a higher MCD index if compared to the MSD index. The large distances observed in the training period associated with the frequency fluctuations are due to very cold temperatures and represent suitable candidates to test the ability of the proposed combined approach in discriminating outliers due to environmental conditions out of those due to retrofitting. This aspect is investigated in detail in the next section.

Once outliers are revealed, further processing is required in order to transform the original variables into a new feature which is made insensitive to the external influences due to environmental and operational factors and, therefore, create a suitable normal condition training set. This is achieved in this work by establishing a regression model of natural frequencies, based on the concept of cointegration. The basic idea from the point of view of cointegration is that, once cointegration relationship between variables is built, the model error describes their long-term equilibrium. If cointegration relationship is estimated from data when the structure operates under normal conditions, it is expected that natural frequencies are influenced by the same set of external influences and, therefore, their combination produces a model error which is stationary.

The stationarity of residual in this case has a dual function. On one hand it permits us to remove common trends from the original data; on the other, it allows us to check for system variations induced by damage, which may manifest as a deviation from equilibrium conditions. Although the relationship between temperature and each of the natural frequencies of the bridge are found to be nonlinear in case of very low temperatures, relations between natural frequencies themselves within the same period are found to be linear. This is because the different dynamic responses of the structure are driven in a similar manner to the external influence of temperature.

For this reason, linear cointegration is applied in the following analysis. To satisfy cointegration assumptions, all variables included in the regression model must be

I (1)

. The order of integration for the time series of the four natural frequencies is determined using the ADF test. Results are resumed in Table 2 and show that all the series are nonstationary and integrated of order one at the 95% confidence level.

Then, the two-step Engle–Granger procedure is implemented. Because it is necessary to select one of the natural frequencies at a time to serve as regression response, there exist four possible models, namely,

f_{1} = h (f_{2}, f_{3}, f_{4}) + ε_{1}, f_{2} = h (f_{1}, f_{3}, f_{4}) + ε_{2}, f_{3} = h (f_{1}, f_{2}, f_{4}) + ε_{3}, f_{4} = h (f_{1}, f_{2}, f_{3}) + ε_{4}

, where

h

represents a function which describes the independent variable as a linear combination of the regressors and

ε_{i}

is the model error. Once regression coefficients are estimated, the ADF test is applied to the training residual and the obtained results are reported in Table 3. Since all the ADF test statistics are significantly smaller than the 5% critical value, it is possible to conclude that all residual series are stationary and, therefore, cointegration relationships exist between natural frequencies.

The prediction ability of all the four models are quite similar and tend to have a good fit to the experimental data. However, among all possible models, the one with a residual with the most negative ADF t-statistic, namely

f_{2} = h (f_{1}, f_{3}, f_{4})

, is selected as the candidate for the following analysis. Indeed, this model is the one able to create the most stationary combination of the original variables, i.e., it is the most effective in suppressing the original common trends.

Figure 7 shows the control chart of the residual time series obtained from cointegration. The confidence interval is set to 95%, which corresponds to the range μ ± 2σ, where μ and σ are the mean and the standard deviation of the residual calculated from the training data. As one can clearly see, the residual series is very sensitive to structural changes.

During the training period, when the structure is assumed to be in a normal condition, fitted data follow the measured data very well. This is reflected by the residual behavior, which remains stationary, any effect from temperature is effectively deleted and the large fluctuations of the frequency values due to low temperatures are well predicted too. Moving to the testing region, deviations between the model predictions and the experimental data occur. First, an average decrease in residual values between observations 2700 and 2800 is observed, probably related to the added masses of the scaffolding on the bridge, which causes a decrease in the natural frequency with respect to fitted data. Second, from sample 3059 onwards, retrofitting takes place, producing a gradual stiffening of the bridge connections. Natural frequencies increase as well, but with a different relationship with respect to those built during cointegration. As a result, the residual series becomes nonstationary with a sudden jump in magnitude, which strongly points to the occurrence of damage.

At this point, the residual outlier map can be used to classify the observations in the six different regions introduced in Section 4. The plot of the cointegration residual versus MCD robust distances are presented in Figure 8.

Specifically, most of data related to scenarios S1 and S2 are classified as regular (region 3 of the map), as they are characterized by small residual and small MCD distances. This is in agreement to the actual behaviour of the structure, i.e., healthy conditions. Similarly, observations from scenario S3 are associated with region 3, as the presence of the first part of the scaffolding causes only a slight variation in the modal properties of the bridge and thus an increase in the MCD input distances which is negligible, except for few points that appear as vertical outliers. In the latter case, this means that the cointegration relationship is no longer respected. Typically, vertical outliers indicate a model misspecification and thus involve a better comprehension of the main phenomena and a model reformulation. However, this is not the case, since the situation that originates vertical outliers is well-defined and related to an external known cause (temporarily added masses) which cannot be estimated and thus its influence on the modal properties cannot be assessed.

In the map, some points from scenario S2 (very cold temperatures) are attributed to a region of benign variations (region 2). Indeed, they represent good leverage points, as the MCD distances of regressors assume high values, but at the same time, the residual is within the control limits, indicating that the cointegration relationship established during normal conditions is still respected.

On the other hand, observations from scenarios S4 and S5 are mostly related to region 4, i.e., they are bad leverage points. This time, as well as input variables are characterized by high MCD distances, residual amplitude also increases since the equilibrium described by cointegration is violated due to the different behaviour induced by retrofitting. Once again, the residual outlier map is able to separate the observations based on the current state of the structure. Consistent with the development of the retrofitting activities, it is possible to observe that data points related to scenario S4 migrate from region 3 (healthy conditions/negligible changes) to region 4 (damage state/relevant structural modification), while data points after the completion of the retrofitting lie permanently within region 4.

7. Conclusions

In this paper, a novel approach based on the use of cointegration and robust outlier detection is described as a means of characterizing outliers in the data. When structures are operating in situ and outside a controlled environment, the measured features may change over time due to the influence of benign reasons, mainly environmental and operational factors that are unrelated to damage. Different outlier detection methods have been used to reveal the presence of anomalies with respect to the baseline conditions, and outliers are often interpreted as critical points to be discarded although they may contain relevant information. On the other hand, when groups of outliers are included in the analysis, they could bias model estimates and the subsequent performance. Consequently, it is of key importance to reveal the nature of outliers according to the underlying processes that generate them. Thanks to this analysis, one can detect benign variability in advance, creating normal conditions clear of external influences, thus enhancing the possibility to identify real damages.

This is exactly the central target of this work, which focuses on explaining that outliers due to environmental and operational variations appears in the data in a different way with respect to those due to damages or structural modifications. The proposed method has been applied on the experimental data of the KW51 railway bridge, which represents a realistic case study subjected to the contemporary influence of EOVs and a gradual structural modification due to retrofitting.

Natural frequencies associated with the vertical modes of the bridge represent the input features for the procedure. Observations are divided in five scenarios associated with different structural conditions. First, data samples from the period before retrofitting, where the structure is assumed to be healthy, are used as training data to establish a cointegration relationship where one frequency (predictor) is expressed as a linear combination of the others (regressors). Data samples acquired during and after the retrofitting are used instead for testing purposes. The stationarity of the training residual obtained after cointegration is assessed through a unit root test and confirms that EOVs trends are effectively removed; at the same time, deviations from stationary conditions are observed in the testing residual after retrofitting which strongly points to the occurrence of damage.

Then, the information from the cointegration residual is used in conjunction with the robust discordancy measures (MCD) of the regressors to create a residual outlier map, which allows us to distinguish the observations in six different regions based on the different structural conditions. Samples related to the healthy state of the structure are correctly classified as regular. Samples measured in correspondence to very cold temperatures are treated as good leverage points and, therefore, associated with a region of “benign” variations; indeed, in this case, even though the MCD distances of the regressors are far from the bulk of the distribution, the cointegration relationship established during normal conditions is still preserved. Finally, it is observed that retrofitting causes a gradual change of the modal properties of the structure which is different from that induced by EOVs. As a result, not only high MCD distances of the regressors are obtained, but also high residual amplitudes, which testify that the equilibrium conditions established by the cointegration relationship are violated. Again, this aspect has been properly captured by the residual outlier map, which identifies these observations as bad leverage points and locates them in a region of “malign” variations.

Future studies should be oriented towards the extension to nonlinear robust regression methods, so as to also include those features characterized by nonlinear relationships.

Author Contributions

Conceptualization, all authors; writing—review and editing, all authors; supervision, E.Z. and A.C.; methodology, S.T.; investigation, S.T.; software, S.T.; data curation, S.T.; visualization, S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors gratefully acknowledge the Structural Mechanics Section of the KU Leuven University for providing access to the KW51 bridge data.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

As scientific texts uses different notations for vectors, matrices, statistical variables and abbreviations, the conventions applied in this paper are provided in the table below.

Symbol/Abbreviation	Description
MSD	Mahalnobis squared distance
MCD	Minimum covariance determinant
z_i	Multivariate observation
μ_z,Σ	Sample mean and covariance matrix
n	Number of samples
p	Number of features
[Z]	Multivariate feature matrix ( $n \times p$ )
h	Breakdown value ( $n / 2 < h \leq n$ )
H_i	Subset of $[Z]$ ( $h \times p$ )
μ_i,Σ_i	Sample mean and covariance for data in $H_{i}$
d_i(i)	MCD distance measure
ADF	Augmented Dickey–Fuller
y_t	Generic time series
y_t~I(d)	The time series $y_{t}$ is integrated of order $d$
t_ρ	ADF t-statistic
EG	Engle-Granger
b = (1, −b₂, …, −b_n)	Cointegrating vector
ε_t	Cointegration residual
x_i	$p$ -dimensional vector of predictors
y_i	One-dimensional vector of the response
f₁,f₂,f₃,f₄	Natural frequencies of the KW51 bridge deck (vertical modes)

References

Gardner, P.; Fuentes, R.; Dervilis, N.; Mineo, C.; Pierce, S.G.; Cross, E.J.; Worden, K. Machine learning at the interface of structural health monitoring and non-destructive evaluation. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2020, 378, 20190581. [Google Scholar] [CrossRef]
Hu, W.-H.; Tang, D.-H.; Teng, J.; Said, S.; Rohrmann, R.G. Structural Health Monitoring of a Prestressed Concrete Bridge Based on Statistical Pattern Recognition of Continuous Dynamic Measurements Over 14 Years. Sensors 2018, 18, 4117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Pimentel, M.A.F.; Clifton, D.A.; Clifton, L.; Tarassenko, L. A review of novelty detection. Signal Process. 2014, 99, 215–249. [Google Scholar] [CrossRef]
Worden, K.; Manson, G.; Fieller, N.R.J. Damage detection using outlier analysis. J. Sound Vib. 2000, 229, 647–667. [Google Scholar] [CrossRef]
Ulriksen, M.D.; Damkilde, L. Structural damage localization by outlier analysis of signal-processed mode shapes—Analytical and experimental validation. Mech. Syst. Signal Process. 2016, 68–69, 1–14. [Google Scholar] [CrossRef]
Worden, K. Structural fault detection using a novelty measure. J. Sound Vib. 1997, 201, 85–101. [Google Scholar] [CrossRef]
Mahalanobis, P.C. On the Generalized Distance in Statistics; National Institute of Science of India: Calcutta, India, 1936; pp. 49–55. [Google Scholar]
Turrisi, S.; Zappa, E.; Cigada, A.; Vivanco, M.R.; Avin, N.C. Effect of image acquisition and processing parameters on the estimation of crowd-induced dynamic loading on stadium grandstands. In Proceedings of the 2021 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Glasgow, UK, 17–20 May 2021. [Google Scholar]
Wang, X.; Gao, Q.; Liu, Y. Damage Detection of Bridges under Environmental Temperature Changes Using a Hybrid Method. Sensors 2020, 20, 3999. [Google Scholar] [CrossRef]
An, Y.; Chatzi, E.; Sim, S.; Laflamme, S.; Blachowski, B.; Ou, J. Recent progress and future trends on damage identification methods for bridge structures. Struct. Control Health Monit. 2019, 26, e2416. [Google Scholar] [CrossRef]
Cai, Y.; Zhang, K.; Ye, Z.; Liu, C.; Lu, K.; Wang, L. Influence of Temperature on the Natural Vibration Characteristics of Simply Supported Reinforced Concrete Beam. Sensors 2021, 21, 4242. [Google Scholar] [CrossRef]
Teng, J.; Tang, D.-H.; Zhang, X.; Hu, W.-H.; Said, S.; Rohrmann, R. Automated Modal Analysis for Tracking Structural Change during Construction and Operation Phases. Sensors 2019, 19, 927. [Google Scholar] [CrossRef] [Green Version]
Dervilis, N.; Cross, E.J.; Barthorpe, R.J.; Worden, K. Robust methods of inclusive outlier analysis for structural health monitoring. J. Sound Vib. 2014, 333, 5181–5195. [Google Scholar] [CrossRef]
Yeager, M.; Gregory, B.; Key, C.; Todd, M. On using robust Mahalanobis distance estimations for feature discrimination in a damage detection scenario. Struct. Health Monit. 2019, 18, 245–253. [Google Scholar] [CrossRef]
Fuentes, R.; Gardner, P.; Mineo, C.; Rogers, T.J.; Pierce, S.G.; Worden, K.; Dervilis, N.; Cross, E.J. Autonomous ultrasonic inspection using Bayesian optimisation and robust outlier analysis. Mech. Syst. Signal Process. 2020, 145, 106897. [Google Scholar] [CrossRef]
Rosafalco, L.; Manzoni, A.; Mariani, S.; Corigliano, A. An Autoencoder-Based Deep Learning Approach for Load Identification in Structural Dynamics. Sensors 2021, 21, 4207. [Google Scholar] [CrossRef]
Sohn, H. Effects of environmental and operational variability on structural health monitoring. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2007, 365, 539–560. [Google Scholar] [CrossRef]
Deraemaeker, A.; Worden, K. A comparison of linear approaches to filter out environmental effects in structural health monitoring. Mech. Syst. Signal Process. 2018, 105, 1–15. [Google Scholar] [CrossRef]
Yan, A.M.; Kerschen, G.; De Boe, P.; Golinval, J.C. Structural damage diagnosis under varying environmental conditions—Part I: A linear analysis. Mech. Syst. Signal Process. 2005, 19, 847–864. [Google Scholar] [CrossRef]
Svendsen, B.T.; Frøseth, G.T.; Øiseth, O.; Rønnquist, A. A data-based structural health monitoring approach for damage detection in steel bridges using experimental data. J. Civ. Struct. Health Monit. 2021, 12, 101–115. [Google Scholar] [CrossRef]
Cross, E.J.; Worden, K.; Chen, Q. Cointegration: A novel approach for the removal of environmental trends in structural health monitoring data. Proc. R. Soc. A Math. Phys. Eng. Sci. 2011, 467, 2712–2732. [Google Scholar] [CrossRef]
Rytter, A. Vibrational Based Inspection of Civil Engineering Structures. Ph.D. Thesis, Aalborg University, Aalborg, Denmark, 1993. [Google Scholar]
Frigui, F.; Faye, J.P.; Martin, C.; Dalverny, O.; Peres, F.; Judenherc, S. Global methodology for damage detection and localization in civil engineering structures. Eng. Struct. 2018, 171, 686–695. [Google Scholar] [CrossRef] [Green Version]
Turrisi, S.; Cigada, A.; Zappa, E. A cointegration-based approach for automatic anomalies detection in large-scale structures. Mech. Syst. Signal Process. 2022, 166, 108483. [Google Scholar] [CrossRef]
Salvetti, M.; Sbarufatti, C.; Cross, E.; Corbetta, M.; Worden, K.; Giglio, M. On the performance of a cointegration-based approach for novelty detection in realistic fatigue crack growth scenarios. Mech. Syst. Signal Process. 2019, 123, 84–101. [Google Scholar] [CrossRef] [Green Version]
Sousa Tomé, E.; Pimentel, M.; Figueiras, J. Damage detection under environmental and operational effects using cointegration analysis—Application to experimental data from a cable-stayed bridge. Mech. Syst. Signal Process. 2020, 135, 106386. [Google Scholar] [CrossRef]
Liang, Y.; Li, D.; Song, G.; Feng, Q. Frequency Co-integration-based damage detection for bridges under the influence of environmental temperature variation. Measurement 2018, 125, 163–175. [Google Scholar] [CrossRef]
Coletta, G.; Miraglia, G.; Pecorelli, M.; Ceravolo, R.; Cross, E.; Surace, C.; Worden, K. Use of the cointegration strategies to remove environmental effects from data acquired on historical buildings. Eng. Struct. 2019, 183, 1014–1026. [Google Scholar] [CrossRef]
Dervilis, N.; Worden, K.; Cross, E.J. On robust regression analysis as a means of exploring environmental and operational conditions for SHM data. J. Sound Vib. 2015, 347, 279–296. [Google Scholar] [CrossRef] [Green Version]
Magalhães, F.; Cunha, A.; Caetano, E. Vibration based structural health monitoring of an arch bridge: From automated OMA to damage detection. Mech. Syst. Signal Process. 2012, 28, 212–228. [Google Scholar] [CrossRef]
Pereira, S.; Magalhães, F.; Gomes, J.P.; Cunha, Á.; Lemos, J.V. Vibration-based damage detection of a concrete arch dam. Eng. Struct. 2021, 235, 112032. [Google Scholar] [CrossRef]
Johnson, E.A.; Lam, H.F.; Katafygiotis, L.S.; Beck, J.L. Phase I IASC-ASCE Structural Health Monitoring Benchmark Problem Using Simulated Data. J. Eng. Mech. 2004, 130, 3–15. [Google Scholar] [CrossRef]
Shi, H.; Worden, K.; Cross, E.J. A nonlinear cointegration approach with applications to structural health monitoring. J. Phys. Conf. Ser. 2016, 744. [Google Scholar] [CrossRef]
Peeters, B.; De Roeck, G. One-year monitoring of the Z24-bridge: Environmental effectsversus damage events. Earthq. Eng. Struct. Dyn. 2001, 30, 149–171. [Google Scholar] [CrossRef]
Langone, R.; Reynders, E.; Mehrkanoon, S.; Suykens, J.A.K. Automated structural health monitoring based on adaptive kernel spectral clustering. Mech. Syst. Signal Process. 2017, 90, 64–78. [Google Scholar] [CrossRef] [Green Version]
Maes, K.; Lombaert, G. Monitoring data for railway bridge KW51 in Leuven, Belgium, before, during, and after retrofitting. J. Bridge Eng. 2021, 26, 04721001. [Google Scholar] [CrossRef]
Turrisi, S.; Zappa, E.; Cigada, A.; Hötzer, T. Large structures natural frequencies estimation using a limited number of sensors. In Proceedings of the 2020 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Dubrovnik, Croatia, 25–28 May 2020; pp. 1–6. [Google Scholar]
Anastasopoulos, D.; De Roeck, G.; Reynders, E.P.B. One-year operational modal analysis of a steel bridge from high-resolution macrostrain monitoring: Influence of temperature vs. retrofitting. Mech. Syst. Signal Process. 2021, 161, 107951. [Google Scholar] [CrossRef]
Maes, K.; Van Meerbeeck, L.; Reynders, E.P.B.; Lombaert, G. Validation of vibration-based structural health monitoring on retrofitted railway bridge KW51. Mech. Syst. Signal Process. 2022, 165, 108380. [Google Scholar] [CrossRef]
Gnanadesikan, R.; Kettenring, J.R. Robust Estimates, Residuals, and Outlier Detection with Multiresponse Data. Biometrics 1972, 28, 81. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Least Median of Squares Regression. J. Am. Stat. Assoc. 1984, 79, 871–880. [Google Scholar] [CrossRef]
Rousseeuw, P.; Leroy, A. Robust Regression and Outlier Detection; John Wiley and Sons: New York, NY, USA, 2005; Volume 589. [Google Scholar]
Maronna, R.A.; Zamar, R.H. Robust Estimates of Location and Dispersion for High-Dimensional Datasets. Technometrics 2002, 44, 307–317. [Google Scholar] [CrossRef]
Maronna, R.A. Robust Estimators of Multivariate Location and Scatter. Ann. Stat. 1976, 4, 51–67. [Google Scholar] [CrossRef]
Rousseeuw, P.J.; Driessen, K.V. A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics 1999, 41, 212–223. [Google Scholar] [CrossRef]
Verboven, S.; Hubert, M. LIBRA: A matlab library for robust analysis. Chemom. Intell. Lab. Syst. 2005, 75, 127–136. [Google Scholar] [CrossRef]
Srinivasa Rao, P.; Ratnam, C. Health monitoring of welded structures using statistical process control. Mech. Syst. Signal Process. 2012, 27, 683–695. [Google Scholar] [CrossRef]
Zapico-Valle, J.L.; García-Diéguez, M.; González-Martínez, M.P.; Worden, K. Experimental validation of a new statistical process control feature for damage detection. Mech. Syst. Signal Process. 2011, 25, 2513–2525. [Google Scholar] [CrossRef]
Said, S.E.; Dickey, D.A. Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 1984, 71, 599–607. [Google Scholar] [CrossRef]
Johansen, S. Statistical analysis of cointegration vectors. J. Econ. Dyn. Control 1988, 12, 231–254. [Google Scholar] [CrossRef]
Engle, R.F.; Granger, C.W.J. Co-integration and error correction: Representation, estimation, and testing. Appl. Econom. 2015, 39, 106–135. [Google Scholar]
Li, F.; Wang, Z.; Liu, G. Towards an Error Correction Model for dam monitoring data analysis based on Cointegration Theory. Struct. Saf. 2013, 43, 12–20. [Google Scholar] [CrossRef]
Hubert, M.; Rousseeuw, P.J.; Van Aelst, S. High-Breakdown Robust Multivariate Methods. Stat. Sci. 2008, 23, 92–119. [Google Scholar] [CrossRef]
Anastasopoulos, D. Structural Health Monitoring Based on Operational Modal Analysis from Long Gauge Dynamic Strain Measurements. Ph.D. Thesis, KU Leuven, Leuven, Belgium, 2020. [Google Scholar]
Moser, P.; Moaveni, B. Environmental effects on the identified natural frequencies of the Dowling Hall Footbridge. Mech. Syst. Signal Process. 2011, 25, 2336–2357. [Google Scholar] [CrossRef]

Figure 1. Example of simple linear regression showing (a) regular observations, (b) vertical outliers, (c) good leverage points, (d) bad leverage points.

Figure 2. Graphical representation of residual outlier map.

Figure 3. (a) The KW51 bridge in Leuven and (b) Example of bolted connection between a brace and the arch of the bridge (Reprinted from ref. [54]).

Figure 4. Layout of the acceleration measurement setup of the KW51 bridge. The side view shows the sensors installed on the connections between the diagonals and the arches, while the top view illustrates the sensors installed on the bridge deck. (Adapted from ref. [54]).

Figure 5. (a) Time history (hourly based) of natural frequencies associated with the global vertical modes of the bridge deck. Vertical dashed lines indicate the frost period, while vertical solid line indicates the beginning of retrofitting interventions; (b) Upper panel: time history of the second natural frequency of the bridge deck, where the five analyzed scenarios are plotted in different colors; Lower panel: time history of air temperature (horizontal red line indicates air temperature equal to 0 °C).

Figure 6. Comparison between univariate MSD and MCD distances for the four natural frequencies.

Figure 7. Control chart of cointegration residual from model

f_{2} = h (f_{1}, f_{3}, f_{4})

. The upper and lower red lines represent the training residual mean plus or minus two times the standard deviation.

Figure 7. Control chart of cointegration residual from model

f_{2} = h (f_{1}, f_{3}, f_{4})

. The upper and lower red lines represent the training residual mean plus or minus two times the standard deviation.

Figure 8. Plot of cointegration residual versus MCD robust distance for model

f_{2} = h (f_{1}, f_{3}, f_{4})

. Numbers from 1 to 6 refer to the different regions of the residual outlier map described in Section 4.2.

Figure 8. Plot of cointegration residual versus MCD robust distance for model

f_{2} = h (f_{1}, f_{3}, f_{4})

. Numbers from 1 to 6 refer to the different regions of the residual outlier map described in Section 4.2.

Table 1. Description of the different analyzed scenarios.

Scenario	Description	Data Points
S1	Before retrofitting	1–1403; 1531–2674
S2	Before retrofitting (cold temperatures)	1404–1530
S3	Retrofitting: 1st stage	2675–3058
S4	Retrofitting: 2nd stage	3059–3755
S5	Post retrofitting	3756–4196

Table 2. ADF test results for the four natural frequencies series.

Variables	ADF t-Statistic	5% Critical Value	Stationarity?
$f_{1}$	0.041	−1.942	NO
$f_{2}$	0.030	−1.942	NO
$f_{3}$	−0.091	−1.942	NO
$f_{4}$	−0.072	−1.942	NO

Table 3. ADF test results for the training residuals.

Variables	ADF t-Statistic	5% Critical Value	Stationarity?
$ε_{1}$	−14.491	−1.942	Y
$ε_{2}$	−18.629	−1.942	Y
$ε_{3}$	−17.557	−1.942	Y
$ε_{4}$	−12.301	−1.942	Y

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Turrisi, S.; Zappa, E.; Cigada, A. Combined Use of Cointegration Analysis and Robust Outlier Statistics to Improve Damage Detection in Real-World Structures. Sensors 2022, 22, 2177. https://doi.org/10.3390/s22062177

AMA Style

Turrisi S, Zappa E, Cigada A. Combined Use of Cointegration Analysis and Robust Outlier Statistics to Improve Damage Detection in Real-World Structures. Sensors. 2022; 22(6):2177. https://doi.org/10.3390/s22062177

Chicago/Turabian Style

Turrisi, Simone, Emanuele Zappa, and Alfredo Cigada. 2022. "Combined Use of Cointegration Analysis and Robust Outlier Statistics to Improve Damage Detection in Real-World Structures" Sensors 22, no. 6: 2177. https://doi.org/10.3390/s22062177

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combined Use of Cointegration Analysis and Robust Outlier Statistics to Improve Damage Detection in Real-World Structures

Abstract

1. Introduction

2. Robust Outlier Detection

2.1. Minimum Covariance Determinant (MCD)

2.2. Threshold Estimation Based on Extreme Value Statistics

3. Cointegration Basics

3.1. Order of Integration and Unit Root Tests

3.2. Linear Cointegration

3.3. Main Steps to Run Cointegration within SHM Applications

4. Proposed Hybrid Approach for Outlier Discrimination

4.1. Identify Leverage Points in Regression

4.2. Description of the Residual Outlier Map

5. Application to SHM: The Railway Bridge KW51

6. Results and Discussions

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI