Semi-Supervised Framework with Autoencoder-Based Neural Networks for Fault Prognosis

Rosa, Tiago Gaspar da; Melani, Arthur Henrique de Andrade; Pereira, Fabio Henrique; Kashiwagi, Fabio Norikazu; Souza, Gilberto Francisco Martha de; Salles, Gisele Maria De Oliveira

doi:10.3390/s22249738

Open AccessArticle

Semi-Supervised Framework with Autoencoder-Based Neural Networks for Fault Prognosis

by

Tiago Gaspar da Rosa

¹,

Arthur Henrique de Andrade Melani

¹

,

Fabio Henrique Pereira

^2,*

,

Fabio Norikazu Kashiwagi

¹,

Gilberto Francisco Martha de Souza

^1,*

and

Gisele Maria De Oliveira Salles

³

¹

Department of Mechatronics and Mechanical Systems Engineering, Polytechnic School, University of São Paulo, São Paulo 05508-010, SP, Brazil

²

Informatics and Knowledge Management Graduate Program, Universidade Nove de Julho, São Paulo 01525-000, SP, Brazil

³

Companhia Paranaense de Energia—COPEL, Curitiba 80420-170, SP, Brazil

^*

Authors to whom correspondence should be addressed.

Sensors 2022, 22(24), 9738; https://doi.org/10.3390/s22249738

Submission received: 12 November 2022 / Revised: 30 November 2022 / Accepted: 5 December 2022 / Published: 12 December 2022

(This article belongs to the Special Issue Intelligent Systems for Fault Diagnosis and Prognosis)

Download

Browse Figures

Versions Notes

Abstract

:

This paper presents a generic framework for fault prognosis using autoencoder-based deep learning methods. The proposed approach relies upon a semi-supervised extrapolation of autoencoder reconstruction errors, which can deal with the unbalanced proportion between faulty and non-faulty data in an industrial context to improve systems’ safety and reliability. In contrast to supervised methods, the approach requires less manual data labeling and can find previously unknown patterns in data. The technique focuses on detecting and isolating possible measurement divergences and tracking their growth to signalize a fault’s occurrence while individually evaluating each monitored variable to provide fault detection and prognosis. Additionally, the paper also provides an appropriate set of metrics to measure the accuracy of the models, which is a common disadvantage of unsupervised methods due to the lack of predefined answers during training. Computational results using the Commercial Modular Aero Propulsion System Simulation (CMAPSS) monitoring data show the effectiveness of the proposed framework.

Keywords:

deep neural networks; fault prognosis; autoencoder; remaining useful life (RUL)

1. Introduction

Incorporating IoT into maintenance has brought new possibilities, including condition-based maintenance (CBM). CBM aims to avoid unnecessary maintenance tasks by taking maintenance actions only when there is evidence of abnormal behavior of physical assets [1,2]. If a CBM program is appropriately established and effectively implemented, it can significantly reduce maintenance costs by reducing the number of scheduled preventive maintenance operations [3].

The main feature of CBM is the condition monitoring (CM) process, in which signals are continuously monitored from certain types of sensors or other appropriate indicators to show the current state of a system or component [4]. Thus, a CBM program consists of three key steps [4]: data acquisition (information collection), data processing (information understanding and interpretation), and decision making (aimed at recommending efficient maintenance policies).

The sequence of steps mentioned above results in two essential forms of analysis within a CBM program: fault diagnosis and prognosis. While diagnosis deals with fault detection and isolation of faulty components, prognosis aims at predicting when the diagnosed fault will turn into a failure, i.e., its goal is to estimate how soon and how likely this failure is to occur [3]. A comprehensive description of prognosis and prognosis modeling is provided by ISO 13381-1 [5], which defines it as “an estimate of the time to failure and risk for one or more existing or future failure modes.” Such an estimate is often referred to as remaining useful life (RUL).

Deep learning (DL) methods have recently been gaining ground in prognosis and health management as they are solutions capable of identifying and predicting the equipment condition through large datasets. They are helpful in circumstances where there is little or no investigation into the physics of the failure. However, the current state of the art concerning industrial-scale integrated solutions is still incipient since many works use simulated databases or real data with artificial faults. Thus, some challenges must be surpassed, such as the ability to learn in environments with evolving operating conditions, novelty detection, robustness to changes in the operational conditions, the capacity of generalization, and output interpretability [6].

Deep learning embraces neural network learning models with multiple layers of computational units that are capable of decomposing higher-level abstract features in terms of other more straightforward representations [7]. As an extension of the single-layer networks, it is also suitable for supervised, unsupervised, and semi-supervised types of learning.

DL emerges in prognostics and health management (PHM) as a resource to solve previously intractable problems in the field, improve performance over traditional techniques, and reduce the effort to deploy prognostic systems due to its advantages. Fink et al. [6] listed some of them as the ability to automate the processing of a significant amount of condition monitoring data, extract valuable features from high dimensional, heterogeneous data sources, learn functional and temporal relationships between and within the signal time series, and transfer knowledge between different operating conditions and different units. Furthermore, DL contributes to attenuating the need for feature engineering in datasets composed of many monitored variables, which is demanding, by incorporating it inside its own network [6].

Several studies have applied DL techniques to solve fault diagnosis and prognosis problems. Tao et al. [8], for example, studied the different structures of a two-layer network designed by varying the hidden layer size and evaluated for its impact on fault diagnosis.

Babu et al. [9] built a convolutional neural network (CNN) to predict the RUL of a system using the readings of several sensors as the input. The authors conducted a series of experiments and demonstrated how a CNN-based regression model could outperform three other regression methods, i.e., the multilayer perceptron, the support vector regression, and the relevance vector regression.

Among the authors that explored the signal reconstruction approach, Malhotra et al. [10] combined LSTM layers in an encoder-decoder to attain an unsupervised health index for a system using multi-sensor time-series data. The study concludes that LSTM-ED constructed HI learned in an unsupervised manner can capture the degradation in a system and that this HI can be used to learn a model for RUL estimation with equivalent performance to domain knowledge or exponential and linear degradation model assumptions.

Wu et al. [11] have proposed a semi-supervised diagnosis architecture called “hybrid classification autoencoder,” which uses a softmax layer over the encoded features of the autoencoder. In its approach, vibration data are pre-processed in a bi-dimensional entry by a short-time Fourier transform (STFT) and subjected to consecutive convolutional layers. Experimental validation has been performed in a publicly available dataset of moto-bearing signals. The authors also presented a practical application in a hydro generator rotor diagnosed with a rub-impact fault between the turbine shaft and turbine guide bearing.

Moreover, several diagnosis and fault prognosis models and frameworks are available in the literature for the most diverse scenarios. One of the first was elaborated by Vachtsevanos et al. [12], and it divides the process into seven steps, starting from sensor data collection, passing through FMEA analysis, operating mode identification routine, feature extractor, and sequential diagnostic and prognostic modules.

Other researchers [13] prefer explicitly declaring the health index (HI) construction as a step of the prognostic scheme and prefer discerning the health stage (HS) of the system by the indicator. The health stage division in the Lei et al. [13] framework shares similarity with the fault detection and diagnostic actions but has the particular goal of splitting a degradation pattern into different ¨health stages¨ according to variations in its characteristics.

Even some standards attempt to generalize a conceptual framework aiming to provide basic CBM and PHM modules from data acquisition and analysis to health assessment, prognostic assessment, and advisory generation. ISO 13381-1 divides this process into four actions (one preprocessing and three prognostic types or levels of increasing complexity): data preprocessing, existing failure mode prognosis, future failure mode prognosis, and post-action prognosis.

However, although many authors conclude that semi-supervised or unsupervised learning-based methods embedded in frameworks are more appropriate for multiple reasons (see [14], for example), there is still a predominance of supervised approaches that depend on intensive manual intervention to label the data. Additionally, supervised methods cannot find previously unknown patterns in data, which are not rare in industrial environments where various causes for the failures result in very different behaviors of each signal before different incidents, even of the same type [14]. According to Sikorka et al. [15], most of the research on prognosis has been theoretical and restricted to a small number of models and failure modes. There are few published examples of prediction models applied in complex systems exposed to various operational and business conditions.

So, the main objective of this work is to provide a semi-supervised framework based on autoencoder deep learning methods for fault detection and prognosis. To overcome a common limitation of unsupervised methods related to the lack of predefined answers during training, this work provides a set of metrics designed to measure the accuracy or effectiveness of the models, ensuring comparability between them for validation and improvement purposes.

Thus, this work not only expands the literature on semi-supervised methods for fault prognosis but also provides a generic framework based on an autoencoder deep learning method. Consequently, the contributions of the proposed approach can be stated as follows:

This approach provides a systematic framework for implementing a semi-supervised prognosis method based upon an autoencoder deep learning method;
This approach implements a framework designed for application in the industrial scenario since it considers the system’s restrictions such as data management, the physical behavior of degradation processes, and business specifications;
This approach enables the detection of different kinds of faults by evaluating each sensor channel (i.e., variable) individually;
This approach proposes a set of metrics to evaluate the accuracy and effectiveness of the fault detection and prognosis models.

The effectiveness of the proposed framework is tested using the Commercial Modular Aero Propulsion System Simulation (CMAPSS) for monitoring data, which is largely used for testing fault detection, diagnosis, and prognosis [16,17,18,19,20,21,22].

The rest of this text is structured in the following way: Section 2 presents the proposed framework in detail; Section 3 shows the results obtained by applying the framework to the CMAPSS database; and Section 4 presents the conclusions and discussions arising from this work.

2. The Proposed Framework

The proposed approach relies upon generating a prognosis horizon for fault degradation patterns using the reconstruction error extrapolation of a deep autoencoder trained only with the machine’s normal operating condition monitoring data. This work focuses on detecting and isolating possible divergences in the monitoring measurements, which may indicate a fault, and extrapolating their growth to predict the machine’s RUL. Such extrapolation is conducted using a set of more straightforward univariate functions with known behavior until a limit of divergence, signaling the failure’s occurrence.

The reason for using this approach, in contrast with what is recently adopted in the prognosis literature, is that there is a demand for models capable of following, recording, and interpreting machine behavior in the context of complex engineering systems (CES). Currently, industrial equipment is generally assisted by monitoring systems (which are either automated or controlled by humans). These systems are usually assisted by programmed fault alarms based on guidelines or empirical knowledge about the process, composing the resources for predictive maintenance. This type of setup is classified in Level 0 in terms of prognostic implementation readiness according to ISO 13381 classification [5], i.e., they are CES with monitoring infrastructure capable of performing detection and sometimes fault identification, yet they do not form a strong foundation for most sophisticated prognostic techniques that require intensive and systematical diagnostic capacities.

In line with the above, the proposed model constitutes a framework that has the potential to embrace all the past requirements—which are the detection, isolation, and identification of the fault—for the remaining useful life prediction reaching Level 1 prognostics according to the same standard. In fact, since the autoencoder is a signal reconstructor and therefore can work as a hidden state reconstructor, diagnosis is possible because each channel could be compared individually to provide a multilabel classification of different kinds of faults.

Another reason for adopting this approach is the unbalanced proportion of monitoring data between faulty and non-faulty conditions in an industrial scenario since some failure events are rare for certain types of equipment. Then, the availability of a great array of data in the normal operating conditions of a CES is attractive for the use of data-driven methodologies. The steps for implementing the prognosis framework and RUL prediction are expressed in the flowchart in Figure 1.

2.1. Step 1: Data Preparation

The data selection comprises the procedure of selecting data in normal operational conditions (NOC), and it is necessary to characterize this state beforehand with the help of a specialist or some reference criteria (for example, collecting data immediately after maintenance or an arbitrary time interval before the occurrence of a fault). It is interesting pointing out that there is no need to establish a perfect boundary in the transition of the conditions since it is desired to detect incremental abnormalities.

After that, the data are scaled using the given criteria, which could be achieved by using a value range reference or by removing the mean and scaling to unit variance—standard score—according to the dataset profile in the application example. It is worth emphasizing that only NOC-labeled data are applied to calibrate the scaler to avoid distortions in the set designated to train the networks. Following this, the data are reshaped into a set of subsequences of size n that will supply the models. These subsequences are generated through a moving window with a temporal iteration step of p, thus allowing overlapping of n–p samples. Each sample has shape (n, m) (n is the subsequence size, and m is the number of channels—sensors inputs). For this work, NOC-labeled data are split between train and validation sets to prepare the neural network. The terminology test set will be designated exclusively to refer to data not applied in the DNN training and tunning process, including abnormal-labeled data.

2.2. Step 2: Fault Detection

The DNN autoencoder models are programmed according to the hyperparameter specifications in Table 1, and their structures are illustrated in Figure 2. Three different kinds of layers will be used, namely MLP, LSTM, and 1D-Convolutional (1D-CNN or Conv-1d), which are commonly used in the setup of DL models to monitor signals in the literature. Albeit MLP could express poor performance in comparison with the other layers, it is applied in this study as a reference to analyze the models. Thus, a minimum requirement for them is to outperform a classic MLP perceptron architecture. Recurrent neural networks, especially LSTM, are widely used for PHM applications. Moreover, Conv-1d is an alternative to apply convolutional operations into time-series data without demanding transformations to the bi-dimensional spatial space, which is time-consuming. In addition, Conv-1d is less computationally expensive because it has fewer parameters.

Rosa et al. [23] investigated the sensitivity of AE architecture hyperparameters over its abnormality detection performance. The study concluded that some specific hyperparameters influence the model outcomes more than others, therefore serving as a reference for the search space definition. Although easy to implement, grid search is an exhaustive procedure that is inefficient without prior knowledge of the search space near the optimality. Some alternatives are the random search or search based on the Bayesian optimization theory [24].

The reconstructed signal subsequences outputted from the trained AE models are compared with the actual signal observations, and the reconstruction error (RE) is evaluated. RE is computed as a mean squared error (MSE) function applied in a subsequence for each channel, so it is possible to inspect discrepancies individually. The subsequences are addressed by the time index of the last observation; then, the RE is also assigned for this position. Thus, the reconstruction error follows the notation below:

R E_{i, j} = \frac{\sqrt{|s_{i, j} - s_{i, j}^{r}|}}{n}

(1)

where RE_i_,j is the reconstruction error for the subsequence

s_{i, j} = \{s_{i, j}^{(0)}, \dots, s_{i, j}^{(n)}\}

with temporal index i, channel index j and size n.

s_{i, j}^{r}

correspond to the reconstructed subsequence mimicking s_i_,j.

The abnormality detection procedure comes afterward, employing the reconstruction error matrix, RE, whose entries are defined by Equation (1), to build a set of error threshold functions f_th(t) that is used to classify whether or not a data entry is abnormal. First, it is important to note that the RE₁_:n,j series are subject to local variability due to the outliers that could come from the sensor’s readings. For example, in machines with more than one operational mode or those with intermittent operations, the working routine is cyclical, having unstable behaviors during state transitions or due to variations in the cycle periods. Examples are the take-off and landing of aircraft or the switch between generation and motorization modes in hydro-generators. Moreover, it is not a trivial task to characterize state transitions, even for experts in the process, and it is part of the data selection step of this study. This fluctuation could severely affect the method’s abnormality detection capacity and must be considered in relation to the definition of f_th(t) and interpretation of the entries of RE.

The main part of Step 2 is summarized in the flowchart presented in Figure 3 and exemplified by the graphics in Figure 4. The non-conformities are detected in this work by using a set of continuous threshold functions f_th(t) = c_j, (c_j is the maximum value between the post-processed RE samples labeled as NOC for the jth channel). Samples of the post-processed RE₁_:n,j that exceed c_j are labeled as abnormalities. Sometimes the post-processing of RE alone is not enough to avoid the occurrence of false positives, which are caused by pointwise or small cluster point addressing. To highlight the cumulative abnormality resulting from the monotonic growth of the degradation pattern, an offset of consecutive abnormal labeled points is used as a requirement for pointing out the beginning of the degradation.

It is worth mentioning that several other techniques can be used in each step of the proposed framework. Specifically, regarding fault detection, the objective of the proposed method is similar to that of the multivariate statistical process control (MSPC). However, despite the usefulness of MSPC for multivariate surveillance in industrial practice, there are some disadvantages regarding establishing what happened in the process. The need for a mathematical background is another drawback for applying MSPC in real scenarios [25].

2.3. Step 3: Fault Prognosis

The RUL estimation is developed from the samples inside the degraded state intervals (I_d) provided by the abnormality detection procedure. I_d is defined as a set of consecutive reconstruction error samples, meaning that:

I_{d}^{i, j} = \{R E_{t_{i}, j}, R E_{t_{i + 1}, j}, \dots, R E_{t_{i + l}, j}\}

(2)

where

I_{d}^{i, j}

is the ith interval with cardinality l + 1 for the channel j, {t_i, t_i+₁, ⋯, t_i+l} is the ordered set of temporal indexes addressed for the REs. An additional notation is

I_{d}^{t_{d}, i, j}

, which emphasizes the temporal placement of I_d, which is agreed to be the time of the first abnormal sample within I_d. For convenience, the positional argument could be suppressed occasionally so:

I_{d}^{i, j} \equiv I_{d}^{t_{d}, i, j} \equiv I_{d}^{t_{d}, j}

. Moreover, if

I_{d}^{(j)}

represents the superset of all I_d’s in the channel

j,

then

I_{d}^{(j)} \supset I_{d}^{(i, j)} \forall i

, and to symbolize the union of sets more conveniently, the notation

I_{d}

is employed, such that

I_{d} = \cup_{j = 0}^{m} \cup_{i = 1}^{n} I_{d}^{(i, j)}

and

I_{d}^{(j)} = \cup_{i = 1}^{n} I_{d}^{(i, j)}

.

These samples could either be subjected to another post-processing routine specially designed for prognosis or the same routine already made for abnormality detection. Hereupon, the initial goal for RUL evaluation is to determine the prognostic error threshold for each channel, which is equivalent to the failure limit of a built health index or measured quantity. As the

R E

cannot directly relate to future variations on the input channels—unless explicability techniques are coupled to the DNN—it is necessary to take past failure events as references to determine those thresholds. Thus, the prognostic error threshold

P E_{t h}^{(j)}

for the jth channel is given as an average of k reconstructed error samples and n fault observations before the failure.

The next step is iteratively fitting curves and executing extrapolations from the first detected abnormality until the error threshold for each channel to obtain the RUL prediction at the time t. At the instant t, there can be more than one estimation because the degradation evolution of each channel is treated independently and fitted in a univariate function. Therefore, a decision criterion is required to provide a singularized prediction, which is achieved by observing curve fitting metrics, prognostic threshold variability, and the monotonicity of the generated profiles together with the values of the produced estimations. The pseudocode in the Algorithm 1 systematizes details concerning the step above.

Algorithm 1 Estimation of the RUL for an experimental fault event with the made assumption that the real remaining life is known for study purposes.
1:	$t_{init} \leftarrow \min (t_{d} (I_{d}^{(0, j)}) : 0 \leq j \leq m)$
2:	for $t_{i} = t_{init}$ $to t_{eol}$ do
3:		for $j = 0$ $to m$ do
4:			if $\exists I_{d}^{(t_{d} : t_{f}, j)}$ $such that R E^{(t_{i}, j)} \in I_{d}^{(t_{d} : t_{f}, j)}$ and $t_{d} \leq t_{i} \leq t_{f}$ then
5:				$x \leftarrow t_{d} : t_{i}$
6:				$y \leftarrow I_{d}^{(t_{d} : t_{i}, j)}$
7:				for $i = 0$ $to n (f)$ do
8:					$c^{(i)} \leftarrow$ $Solve : Leasts squares to f^{(i)}$ $in (x, y)$
9:					$t_{eol}^{*} \leftarrow$ $Solve : f^{(i)} (c^{(i)}, x) - P E_{t h}^{(j)} = 0$
10:					$r^{* (i, j)} \leftarrow t_{eol}^{*} - t_{i}$
11:				end for
12:				$i^{*} \leftarrow D_{f 1} (R^{2} (f^{(i)} (c^{(i)}, x), y) : 0 \leq i \leq n (f))$
13:				$r^{* (j)} \leftarrow r^{* (i^{*}, j)}$
14:				$M o n^{(j)} \leftarrow {Mon}^{(f^{(i )})} (c^{(i^{})}, x), y)$
15:				$R^{(j)} \leftarrow R^{2} (f^{(i^{})} (c^{(i^{})}, x), y)$
16:			end if
17:		end for
18:		$r^{* (t_{i})} \leftarrow D_{f 2} (M o n^{(j)}, R^{(j)}, σ (P E_{t h}^{(j)}), r^{* (j)} : 0 \leq j \leq m)$
19:	end for

The Algorithm 1 has three main loops that permeate the prognosis procedure in a given inspection interval as long as an abnormality is detected. The loops, from the most to the least nested, iterate through curves (Loop 1), channels (Loop 2), and in time (Loop 3), respectively. The first one adjusts the function shape for a channel

m

at time

t_{i}

using the least squares method and estimating the

t_{e o l}

and thus the RUL. The second one, in turn, evaluates the RUL and curve-fitting metrics for each one of the channels with degradation labeled in

t_{i}

using the decision function

D f_{1}

. By the end, the third loop decides whether a prediction is made at the time

t_{i}

and its value using the decision function

D f_{2}

. Decision functions are subroutines that hierarchically dispose of the most likely remaining life prediction(s) after inputting a list of them by means of the analysis of a set of curve-fitting metrics.

D f_{1}

only employs

R^{2}

in the sorting and eliminates nonsensical outcomes and those below an established fitting limit. The final result comes from the mean of the remnants’ occurrences.

D f_{2}

considers monotonicity beyond

R^{2}

and weights the last one in a fraction of 0.8 out of 1.

Furthermore, the curve fitting is executed using a non-linear least-squares problem with bounds on the variables. The objective is to find a local minimum of the cost function F(x), which is:

M i n i m i z e F (c) = \sum_{i = 1}^{N} ρ ({q_{i} (c)}^{2}), S u b j e c t t o : x \geq 0

(3)

where c is a vector of estimable parameters, N is the number of available data points,

ρ (s)

is a scalar loss function that reduces the outliers’ influence, and q_i(c) is the component i of the vector of residuals

q

. Residuals are understood as the difference between the prediction of a model function f(x,c) and a set of data points

\{i = 1, \dots, N\}

so

q_{i} (c) = f (x_{i}, c) - y_{i}

.

Minimization is performed through the trust-region reflective algorithm implemented in an open-source scientific computing library. The used curves are disposed of in Table 2 and represent common degradation patterns found in mechanical components [13,26].

The value of the function

f^{- 1} (R E)

, which is the inverse of

f (t)

, at the point

p = P E_{t h}^{(j)}

gives the component

t_{e o l}^{* (j)}

. Thus, the estimated RUL at the instant

t_{i}

is

r (t_{i}) = t_{e o l}^{* (j)} - t_{i}

for the fitted curve.

2.4. Step 4: Performance Assessment

The performance assessment is realized with dedicated performance metrics for comparing the autoencoders during the training process, abnormality detection, and prognostics. The AE convergence is observed through the train and validation set loss on the last training epoch. Abnormality detection capacity is measured by detection coverage,

d

, and false-positive coverage,

f

, respectively:

d = \frac{n (I_{d}^{r} & \cap I_{d}^{*})}{n (I_{d}^{r})} * 100

(4)

f = \frac{n (I_{n}^{r} \cap I_{d}^{*})}{n (I_{n}^{r})} * 100

(5)

The indicator

d

measures the ratio between the samples correctly signaled by the method as abnormalities and the real set of degradation occurrences, whereas

f

relates to the ratio of NOC samples highlighted on the same condition and the real entries in the normal state.

Other indicators used in evaluating performance are the discontinuity index

I_{d c}^{(j)} = {(N_{i n t}^{(j)})}^{- 1},

where

N_{i n t}^{(j)}

is the number of intervals where abnormality was detected, the time interval between the first spotted abnormal point

t_{s p}

, and the concrete tipping point for the degraded stage

t_{d} : Δ_{u s p} = t_{s p} - t_{d}

.

The prognostic capacity, in turn, is quantified by the root-mean-square error (RMSE), NASA’s scoring function adaptation (

n s

-score) [27,28], and the prognostic horizon. The first two are defined as:

n s = \frac{1}{N_{*}} \sum_{k = 1}^{N_{*}} \exp (β |Δ^{(k)}|) - 1

(6)

R M S E = \sqrt{\frac{1}{N_{*}} \sum_{k = 1}^{N_{*}} {(Δ^{(k)})}^{2}}

(7)

where

N_{*}

indicates the total number of RUL estimations,

Δ^{(k)}

is the difference between the predicted and the real remaining life of the kth sample,

Δ^{(k)} = r (t_{k}) - r^{*} (t_{k})

, and

β

is

\frac{1}{14}

if RUL is underestimated (but

\frac{1}{10}

otherwise). The

n s

metric is not symmetric and penalizes overestimation more than underestimation [29].

The prognostic horizon is defined as the time interval between the time

t_{⊤ (C)}

when a made prediction first meets a specified performance criteria

C

that continues being satisfied until

t_{e o l}

for all

t^{(i)}

such that

t_{⊤ (C)} \leq t^{(i)} \leq t_{e o l}

, thus:

H_{⊤ (C)} = t_{e o l} - t_{⊤ (C)}

(8)

Moreover, some metrics proposed by Saxena et al. [27,28] may be used for auxiliary performance inspection, i.e., not designated for a specific finality of tunning or validation in this study within the models’ comparison schema. These metrics are relative accuracy and cumulative relative accuracy.

If relative accuracy (RA) is defined as an error measurement in the RUL prediction relative to the actual RUL

r^{*} (t_{λ})

at a specified

t_{λ}

, then:

R A_{λ}^{l} = 1 - \frac{|r_{*}^{l} (t_{λ}) - 〈 r^{l} (t_{λ}) 〉|}{r_{*}^{l} (t_{λ})}

(9)

where

l

is the index for the lth prognostic experiment,

r_{*}^{l} (t_{λ})

is the ground truth remaining life at the time

t_{λ}

, and

r (i_{λ})

is an appropriate central tendency point estimate of the predicted RUL distribution at the time index

t_{λ}

.

Since relative accuracy is expressed punctually, to attain an overall view of the algorithm behavior over time, it is necessary to aggregate the measurements as a normalized weighted sum of relative accuracies for all the predictions in one prognosis experiment, resulting in a metric called cumulative relative accuracy, which is:

C R A_{λ}^{l} = \frac{1}{n (p_{λ})} \sum_{i \in p_{λ}} w (r^{l} (t_{λ})) R A_{λ}^{l}

(10)

where

w (r^{l} (l))

is a weight factor as a function of RUL at all time instances,

p_{λ}

is the ordered set of all time indexes before

t_{λ}

, and

n (p_{λ})

is the cardinality of the set

p_{λ}

.

Apart from the metrics based on accuracy, it is also important to mention the monotonicity criteria applied as an input of the decision function for the RUL discrimination at the time

t_{i}

, previously elucidated. Lei et al. [13] argue that machinery degradation is an irreversible process and thus should be linked with monotonic increasing or decreasing trends.

There are monotonicity metrics based on the count of finite differences

d / d x = x_{k + 1} - x_{k}

of a health index sequence

X = {\{x_{k}\}}_{k = 1 : K}

with

x_{k}

constituting the value of HI at the time

t_{k}

[30]. The selected one is described as:

{M o n}_{1} (X) = \frac{1}{K - 1} ∣ N o . o f d / d x > 0 - N o . o f d / d x < 0 ∣

(11)

where

K

is the number of the elements of the set

X,

N o . o f d / d x > 0 - N o . o f d / d x < 0

represents the number of positive and negative differences, respectively, and then

{M o n}_{1} (X)

quantifies the absolute difference between them, normalizing it for the interval

[0, 1] .

3. Results

This chapter presents the results of applying the framework to the CMAPSS database. Section 3.1 describes the CMAPSS dataset, while Section 3.2 shows the results obtained.

3.1. Application Example in CMAPSS Dataset

The database chosen for the study is a variant of the Commercial Modular Aero-Propulsion System Simulation (CMAPSS), publicly available and recognized as one of the datasets frequently used for benchmark prediction algorithms. It was recently updated after joint work between NASA and ETH Zurick’s intelligent maintenance systems center, so that the amount of sensor samples has been increased to 1 Hz, making it suitable for the study of the models oriented for large volumes of data.

The CMAPSS-2 [31] is composed of a set of synthetic RTF trajectories, that is, with the artificial degradation of nine turbofan engines that were produced by the simulator from the input of real flight conditions, which are characterized by the scenario descriptor variables: altitude, Mach number, throttle-resolver angle (TRA), and total inlet blade temperature. The base is divided into six units designated for training and another three for testing, with operating conditions slightly different from the others. In this study, only the training data from CMAPSS-2 were used, which does not compromise the feasibility study since the tested model is unsupervised and, therefore, uses only a part of the samples from each unit for training.

The inserted degradation pattern is of a continuous type and is divided into four states: the degradation condition at the beginning of the operation; the normal state; a transition zone between the normal to abnormal conditions; and an abnormal state. The simulation considers the alternating presence of failure modes in the main sub-components of the motor: fan, LPC, HPC, HPT, and LPT. Their deteriorations are modeled by adjustments in flow capacity and efficiency. More information about the modeling can be found in Chao et al.’s work [32]. Figure 5 outlines the allocation of the main subsystems of a turbofan engine.

In this application example, the units have been subjected to high- and low-pressure turbine failure modes with an initial condition of random deterioration of about 10% of the health index implicit in the simulator. Table 3 details the failure modes for each unit and provides additional information on the number of samples, the transition time to abnormality, and end of life (

t_{e o l}

) in cycles. Figure 6 details the trajectory imposed on the flow and efficiency modifiers for the tested units.

This application example follows the framework with sets of hyperparameters and fixed neural network architecture, whose feature space is composed of 18 variables, which are the same condition monitoring signals used by Chao et al. [32]. In addition to that, a detailed description of the CMAPSS simulator variables can be found in [31].

The autoencoder models are subject to a validation procedure that consists of two steps: the first one is to evaluate whether its performance (through an analysis of the metrics presented in Section 2.4) surpasses that of a simplified baseline model, which does not use deep learning, and the second one is to compare it with alternatives presented in the literature that employ similar techniques and databases.

The baseline model is built from a simple regression extrapolation procedure of the pre-processed original inputs of the database, following the sequence of steps: down sampling at a rate of 1 sample every 200 (without crossing the limits of operational cycles) and later smoothing by simply moving an average size of 500, so that the samples of this model and the one submitted for validation are similar, and then the application (see Figure 2) of the methodology and performance evaluation, obviously with the metrics of Section 2.4.

3.2. Results from Application Example

The MSE loss convergence during the networks’ training progression is shown in Figure 7 and the progression of useful life estimations over the course of the operation of the units is presented in Figure 8 and Figure 9. The time instant

t

, x-axis, is normalized in relation to the total life (

t_{e o l}

) of the motors and is interpreted as a percentage (0–100%) of

t_{e o l}

or as normalized cycles. The y-axis indicates the predicted RUL at instant t (also expressed as a percentage of

t_{e o l}

) and the orange dashed line, the real value of the

R U L

(that is,

t - t_{e o l})

at that instant. It is noted that the beginning of the forecast differs from the units since it is directly related to the abnormality detection capacity, which is made by a criterion similar to that used by Rosa et al. [23], wherein there is a difference in a consecutive set of points of the maximum reconstruction error between the samples in NOC.

For all the analyzed models, the time of the first prediction (

t_{f p t}

) occurred after half of the degradation time of the engines. From 50% to 65% constitutes a region of instability in the forecasts in which there are remaining life estimates that exceed the value of t_eol near 100% or underestimate it close to 1%. This is because the deterioration trends are incipient and have a low rate of change, which makes it difficult for the algorithm to decide which of the curves is the most appropriate, as some have a very similar fit condition. After 70% of the t_eol, a stable convergence zone is formed, and the adherence of the projections to the real RUL curve gradually improves up to 100%, which is the desired behavior. Compared to the baseline model, the proposed models advance to a stable condition much earlier (~65%) than the Baseline (~80%).

From the three models tested, Conv-1d showed the best result in terms of advancing convergence to the actual prognostic result for all units. It can be seen from Figure 8 that it is the model with the most anticipated first average prediction time of all the units and adheres to the reference line of progression of the RUL in about 75% of the t_eol. The MLP model visually manifested a behavior similar to the convolutional one and also presented a zone of forecast instability with high fractional RMSE but with time stamping metrics (fpt, H_T(5), and H_T(20)) later compared to the second. The LSTM model did not show a concentrated region of large prediction errors like the previous two, but it did show sparse peaks of high errors for two or three cycles in units 2, 16, 15, and 5. Although it may seem that the LSTM provides more stable predictions, in fact, gaps in the forecasts may occur, especially in the region of 60–75% t_eol, in which large magnitude discrepancies are suppressed by the restriction of the algorithm to disregard RUL estimates, if r(t) + t exceeds t_eol, above 300 cycles.

For a moving average subsequence of n = 500, it can be seen that the three autoencoder models outperform the Baseline, which starts to provide consistent forecasts after 80 normalized cycles have elapsed. An increase in the time window of the moving average could proportionate a positive impact, especially on the base model, as it benefits the most from signal attenuation in regions of instability. However, increasing

n

reduces the number of samples of each unit available for curve fitting in the prediction algorithm so that the RUL of some units arranged in Table 3 could not be calculated.

The difference between the forecast and the actual value of the

R U L

, also expressed as a percentage of the

t_{e o l}

, is shown in Figure 10 and Figure 11. The blue dashed lines indicate 20% error limits in Figure 10 and 5% error limits in Figure 11, which is taken as a reference for calculating the prognostic horizon. The proposed method manages to keep the estimates within the error margin of +/− 20%

t_{e o l}

, but it has complications in meeting the goal of +/− 5%

t_{e o l}

, with only a few units achieving this result even after 80% of the machine’s life. There are two possible reasons for this answer: the first one, mentioned above, is the absence of a global tuning of the model, including the neural architecture, which is not at its optimal performance in terms of training with NOC samples; the second one is the uncertainty regarding the choice of the error threshold for the prognosis, which can increase the estimates above what was expected.

The summary of the results obtained for the values of the performance metrics is presented in Table 4, while Table A1 (in Appendix A), presents all the results organized by unit. Both tables show the RMSE, fractional RMSE’s L₁, L₂, and L₃, time of the first prediction (

t_{f p t}

), the

n s

(Equation (6)) divided by the total number of estimates, cumulative relative accuracy, and prognostic horizons for 5% and 20% of the errors. It should be noted that L₁, L₂, and L₃ stand for the RMSE fraction only for samples inside the first, second, and third thirds of the second half of the normalized t_eol, respectively.

There is no expressive gain in the RMSE of the proposed model when compared to the Baseline (−15.64%, Table 4) due to the rough projections made at the beginning of the degradation process. When these prognostic samples are disregarded, it is possible to notice a performance gain for this metric, which is expressive from the third third (L₃) and improves the proximity of

t_{e o l}

, therefore quantitatively corroborating the notion that the proposed model advances to a state of convergence in the zone before the baseline.

A Table 4 inspection reveals that the global RMSE is lower than the Baseline model. The reasons are that the Baseline model produced less and later estimates in comparison to the autoencoders, as can be viewed in Figure 8 and Figure 9, and when closer to the end of life, the prediction errors tend to be smaller due to the presence of more information about the pronounced degradation. The models start to equate in performance as there is an approximation to the stable convergence zone, and there is a slight divergence between the

R M S E_{L 3}

values. Although the Baseline also has a lower

R M S E_{L 3}

, it should be noticed that it has performed fewer predictions even in that region—see Figure 9 and Figure 11. The prognostic horizon is certainly greater for the autoencoder-based solutions, highlighting the Conv-1d, which has the earlier

t_{f t p}

, so a correlation with the

H_{⊤ (C)}

was already expected. The difference

t_{f t p}

–

H_{⊤ (C)}

could be interpreted as a latency of the model in the archive or an acceptable error margin.

Moreover,

n s

, another error evaluation metric, differed from the RMSE’s outcomes by showing a similar quantity overall. This fact is justifiable because, even though the models have differed significantly in global accuracy, all of them displayed a greater tendency to overestimate predictions, which is penalized by this metric. CRA, in turn, follows the RMSE behavior, as they are almost analogous measurements when a linear weighting

w (x) = x

(Equation (10)) is taken.

Generally, the proposed autoencoder models are more stable than the Baseline model, detect abnormalities earlier, and enter a region of stable convergence earlier. They manage to meet the margin of error requirement below 20% of

t_{e o l}

for at least a fourth of the unit’s life but struggle to meet the requirement of a 5% forecast horizon.

Finally, the comparison with the literature is based on the publication by Chao et al. [32], who also built deep learning models to estimate the RUL on the CMAPSS-2 basis. This comparison aims to verify if the framework exhibits coherent behavior for the predictions over time. It is made by the qualitative inspection of the prediction errors’ progression, see Figure 10 and Figure 11, which is also plotted by Chao et al. [32] for the same three kinds of layers used in this study. Moreover, some performance measurements taken in this work are compared with the results obtained by the cited author. They are the RMSE and prognostic horizon.

The presented models could not overcome the data-oriented arrangements programmed by Chao et al. [32], nor is this the intention, as they use supervised learning, thus mapping the channel signature throughout the degradation evolution and not only in the NOC. Even if it is not possible to exceed this author in performance, it is important to note that there is a great proximity between the mean squared error values for the stable convergence zone (

R M S E_{L 3}

). There is a great similarity between the behavior of the operating time forecasts plotted by this author with the one shown in this work. Greater uncertainty is also demonstrated at the beginning of the forecasting process and gradually reduces until t_eol. It is observed that the use of a supervised technique allows a t_fpt very close to the beginning of the unit’s life and that the supervised method purely derived from ANN can make inferences almost in real time after being trained and generated by a new sample (without the computational costs of the curve fitting). On the other hand, supervised learning techniques tend to be more specific to the application—failure mode—and have a shorter lifespan, requiring retraining to adapt to changes in the operating equipment.

Therefore, the advantage of our approach is its capacity to be easily implemented in an industrial context, which has the particularity of having an abundance of engineering system data in NOC with few recorded faults. Furthermore, real scenarios had lower-quality data labels or unlabeled data. Our framework is designed considering this observation since there is no need to attribute labels or even discriminate sets of abnormal samples. Another point is that we elaborate a complete framework that embraces detection and prognostic models, while Chao et al. [32] focus only on train models for RUL estimation without worrying about scalability. In the end, the proposed framework is more suitable for use in different industrial domains and has an extensive application range because it does not require physical information or intensive knowledge about the fault’s nature and its signature in the sensors’ readings.

4. Conclusions

The above results allow us to infer that the framework developed can match the performance with a baseline model that uses simple linear regression on pre-processed signals. It is noteworthy that there is still a large margin of adjustment available through hyperparameter modulations, neural network architectures, and post-processing adjustments for reconstruction errors, among others, to achieve more significant gains.

One future improvement of the prediction algorithm concerns the predictions at the beginning of the degradation process, in which different curves tend to show high

R^{2},

overestimating the remaining lifetime value. This behavior is somewhat expected, as predictions tend to improve in accuracy as more information about the condition becomes available. However, the decision functions need to be fixed to avoid outliers that exceed

t_{e o l}

values by more than 300% and avoid producing an atypical variability of predictions between two consecutive moments. One way to achieve this is to force downgrade outliers by noting that

d R U L_{r e a l} / d t = - 1

. In addition, it is also necessary to calibrate the decision functions better so that the other indicators are taken into account in the hierarchy of functions and sensors, which currently relies heavily on R² values.

As for the data pre-processing and reconstruction error post-processing routines, it is feasible to explore new trend-smoothing techniques that do not suppress many samples, which was the case with the moving average used. Applying a moving average of

n = 2000

points implies losing the equivalent of 30 cycles at the beginning of the abnormality state and contributing to the effect mentioned in the above paragraph. This is a problem for units with low

t_{e o l}

, or that operated for a few cycles in an abnormal condition before failure, such as unit 14, as there are few samples designated for the prognostic step. Another alternative would be to merge the labeled inputs with those in normal conditions at an offset of n at most before the detection point.

For future works, a study on the quantification of uncertainties at different stages of the method can be carried out. Potential advances can be made by using Bayesian neural networks or generative models or by adopting probabilistic regressors to extrapolate reconstruction errors. Other sources of uncertainties are in the attribution of the moment of transition to the degraded state, in the attribution of the prognostic trend limit, and, therefore, in the estimation of the RUL, the latter of which absorbs all the variability arising from the decision-making process within the algorithm of prognosis as well as inherits the epistemological and random remnants of the models and data sets, respectively. On the other hand, quantifying multiple sources of uncertainty considerably impacts performance, especially if Monte Carlo random sampling techniques are prevalent. The problem of integrating multiple sources of uncertainty within this framework to produce predictive results with safety margins and fulfill the specificities of the current regulations in a computationally efficient manner is still open to the author. In the absence of the possibility of performing this globally, it is recommended to identify the factors that most contribute to the variability of the prognosis result, such as the definition of the limit error for prognosis, which greatly impacts performance, as observed in the application example.

Author Contributions

Conceptualization, T.G.d.R., A.H.d.A.M., F.H.P., F.N.K., G.F.M.d.S. and G.M.D.O.S.; methodology, T.G.d.R., A.H.d.A.M., F.H.P., F.N.K. and G.F.M.d.S.; validation, T.G.d.R., A.H.d.A.M., F.H.P., F.N.K. and G.F.M.d.S.; formal analysis, T.G.d.R.; investigation, T.G.d.R.; data curation, T.G.d.R.; writing—original draft preparation, T.G.d.R.; writing—review and editing, T.G.d.R., A.H.d.A.M., F.H.P. and F.N.K.; supervision, G.F.M.d.S.; project administration, G.F.M.d.S. and G.M.D.O.S.; funding acquisition, G.F.M.d.S. and G.M.D.O.S. All authors have read and agreed to the published version of the manuscript.

Funding

This paper presents part of the results obtained with the execution of the project PD-06491-0341-2014 “Methodology for asset management applied to hydro-generators based on mathematical models of reliability and maintainability” carried out by the Federal University of Technology at Parana and University of Sao Paulo to COPEL Geração e Transmissão S.A within the scope of the Electric Sector’s Research and Technological Development Program regulated by the National Agency of Electrical Energy (ANEEL).

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: https://www.nasa.gov/content/prognostics-center-of-excellence-data-set-repository (accessed on 4 December 2022).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1 presents the performance metrics for each unit’s remaining life evaluation of the tested models.

Table A1. Performance metrics for each unit’s remaining life evaluation of the tested models. L₁, L₂, and L₃ stand for the RMSE fraction only for samples inside the first, second, and third thirds of the normalized t_eol second half, respectively.

Autoencoder	Unit	RMSE	RMSE_L1	RMSE_L2	RMSE_L3	t_fpt	ns	CRA	H_T(5)	H_T(20)
Conv-1d	2	28.174	12.882	14.076	8.793	61.333	0.402	−0.052	4	9.333
	5	39.661	9.798	7.56	1.299	64.045	0.403	0.408	12.36	33.708
	10	11.883	8.405	8.76	2.491	57.317	0.405	0.5	9.756	42.683
	11	42.548	5.922	5.427	6.443	59.322	0.402	0.095	1.695	30.508
	14	91.845	91.845	35.161	48.548	97.368	0.407	−31.796	0	0
	15	70.579	37.695	21.768	14.243	52.239	0.404	−0.791	0	19.403
	16	82.639	83.927	6.662	3.703	47.619	0.404	−0.417	11.111	23.81
	18	58.992	10.021	5.521	2.912	30.985	0.4	0.26	32.394	32.394
	20	24.333	16.163	8.475	3.301	31.818	0.398	0.43	15.152	51.515
MLP	2	28.174	12.882	14.076	8.793	61.333	0.402	−0.052	4	9.333
	5	39.661	9.798	7.56	1.299	64.045	0.403	0.408	12.36	33.708
	10	11.883	8.405	8.76	2.491	57.317	0.405	0.5	9.756	42.683
	11	42.548	6.922	5.427	6.443	59.322	0.402	0.095	1.695	30.508
	14	91.845	91.845	35.161	48.548	97.368	0.407	−31.796	0	0
	15	70.579	37.695	21.768	14.243	52.239	0.404	−0.791	0	19.403
	16	82.639	83.927	6.662	3.703	47.619	0.404	−0.417	11.111	23.81
	18	58.992	10.021	5.521	2.912	30.986	0.4	0.26	32.394	32.394
	20	24.333	16.163	8.475	3.301	31.818	0.398	0.43	15.152	51.515
LSTM	2	28.174	12.882	14.076	8.793	61.333	0.402	−0.052	4	9.333
	5	39.661	9.798	7.56	1.299	64.045	0.403	0.408	12.36	33.708
	10	11.883	8.405	8.76	2.491	57.317	0.405	0.5	9.756	42.683
	11	42.548	6.922	5.427	6.443	59.322	0.402	0.095	1.695	30.508
	14	91.845	91.845	35.161	48.548	97.368	0.407	−31.796	0	0
	15	70.579	37.695	21.768	14.243	52.239	0.404	−0.791	0	19.403
	16	82.639	83.927	6.662	3.703	47.619	0.404	−0.417	11.111	23.81
	18	58.992	10.021	5.521	2.912	30.986	0.4	0.26	32.394	32.394
	20	24.333	16.163	8.475	3.301	31.818	0.398	0.43	15.152	51.515
Baseline	2	28.174	12.882	14.076	8.793	61.333	0.402	−0.052	4	9.333
	5	39.661	9.798	7.56	1.299	64.045	0.403	0.408	12.36	33.708
	10	11.883	8.405	8.76	2.491	57.317	0.405	0.5	9.756	42.683
	11	42.548	6.922	5.427	6.443	59.322	0.402	0.095	1.695	30.508
	14	91.845	91.845	35.161	48.548	97.368	0.407	−31.796	0	0
	15	70.579	37.695	21.768	14.243	52.239	0.404	−0.791	0	19.403
	16	82.639	83.927	6.662	3.703	47.619	0.404	−0.417	11.111	23.81
	18	58.992	10.021	5.521	2.912	30.986	0.4	0.26	32.394	32.394
	20	24.333	16.163	8.475	3.301	31.818	0.398	0.43	15.152	51.515

References

Melani, A.H.A.; Murad, C.A.; Caminada Netto, A.; de Souza, G.F.M.; Nabeta, S.I. Criticality-Based Maintenance of a Coal-Fired Power Plant. Energy 2018, 147, 767–781. [Google Scholar] [CrossRef]
Melani, A.H.D.A.; Michalski, M.A.D.C.; da Silva, R.F.; de Souza, G.F.M. A Framework to Automate Fault Detection and Diagnosis Based on Moving Window Principal Component Analysis and Bayesian Network. Reliab. Eng. Syst. Saf. 2021, 215, 107837. [Google Scholar] [CrossRef]
Jardine, A.K.S.; Lin, D.; Banjevic, D. A Review on Machinery Diagnostics and Prognostics Implementing Condition-Based Maintenance. Mech. Syst. Signal Process. 2006, 20, 1483–1510. [Google Scholar] [CrossRef]
Ahmad, R.; Kamaruddin, S. An Overview of Time-Based and Condition-Based Maintenance in Industrial Application. Comput. Ind. Eng. 2012, 63, 135–149. [Google Scholar] [CrossRef]
ISO 13381-1:2015; Condition Monitoring and Diagnostics of Machines—Prognostics—Part 1: General Guidelines. International Organization for Standardization: Geneva, Switzerland, 2015.
Fink, O.; Wang, Q.; Svensén, M.; Dersin, P.; Lee, W.J.; Ducoffe, M. Potential, Challenges and Future Directions for Deep Learning in Prognostics and Health Management Applications. Eng. Appl. Artif. Intell. 2020, 92, 103678. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Tao, J.; Liu, Y.; Yang, D. Bearing Fault Diagnosis Based on Deep Belief Network and Multisensor Information Fusion. Shock. Vib. 2016, 2016, 9306205. [Google Scholar] [CrossRef] [Green Version]
Babu, G.S.; Zhao, P.; Li, X.L. Deep Convolutional Neural Network Based Regression Approach for Estimation of Remaining Useful Life. In Lecture Notes in Computer Science (Including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2016; Volume 9642. [Google Scholar]
Malhotra, P.; TV, V.; Ramakrishnan, A.; Anand, G.; Vig, L.; Agarwal, P.; Shroff, G. Multi-Sensor Prognostics Using an Unsupervised Health Index Based on LSTM Encoder-Decoder. arXiv 2016, arXiv:1608.06154. [Google Scholar]
Wu, X.; Zhang, Y.; Cheng, C.; Peng, Z. A Hybrid Classification Autoencoder for Semi-Supervised Fault Diagnosis in Rotating Machinery. Mech. Syst. Signal Process. 2021, 149, 107327. [Google Scholar] [CrossRef]
Vachtsevanos, G.; Lewis, F.; Roemer, M.; Hess, A.; Wu, B. Intelligent Fault Diagnosis and Prognosis for Engineering Systems; Wiley: Hoboken, NJ, USA, 2007. [Google Scholar]
Lei, Y.; Li, N.; Guo, L.; Li, N.; Yan, T.; Lin, J. Machinery Health Prognostics: A Systematic Review from Data Acquisition to RUL Prediction. Mech. Syst. Signal Process. 2018, 104, 799–834. [Google Scholar] [CrossRef]
Kolokas, N.; Vafeiadis, T.; Ioannidis, D.; Tzovaras, D. A Generic Fault Prognostics Algorithm for Manufacturing Industries Using Unsupervised Machine Learning Classifiers. Simul. Model. Pract. Theory 2020, 103, 102109. [Google Scholar] [CrossRef]
Sikorska, J.Z.; Hodkiewicz, M.; Ma, L. Prognostic Modelling Options for Remaining Useful Life Estimation by Industry. Mech. Syst. Signal Process. 2011, 25, 1803–1836. [Google Scholar] [CrossRef]
Cheng, Y.; Wang, C.; Wu, J.; Zhu, H.; Lee, C.K.M. Multi-Dimensional Recurrent Neural Network for Remaining Useful Life Prediction under Variable Operating Conditions and Multiple Fault Modes. Appl. Soft Comput. 2022, 118, 108507. [Google Scholar] [CrossRef]
Fan, Y.; Nowaczyk, S.; Rögnvaldsson, T. Transfer Learning for Remaining Useful Life Prediction Based on Consensus Self-Organizing Models. Reliab. Eng. Syst. Saf. 2020, 203, 107098. [Google Scholar] [CrossRef]
Zhou, Z.; Li, T.; Zhao, Z.; Sun, C.; Chen, X.; Yan, R.; Jia, J. Time-Varying Trajectory Modeling via Dynamic Governing Network for Remaining Useful Life Prediction. Mech. Syst. Signal Process. 2023, 182, 109610. [Google Scholar] [CrossRef]
Wang, L.; Cao, H.; Xu, H.; Liu, H. A Gated Graph Convolutional Network with Multi-Sensor Signals for Remaining Useful Life Prediction. Knowl. Based Syst. 2022, 252, 109340. [Google Scholar] [CrossRef]
Song, T.; Liu, C.; Wu, R.; Jin, Y.; Jiang, D. A Hierarchical Scheme for Remaining Useful Life Prediction with Long Short-Term Memory Networks. Neurocomputing 2022, 487, 22–33. [Google Scholar] [CrossRef]
Xu, D.; Xiao, X.; Liu, J.; Sui, S. Spatio-Temporal Degradation Modeling and Remaining Useful Life Prediction under Multiple Operating Conditions Based on Attention Mechanism and Deep Learning. Reliab. Eng. Syst. Saf. 2023, 229, 108886. [Google Scholar] [CrossRef]
de Pater, I.; Mitici, M. Developing Health Indicators and RUL Prognostics for Systems with Few Failure Instances and Varying Operating Conditions Using a LSTM Autoencoder. Eng. Appl. Artif. Intell. 2023, 117, 105582. [Google Scholar] [CrossRef]
da Rosa, T.G.; de Andrade Melani, A.H.; Kashiwagi, F.N.; de Carvalho Michalski, M.A.; de Souza, G.F.M.; de Oliveira Salles, G.M.; Rigoni, E. Data Driven Fault Detection in Hydroelectric Power Plants Based on Deep Neural Networks. In Proceedings of the 32nd European Safety and Reliability Conference, Dublin, Ireland, 28 August–1 September 2022; Leva, M.C., Patelli, E., Podofillini, L., Wilson, S., Eds.; Research Publishing: Singapore, 2022; p. 8. [Google Scholar]
Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for Hyper-Parameter Optimization. In Advances in Neural Information Processing Systems; Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2011; Volume 24. [Google Scholar]
Rogalewicz, M. Some Notes on Multivariate Statistical Process Control. Manag. Prod. Eng. Rev. 2012, 3, 80–86. [Google Scholar] [CrossRef]
Liu, S.; Fan, L. An Adaptive Prediction Approach for Rolling Bearing Remaining Useful Life Based on Multistage Model with Three-Source Variability. Reliab. Eng. Syst. Saf. 2022, 218, 108182. [Google Scholar] [CrossRef]
Saxena, A.; Celaya, J.; Saha, B.; Saha, S.; Goebel, K. Metrics for Offline Evaluation of Prognostic Performance. Int. J. Progn. Health Manag. 2010, 1, 4–23. [Google Scholar] [CrossRef]
Saxena, A.; Celaya, J.; Saha, B.; Saha, S.; Goebel, K. Evaluating Prognostics Performance for Algorithms Incorporating Uncertainty Estimates. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, 6–13 March 2010. [Google Scholar]
Chao, M.A.; Kulkarni, C.; Goebel, K.; Fink, O. Fusing Physics-Based and Deep Learning Models for Prognostics. Reliab. Eng. Syst. Saf. 2022, 217, 107961. [Google Scholar] [CrossRef]
Javed, K.; Gouriveau, R.; Zerhouni, N.; Nectoux, P. Enabling Health Monitoring Approach Based on Vibration Data for Accurate Prognostics. IEEE Trans. Ind. Electron. 2015, 62, 647–656. [Google Scholar] [CrossRef] [Green Version]
Frederick, D.K.; Decastro, J.A.; Litt, J.S. User’s Guide for the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS); NASA: Washington, WA, USA, 2007.
Chao, M.A.; Kulkarni, C.; Goebel, K.; Fink, O. Aircraft Engine Run-to-Failure Dataset under Real Flight Conditions for Prognostics and Diagnostics. Data 2021, 6, 5. [Google Scholar] [CrossRef]

Figure 1. Flowchart describing the prognostic framework.

Figure 2. Representation of the autoencoders’ architectures. Each box indicates a layer, and the arrows indicate the information flow. (a) Conv-1d. (b) LSTM. (c) MLP.

Figure 3. Flowchart systematizing the abnormality detection procedure.

Figure 4. Exemplification of the procedure in Step 2. In (a), there is a temporal progression of the reconstruction errors for an arbitrary variable. Train—from zero to the first green line—and validation—interposed between the green lines—sets correspond to NOC, while the test set is in the degraded condition. E(y_r, y_p)—dashed red line—is the maximum error between the samples in the condition labeled as normal. The localization of E in the samples’ distribution is displayed in (b) as a continuous red line. As the monotonic pattern evolves, it exceeds E, and if a certain quantity of consecutive RE keeps above the limit, the abnormality is registered.

Figure 5. Representation of the main subsystems of the turbofan engine simulated in CMAPSS. From left to right: fan, low-pressure compressor (LPC), high-pressure compressor (HPC), combustion chamber, high-pressure turbine (HPT), and low-pressure turbine (LPT).

Figure 6. Evolution of flow and efficiency modifier trajectories over time. Degradation is introduced into the system in the timestep indicated by the vertical dashed lines. Therefore, the modulation of these inputs induces a response in the array of sensors. In (a) the HPT efficiency modifier is alterated, in (b) the LPT effiency modifier and in (c) LPT flow modifier.

Figure 7. MSE loss convergence during the networks’ training progression: (a) Conv-1D; (b) LSTM; and (c) MLP.

Figure 8. Progression of predictions over units’ operational time—

r (t) x t_{i}

(

t_{e o l}

)—for the autoencoder reconstruction error extrapolation method and Baseline model. The dashed orange line means the ground truth RUL. In (a) results are presented for the Conv-1d autoencoder, (b) LSTM, (c) MLP and (d) for the Baseline model.

Figure 8. Progression of predictions over units’ operational time—

r (t) x t_{i}

(

t_{e o l}

)—for the autoencoder reconstruction error extrapolation method and Baseline model. The dashed orange line means the ground truth RUL. In (a) results are presented for the Conv-1d autoencoder, (b) LSTM, (c) MLP and (d) for the Baseline model.

Figure 9. Progression of predictions over units’ operational time—

r (t) x t_{i}

(

t_{e o l}

)—for the autoencoder reconstruction error extrapolation method and Baseline model. Close-up view with error above being 60%

t_{e o l}

suppressed. The dashed orange line means the ground truth RUL. In (a) results are presented for the Conv-1d autoencoder, (b) LSTM, (c) MLP and (d) for the Baseline model.

Figure 9. Progression of predictions over units’ operational time—

r (t) x t_{i}

(

t_{e o l}

)—for the autoencoder reconstruction error extrapolation method and Baseline model. Close-up view with error above being 60%

t_{e o l}

suppressed. The dashed orange line means the ground truth RUL. In (a) results are presented for the Conv-1d autoencoder, (b) LSTM, (c) MLP and (d) for the Baseline model.

Figure 10. Progression of the prediction error relative to the total life of the asset,

E_{p} = 100 \times (r^{*} (t) - r (t)) / t_{e o l},

over the time of operation of the units for the reconstruction error extrapolation. The prognostic horizon of 20% is represented by the blue dashed lines. In (a) results are presented for the Conv-1d autoencoder, (b) LSTM, (c) MLP and (d) for the Baseline model.

Figure 10. Progression of the prediction error relative to the total life of the asset,

E_{p} = 100 \times (r^{*} (t) - r (t)) / t_{e o l},

over the time of operation of the units for the reconstruction error extrapolation. The prognostic horizon of 20% is represented by the blue dashed lines. In (a) results are presented for the Conv-1d autoencoder, (b) LSTM, (c) MLP and (d) for the Baseline model.

Figure 11. Progression of the prediction error relative to the total life of the asset,

E_{p} = 100 \times (r^{*} (t) - r (t)) / t_{e o l}

, over the time of operation of the units for the reconstruction error extrapolation and baseline methods. The prognostic horizon of 5% is represented by the blue dashed lines. In (a) results are presented for the Conv-1d autoencoder, (b) LSTM, (c) MLP and (d) for the Baseline model.

Figure 11. Progression of the prediction error relative to the total life of the asset,

E_{p} = 100 \times (r^{*} (t) - r (t)) / t_{e o l}

, over the time of operation of the units for the reconstruction error extrapolation and baseline methods. The prognostic horizon of 5% is represented by the blue dashed lines. In (a) results are presented for the Conv-1d autoencoder, (b) LSTM, (c) MLP and (d) for the Baseline model.

Table 1. Hyperparameter specification of the analyzed autoencoders.

Autoencoder	Hyperparameter	Value/Definition
Conv-1d, LSTM and MLP *	Number of layers—Encoder only	2
	Dropout rate	0.3
	Loss Function	MSE
	Optimization Technique	Adam
	Subsequences Size	200
	Validation data fraction	10%
Conv-1d	Strides	2
	Learning Rate	0.001
	Number of filter units	(32, 16)
	Kernel Size	(15, 10)
	Padding	same
	Activation Function	LeakyReLU
	Epochs	70
LSTM	Activation Function	tanh
	Number of LSTM units	(64, 32)
	Epochs	70
MLP	Activation Function	LeakyReLU
	Neurons	(32, 16)
	Epochs	100

* Here we list the hyperparameters common to the three types of autoencoders.

Table 2. Selected model functions

f^{(i)} (c^{(i)}, x)

that cause the vector f subject to be minimized in accordance with the algorithm presented in Figure 2.

Table 2. Selected model functions

f^{(i)} (c^{(i)}, x)

that cause the vector f subject to be minimized in accordance with the algorithm presented in Figure 2.

$Function f^{(i)}$	Behavior
1	$c_{1} x + c_{0}$
2	$c_{2} x^{2} + c_{1} x + c_{0}$
3	$c_{1} l o g (x) + c_{0}$
4	$c_{2} e^{c_{1} x} + c_{0}$

Table 3. Information about subset samples of each unit (adapted from [32]).

Dataset Fraction	Unit (u)	Rows (10⁴)	$t_{s}$ * (Cycles)	t_eol (Cycles)	Failure Mode
Training	2	8.5	17	75	HPT
	5	10.3	17	89	HPT
	10	9.5	17	82	HPT
	16	7.7	16	63	HPT + LPT
	18	8.9	17	71	HPT + LPT
	20	7.7	17	66	HPT + LPT
Test	11	6.6	19	59	HPT + LPT
	14	1.6	36	76	HPT + LPT
	15	4.3	24	67	HPT + LPT

* t_s represents the real transition time between normal and degraded conditions.

Table 4. Mean (μ) and standard deviation (σ) of the performance metrics computed for each model and the autoencoders overall.

Performance Metrics	Conv-1d		MLP		LSTM		Conv-1d, MLP, and LSTM Overall		Baseline
Performance Metrics	µ	σ	µ	σ	µ	σ	µ	σ	µ	σ
RMSE	49.702	23.942	44.116	30.256	42.505	36.587	45.441	6.322	14.702	12.102
RMSE_L1	122.926	81.626	44.815	5.514	49.569	17.053	72.437	41.020	52.499	0.000 *
RMSE_L2	35.958	15.820	63.927	52.562	52.505	66.543	50.797	26.198	20.968	15.921
RMSE_L3	8.668	7.929	9.312	9.856	18.267	27.008	12.082	10.503	7.964	5.549
t_fpt	57.627	11.532	65.039	15.695	62.454	19.762	61.707	4.115	75.633	19.729
ns	0.403	0.002	0.402	0.002	0.400	0.002	0.402	0.000	0.400	0.002
CRA	−0.311	0.912	−0.488	0.708	−0.482	0.785	−0.427	0.103	0.006	0.633
H_T(5)	11.767	6.123	8.205	7.847	4.169	4.690	8.047	1.581	4.724	2.984
H_T(20)	20.694	5.336	14.735	4.750	12.072	8.399	15.834	1.959	6.643	4.270

* Standard deviation value is 0 because there was just one sample.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rosa, T.G.d.; Melani, A.H.d.A.; Pereira, F.H.; Kashiwagi, F.N.; Souza, G.F.M.d.; Salles, G.M.D.O. Semi-Supervised Framework with Autoencoder-Based Neural Networks for Fault Prognosis. Sensors 2022, 22, 9738. https://doi.org/10.3390/s22249738

AMA Style

Rosa TGd, Melani AHdA, Pereira FH, Kashiwagi FN, Souza GFMd, Salles GMDO. Semi-Supervised Framework with Autoencoder-Based Neural Networks for Fault Prognosis. Sensors. 2022; 22(24):9738. https://doi.org/10.3390/s22249738

Chicago/Turabian Style

Rosa, Tiago Gaspar da, Arthur Henrique de Andrade Melani, Fabio Henrique Pereira, Fabio Norikazu Kashiwagi, Gilberto Francisco Martha de Souza, and Gisele Maria De Oliveira Salles. 2022. "Semi-Supervised Framework with Autoencoder-Based Neural Networks for Fault Prognosis" Sensors 22, no. 24: 9738. https://doi.org/10.3390/s22249738

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Supervised Framework with Autoencoder-Based Neural Networks for Fault Prognosis

Abstract

1. Introduction

2. The Proposed Framework

2.1. Step 1: Data Preparation

2.2. Step 2: Fault Detection

2.3. Step 3: Fault Prognosis

2.4. Step 4: Performance Assessment

3. Results

3.1. Application Example in CMAPSS Dataset

3.2. Results from Application Example

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI