1. Introduction
With the increasing scale and complexity of the modern production industry, the probability of system fault also increases, which may cause economic losses or even major safety incidents. Therefore, it is particularly necessary to monitor the production process. Due to the wide application of distributed control system (DCS), a large number of industrial process data have been recorded [
1,
2], and data-driven process monitoring methods have developed rapidly with the application of computer technology and artificial intelligence technology in process monitoring. The characteristics of historical data can be extracted by data-driven monitoring methods without detailed modeling of the internal mechanism of the process.
As a traditional data-driven method, multivariate statistical process monitoring (MSPM) has been widely used in the monitoring of stationary processes. The high-dimensional samples are projected into the low-dimensional subspace through MSPM, and the monitoring statistics in the low-dimensional subspace are calculated to monitor the process operation status [
3].
Among the methods based on multivariate statistics, principal components analysis (PCA) is one of the classical algorithms. The covariance matrix of the process data set is calculated to obtain the eigenvector of the matrix, so as to determine the direction of the reduced dimension projection [
4]. Partial least squares (PLS) is similar to PCA. The basic idea of PLS is to establish a small number of input and output comprehensive variables, so as to reflect the change information contained in the original variables more intensively, and then a linear regression model is established [
5]. The square prediction error (SPE) and Hotelling’s statistics (
) are calculated to achieve process monitoring in these two methods [
6]. However, it is usually assumed in traditional MSPM that the relationship among process variables is linear and variables are stationary. Dynamic and nonstationary characteristics in the process are not taken into account, which refers to the autocorrelation and time-varying characteristics of variables. The abnormal deviations at an early stage of process faults could be buried in these nonstationary trends, and cannot be effectively detected in time [
7]. Therefore, several new monitoring methods are proposed for the complex characteristics of the process.
In order to deal with the dynamic characteristics of the process, Ku proposed dynamic principal components analysis (DPCA) on the basis of PCA [
8]. The time-lagged variables of the original data were extended to reflect the dynamic relationship among variables. However, Rato [
9] pointed out that the principal components extracted by DPCA still retain strong autocorrelation, which result in reduced monitoring performance of
and SPE. A common way to deal with the nonstationary process is to make several differences on the nonstationary variables to obtain the stationary variables and then establish the monitoring model. The autoregressive integrated moving average model (ARIMA) was first proposed by Box and Jenkins [
10], whose basic idea is to analyze the characteristics of autocorrelation and partial autocorrelation functions of stationary series after difference, and the parameters of the model are calculated to test the effectiveness of the model to predict the future time series. However, the dynamic information of the process is lost after the difference, which makes the monitoring model less effective.
Cointegration analysis (CA) was first proposed by Engle to deal with nonstationary economic variables [
11]. Hendry proposed the error correction model (ECM) in 1978, through which the nonstationary series are converted into stationary series without difference modeling. Granger proposed the relationship between cointegration and ECM in 1981 [
12]. In 1987, Engle and Granger integrated the vector autoregressive model (VAR), ECM and cointegration theory to form the Granger representation theorem. Through this theory, the advantages of short-term and long-term models in time series analysis are combined, which provides a better solution for the modeling of nonstationary time series. Due to the internal physical and chemical mechanism of modern industrial process, there is a long-term dynamic equilibrium relationship among variables, which can be handled by cointegration theory. Therefore, CA has been widely used in the industrial field in recent years. Chen applied CA to the industrial field for the first time and introduced the reduced-order model diagnosis method to isolate the system fault. The simulation example in the fluid catalytic cracking unit (FCCU) system showed that cointegration has a good prospect in the application of condition monitoring and fault diagnosis for engineering systems [
13].
Xu revealed the shortcomings of the monitoring strategy based on the traditional unit root test method which is insensitive to some system faults and the limitations of the reduced order cointegration model method through some examples, and put forward the method of using the unit root of structural mutation and the Gregory–Hansen cointegration test to carry out system condition monitoring and fault diagnosis. The results showed that the deficiencies of the ADF test were remedied and the variables containing fault information can be directly determined [
14]. Yu proposed an adaptive monitoring scheme based on recursive CA to address the issues that when the cointegration relationship changes, the operation status of future nonstationary process could not be reflected accurately by the previous CA. Three monitoring statistics were developed to reflect the operation status of the industrial process, and experimental results of two real industrial processes showed that the adaptive monitoring strategy based on recursive CA could effectively adapt to normal process changes without frequent model updating [
15]. A new monitoring index that contains multiple order moments was proposed by Wen [
16] to capture different statistical features of the stationary data set. The results showed that the use of multiple order moments as a monitoring index based on cointegration analysis can provide early alarms for abnormal conditions and can effectively identify normal changes and abnormalities.
In simple terms, if a linear combination of a group of nonstationary time series is stationary, it means that linear cointegration exists in the time series. However, the traditional linear cointegration model is not sensitive enough to some faults when the relationship of most industrial process variables is nonlinear, which should be considered when establishing the monitoring model.
Alternating conditional expectation (ACE) was proposed by Breiman [
17] and improved by Xue [
18] through replacing the conditional expectation calculation of finite data sets with a data smoothing technique called Supermoore. The basic principle of ACE is to transform the dependent variable and the independent variable to maximize the linear correlation between the transformed dependent variable and the independent variable.
Zhang [
19] applied it to answer the question that if the sequences
and
are nonstationary and non-cointegrated, then under what conditions the cointegration relationship exists between the nonlinear transformation
and
. He pointed out that it is not necessary to consider whether there is a cointegration relationship between the transformed sequence
and
any more if there is a cointegration relationship identified between
and
. In such a case, the theory of linear cointegration can perform well. On the contrary, if there is no cointegration relationship between
and
, the existing linear cointegration theory will no longer be applicable. In this case, the establishment of the structural model of the transformed sequence
and
will broaden the application scope of the cointegration theory.
Based on above discussion, a new monitoring strategy is proposed in this paper. An ACE algorithm is used to transform historical data to maximize the linear correlation between variables, then CA is used to analyze the cointegration relationship among variables and establish the cointegration model, and finally the statistics are calculated to monitor the process. The strategy is applied to the monitoring of a simulation case and a catalytic reforming unit in a petrochemical company. The results show that this method can realize the monitoring of nonstationary process and find the equilibrium relationship which cannot be found by the traditional cointegration method, and it can improve the sensitivity of the monitoring model.
The rest of this paper is organized as follows: the theories and methods used in this paper are introduced in
Section 2. The detailed steps of the method proposed in this paper are introduced in
Section 3. Three cases are used to verify the effectiveness of the proposed method in
Section 4, including two simulation cases and a real industrial case. Finally, the paper is concluded in
Section 5.
3. The Proposed Monitoring Strategy
Nonstationary working conditions often exist in actual industrial production. The nonstationary trend of variables cannot be distinguished from the trend caused by an abnormal process through the traditional multivariate statistical method. The change of a long-term equilibrium relationship among nonstationary variables can be monitored instead of the nonstationary variables themselves by CA. However, the traditional linear cointegration model is not sensitive to some faults if there is a nonlinear relationship among certain process variables. The purpose of the ACE algorithm is to find a pair of nonlinear transformation functions that maximize the linear correlation of the transformed sequences. Therefore, establishing the structural model of the transformed sequence will broaden the application range of the cointegration theory.
The monitoring strategy based on ACE and CA is proposed in this article. The algorithm block diagram is as shown in
Figure 1.
3.1. Offline Modeling
Step 1: Perform ADF test on the training data to distinguish the nonstationary variables, variables will be selected as the modeling variables.
Step 2: Select one of the variables as target variable , which is a general variable prone to abnormal changes, other variables as . A group of transform data and are calculated through ACE. Then is replaced with the average of . The original variables are polynomial fitted to the transformed variables to obtain the nonlinear transformation equation.
Step 3: Normalize the transformed variables with the following formula:
where
is the average of
,
is the standard deviation. The number of cointegration vectors
is obtained through Johansen test, and the cointegration coefficient matrix
is obtained from maximum likelihood estimation.
Step 4: Construct the monitoring statistics and control limit so that online data can be monitored.
3.2. Online Monitoring
Step 1: Select the variables determined by the training data as the model input variables.
Step 2: Transform variables by the nonlinear transformation equations in step 2 above.
Step 3: Normalize the transformed variables by the process variables calculated in step 3 above. Project the transformed variables onto the cointegration coefficient matrix obtained in step 3 above:
Step 4: Construct monitoring statistics . When the monitoring statistics exceed the control limit , the system will trigger an alarm.
5. Conclusions
In this work, a nonstationary process monitoring strategy based on CA and an ACE algorithm is proposed. Through traditional CA, only the linear cointegration relationship between variables can be extracted. As a nonparametric method, the ACE algorithm only depends on the extremely weak distribution assumption, and a variety of nonlinear transformation forms of data can be obtained, so that the nonlinear characteristics of different forms of variables can be described [
24]. Thus, the nonlinear cointegration relationship can be extracted by CA combined with ACE.
Aiming at nonlinear and nonstationary industrial data, nonlinear transformation derived by ACE is first performed on non-cointegration series. These transformations converge gradually to an optimal transformation obtained through the nonparametric data smoothing technique, namely the optimal ACE transformation, which is similar to robust optimization [
25,
26]. If there is a certain long-term nonlinear relationship between these series, this long-term equilibrium relationship among these transformed series could be extracted by traditional CA, which means that the transformed series become cointegrated and the nonlinear and nonstationary data characteristics can be extracted.
The strategy proposed is also applied in multi-dimensional simulation data and industrial data. The results show that the strategy can trigger an alarm in time when the fault occurs, while the traditional monitoring strategy based on cointegration theory results in a large number of false alarms, which means that the strategy has a wider application range and higher sensitivity.