A Satellite Incipient Fault Detection Method Based on Decomposed Kullback–Leibler Divergence

Zhang, Ge; Yang, Qiong; Li, Guotong; Leng, Jiaxing; Yan, Mubiao

doi:10.3390/e23091194

Open AccessArticle

A Satellite Incipient Fault Detection Method Based on Decomposed Kullback–Leibler Divergence

by

Ge Zhang

^1,2

,

Qiong Yang

¹,

Guotong Li

^1,2,3,*,

Jiaxing Leng

¹ and

Mubiao Yan

^1,2

¹

Innovation Academy for Microsatellites of CAS, Shanghai 201203, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China

^*

Author to whom correspondence should be addressed.

Entropy 2021, 23(9), 1194; https://doi.org/10.3390/e23091194

Submission received: 30 July 2021 / Revised: 6 September 2021 / Accepted: 7 September 2021 / Published: 9 September 2021

(This article belongs to the Special Issue Fault Diagnosis Method Based on Information Theoretic: From Theory to Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Detection of faults at the incipient stage is critical to improving the availability and continuity of satellite services. The application of a local optimum projection vector and the Kullback–Leibler (KL) divergence can improve the detection rate of incipient faults. However, this suffers from the problem of high time complexity. We propose decomposing the KL divergence in the original optimization model and applying the property of the generalized Rayleigh quotient to reduce time complexity. Additionally, we establish two distribution models for subfunctions

F_{1} (w)

and

F_{3} (w)

to detect the slight anomalous behavior of the mean and covariance. The effectiveness of the proposed method was verified through a numerical simulation case and a real satellite fault case. The results demonstrate the advantages of low computational complexity and high sensitivity to incipient faults.

Keywords:

Kullback–Leibler (KL) divergence; fault detection; condition monitoring; incipient fault; generalized Rayleigh quotient (GRQ); optimum projection vector (PV)

1. Introduction

Due to the vigorous development of the space industry, the number of satellites in orbit has increased to meet various needs, such as navigation [1], communication [2], meteorology [3], and earth observation [4]. However, satellites face the risk of abnormalities or experience failure because of high-energy particles in space, electrostatic discharge, and cycle temperature [5,6,7]. Because serious faults may occur due to the continuous deterioration of incipient faults [8], timely and accurate detection of incipient faults can reserve sufficient processing time for satellite operation and maintenance system, which is of great significance to guarantee the availability and continuity of satellite services [9].

During the past three decades, the problem of satellite fault detection has been extensively studied in various studies [10,11,12,13]. In traditional satellite fault detection methods, such as threshold-based methods [14,15] and model-based methods [16,17], the thresholds or the models required for fault detection necessitate manual setting. Therefore, the performance of these fault detection methods heavily relies on the experience of experts [18]. In recent years, data-driven fault detection methods have eliminated this heavy dependence on expert experience and become a popular research field [19,20,21,22]. These methods establish normal models based on satellite normal historical data, and then compare the online data with the normal models to assess whether the online data is faulty. However, the methods proposed in the existing literature are mainly applied to serious faults, and an extremely small amount of research and application relates to incipient faults of satellites. The amplitudes of incipient faults are small compared to system signals, usually ranging from 1% to 10% [23], which are easily masked by normal system variations [24]. Therefore, satellite incipient fault detection is a daunting task [25].

Ji et al. [26] found that the introduction of smoothing technology can improve the detection rate of incipient faults. Jinane et al. [27] proposed an incipient fault detection method based on principal component analysis (PCA) and the KL divergence, but this method only considered the incipient faults in the principal component subspace. Chen et al. [23] proposed an improved method that monitors anomalous behaviors in principal and residual subspaces. Gautam et al. [28] presented a sensor incipient fault detection method based on a Kalman Filter and the KL divergence. Deng et al. [29] combined two-step localized kernel PCA with the KL divergence for nonlinear system incipient fault monitoring. Zhang et al. [30] proposed that the principal components obtained by PCA are not necessarily the optimum projection vector (PV) for detecting incipient faults. Furthermore, the problem of finding the optimum PV was modeled as an optimization model. Using local optimum PV in real time makes the method more sensitive to incipient faults, but it also raises the problem of high computational complexity. For this reason, this paper proposes a new incipient fault detection method with lower computational complexity by decomposing the KL divergence. The main contributions of this work are summarized as follows:

We analyzed the necessity and feasibility of decomposing the KL divergence in the optimization model.
We constructed two distribution models for subfunctions $F_{1} (w)$ and $F_{3} (w)$ .
The effectiveness of the proposed method was verified through a numerical case and a real satellite fault case.

This paper is organized as follows. The generalized Rayleigh quotient (GRQ) and original optimization model are introduced in Section 2. The fault detection method based on the decomposed KL divergence is presented in detail in Section 3. In Section 4, the proposed method is illustrated and analyzed through two cases. Finally, conclusions are given in Section 5.

2. Preliminary

In this section, we introduce the definition and property of the generalized Rayleigh quotient and note the problem of original optimization model.

2.1. Generalized Rayleigh Quotient (GRQ)

The GRQ is defined as follows [31]:

R (A, B, x) = \frac{x^{T} A x}{x^{T} B x}

(1)

where

x

is a non-zero vector,

A

is a symmetric matrix, and

B

is a positive definite symmetric matrix. The GRQ has a critical property that the maximum value of

R (A, B, x)

is equal to the maximum eigenvalue of matrix

B^{- 1} A

[32]; that is,

R (A, B, x) \leq λ_{\max}

, where

λ_{\max}

is the maximum eigenvalue of the matrix

B^{- 1} A

. In addition, the optimum vector

x

which maximizes

R (A, B, x)

is the eigenvector corresponding to the maximum eigenvalue [32].

The sum of two GRQs is defined as follows [33]:

R (A_{1}, B_{1}, A_{2}, B_{2}, x) = \frac{x^{T} A_{1} x}{x^{T} B_{1} x} + \frac{x^{T} A_{2} x}{x^{T} B_{2} x}

(2)

where

x

is a non-zero vector, both

A_{1}

and

A_{2}

are symmetric matrices, and both

B_{1}

and

B_{2}

are positive definite symmetric matrices.

Because iteration is not required, the maximum value of a single GRQ can be quickly obtained by directly applying the property of the GRQ. Regarding the maximum value of the sum of two GRQs, according to Reference [33], the time complexity of maximizing the sum of two GRQs is NP-hard. Prominently, accurate algorithms cannot solve large instances of such a problem, and approximate algorithms are necessary.

2.2. Original Optimization Model

Under the assumption that the data obey a multidimensional Gaussian distribution, and using the KL divergence to detect incipient faults, the problem of finding the optimum projection vector (PV) is modeled as follows [30]:

{\begin{matrix} \min_{w} - h (w) \\ s . t . w^{T} w = 1; \forall w_{i}, - 1 \leq w_{i} \leq 1, i \in [1, m] \end{matrix}

(3)

h (w) = \frac{1}{2} [\frac{w^{T} Σ_{y} w}{w^{T} Σ_{x} w} + \frac{w^{T} Σ_{x} w}{w^{T} Σ_{y} w} + {(Δ μ^{T} w)}^{2} (\frac{1}{w^{T} Σ_{x} w} + \frac{1}{w^{T} Σ_{y} w}) - 2]

(4)

In Equations (3) and (4),

w

is a PV,

h (w)

is the KL divergence of the projections of normal historical data

X

and online data

Y

. Both the normal historical data and the online data obey

m

dimensional joint Gaussian distributions,

X \sim N (μ_{x}, Σ_{x})

,

Y \sim N (μ_{y}, Σ_{y})

[30]. Let

Δ μ = μ_{y} - μ_{x}

,

Δ Σ = Σ_{y} - Σ_{x}

; the KL divergence

h (w)

can be expressed as the sum of two GRQs, as shown in Equation (5):

\begin{array}{l} h (w) & = \frac{1}{2} [\frac{w^{T} Σ_{y} w}{w^{T} Σ_{x} w} + \frac{w^{T} Σ_{x} w}{w^{T} Σ_{y} w} + {(Δ μ^{T} w)}^{2} (\frac{1}{w^{T} Σ_{x} w} + \frac{1}{w^{T} Σ_{y} w}) - 2] \\ = \frac{1}{2} [\frac{w^{T} Σ_{y} w}{w^{T} Σ_{x} w} + \frac{w^{T} Δ μ Δ μ^{T} w}{w^{T} Σ_{x} w}] + \frac{1}{2} [\frac{w^{T} Σ_{x} w}{w^{T} Σ_{y} w} + \frac{w^{T} Δ μ Δ μ^{T} w}{w^{T} Σ_{y} w}] - 1 \\ = \frac{1}{2} [\frac{w^{T} (Σ_{y} + Δ μ Δ μ^{T}) w}{w^{T} Σ_{x} w} + \frac{w^{T} (Σ_{x} + Δ μ Δ μ^{T}) w}{w^{T} Σ_{y} w}] - 1 \\ = \frac{1}{2} (\frac{w^{T} A_{1} w}{w^{T} B_{1} w} + \frac{w^{T} A_{2} w}{w^{T} B_{2} w}) - 1 \end{array}

(5)

where

A_{1} = Σ_{y} + Δ μ Δ μ^{T}

,

A_{2} = Σ_{x} + Δ μ Δ μ^{T}

,

B_{1} = Σ_{x}

,

B_{2} = Σ_{y}

. According to the property of the covariance matrix, both

Σ_{x}

and

Σ_{y}

in Equation (5) are non-negative symmetric matrices. This paper considers only the case that both of the matrices are positive definite symmetric matrices to satisfy the condition of the GRQ. If the influence of the coefficient 0.5 and the constant −1 is ignored, Equation (3) can be equally expressed as the maximization of the sum of two GRQs:

{\begin{matrix} \max_{w} \frac{w^{T} A_{1} w}{w^{T} B_{1} w} + \frac{w^{T} A_{2} w}{w^{T} B_{2} w} \\ s . t . w^{T} w = 1; \forall w_{i}, - 1 \leq w_{i} \leq 1, i \in [1, m] \end{matrix}

(6)

As stated in Section 2.1, the time complexity of solving the optimization problem in Equation (6) is NP-hard. Similarly, the optimization problem in Equation (3) is NP-hard. In Reference [30], a ready-made optimization solution tool (the fmincon function in MATLAB) is used to solve the optimization problem. However, this method can only obtain the local optimum PV, rather than the global optimum PV. Additionally, with the gradual increase in the number of variables to be monitored, the time complexity of iteration becomes more prominent. Therefore, this study aimed to determine an approximate algorithm with lower time complexity.

3. Incipient Fault-Detection Method Based on Decomposed KL Divergence

In this section, we propose the idea of decomposing the KL divergence and built two distribution models to detect incipient faults.

3.1. Decomposed KL Divergence

As stated in Section 2.1, the maximum value of a single GRQ can be quickly obtained by applying the property of the GRQ. Therefore, this paper attempts to decompose

h (w)

to reduce time complexity. Specifically, we attempt to decompose

h (w)

into the sum of multiple GRQs, and then calculate the maximum value and the optimum PV of each GRQ. Under the guidance of this idea, the KL divergence

h (w)

can be decomposed into the sum of four GRQs, as expressed in Equations (8)–(11):

h (w) = \frac{1}{2} (F_{1} (w) + F_{2} (w) + F_{3} (w) + F_{4} (w)) - 1

(7)

F_{1} (w) = \frac{w^{T} Σ_{y} w}{w^{T} Σ_{x} w}

(8)

F_{2} (w) = \frac{w^{T} Σ_{x} w}{w^{T} Σ_{y} w}

(9)

F_{3} (w) = \frac{w^{T} Δ μ Δ μ^{T} w}{w^{T} Σ_{x} w}

(10)

F_{4} (w) = \frac{w^{T} Δ μ Δ μ^{T} w}{w^{T} Σ_{y} w}

(11)

where

F_{1} (w)

,

F_{2} (w)

,

F_{3} (w)

, and

F_{4} (w)

are collectively referred to as the subfunctions of

h (w)

. In Equations (8)–(11), both

Σ_{x}

and

Σ_{y}

are positive definite symmetric matrices, so that each subfunction of

h (w)

satisfies the form of the GRQ. Therefore, we can obtain the maximum value and optimum PV of each subfunction using the property of the GRQ.

Clearly, the maximization of each subfunction may not maximize the original function. For instance, we can find a PV

w_{1}

that maximizes

F_{1} (w)

, but

w_{1}

does not necessarily maximize

h (w)

. In this case, what is the point of decomposing

h (w)

? According to reference [30], the ultimate goal of maximizing

h (w)

is to determine the PV

w

that is most sensitive to the incipient fault; that is, our ultimate goal is to detect the incipient fault. From the aspect of fault detection, although the PV obtained by maximizing the subfunction may not be optimal for the original function, the PV has its own value if it can detect the fault and be obtained in a fast manner.

Which subfunctions of

h (w)

are effective and can be solved quickly? After analysis, two subfunctions

F_{1} (w)

and

F_{3} (w)

are selected. According to Equation (8) and the property of GRQ, the maximum value of

F_{1} (w)

is the maximum eigenvalue of matrix

Σ_{x}^{- 1} Σ_{y}

. Furthermore, the optimum PV of

F_{1} (w)

is the eigenvector corresponding to the maximum eigenvalue. Similarly, According to Equation (10) and the property of the GRQ, the optimum PV of

F_{3} (w)

is the eigenvector corresponding to the maximum eigenvalue of matrix

Σ_{x}^{- 1} Δ μ Δ μ^{T}

. Let the optimum PVs of

F_{1} (w)

and

F_{3} (w)

be

w_{F 1}

and

w_{F 3}

, respectively.

3.2. Construction of Fault Detection Models

The optimum PVs

w_{F 1}

and

w_{F 3}

only provide two optimal perspectives of observation which the on-line data and the normal historical data are the easiest to distinguish that can be most easily distinguished by the online data and the normal historical data. We still lack some measurement indices to test whether a fault has occurred in the on-line data

Y

. This section uses

F_{1} (w)

and

F_{3} (w)

as the deviation measurement indices. Due to noise, both

F_{1} (w)

and

F_{3} (w)

fluctuate in their normal ranges when there is no fault in

Y

. However,

F_{1} (w)

or

F_{3} (w)

are outside of the normal ranges when a fault occurs in

Y

.

The normal ranges of

F_{1} (w)

or

F_{3} (w)

are the key to fault detection. To obtain them, we assume that the normal historical data

X

and the online data

Y

obey two m-dimensional joint Gaussian distributions,

X \sim N (μ_{x}, Σ_{x})

and

Y \sim N (μ_{y}, Σ_{y})

, respectively. Denote the projections of

X

and

Y

onto the vector

w_{F 1}

as

p_{F 1}

and

q_{F 1}

, respectively. According to the property of m-dimensional joint Gaussian distribution,

p_{F 1}

and

q_{F 1}

obey one-dimensional Gaussian distributions

p_{F 1} ~ N (w_{F 1}^{T} μ_{x}, w_{F 1}^{T} Σ_{x} w_{F 1})

and

q_{F 1} ~ N (w_{F 1}^{T} μ_{y}, w_{F 1}^{T} Σ_{y} w_{F 1})

, respectively [34]. The relationship of

F_{1} (w)

,

w_{F 1}

,

Σ_{x}

, and

Σ_{y}

is presented in Equation (12):

F_{1} (w) = \frac{w_{F 1}^{T} Σ_{y} w_{F 1}}{w_{F 1}^{T} Σ_{x} w_{F 1}}

(12)

Denote the projections of

X

and

Y

onto the vector

w_{F 3}

as

p_{F 3}

and

q_{F 3}

, respectively. Similarly, according to the property of m-dimensional joint Gaussian distribution,

p_{F 3}

and

q_{F 3}

obey one-dimensional Gaussian distributions

p_{F 3} ~ N (w_{F 3}^{T} μ_{x}, w_{F 3}^{T} Σ_{x} w_{F 3})

and

q_{F 3} ~ N (w_{F 3}^{T} μ_{y}, w_{F 3}^{T} Σ_{y} w_{F 3})

, respectively [34]. The relationship of

F_{3} (w)

,

w_{F 3}

,

Σ_{x}

and

Δ μ

is presented in Equation (13):

F_{3} (w) = \frac{w_{F 3}^{T} Δ μ Δ μ^{T} w_{F 3}}{w_{F 3}^{T} Σ_{x} w_{F 3}}

(13)

Because the normal historical data

X

are obtained before fault detection, and the optimum PVs

w_{F 1}

and

w_{F 3}

are obtainable from Section 3.1, it can be considered that

Σ_{x}

,

μ_{x}

,

w_{F 1}

, and

w_{F 3}

in Equations (12) and (13) are known and invariable. Furthermore, the mean offset vector

Δ μ

and the covariance matrix

Σ_{y}

related to

Y

are unknown and variable. Because

Σ_{x}

,

w_{F 1}

, and

w_{F 3}

are known, we can assume

w_{F 1}^{T} Σ_{x} w_{F 1} = c_{F 1}

and

w_{F 3}^{T} Σ_{x} w_{F 3} = c_{F 3}

, where both

c_{F 1}

and

c_{F 3}

are constants. Hence, Equations (14) and (15) can be obtained:

F_{1} (w) = \frac{w_{F 1}^{T} Σ_{y} w_{F 1}}{c_{F 1}}

(14)

F_{3} (w) = \frac{w_{F 3}^{T} Δ μ Δ μ^{T} w_{F 3}}{c_{F 3}}

(15)

To obtain the normal ranges of

F_{1} (w)

or

F_{3} (w)

, it is supposed that the fault-free online data

Y

are obtained by sampling the joint Gaussian distribution obeyed by

X

. Because

p_{F 1}

and

q_{F 1}

are the projections of

X

and

Y

onto the vector

w_{F 1}

, respectively, we can consider that

q_{F 1}

is obtained by sampling the one-dimensional Gaussian distribution obeyed by

p_{F 1}

. Similarly, we can consider that

q_{F 3}

is obtained by sampling the one-dimensional Gaussian distribution obeyed by

p_{F 3}

.

Assume that

f

obeys a one-dimensional Gaussian distribution

N (μ, σ^{2})

. Let

g

denote the sample set of

f

,

\bar{μ}

denote the sample mean of

g

,

S^{2}

denote the sample variance of

g

, and

n_{1}

denote the sample number of

g

. Thus,

\bar{μ}

satisfies [35]:

\bar{μ} \sim N (μ, \frac{σ^{2}}{n_{1}})

(16)

S^{2}

satisfies [35]:

\frac{(n_{1} - 1) S^{2}}{σ^{2}} \sim χ^{2} (n_{1} - 1)

(17)

Let

f = p_{F 1}

and

g = q_{F 1}

, then the variances of

p_{F 1}

and

q_{F 1}

are substituted into Equation (17). We can obtain:

\frac{(n_{1} - 1) w_{F 1}^{T} Σ_{y} w_{F 1}}{w_{F 1}^{T} Σ_{x} w_{F 1}} \sim χ^{2} (n_{1} - 1)

(18)

Because

w_{F 1}^{T} Σ_{x} w_{F 1} = c_{F 1}

, we can obtain:

\frac{(n_{1} - 1) w_{F 1}^{T} Σ_{y} w_{F 1}}{c_{F 1}} \sim χ^{2} (n_{1} - 1)

(19)

Comparing Equation (14) with Equation (19), we can obtain:

(n_{1} - 1) F_{1} (w) \sim χ^{2} (n_{1} - 1)

(20)

Therefore, the subfunction

F_{1} (w)

multiplied by a constant

n_{1} - 1

obeys a chi-square distribution with

n_{1} - 1

degrees of freedom when there is no fault in

Y

.

Let

f = p_{F 3}

and

g = q_{F 3}

, then the mean and variance of

q_{F 3}

and the mean of

p_{F 3}

are substituted into Equation (16). We can obtain:

w_{F 3}^{T} μ_{y} \sim N (w_{F 3}^{T} μ_{x}, \frac{w_{F 3}^{T} Σ_{x} w_{F 3}}{n_{1}})

(21)

Because

μ_{x}

,

Σ_{x}

, and

w_{F 3}

are all known, we can suppose

w_{F 3}^{T} μ_{x} = c_{3}

, where

c_{3}

is a constant. According to the property of the one-dimensional Gaussian distribution,

w_{F 3}^{T} μ_{y} - c_{3}

still obeys the one-dimensional Gaussian distribution, as shown in Equation (22):

w_{F 3}^{T} μ_{y} - w_{F 3}^{T} μ_{x} = w_{F 3}^{T} μ_{y} - c_{3} \sim N (0, \frac{w_{F 3}^{T} Σ_{x} w_{F 3}}{n_{1}})

(22)

Since

Δ μ = μ_{y} - μ_{x}

and

w_{F 3}^{T} Σ_{x} w_{F 3} = c_{F 3}

, we can obtain:

w_{F 3}^{T} Δ μ_{k} \sim N (0, \frac{c_{F 3}}{n_{1}})

(23)

Normalize

w_{F 3}^{T} Δ μ_{k}

and we can obtain:

\sqrt{\frac{n_{1}}{c_{F 3}}} w_{F 3}^{T} Δ μ_{k} \sim N (0, 1)

(24)

Furthermore, we can obtain Equation (25) from the relationship between the standard normal distribution and the chi-square distribution:

\frac{n_{1} w_{F 3}^{T} Δ μ_{k} Δ μ_{k}^{T} w_{F 3}}{c_{F 3}} \sim χ^{2} (1)

(25)

Comparing Equation (15) with Equation (25), we can obtain:

n_{1} F_{3} (w) \sim χ^{2} (1)

(26)

Therefore, the subfunction

F_{3} (w)

multiplied by a constant

n_{1}

obeys a chi-square distribution with one degree of freedom when there is no fault in

Y

.

In summary,

(n_{1} - 1) F_{1} (w)

and

(n_{1}) F_{3} (w)

obey chi-square distributions with

n_{1} - 1

and one degree of freedom, respectively. Thus, the chi-square test is applicable to verify whether a fault occurs in

Y

. Given a significance level

α

, the fault detection thresholds of

(n_{1} - 1) F_{1} (w)

and

n_{1} F_{3} (w)

are obtainable from the chi-square test. Denote the fault detection thresholds of

(n_{1} - 1) F_{1} (w)

and

n_{1} F_{3} (w)

as

ε_{F 1}

and

ε_{F 3}

, respectively. In this case, two fault detection models are established as follows:

{\begin{matrix} H_{0} : F_{1} (w) \leq \frac{ε_{F 1}}{n_{1} - 1}, f a u l t - f r e e \\ H_{1} : F_{1} (w) > \frac{ε_{F 1}}{n_{1} - 1}, f a u l t y \end{matrix}

(27)

{\begin{matrix} H_{0} : F_{3} (w) \leq \frac{ε_{F 3}}{n_{1}}, f a u l t - f r e e \\ H_{1} : F_{3} (w) > \frac{ε_{F 3}}{n_{1}}, f a u l t y \end{matrix}

(28)

The reason for selecting the subfunctions

F_{1} (w)

or

F_{3} (w)

is the coverage of detectable faults. It can be seen from Equation (4) that

h (w)

is a function of

w

,

Σ_{x}

,

Σ_{y}

, and

Δ μ

. Because the normal historical data

X

and the PV

w

are determined, both

w

and

Σ_{x}

are known, whereas

Δ μ

and

Δ Σ

, which are related to the online data, are unknown. Thus,

h (w)

is a function of

Δ μ

and

Δ Σ

.

Due to noise, both

Δ μ

and

Δ Σ

fluctuate within their normal ranges. However,

Δ μ

or

Δ Σ

are outside of the acceptable range when the online data is faulty. Because

h (w)

is a function of

Δ μ

and

Δ Σ

, the abnormal change in

Δ μ

or

Δ Σ

will further position

h (w)

outside of the acceptable range. Therefore, the abnormal change in

Δ μ

or

Δ Σ

can be detected by

h (w)

. It can be seen from Equation (8) and

Δ Σ = Σ_{y} - Σ_{x}

that

F_{1} (w)

is a function of

Δ Σ

; thus, the fault caused by the abnormal change in

Δ Σ

can be detected by

F_{1} (w)

. Similarly, the fault caused by the abnormal change in

Δ μ

can be detected by

F_{3} (w)

from Equation (10). Therefore, the combination of

F_{1} (w)

and

F_{3} (w)

can cover the majority of faults that can be detected by

h (w)

.

Why are the other two subfunctions

F_{2} (w)

and

F_{4} (w)

not chosen to detect faults? Comparing Equation (8) with Equation (10),

F_{1} (w)

and

F_{2} (w)

are reciprocal to each other. Therefore, we can detect the abnormal change in

Δ Σ

by taking either of them. The expressions of

F_{3} (w)

and

F_{4} (w)

differ only in the denominator. After experimental verification, the fault detection ability of

F_{4} (w)

is similar to that of

F_{3} (w)

. Thus, only one of

F_{3} (w)

and

F_{4} (w)

needs to be selected to detect the abnormal change in

Δ μ

3.3. Overall Fault Detection Process

We intend to use sliding windows to extract and monitor the online data in real time. Let the online data extracted by the

k t h

sliding window be

Y_{k}

. The pseudocode and the flow chart of the proposed method are shown as follows:

Z-score normalization is performed for each parameter of the normal historical data $X$ , and $\bar{X}$ is obtained.
The online data $Y_{k}$ are extracted by a sliding window with the length of $n_{1}$ .
The on-line data $Y_{k}$ are normalized by Z-score to obtain ${\bar{Y}}_{k}$ .
Two optimum PVs $w_{F 1}$ and $w_{F 3}$ between $\bar{X}$ and ${\bar{Y}}_{k}$ are obtained by using the property of the GRQ, as stated in Section 3.1.
Two fault detection thresholds $ε_{F 1}$ and $ε_{F 3}$ are set by using the chi-square test with a significance level $α$ .
Equations (12) and (13) are used to calculate the actual values $F_{1} (w)$ and $F_{3} (w)$ of $\bar{X}$ and ${\bar{Y}}_{k}$ .
The potential existence of a fault in $Y_{k}$ is tested according to Equations (27) and (28). If at least one of two fault detection models detect fault, the online data $Y_{k}$ can be considered to be faulty. Otherwise, $Y_{k}$ is normal. Let $k = k + 1$ ; the online data of the next sliding window $Y_{k}$ is tested from steps 2 to 7.

As can be seen from Figure 1, for each sliding window

Y_{k}

, we can use the property of the GRQ to obtain the optimum PVs

w_{F 1}

and

w_{F 3}

between

X

and

Y_{k}

. Because the online data

Y_{k}

may vary from different windows,

w_{F 1}

and

w_{F 3}

may not be the same for each window; that is, the optimum PVs adjust the online data in real time, which makes the proposed method more adaptable to potential faults.

We suppose that the system model includes

n

monitored variables and the length of sliding windows is

n_{1}

. The computation cost of Z-score normalization for

Y_{k}

is

O (n n_{1})

. The computation cost of obtaining the mean vector and the covariance matrix of

{\bar{Y}}_{k}

is

O (n_{1})

and

O (n^{2} n_{1})

, respectively. The computation cost of obtaining the inverse matrix of

Σ_{x}

is

O (n^{3})

. The computation cost of obtaining

Σ_{x}^{- 1} Σ_{y}

is

O (n^{3})

. Similarly, the computation cost of obtaining

Σ_{x}^{- 1} Δ μ Δ μ^{T}

is

O (n^{3})

. The computation cost of obtaining both the maximum eigenvalue and the eigenvector of

Σ_{x}^{- 1} Σ_{y}

and

Σ_{x}^{- 1} Δ μ Δ μ^{T}

is

O (n^{3})

. Combining all the computation cost parts above, we can get the overall computation cost of obtaining two optimum projection vectors for each window as

O (n^{3})

.

4. Results and Analysis

In this section, we use a numerical case and a real satellite fault case to assess the effectiveness of the proposed method.

4.1. Numerical Case

In this subsection, a numerical simulation case, which includes three incipient faults, is provided to verify the correctness and effectiveness of the proposed method. The system model is as shown in Equation (29):

\begin{array}{l} x_{1} = s_{1} + s_{2} + f_{1} + e_{1} \\ x_{2} = s_{1} - s_{5} + e_{2} \\ x_{3} = (1 + f_{3}) (s_{2} - s_{3}) + e_{3} \\ x_{4} = s_{1} - (1 + f_{2}) s_{4} + e_{4} \\ x_{5} = s_{1} + s_{3} + (1 + f_{2}) s_{4} + e_{5} \end{array}

(29)

In Equation (29),

{[x_{1}, x_{2}, x_{3}, x_{4}, x_{5}]}^{T}

are five monitored variables,

{[s_{1}, s_{2}, s_{3}, s_{4}, s_{5}]}^{T}

are five signal sources,

{[e_{1}, e_{2}, e_{3}, e_{4}, e_{5}]}^{T}

are five noise sources, and

{[f_{1}, f_{2}, f_{3}]}^{T}

are three incipient fault sources. All the signal sources and the noise sources are independent of each other and obey the standard normal distribution

N (0, 1)

.

The experimental parameters of the numerical case were set as follows. The number of each of normal historical samples and online samples was 60,000. The values of the fault sources before and after injecting faults were

{[0, 0, 0]}^{T}

and

{[0.09, 0.20, 0.09]}^{T}

, respectively. All the incipient faults were injected at the moment of 30,001 and did not occur simultaneously. The fault types of

f_{1}

,

f_{2}

, and

f_{3}

were offset fault, gain fault, and gain fault, respectively. Both the length and interval of sliding windows were 300 for all data in the experiment. A total of 200 windows were obtained from the online data after using sliding windows. The first 100 of the 200 windows were normal windows, whereas the last 100 were fault windows. The default signal-to-noise ratio (SNR) was set as 20 dB [30]. The simulation hardware platform was a desktop computer (CPU: Intel core

i 5 - 10400

, RAM: DDR4/2666/16G) and the software was MATLAB 2019b.

The compared fault-detection methods included using PCA and the

T^{2}

statistic [36] (PCA +

T^{2}

), PCA and the squared prediction error statistic [36] (PCA + SPE), PCA and the KL divergence [23] (PCA + KLD), and the method based on the local optimum PV and the KL divergence [30] (LOPVKLD). Because of the poor effect of directly monitoring the original variables, the methods of PCA +

T^{2}

and PCA + SPE in this experiment monitored the means and variances of the original variables. The principal subspace was selected with a cumulative variance contribution of more than 90%. The confidence levels for the PCA +

T^{2}

method and the PCA + SPE method were both set at 0.95. The significance levels for the PCA + KLD method and the LOPVKLD method were 0.05 and 0.01, respectively. The significance levels of the subfunctions

F_{1} (w)

and

F_{3} (w)

proposed in this paper were 0.0005 and 0.001, respectively. Three evaluation indexes—fault detection rate (FDR), false alarm rate (FAR), and the time consumption of finding the optimum PV for each window (time consumption)—were chosen as the indexes for evaluating the fault detection results. For the purpose of conciseness, only the fault detection result of the PCA + KLD method of the principal component that was most sensitive to the fault is presented, whereas the other, relatively poor results are not displayed.

The detection results of five fault-detection methods for the incipient fault

f_{1}

are shown in Figure 2. As can be seen from Figure 2, both the PCA +

T^{2}

method and the PCA + SPE method failed to detect

f_{1}

because most of the fault windows were still within the detection threshold. Conversely, both the PCA + KLD method and the LOPVKLD method successfully detected

f_{1}

. As stated in Section 3.2, the subfunctions

F_{1} (w)

and

F_{3} (w)

can detect the fault that causes the abnormal change in

Δ Σ

and

Δ μ

, respectively. Because

f_{1}

is the offset fault that can cause the abnormal change in

Δ μ

, the fault

f_{1}

can be successfully detected by the subfunction

F_{3} (w)

rather than the subfunction

F_{1} (w)

.

The detection results of five fault-detection methods for the incipient fault

f_{2}

are presented in Figure 3. As shown, the PCA + SPE method still fails to detect

f_{2}

. Both the PCA +

T^{2}

method and the PCA + KLD method have relatively poor detection results for

f_{2}

. Due to the application of the local optimum PV, the LOPVKLD method has a better detection result for

f_{2}

. Because

f_{2}

is the gain fault which can cause the abnormal change in

Δ Σ

,

f_{2}

can be successfully detected by the subfunction

F_{1} (w)

rather than the subfunction

F_{3} (w)

.

As can be seen from Figure 4, three fault-detection methods—PCA +

T^{2}

, PCA + SPE and PCA + KLD—are ineffective in detecting the fault

f_{3}

, because most of the result values of these methods are still under the detection threshold. It can be seen from Figure 3d,f that the LOPVKLD method and the subfunction

F_{1} (w)

are effective at detecting

f_{3}

. As

f_{3}

is the gain fault, the subfunction

F_{3} (w)

fails to detect

f_{3}

.

Considering the randomness of the signal sources and the noise sources in the numerical case, we simulated the three incipient faults 100 times and then derived the average of the fault detection results, as presented in Table 1.

It can be seen from Reference [30] that the PCA +

T^{2}

and the PCA + SPE methods are ineffective in detecting incipient faults when the original variables are monitored. As can be seen from Table 1, the fault detection rates of these two methods increase, particularly the fault detection rate for

f_{2}

. The reason for the improvement in these two methods is that the extraction of the means and variances of the variables can be considered as smoothing the variables. Although the means and variances of the variables are monitored, the detection results of these two methods are inferior to those of the PCA + KLD method. Due to the usage of constant PVs, the PCA + KLD method is effective at detecting

f_{1}

and

f_{2}

, but has poor detection results for

f_{3}

.

Because of the application of the local optimum PV, the LOPVKLD method is sensitive to all three incipient faults. However, as stated in Section 2.2, the LOPVKLD method has the disadvantage of high computation complexity. As can be seen from Table 1, the LOPVKLD method requires a long duration (about 70 ms) to obtain the optimum PV. By contrast, the duration to obtain the optimum PV for each subfunction is less than 25 μs, three orders of magnitude faster than the LOPVKLD method. Because finding the optimum PV is not required, the PCA +

T^{2}

, PCA + SPE, and PCA + KLD methods have lower computation complexity than the proposed method. However, the detection results of these methods are not as good as those of the proposed method, particularly the detection result for

f_{3}

. Because the subfunctions

F_{1} (w)

and

F_{3} (w)

can detect the faults caused by the abnormal change in

Δ Σ

and

Δ μ

, respectively, the three faults can be successfully detected by

F_{3} (w)

,

F_{1} (w)

, and

F_{1} (w)

, respectively.

The reason for the sensitivity of the proposed method to incipient faults, from the perspective of optimum PV, is explained in this paper. The projection process can be regarded as a weighted sum process, as presented in Equation (30):

w^{T} X = w_{1} x_{1} + w_{2} x_{2} + \dots + w_{5} x_{5}

(30)

In Equation (30),

w

is an optimum PV and can be considered to be a weight coefficient vector and

X

is the vector which includes five monitored variables. For the purpose of presentation, all the optimum PVs in the numerical case were normalized (the moduli of the vectors were set to 1) and the absolute value was taken. The optimum PVs obtained using the LOPVKLD method, the subfunction

F_{1} (w)

, and the subfunction

F_{3} (w)

before and after insertion of the faults

f_{1}

and

f_{3}

are shown in Figure 5a–f, respectively. In each subfigure of Figure 5, the first 100 windows were the normal windows, whereas the last 100 windows were the fault windows.

Due to the enlargement of the faulty variables, the fault is easier to expose and the detection ability is improved. It can be seen from Equation (29) that the fault

f_{1}

was added to the variable

x_{1}

. As can be seen from Figure 5a–c, both the LOPVKLD method and the subfunction

F_{3} (w)

enlarged the weight of faulty variable

x_{1}

after the fault

f_{1}

occurred. As shown in Figure 5d–f, because the fault variable of the fault

f_{3}

is

x_{3}

, both the LOPVKLD method and the subfunction

F_{1} (w)

enlarged the weight of faulty variable

x_{3}

after the fault

f_{3}

occurred. In addition, because iteration is not needed, the computation complexity of the proposed method is less than that of the LOPVKLD method. In summary, the proposed method not only retains the advantage of being more sensitive to possible incipient faults, but also alleviates the disadvantage of high computational complexity.

4.2. Real Satellite Fault Case

On 16 March 2021, key telemetry parameters of a satellite payload abnormally fluctuated. Figure 6 presents the phenomena of a telemetry parameter fluctuation related to the fault. In this case, the development of the fault experienced three stages. In the first stage, the variance of the telemetry parameter increased slightly and lasted around 50 days. With the further deterioration of the fault, the mean and variance of the telemetry parameter significantly fluctuated in the second stage. The fault lasted around 70 days in this stage. As the fault developed to the third stage, the mean and variance of the telemetry parameter seriously deviated from the normal fluctuation range. Because the current fault detection system adopts the method based on a threshold, the system cannot detect the fault until it develops to the third stage. If the fault was successfully detected at the beginning of the first stage, it could be found about four months earlier. Thus, the research objective of this paper is to detect the incipient fault from the first stage.

In this study, a total of 13,066,123 samples were collected and arranged from the satellite measurement and control system from 7:35:34 on 15 November 2020 to 16:27:52 on 16 May 2021. Two telemetry parameters related to the fault were selected, as presented in Figure 7. For the reason of confidentiality, the true telemetry parameter names are hidden. The sampling rate of the telemetry data in Figure 7 was 1 Hz. Due to the constraints of the satellite’s visible arc and the ground station measurement and control resources, some telemetry data were not transmitted; that is, the telemetry data were discontinuous in time.

As indicated in Figure 7b, the parameters show a periodicity, and the period is consistent with the satellite orbital period (46, 468 s). For this reason, in this study, we took the satellite orbital period as the length of the sliding window, set the interval of the sliding window as 10,000, and retained the sliding windows comprising more than 40,000 samples as effective windows. A total of 524 effective windows were obtained from the first 6,246,451 samples after being extracted by sliding windows. The samples of the first 100 effective windows were selected as the normal historical data. The last 424 effective windows were selected as the online data for testing. Among the 424 windows for testing, the first 72 windows were normal windows, whereas the last 354 windows were fault windows.

Furthermore, it can be seen from Figure 7b that the telemetry parameters do not obey Gaussian distributions; thus, the fault detection threshold set by the chi-square test may not be appropriate, and the normal historical data must be used to assist in setting the threshold. As stated in Section 3.2, the subfunction

F_{1} (w)

multiplied by the constant

n_{1} - 1

obeys a chi-square distribution with

n_{1} - 1

degrees of freedom. In this case, the length of the sliding window

n_{1}

was 46,468. The degrees of freedom were sufficiently high that the subfunction

F_{1} (w)

could be considered to obey a normal distribution; that is, the

3 σ

method could be used in this case to test whether there is a fault in

F_{1} (w)

.

Let

X

be the normal historical data, which include the date of 100 normal windows. We assume that the

i t h

normal window data is

X_{i}

. We set

X_{i}

as the online data and then use the property of the GRQ to obtain the optimum PV

w_{F 1_i}

between

X

and

X_{i}

. Let

Y = X_{i}

,

w = w_{F 1_i}

; we can obtain the value of

F_{1_i} (w)

from Equation (8). Furthermore, we can obtain a vector

F_{1_X} (w)

from 100 normal windows. The process of obtaining the vector

F_{1_X} (w)

is shown in Figure 8.

Let

M_{1}

and

S_{1}

be the mean and the standard deviation of the vector

F_{1_X} (w)

, respectively. The fault-detection method of the subfunction

F_{1} (w)

is presented as follows:

{\begin{matrix} H_{0} : M_{1} - 3 S_{1} \leq F_{1} (w) \leq M_{1} + 3 S_{1}, f a u l t - f r e e \\ H_{1} : M_{1} - 3 S_{1} > F_{1} (w) | F_{1} (w) > M_{1} + 3 S_{1}, f a u l t y \end{matrix}

(31)

It can be seen from Section 3.2 that the subfunction

F_{3} (w)

multiplied by the constant

n_{1}

obeys a chi-square distribution with one degree of freedom. Therefore, we refer to the method in Reference [30] to set the threshold. Let

F_{3_X} (w)

be the set of 100

F_{3} (w)

values of 100 normal windows. The process of obtaining the vector

F_{3_X} (w)

is similar to that of the vector

F_{1_X} (w)

. The difference between these two processes is that we use the property of the GRQ to obtain the optimum PV

w_{F 3_i}

and then obtain the value of

F_{3_i} (w)

from Equation (10). Let

M_{3}

be the mean of the vector

F_{3_X} (w)

. The fault-detection method of the subfunction

F_{3} (w)

is presented as follows:

{\begin{matrix} H_{0} : F_{3} (w) \leq M_{3} χ_{α}^{2} (1), f a u l t - f r e e \\ H_{1} : F_{3} (w) > M_{3} χ_{α}^{2} (1), f a u l t y \end{matrix}

(32)

where

χ_{α}^{2} (1)

is the threshold of the chi-square distribution with one degree of freedom with a given significance level

α

.

In this real satellite fault case, both the PCA +

T^{2}

and the PCA + SPE methods still monitored the means and variances of the telemetry parameters. The experimental parameters of these two methods were the same as those presented in Section 4.1. The significance levels of the PCA + KLD method were set to 0.05 and 0.01, respectively. The significance levels of the LOPVKLD method were set to 0.05 and 0.01, respectively. The threshold of

F_{1} (w)

was set by the

3 δ

method, and the significance level of

F_{3} (w)

was 0.01. The detection results and evaluation indexes of these five methods for the real satellite fault are shown in Figure 9 and Table 2, respectively.

It can be seen from Figure 9a that the PCA +

T^{2}

method has a poor detection result for the real satellite fault, particularly the fault windows between Nos. 100 and 200. Compared to Figure 9a, the detection result of the PCA + SPE method in Figure 9b is significantly improved. However, some fault windows around Nos. 250 to 300 are below the fault detection threshold. Figure 9c,d presents the detection results of the two principal components of the PCA + KLD method for the real satellite fault. In Figure 9c,d, the detection thresholds of significance levels of 0.05 and 0.01 are represented by the black dashed line and the magenta dashed line, respectively. Figure 8e,f illustrates the fault detection results of the LOPVKLD method with the significance levels of 0.05 and 0.01, respectively. According to Figure 9c,f, the fault detection rates of the PCA + KLD and the LOPVKLD methods are higher than 95% with the significance level of 0.05. However, the false alarm rates of both these methods are higher than 25% at this significance level. At significance levels of 0.01, the false alarm rates of these two methods are around 12%, but the fault detection rates decrease by around 10%. As a comparison, the fault detection and false alarm rates of the subfunction

F_{1} (w)

are 100% and 0%, respectively. The false alarm of the proposed method comes from the subfunction

F_{3} (w)

. It can be seen from Figure 9 and Table 2 that the false alarm rate of the proposed method is 13.89%. The effectiveness and superiority of the proposed method is further verified by the real satellite case.

5. Conclusions

In this paper, we propose a new and fast method to detect incipient faults of satellites. We decompose the KL divergence and use the property of the generalized Rayleigh quotient to obtain the optimum projection vector. Under the assumption that the variables obey a multidimensional Gaussian distribution, the distributions of the subfunctions

F_{1} (w)

and

F_{3} (w)

are presented and verified. To address non-Gaussian satellite telemetry parameters, we use the normal historical data to assist in setting the threshold. The proposed method is a linear method. Future work may focus on developing a nonlinear fault-detection method.

Author Contributions

Conceptualization, G.Z. and G.L.; methodology, G.Z. and Q.Y.; software, G.Z.; validation, G.Z., Q.Y. and G.L.; formal analysis, G.Z.; investigation, G.Z.; resources, Q.Y., M.Y. and J.L.; data curation, Q.Y. and J.L.; writing—original draft preparation, G.Z.; writing—review and editing, G.L. and M.Y.; visualization, G.Z.; supervision, G.L.; project administration, G.L.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Beidou Navigation In-orbit Support System (grant number JKBDZGDH01) and National special support plan for high-level talents (grant number WRJH19DH01).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Abbreviation	Description
KL	Kullback–Leibler
PCA	principal component analysis
PV	projection vector
GRQ	generalized Rayleigh quotient
FDR	fault detection rate
FAR	false alarm rate

References

Yang, Y.; Mao, Y.; Sun, B. Basic performance and future developments of BeiDou global navigation satellite system. Satell. Navig. 2020, 1, 1. [Google Scholar] [CrossRef] [Green Version]
Wang, J.; Chen, H.; Zhu, Z. Modeling research of satellite-to-ground quantum key distribution constellations. Acta Astronaut. 2021, 180, 470–481. [Google Scholar] [CrossRef]
Chen, H.; Yong, B.; Shen, Y.; Liu, J.; Hong, Y.; Zhang, J. Comparison analysis of six purely satellite-derived global precipitation estimates. J. Hydrol. 2020, 581, 124376. [Google Scholar] [CrossRef]
Burke, M.; Driscoll, A.; Lobell, D.B.; Ermon, S. Using satellite imagery to understand and promote sustainable development. Science 2021, 371, 6535. [Google Scholar] [CrossRef] [PubMed]
Ezhilarasu, C.M.; Skaf, Z.; Jennions, I.K. The application of reasoning to aerospace Integrated Vehicle Health Management (IVHM): Challenges and opportunities. Prog. Aeronaut. Sci. 2019, 105, 60–73. [Google Scholar] [CrossRef]
Tafazoli, M. A study of on-orbit spacecraft failures. Acta Astronaut. 2009, 64, 195–205. [Google Scholar] [CrossRef]
Li, E.-H.; Li, Y.-Z.; Li, T.-T.; Li, J.-X.; Zhai, Z.-Z.; Li, T. Intelligent analysis algorithm for satellite health under time-varying and extremely high thermal loads. Entropy 2019, 21, 983. [Google Scholar] [CrossRef] [Green Version]
Safaeipour, H.; Forouzanfar, M.; Casavola, A. A survey and classification of incipient fault diagnosis approaches. J. Process Control 2021, 97, 1–16. [Google Scholar] [CrossRef]
Peng, Z.; Lu, Y.; Miller, A.; Zhao, T.; Johnson, C. Formal specification and quantitative analysis of a constellation of navigation satellites. Qual. Reliab. Eng. Int. 2016, 32, 345–361. [Google Scholar] [CrossRef] [Green Version]
Cayrac, D.; Dubois, D.; Prade, H. Handling uncertainty with possibility theory and fuzzy sets in a satellite fault diagnosis application. IEEE Trans. Fuzzy Syst. 1996, 4, 251–269. [Google Scholar] [CrossRef] [Green Version]
Chen, R.H.; Ng, H.K.; Speyer, J.L.; Guntur, L.S.; Carpenter, R. Health monitoring of a satellite system. J. Guid. Control. Dynam. 2006, 29, 593–605. [Google Scholar] [CrossRef] [Green Version]
Schwabacher, M.; Oza, N.; Matthews, B. Unsupervised anomaly detection for liquid-fueled rocket propulsion health monitoring. J. Aeros. Comp. Inf. Com. 2009, 6, 464–482. [Google Scholar] [CrossRef] [Green Version]
Pang, J.; Liu, D.; Peng, Y.; Peng, X. Collective anomalies detection for sensing series of spacecraft telemetry with the fusion of probability prediction and Markov chain model. Sensors 2019, 19, 722. [Google Scholar] [CrossRef] [Green Version]
Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
Verzola, I.; Donati, A.; Martinez, J.; Schubert, M.; Somodi, L. Project Sibyl: A Novelty Detection System for Human Spaceflight Operations. In Proceedings of the 14th International Conference on Space Operations, Daejeon, Korea, 16–20 May 2016; p. 2405. [Google Scholar]
Hayden, S.; Sweet, A.; Christa, S. Livingstone model-based diagnosis of Earth Observing One. In Proceedings of the AIAA 1st Intelligent Systems Technical Conference, Chicago, IL, USA, 20–22 September 2004; p. 6225. [Google Scholar]
Deb, S.; Pattipati, K.R.; Shrestha, R. QSI’s integrated diagnostics toolset. In Proceedings of the 1997 IEEE Autotestcon Proceedings AUTOTESTCON’97. IEEE Systems Readiness Technology Conference. Systems Readiness Supporting Global Needs and Awareness in the 21st Century, Anaheim, CA, USA, 22–25 September 1997; pp. 408–421. [Google Scholar]
Cheng, C.; Wang, J.; Chen, H.; Chen, Z.; Luo, H.; Xie, P. A review of intelligent fault diagnosis for high-speed trains: Qualitative approaches. Entropy 2021, 23, 1. [Google Scholar] [CrossRef] [PubMed]
Muthusamy, V.; Kumar, K.D. A novel data-driven method for fault detection and isolation of control moment gyroscopes onboard satellites. Acta Astronaut. 2021, 180, 604–621. [Google Scholar] [CrossRef]
Ibrahim, S.K.; Ahmed, A.; Zeidan, M.A.E.; Ziedan, I.E. Machine learning techniques for satellite fault diagnosis. Ain. Shams. Eng. J. 2020, 11, 45–56. [Google Scholar] [CrossRef]
Pang, J.; Liu, D.; Peng, Y.; Peng, X. Anomaly detection based on uncertainty fusion for univariate monitoring series. Measurement 2017, 95, 280–292. [Google Scholar] [CrossRef]
Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 387–395. [Google Scholar]
Chen, H.; Jiang, B.; Lu, N. An improved incipient fault detection method based on Kullback–Leibler divergence. ISA Trans. 2018, 79, 127–136. [Google Scholar] [CrossRef]
Chen, H.; Jiang, B.; Lu, N.; Mao, Z. Deep PCA based real-time incipient fault detection and diagnosis methodology for electrical drive in high-speed trains. IEEE Trans. Veh. Technol. 2018, 67, 4819–4830. [Google Scholar] [CrossRef]
Shang, J.; Chen, M.; Ji, H.; Zhou, D. Recursive transformed component statistical analysis for incipient fault detection. Automatica 2017, 80, 313–327. [Google Scholar] [CrossRef]
Ji, H.; He, X.; Shang, J.; Zhou, D. Incipient fault detection with smoothing techniques in statistical process monitoring. Control Eng. Pract. 2017, 62, 11–21. [Google Scholar] [CrossRef]
Harmouche, J.; Delpha, C.; Diallo, D. Incipient fault detection and diagnosis based on Kullback–Leibler divergence using principal component analysis: Part I. Signal Process. 2014, 94, 278–287. [Google Scholar] [CrossRef]
Gautam, S.; Tamboli, P.K.; Patankar, V.H.; Roy, K.; Duttagupta, S.P. Sensors Incipient Fault Detection and Isolation Using Kalman Filter and Kullback–Leibler Divergence. IEEE Trans. Nucl. Sci. 2019, 66, 782–794. [Google Scholar] [CrossRef]
Deng, X.; Cai, P.; Cao, Y.; Wang, P. Two-step localized kernel principal component analysis based incipient fault diagnosis for nonlinear industrial processes. Ind. Eng. Chem. Res. 2020, 59, 5956–5968. [Google Scholar] [CrossRef]
Zhang, G.; Yang, Q.; Li, G.; Leng, J.; Wang, L. A Satellite Incipient Fault Detection Method Based on Local Optimum Projection Vector and Kullback–Leibler Divergence. Appl. Sci. 2021, 11, 797. [Google Scholar] [CrossRef]
Hart, P.E.; Stork, D.G.; Duda, R.O. Pattern Classification; Wiley Hoboken: Hoboken, NJ, USA, 2000. [Google Scholar]
Watkins, D.S. Fundamentals of Matrix Computations, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2002. [Google Scholar]
Wang, L.-F.; Xia, Y. A linear-time algorithm for globally maximizing the sum of a generalized rayleigh quotient and a quadratic form on the unit sphere. SIAM J. Optim. 2019, 29, 1844–1869. [Google Scholar] [CrossRef]
Jolliffe, I. Principal Component Analysis, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
Casella, G.; Berger, R.L. Statistical Inference, 2nd ed.; Duxbury Press: Pacific Grove, CA, USA, 2002. [Google Scholar]
Nassar, B.; Hussein, W.; Mokhtar, M. Space telemetry anomaly detection based on statistical PCA algorithm. In Proceedings of the International Journal of Electronics and Communication Engineering, Paris, France, 27–28 August 2015; pp. 637–645. [Google Scholar]

Figure 1. The flow chart of the proposed method.

Figure 2. The detection results of five fault-detection methods for the fault

f_{1}

. (a) The result of PCA +

T^{2}

for

f_{1}

; (b) the result of PCA + SPE for

f_{1}

; (c) The result of PCA + KLD for

f_{1}

; (d) the result of LOPVKLD for

f_{1}

; (e) the result of

F_{1} (w)

for

f_{1}

; (f) the result of

F_{3} (w)

for

f_{1}

.

Figure 2. The detection results of five fault-detection methods for the fault

f_{1}

. (a) The result of PCA +

T^{2}

for

f_{1}

; (b) the result of PCA + SPE for

f_{1}

; (c) The result of PCA + KLD for

f_{1}

; (d) the result of LOPVKLD for

f_{1}

; (e) the result of

F_{1} (w)

for

f_{1}

; (f) the result of

F_{3} (w)

for

f_{1}

.

Figure 3. The detection results of five fault-detection methods for the fault

f_{2}

. (a) The result of PCA +

T^{2}

for

f_{2}

; (b) the result of PCA + SPE for

f_{2}

; (c) the result of PCA + KLD for

f_{2}

; (d) the result of LOPVKLD for

f_{2}

; (e) the result of

F_{1} (w)

for

f_{2}

; (f) the result of

F_{3} (w)

for

f_{2}

.

Figure 3. The detection results of five fault-detection methods for the fault

f_{2}

. (a) The result of PCA +

T^{2}

for

f_{2}

; (b) the result of PCA + SPE for

f_{2}

; (c) the result of PCA + KLD for

f_{2}

; (d) the result of LOPVKLD for

f_{2}

; (e) the result of

F_{1} (w)

for

f_{2}

; (f) the result of

F_{3} (w)

for

f_{2}

.

Figure 4. The detection results of five fault-detection methods for the fault

f_{3}

. (a) The result of PCA +

T^{2}

for

f_{3}

; (b) the result of PCA + SPE for

f_{3}

; (c) the result of PCA + KLD for

f_{3}

; (d) the result of LOPVKLD for

f_{3}

; (e) the result of

F_{1} (w)

for

f_{3}

; (f) the result of

F_{3} (w)

for

f_{3}

.

Figure 4. The detection results of five fault-detection methods for the fault

f_{3}

. (a) The result of PCA +

T^{2}

for

f_{3}

; (b) the result of PCA + SPE for

f_{3}

; (c) the result of PCA + KLD for

f_{3}

; (d) the result of LOPVKLD for

f_{3}

; (e) the result of

F_{1} (w)

for

f_{3}

; (f) the result of

F_{3} (w)

for

f_{3}

.

Figure 5. Comparison of the optimum PVs for different faults. (a) The optimum PVs of LOPVKLD for

f_{1}

; (b) the optimum PVs of

F_{1} (w)

for

f_{1}

; (c) the optimum PVs of

F_{3} (w)

for

f_{1}

; (d) the optimum PVs of LOPVKLD for

f_{3}

; (e) the optimum PVs of

F_{1} (w)

for

f_{3}

; (f) the optimum PVs of

F_{3} (w)

for

f_{3}

.

Figure 5. Comparison of the optimum PVs for different faults. (a) The optimum PVs of LOPVKLD for

f_{1}

; (b) the optimum PVs of

F_{1} (w)

for

f_{1}

; (c) the optimum PVs of

F_{3} (w)

for

f_{1}

; (d) the optimum PVs of LOPVKLD for

f_{3}

; (e) the optimum PVs of

F_{1} (w)

for

f_{3}

; (f) the optimum PVs of

F_{3} (w)

for

f_{3}

.

Figure 6. The phenomena of the fault parameter fluctuation.

Figure 7. The phenomena of the selected fault parameters. (a) All the data of the parameters; (b) the periodic phenomenon of the parameters.

Figure 8. The process of obtaining vector

F_{1_X} (w)

.

Figure 8. The process of obtaining vector

F_{1_X} (w)

.

Figure 9. The detection result of five methods for the real satellite fault. (a) The result of PCA +

T^{2}

for the fault; (b) the result of PCA + SPE for the fault; (c) the result of second principal component of PCA + KLD for the fault; (d) the result of first principal component of PCA + KLD for the fault; (e) the result of LOPVKLD with the significance level of 0.05; (f) the result of LOPVKLD with the significance level of 0.01; (g) the result of

F_{1} (w)

for the fault; (h) the result of

F_{3} (w)

for the fault.

Figure 9. The detection result of five methods for the real satellite fault. (a) The result of PCA +

T^{2}

for the fault; (b) the result of PCA + SPE for the fault; (c) the result of second principal component of PCA + KLD for the fault; (d) the result of first principal component of PCA + KLD for the fault; (e) the result of LOPVKLD with the significance level of 0.05; (f) the result of LOPVKLD with the significance level of 0.01; (g) the result of

F_{1} (w)

for the fault; (h) the result of

F_{3} (w)

for the fault.

Table 1. Comparison of fault detection performance for the three incipient faults.

Faults	Evaluation Indexes	PCA + T²	PCA + SPE	PCA + KLD	LOPVKLD	Proposed Method
Faults	Evaluation Indexes	PCA + T²	PCA + SPE	PCA + KLD	LOPVKLD	F₁(w)	F₃(w)
$f_{1}$	FDR (%)	5.76	17.02	97.41	94.63	7.41	96.67
	FAR (%)	4.49	7.67	11.90	15.76	8.5	5.56
	Time consumption	0 (μs)	0 (μs)	0 (μs)	68.5 (ms)	18.42 (μs)	24.26 (μs)
$f_{2}$	FDR (%)	58.46	25.96	79.36	89.08	95.99	8.41
	FAR (%)	4.41	8.05	11.58	14.84	7.17	5.80
	Time consumption	0 (μs)	0 (μs)	0 (μs)	70.8 (ms)	18.20 (μs)	23.75 (μs)
$f_{3}$	FDR (%)	27.56	20.87	30.37	90.91	97.81	7.25
	FAR (%)	4.61	7.68	11.50	15.82	7.53	5.88
	Time consumption	0 (μs)	0 (μs)	0 (μs)	71.7 (ms)	18.31 (μs)	23.99 (μs)

Table 2. The evaluation indexes of five fault methods for the real satellite fault.

Evaluation Indexes	PCA + T²	PCA + SPE	PCA + KLD		LOPVKLD		Proposed Method
Evaluation Indexes	PCA + T²	PCA + SPE	α = 0.05	α = 0.01	α = 0.05	α = 0.01	F₁(w)	F₃(w)
FDR (%)	63.46	83.85	97.16	85.65	95.17	85.51	100	32.95
FAR (%)	14.08	16.9	25	11.11	26.39	12.50	0	13.89

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, G.; Yang, Q.; Li, G.; Leng, J.; Yan, M. A Satellite Incipient Fault Detection Method Based on Decomposed Kullback–Leibler Divergence. Entropy 2021, 23, 1194. https://doi.org/10.3390/e23091194

AMA Style

Zhang G, Yang Q, Li G, Leng J, Yan M. A Satellite Incipient Fault Detection Method Based on Decomposed Kullback–Leibler Divergence. Entropy. 2021; 23(9):1194. https://doi.org/10.3390/e23091194

Chicago/Turabian Style

Zhang, Ge, Qiong Yang, Guotong Li, Jiaxing Leng, and Mubiao Yan. 2021. "A Satellite Incipient Fault Detection Method Based on Decomposed Kullback–Leibler Divergence" Entropy 23, no. 9: 1194. https://doi.org/10.3390/e23091194

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Satellite Incipient Fault Detection Method Based on Decomposed Kullback–Leibler Divergence

Abstract

1. Introduction

2. Preliminary

2.1. Generalized Rayleigh Quotient (GRQ)

2.2. Original Optimization Model

3. Incipient Fault-Detection Method Based on Decomposed KL Divergence

3.1. Decomposed KL Divergence

3.2. Construction of Fault Detection Models

3.3. Overall Fault Detection Process

4. Results and Analysis

4.1. Numerical Case

4.2. Real Satellite Fault Case

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI