Next Article in Journal
Aircraft Maintenance Check Scheduling Using Reinforcement Learning
Next Article in Special Issue
Methods of Identifying Correlated Model Parameters with Noise in Prognostics
Previous Article in Journal
Special Issue “10th EASN International Conference on Innovation in Aviation & Space to the Satisfaction of the European Citizens”
Previous Article in Special Issue
Fault Diagnosis and Reconfigurable Control for Commercial Aircraft with Multiple Faults and Actuator Saturation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Text-Driven Aircraft Fault Diagnosis Model Based on a Word2vec and Priori-Knowledge Convolutional Neural Network

1
Technology Development Department, Avicas Generic Technology Co., LTD, Yangzhou 225000, China
2
School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Aerospace 2021, 8(4), 112; https://doi.org/10.3390/aerospace8040112
Submission received: 11 March 2021 / Revised: 7 April 2021 / Accepted: 12 April 2021 / Published: 14 April 2021
(This article belongs to the Special Issue Fault Detection and Prognostics in Aerospace Engineering)

Abstract

:
In the process of aircraft maintenance and support, a large amount of fault description text data is recorded. However, most of the existing fault diagnosis models are based on structured data, which means they are not suitable for unstructured data such as text. Therefore, a text-driven aircraft fault diagnosis model is proposed in this paper based on Word to Vector (Word2vec) and prior-knowledge Convolutional Neural Network (CNN). The fault text first enters Word2vec to perform text feature extraction, and the extracted text feature vectors are then input into the proposed prior-knowledge CNN to train the fault classifier. The prior-knowledge CNN introduces expert fault knowledge through Cloud Similarity Measurement (CSM) to improve the performance of the fault classifier. Validation experiments on five-year maintenance log data of a civil aircraft were carried out to successfully verify the effectiveness of the proposed model.

1. Introduction

As an extremely complex system, faults often occur on aircraft due to human error, material defects, manufacturing errors, operating environment fluctuations, etc. [1]. When these aircraft faults occur, maintainers usually will first subjectively judge the fault type through experience and then decide what kind of maintenance strategy to adopt. However, the aircraft system is too complex to judge the fault type accurately based on subjective experience, especially for young and inexperienced maintainers. Therefore, scholars have always been actively exploring how to objectively judge the fault type at the data level.
Especially with the development of machine learning and sensor technology, data-driven fault diagnosis has been developing [2,3]. Data-driven fault diagnosis models are increasingly proposed. Nguyen et al. [4,5] proposed a magnitude order balance method to diagnosis quadcopters actuator faults based on sensor data and developed an attitude fault-tolerant control based on a nonsingular fast terminal sliding mode and a neural network to compensate the actuator fault. Gao et al. [6] proposed a novel artificial neural network model by fusing a Deep Belief Network (DBN) and a Quantum Inspired Neural Network (QINN) and injected four fault modes to structure an aircraft fuel system fault diagnosis model based on oil pressure data. Shen et al. [7] developed a novel hybrid multi-mode machine learning framework by exploiting inherent embedded health information contained in Input or Output (I/O) sensor data to monitor aircraft gas turbine engine health status, which effectively improved the accuracy of fault diagnosis.
Although these data-driven aircraft fault diagnosis models have shown good effects, they are mostly based on structured data. As unstructured data cannot be directly recognized by computers, aircraft fault diagnosis driven by unstructured data represented by text and image has not been widely studied. However, in real life, most data tend to be unstructured or semi-structured [8]. Especially in the life cycle of aircraft, ample maintenance and support textual data are recorded in every aircraft fault maintenance activity. These aircraft fault texts usually record the abnormal working state, the fault phenomenon, and other aircraft fault knowledge, which can be used to judge the fault type. However, with no effective processing technology, such aircraft fault description text is not utilized effectively, which results in great waste.
Based on the above problems, we establish the research objective of this paper, which is to develop an effective aircraft fault diagnosis model based on text data to make full use of aircraft fault-description text data and improve the level of aircraft fault diagnosis. To achieve the above research objective, Word2vec as a text feature extraction algorithm is used to solve the problem that the computer cannot recognize the text data directly. A novel prior-knowledge CNN is proposed to construct a classifier for improving fault diagnosis accuracy. We carried out verification experiments on the five-year maintenance log data of a civil aircraft to verify the effectiveness of the proposed text-driven aircraft fault diagnosis model. Based on the research objective and plan, the main contributions of this work are to structure a novel text-driven aircraft fault diagnosis model and propose a prior-knowledge CNN classifier, which introduces an expert fault knowledge base composed of historical fault text data judged by experts as prior knowledge.
The merits of our model are:
(1)
as a data-driven model, the proposed aircraft fault diagnosis model can automatically and quickly judge which failure type the failure described in the text belongs to once a failure-description text is entered from an objective point;
(2)
Word2vec as a more efficient method is used to do text feature extraction instead of the traditional Term Frequency & Inverse Document Frequency (TF-IDF) and Latent Dirichlet Allocation (LDA);
(3)
a novel prior-knowledge CNN is proposed by introducing the expert fault knowledge to improve the accuracy of fault diagnosis.
The remainder of this paper is organized as follows:
  • Section 2 presents a literature review of text feature extraction and CNNs.
  • In Section 3, the proposed text-driven aircraft fault diagnosis model is first discussed and the three core parts of the model, including text data preprocessing, Word2vec text feature extraction, and the prior-knowledge CNN, are then explained in detail.
  • Section 4 describes the experiment and discusses the experimental results.
  • Section 5 provides conclusions.

2. Literature Review

2.1. Text Feature Extraction

Text feature extraction is used to transform text data into a structured format to solve the problem that computers cannot directly recognize unstructured data such as text [9]. At present, the vector space models are widely used for the structured processing of text data [10]. TF-IDF [11] and LDA [12] are two typical vector space models. They are also widely used in text-driven fault diagnosis models. Rodrigues et al. [13] used TF-IDF and Multilayer Perceptron (MLP) to perform aircraft interior failure pattern recognition. Wang et al. [14] used LDA and a Support Vector Machine (SVM) to develop a fault diagnosis model for railway systems. TF-IDF and LDA are easy to operate and run efficiently. However, TF-IDF generates a word vector without considering the context and easily leads to dimension explosion [15]. Although LDA considers the context, as an unsupervised algorithm, there is blindness in the process of word vector generation [16]. To solve the shortcomings of TF-IDF and LDA, Zhou et al. [14] proposed a fusion feature extraction model called TI-LDA, based on TF-IDF and LDA, and applied it to text-driven aircraft fault diagnosis. TI-LDA not only considers context and word order, but also avoids ambiguity. However, TI-LDA still has the problem of dimension explosion. To solve the above problems, Mikolov et al. [17] proposed the Word2vec text feature extraction algorithm. Word2vec adopts a three-layer neural network trained by inputting the context words to predict the current word or inputting the current word to predict the context words to map words into a low dimensional vector space, which means Word2vec does not cause dimension explosion while considering context and word order [18]. Therefore, Word2vec is widely used in the field of fault diagnosis and has made good progress. Chang et al. [19] applied the Word2vec moving distance model to obtain a failure occurrence sequence, which effectively improves the accuracy of fault diagnosis. Bai et al. [20] used Word2vec to extract the power grid system alarm text feature, which was put into an ensemble classifier to perform power grid system fault diagnosis, and their experimental result shows the proposed model has a good identification effect.

2.2. CNN

The CNN is a well-known deep learning framework inspired by the natural visual perception mechanism of living creatures [21]. Since LeCun et al. [22] published the seminal paper establishing the modern framework of the CNN in 1990, it has been used in image recognition [23], real-time object detection [24], time series prediction [25], etc. Since deep learning theories have reformed the traditional fault diagnosis in the 2010s [26,27], the CNN, as a deep learning algorithm, is also widely used in the field of fault diagnosis. Eren et al. [28] developed a generic real-time bearing fault diagnosis approach from raw time series sensor data based on a one-dimensional CNN classifier. In the study of Zhong et al. [29], a transfer learning method was investigated based on a CNN and an SVM for gas turbine fault diagnosis under a small fault sample condition. Zhao et al. [30] proposed a normalized CNN for the rolling bearing diagnosis of different fault severities and orientations under scenarios of data imbalance and variable working conditions. Although these single CNN models have achieved certain results, the prior-knowledge CNN has been shown to be more effective. Ma et al. [31] encoded expert prior knowledge into Regional Convolutional Neural Networks (R-CNN), which effectively improved the accuracy of facial action unit recognition. In Wei’s work [32], an end-to-end weak scratch model is built by embedding prior knowledge into an encoder-decoder CNN to significantly improve the accuracy of the weak scratch inspection of optical components. These studies show that the prior-knowledge CNN is more effective than a single CNN.

3. Methodology

To make full use of aircraft fault text, a novel text-driven aircraft fault diagnosis model is proposed based on the Word2vec text feature extraction algorithm and a prior-knowledge CNN classification algorithm. The construction process of the proposed aircraft fault diagnosis model is shown in Figure 1. Firstly, text data preprocessing is carried out for the input aircraft fault text data, and this includes eliminating the repeated data, eliminating the missing data, performing word segmentation, and removing stop words. Secondly, the preprocessed text data is mapped to the word vector space by Word2vec to obtain the aircraft fault text vector data. Finally, the aircraft fault text vector data enters the prior-knowledge CNN model to train the classifier. The trained prior-knowledge CNN classifier can automatically give the corresponding fault type, on the premise of inputting an aircraft fault description text, to realize the intelligent aircraft fault diagnosis. The three parts of the text-driven aircraft fault diagnosis model, including text data preprocessing, Word2vec text feature extraction, and the prior-knowledge CNN, will be described in the following.

3.1. Text Data Preprocessing

Text data preprocessing is quite different from structured data preprocessing. Text data not only needs to perform normal preprocessing such as eliminating the repeated and missing data, but also needs to remove stop words. Stop words mainly refers to emotional particles and punctuation marks in the text, which have no contribution to the semantic expression. The existence of stop words will not only lead to a virtual high dimension of text feature vectors, but also interfere with the training of the classifier. Therefore, the stop word must be removed for text data.
In addition, for special language texts such as the Chinese text data used in this paper, word segmentation is also needed before removing the stop words. There is no clear separation mark between words in Chinese text, but a continuous string of Chinese characters. Word segmentation is the first step in Chinese text processing, which refers to the segmentation of sentences in the text into words through certain rules and methods. Common word segmentation methods mainly include dictionary-based word segmentation methods, statistics-based word segmentation methods, and rule-based statistical methods [33]. At present, the application effect is better, and the most widely used process is the word segmentation method based on dictionaries such as Jieba. The Jieba word segmentation tool is based on the Trie tree structure [34] and uses dynamic programming to find the maximum probability path to obtain the word segmentation results. It uses the Hidden Markov Model (HMM) [35] and the Viterbi [36] algorithm to identify unregistered words and can improve the disambiguation and unambiguousness in a custom way. Liu et al. [37] proposed a new approach to process unknown words in financial public opinions with Jieba. Yu et al. [38] proposed to explicitly display the central words of a movie through a combination of Jieba lexicon. For the problem of log Chinese text word recognition, Jieba is currently the most effective tool.

3.2. Text Feature Extraction Based on Word2vec

Since TF-IDF easily leads to dimension explosion and LDA tends to be ambiguous, Word2vec is used in this paper to perform text feature extraction. Word2vec is a neural network probabilistic language model proposed by Mikolov et al. [17] and is mainly used to realize the transformation of text information from an unstructured form to a vectorized form [39]. Compared with the traditional high-dimensional TF-IDF word vector, the dimension of the Word2vec word vector is usually 100–300. A low word vector dimension can greatly reduce computational complexity and the risk of dimension explosion. In addition, the Word2vec word vector is calculated according to the context and word order, which fully captures the semantic information of the text. As a result, Word2Vec has been widely used and studied since its release. Based on the different ways of training word vectors, Word2vec can be divided into two models, the Skip-Gram-Continuous Model (Skip-gram) and the Continuous Bag-of-Words Model (CBOW). Skip-Gram inputs the current word to predict the surrounding words, while CBOW inputs the surrounding words to predict the current word. In comparison, the CBOW model is more effective in processing small corpora, while the Skip-Gram model is more suitable for processing large corpora. The aircraft maintenance text log used in this paper is a typical small corpus, so CBOW is more suitable for text feature extraction in our study.
The core idea of the CBOW model is to input the set of surrounding 2 c words C o n t e x t ( w ) = { C o n t e x t ( w ) 1 ,   C o n t e x t ( w ) 2 ,   ,   C o n t e x t ( w ) 2 c } to predict the current word w . 2 c means to take c words forward and c words backward with w as the center. As shown in Figure 2, CBOW is a three-layer neural network, including the input layer, projection layer, and output layer.
Input Layer: One-hot encoding vectors of 2 c words in C o n t e x t ( w ) , namely v ( C o n t e x t ( w ) ) = { v ( C o n t e x t ( w ) 1 ) ,   v ( C o n t e x t ( w ) 2 ,   ,   v ( C o n t e x t ( w ) 2 c ) } .
Projection Layer: 2 c vectors are added to the input layer to obtain X w , namely X w = 1 2 c v ( C o n t e x t ( w ) i ) .
Output Layer: The output layer corresponds to a binary tree, with the words appearing in the corpus as leaf nodes, and the times of each word appearing in the corpus as weight to construct a Huffman tree. In the Huffman tree, there are n leaf nodes ( n = | D | ), corresponding to the words in dictionary D , and n 1 none-leaf nodes.
For the corpus C , the objective function of CBOW is usually the logarithmic likelihood function shown in Equation (1), which means the probability that the current word is w when C o n t e x t ( w ) is known is maximized.
c = w c log p ( w | C o n t e x t ( w ) )
For any word w in the dictionary D , there must be a unique path P w from the root node to the w node in the Huffman tree. There are | P w | 1 branches on path P w . If each branch is regarded as a binary classification, then each classification will produce a probability. Multiplying these probabilities is the required p ( w | C o n t e x t ( w ) ) . The stochastic gradient ascent algorithm is then used to maximize the objective function. Finally, the vector on the leaf nodes of the Huffman tree in the output layer is the final word vector of w .

3.3. Prior-Knowledge CNN Based on Cloud Similarity Measurement (CSM)

A prior-knowledge CNN model is used to construct the classifier in this paper. Different from the traditional CNN model, the expert prior knowledge, which mainly refers to the expert fault knowledge base, is encoded into the prior-knowledge CNN model. Meanwhile, a similarity measure algorithm named Cloud Similarity Measurement (CSM) [40,41] is introduced to quantify the similarity between the text to be classified and the historical fault text in the expert fault knowledge base.

3.3.1. CNN Algorithm

This paper uses maintenance log data with tags. The supervised learning algorithm is more suitable for the application scenarios and data characteristics of this paper. Common supervised learning algorithms include the Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long-Short Term Memory (LSTM), and the Support Vector Machine (SVM). The CNN refers to those neural networks that use convolution operations in at least one layer of the network to replace general matrix multiplication operations. Its goal is to learn local neighborhood matching through nonlinear mapping to achieve data dimensionality reduction. In this way, the number of parameters to be learned will be greatly reduced due to the shared convolutional layer filter weight. The CNN is more suitable for the high-dimensional characteristics of unstructured data. As a deep learning algorithm, the CNN has been successfully applied in fields such as natural language processing, image processing, and video processing. Jin et al. [42] used a deep convolutional neural network to solve inverse problems in imaging. Acharya et al. [43] proposed an algorithm for the automated detection and diagnosis of seizure using Electroencephalogram (EEG)signals with a convolutional neural network. Poria [44] presented the first deep learning approach to aspect extraction in opinion mining with a CNN.
The CNN is a feed-forward neural network, which is mainly based on three basic concepts: a local receptive field, weight sharing, and pooling. The local receptive field reduces the weight parameters that need to be trained by mapping each neuron to a local feature. Weight sharing ensures that all neurons in the same convolution kernel have the same weight, thereby greatly reducing the number of training parameters in the network. Pooling can reduce the scale of features and ensure the invariance of features. Therefore, a CNN can guarantee the robustness of input features in displacement, tilt, scaling, or other deformations.
A CNN consists of input layer, convolutional layer, pooling layer, fully connected layer, and output layer. From the point of view of data processing, the overall structure of a CNN can be divided into two parts: one is responsible for feature extraction, including the input layer, the convolutional layer, and the pooling layer; the other is responsible for data classification, including the fully connected layer and the output layer. The convolutional layer and the pooling layer are feature extractors in CNN. They will extract potential features from the original data, and the fully connected layer is the CNN classifier, which uses the features obtained from the last pooling layer as input for classification. The CNN structure is shown in Figure 3.
Generally speaking, multiple convolutional layers can be included in the CNN structure. These convolutional layers perform local feature detection on the data of the previous layer (not necessarily the input layer) and store the detection results as a feature map. A convolutional layer usually has multiple different convolution functions (i.e., convolution kernels) to try to find different potential features in the input data.
Assuming that the input data of the convolution layer is a two-dimensional matrix, the output result of the convolution kernel can be obtained by Equation (2):
y i j = σ ( r = 1 F c = 1 F w r c x ( r + i × S ) ( c + j × S ) + b ) , ( 0 i H F S , 0 j W F S )
where y i j is the value of an output point in the feature map; H and W are the vertical and horizontal dimensions of the input data; F represents the length and width of the convolution kernel; S represents the step size of the convolution kernel to move once; x ( r + i × S ) ( c + j × S ) represents the value of the input data at the coordinate ( r + i × S ) ( c + j × S ) ; b and w r c represent the offset and the weight at coordinates ( r ,   c ), respectively; σ represents any nonlinear activation function used for feature extraction.
The pooling layer is usually located behind the convolutional layer. The pooling layer takes the output of the convolutional layer as its input and reduces the dimensionality of the feature data by performing regional aggregation on the feature map output by the convolutional layer. Maximum pooling is simply to select the maximum value in the area through a filter of size. Mean pooling is to calculate the average of all feature values in the current area.
L P ( x ) = ( x ( i , j ) x x ( i , j ) P ) 1 / P
where P represents the pooling parameter, P = 1 represents the mean pooling, and P = represents the maximum pooling. In fact, the pooling layer extracts the feature data twice, which reduces the complexity of the model while still retaining a large amount of original information.
The fully connected layer is similar to the multilayer perceptron, and the neurons between adjacent layers are interconnected in pairs. The fully connected layer integrates the local feature information extracted by the convolutional layer and the pooling layer and then generates classification features that can be processed by the output layer.
y l = f ( w l × y l 1 + b l )
where f ( · ) represents the activation function of the fully connected layer; y l represents the output value of the l t h layer; y l 1 represents both the input of l t h layer and the output value of ( l 1 ) t h layer; w l and b l , respectively, represent the weight and offset of the l t h layer.

3.3.2. Text Similarity Measurement Based on CSM

This paper introduces the CSM to quantify the degree of similarity between the text to be classified and the historical fault text. The CSM algorithm comes from the cloud model and is used to describe the differences between different clouds. In data mining, the CSM algorithm can overcome the shortcomings of Euclidean distance, Dynamic Time Warping (DTW) distance, and classical method mode distance in the similarity measurement of two time series, so as to achieve better measurement accuracy. CSM is composed of a reverse cloud generation algorithm including two parts: the cloud characteristic vector and the angle cosine.
For the input fault text description data A j = ( a 1 , a 2 , , a N ) and fault text description data B k = ( b 1 , b 2 , , b M ) , where N and M are the data lengths of A j and B k , the calculation process of the CSM algorithm is as follows:
(1)
Calculate the expected value of A j :
A - = 1 n j = 1 n A j
First-order center distance:
A ˙ = 1 n j = 1 n | A j A - |
Sample variance:
S 2 = 1 n 1 j = 1 n | A j A - | 2
(2)
Calculate the expected value E A of the cloud model:
E A = A -
(3)
Calculate the characteristic entropy of A j :
E n = π 2 1 n j = 1 n | A j E A |
(4)
Calculate the super entropy of A j :
H e = S 2 E n 2
where E A , E n , and H e are used to describe the overall characteristics of A j . The cloud vector of A j is then υ j = ( E A j ,   E n j ,   E e j ) . Similarly, the cloud vector of another data set B k is υ k = ( E B k ,   E n k ,   E e k ) . The cosine value of the cosine angle between two cloud vectors is expressed as the similarity of the two sequences:
s i m j k = cos ( υ j , υ k ) = υ j × υ k υ j υ k
It can be seen in Equation (11) that s i m j j = s i m k k = 1 ; that is, the similarity between the cloud vector and itself is 1. At the same time, s i m j k = s i m k j ; that is, the similarity satisfies the symmetry.

3.3.3. Construction of Prior-Knowledge CNN

Based on the CNN and CSM, this paper proposes a prior-knowledge CNN model to construct the classifier. Its core principle is to use expert prior knowledge to modify the prediction results of the CNN. The principle to judge whether a text is modified is whether the prediction accuracy of the CNN is lower than the maximum CSM similarity between the text and the expert knowledge base. Therefore, the realization of the prior-knowledge CNN generally includes three parts: training the CNN classifier, calculating the CSM text similarity, and fixing the prediction results. The specific structure of the prior-knowledge CNN model is shown in Figure 4, which mainly includes the following steps:
(1)
Firstly, the text data set D is divided into training set D s and test set D T according to a certain proportion.
(2)
Second, the training set D s enters the CNN to train the initial CNN classifier, and the test set D T enters the initial CNN classifier to test the classification accuracy A c c of the initial CNN classifier.
(3)
Thirdly, for any fault text vector i in the test set D T , it is put into the initial CNN classifier to obtain the initial predictions fault type F C i . A c c and F C i make up the tuple ( A c c ,   F C i ) .
(4)
Fourthly, the similarity between the fault text vector in the expert fault knowledge base E and fault text vector i to be classified is calculated to obtain the similarity set S i = { S i m 1 i ,   S i m 2 i ,   ,   S i m m i } ( m = | E | ) . The maximum value of set S i is taken to obtain S i m j i = M a x ( S i )   ( j [ 1 ,   m ] ) . S i m j i and F S j make up the tuple ( S i m j i ,   F S j ) .
(5)
Fifthly, the operation shown in Equation (12) is performed on ( A c c ,   F C i ) and ( S i m j i ,   F S j ) to obtain the final fault type F i corresponding to the fault text vector i .
F i = { F C i , A cc S i m j i F S j , A cc < S i m j i
Finally, Steps (3), (4), and (5) are performed for each text in the test set to complete the correction of the initial CNN classifier.

4. Experiments and Result Analysis

To verify the effectiveness of the proposed aircraft fault diagnosis model, verification experiments were carried out on a real aircraft fault text data set, which is comprised of five-year maintenance log data of Chinese text from a civil aircraft. After data cleaning, more than 50,000 aircraft fault texts were obtained, some of which are shown in Table 1. The second column in the table records the contents of the aircraft fault description text, and the third column records the fault type corresponding to the aircraft fault description text. For the aircraft fault text data set used in this study, a total of 10 fault types are involved. To facilitate the follow-up processing, we coded the 10 fault types as follows: sensor fault (0), circuit fault (1), equipment ablation (2), resistance fault (3), mechanical fault (4), equipment aging (5), lamp fault (6), indicator fault (7), computer fault (8), and switch fault (9). According to the proposed aircraft fault diagnosis model construction process, the validation experiment mainly includes text data preprocessing, Word2vec text feature extraction, and construction of the prior-knowledge CNN classifier.
1. Text data preprocessing
As the five-year maintenance log data is comprised of Chinese text, word segmentation needs to be performed, and stop words need to be removed before further processing. Therefore, we first used Jieba to segment the Chinese fault description text and then removed the stop words in the fault description text. The text data obtained after the above preprocessing operation is shown in Table 2. Compared with the original data in Table 1, the stop words in the Chinese fault description text have been removed, and separators have been added between words.
2. Word2vec text feature extraction
Since the computer cannot directly process the text, it is necessary to perform text feature extraction to transform the text data into a structured format after text preprocessing. Word2vec is used to extract the text features, and the results are shown in Table 3. It can be seen that the aircraft fault text is mapped to a 100-dimensional vector space.
3. Constructing the prior-knowledge CNN classifier
As mentioned above, the construction of the prior-knowledge CNN mainly includes three parts: training the CNN classifier, calculating the CSM text similarity, and fixing the prediction results. Therefore, we first put the text vector data extracted by Word2vec into the CNN for training and tested the trained CNN with the test set to obtain A c c = 0.9623 . The similarity between the fault text vectors in the test set and the expert fault knowledge base by CSM was then calculated, and the similarity (0–1) value is shown in Table 4. Finally, the predicted values of the test set were fixed by comparing the size relationship between the CNN classification accuracy A c c and the maximum similarity S i m j i . Taking the No.2 text in the test set as an example, the CSM similarity values between the No.2 text and the 10 fault types in the expert knowledge base are 0.8154, 0.6126, 0.2278, 0.7386, 0.6260, 0.8790, 0.9900, 0.4981, 0.5860, and 0.6609. The maximum is 0.9900. As 0.9900 is greater than 0.9625, the fault type of the No.2 text is corrected to lamp fault (6). The above operations were performed on each text in the test set to complete the training of the prior-knowledge CNN classifier.
To verify the superiority of the proposed aircraft fault diagnosis model, our aircraft fault diagnosis model based on Word2vec and the prior-knowledge CNN was compared with Rodrigues’s [13] aircraft fault diagnosis model based on TF-IDF and MLP and with Wang’s [14] aircraft fault diagnosis model based on LDA and SVM. Seven control groups and an experimental group were designed. Common classification indicators including Accuracy ( A c c ), F 1 Score ( F 1 ), and Area Under Curve ( A U C ) were used to evaluate the performance of these classifiers. The results are shown in Table 5. We can see clearly that all the classification indicators of the proposed aircraft fault diagnosis model based on Word2vec and the prior-knowledge CNN are very high and better than the other five models, which proves the superiority of the aircraft fault diagnosis model proposed in this paper. By comparing the experimental results of Groups C, D, and E, it can also be seen that Word2vec can indeed improve the performance of the classifier compared with TF-IDF and LDA. It can also be seen that the proposed prior-knowledge CNN is better than MLP, SVM, and CNN on A c c , F 1 , and A U C by comparing the experimental results of Groups E, F, G, and H.
To study the effect of the expert fault knowledge base for different types of fault diagnosis, this study compares the confusion matrix and ROC curve of the initial CNN classifier and the prior-knowledge CNN classifier under different fault types, as shown in Figure 5. As shown in the figure, the diagnosis accuracy of the prior-knowledge CNN classifier is higher than that of the initial CNN classifier for each fault type, except for mechanical fault (4). Among them, the prior-knowledge CNN improves the accuracy of switch fault (9) diagnosis the most. Therefore, the switch fault (9) knowledge of the expert fault knowledge base is relatively complete, while the mechanical fault (4) knowledge needs to be supplemented. This means that a high-quality expert fault knowledge base is the key to further improving the performance of the proposed aircraft fault diagnosis model based on Word2vec and the prior-knowledge CNN.

5. Conclusions

The lack of effective technical means leads to the substantial waste of aircraft fault description text. Therefore, a text-driven fault diagnosis model was developed in this study based on Word2vec, a CNN, and CSM. Word2vec is used to perform text feature extraction, while the CNN and CSM are used to build the prior-knowledge CNN classifier. The main contribution of the proposed prior-knowledge CNN is that it is encoded into the expert fault knowledge by CSM similarity between the text to be classified and the historical fault text in the expert fault knowledge base to improve the accuracy of aircraft fault diagnosis. According to the experimental results on five-year maintenance log data comprised of Chinese text from a civil aircraft, we can draw the following conclusions:
(1)
The proposed aircraft fault diagnosis model based on Word2vec and the prior-knowledge CNN reached 0.9742, 0.9740, and 0.9844 in A c c , F 1 , and A U C , respectively. The accuracy is more than 97%, so the fault type can be accurately judged according to the fault description text by this model.
(2)
For this study, Word2vec is a more effective text feature extraction method compared with TF-IDF and LDA and it can improve the performance of the classifier.
(3)
The CNN classifier is better than the MLP classifier and the SVM classifier for the performance indicators of A c c , F 1 , and A U C . Introducing expert fault knowledge to the CNN by CSM can further improve the accuracy of fault diagnosis.
(4)
A high-quality expert fault knowledge base is the key to further improving the performance of the prior-knowledge CNN classifier.
Compared with similar work [13,14], we innovated in the following aspects:
(1)
A new text-driven aircraft fault diagnosis framework based on Word2vec and the prior-knowledge CNN is proposed in this paper, and it has a higher fault diagnosis accuracy compared with the previous text-driven aircraft fault frameworks.
(2)
To further improve the accuracy of fault diagnosis, a more efficient Word2vec method, instead of the traditional TF-IDF and LDA methods, is used to extract text features.
(3)
A novel prior-knowledge CNN is proposed in this paper by fusing a CNN and CSM, which improves the performance of the CNN classifier and is much better than the traditional MLP and SVM classifiers.
(4)
The text-driven aircraft fault diagnosis model developed in this paper can process not only English text but also Chinese text.
In summary, the text-driven fault diagnosis model based on Word2vec and the prior-knowledge CNN proposed in this paper can exactly judge the fault type according to the aircraft fault description text to realize the full mining and application of maintenance log data and provide support for aircraft maintenance. In the future, we can fuse the structured data and the unstructured data for fault diagnosis, so that we can easily find the cause of the fault at the data level and explain the specific mechanism of the fault at the mechanism level.

Author Contributions

Conceptualization, S.Z.; Data curation, X.J. and C.W.; Formal analysis, B.C.; Funding acquisition, S.Z. and W.H.; Investigation, C.W. and W.H.; Methodology, Z.X., B.C., S.Z. and W.C.; Project administration, W.C.; Resources, S.Z.; Software, Z.X. and X.J.; Supervision, W.C.; Validation, Z.X. and C.W.; Visualization, X.J. and W.H.; Writing—original draft, B.C.; Writing—review and editing, Z.X. and B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 71971013 & 71871003) and the Fundamental Research Funds for the Central Universities (YWF-20-BJ-J-943). The study was also sponsored by the Graduate Student Education & Development Foundation of Beihang University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dhillon, B.S.; Liu, Y. Human error in maintenance: A review. J. Qual. Maint. Eng. 2006, 12, 21–36. [Google Scholar] [CrossRef]
  2. Qin, S.J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220–234. [Google Scholar] [CrossRef]
  3. Salfner, F.; Lenk, M.; Malek, M. A survey of online failure prediction methods. ACM Comput. Surv. (CSUR) 2010, 42, 1–42. [Google Scholar] [CrossRef]
  4. Nguyen, N.P.; Huynh, T.T.; Do, X.P.; Mung, N.X.; Hong, S.K. Robust Fault Estimation Using the Intermediate Observer: Application to the Quadcopter. Sensors 2020, 20, 4917. [Google Scholar] [CrossRef]
  5. Nguyen, N.P.; Mung, N.X.; Thanh Ha, L.N.N.; Huynh, T.T.; Hong, S.K. Finite-Time Attitude Fault Tolerant Control of Quadcopter System via Neural Networks. Mathematics 2020, 8, 1541. [Google Scholar] [CrossRef]
  6. Gao, Z.; Ma, C.; Song, D.; Liu, Y. Deep quantum inspired neural network with application to aircraft fuel system fault diagnosis. Neurocomputing 2017, 238, 13–23. [Google Scholar] [CrossRef]
  7. Shen, Y.; Khorasani, K. Hybrid multi-mode machine learning-based fault diagnosis strategies with application to aircraft gas turbine engines. Neural Netw. 2020, 130, 126–142. [Google Scholar] [CrossRef] [PubMed]
  8. Wang, W.M.; Cheung, C.F.; Lee, W.B.; Kwok, S.K. Mining knowledge from natural language texts using fuzzy associated concept mapping. Inform. Process. Manag. 2008, 44, 1707–1719. [Google Scholar] [CrossRef]
  9. Liang, H.; Sun, X.; Sun, Y.; Gao, Y. Text feature extraction based on deep learning: A review. Eurasip. J. Wirel. Comm. 2017, 2017, 211. [Google Scholar] [CrossRef] [PubMed]
  10. Zhou, S.; Xu, X.; Liu, Y.; Chang, R.; Xiao, Y. Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization with Clustering Analysis. IEEE Access 2019, 7, 107247–107258. [Google Scholar] [CrossRef]
  11. Sparck, J.K. A Statistical interpretation of term specificity and its application in retrieval. J. Doc. 1972, 28, 11–21. [Google Scholar] [CrossRef]
  12. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn Res. 2003, 3, 993–1022. [Google Scholar]
  13. Rodrigues, R.S.; Balestrassi, P.P.; Paiva, A.P.; Garcia-Diaz, A.; Pontes, F.J. Aircraft interior failure pattern recognition utilizing text mining and neural networks. J. Intell. Inf. Syst. 2012, 38, 741–766. [Google Scholar] [CrossRef]
  14. Wang, F.; Xu, T.; Tang, T.; Zhou, M.; Wang, H. Bilevel Feature Extraction-Based Text Mining for Fault Diagnosis of Railway Systems. IEEE T Intell. Transp. 2017, 18, 49–58. [Google Scholar] [CrossRef]
  15. Zhou, S.; Chen, B.; Zhang, Y.; Liu, H.; Xiao, Y.; Pan, X. A Feature Extraction Method Based on Feature Fusion and its Application in the Text-Driven Failure Diagnosis Field. Int. J. Interact. Multimed. Artif. Intell. 2020, 6, 121–130. [Google Scholar]
  16. Kim, D.; Seo, D.; Cho, S.; Kang, P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inform. Sci. 2019, 477, 15–29. [Google Scholar] [CrossRef]
  17. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
  18. Jatnika, D.; Bijaksana, M.A.; Suryani, A.A. Word2Vec Model Analysis for Semantic Similarities in English Words. Procedia Comput. Sci. 2019, 157, 160–167. [Google Scholar] [CrossRef]
  19. Chang, W.; Xu, Z.; You, M.; Zhou, S.; Xiao, Y.; Cheng, Y. A Bayesian Failure Prediction Network Based on Text Sequence Mining and Clustering. Entropy 2018, 12, 923. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Bai, Z.; Sun, G.; Zang, H.; Zhang, M.; Shen, P.; Liu, Y.; Wei, Z. Identification Technology of Grid Monitoring Alarm Event Based on Natural Language Processing and Deep Learning in China. Energies 2019, 17, 3258. [Google Scholar] [CrossRef] [Green Version]
  21. Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recogn. 2018, 77, 354–377. [Google Scholar] [CrossRef] [Green Version]
  22. LeCun, Y.; Boser, B.E.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.E.; Jackel, L.D. Handwritten digit recognition with a back-propagation network. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Denver, CO, USA, 26–29 November 1990; pp. 396–404. [Google Scholar]
  23. Shang, R.; He, J.; Wang, J.; Xu, K.; Jiao, L.; Stolkin, R. Dense connection and depthwise separable convolution based CNN for polarimetric SAR image classification. Knowl. Based Syst. 2020, 194, 105542. [Google Scholar] [CrossRef]
  24. Wu, M.; Yue, H.; Wang, J.; Huang, Y.; Liu, M.; Jiang, Y.; Ke, C.; Zeng, C. Object detection based on RGC mask R-CNN. IET Image Process. 2020, 14, 1502–1508. [Google Scholar] [CrossRef]
  25. Livieris, I.E.; Pintelas, E.; Pintelas, P. A CNN–LSTM model for gold price time-series forecasting. Neural Comput. Appl. 2020, 32, 17351–17360. [Google Scholar] [CrossRef]
  26. Lei, Y.; Yang, B.; Jiang, X.; Jia, F.; Li, N.; Nandi, A.K. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mech. Syst. Signal Pr. 2020, 138, 106587. [Google Scholar] [CrossRef]
  27. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  28. Eren, L.; Ince, T.; Kiranyaz, S. A Generic Intelligent Bearing Fault Diagnosis System Using Compact Adaptive 1D CNN Classifier. J. Signal Process. Syst. 2019, 91, 179–189. [Google Scholar] [CrossRef]
  29. Zhong, S.; Fu, S.; Lin, L. A novel gas turbine fault diagnosis method based on transfer learning with CNN. Measurement 2019, 137, 435–453. [Google Scholar] [CrossRef]
  30. Hao, B.; Zhang, X.; Li, H.; Yang, Z. Intelligent fault diagnosis of rolling bearings based on normalized CNN considering data imbalance and variable working conditions. Knowl. Based Syst. 2020, 199, 105971. [Google Scholar]
  31. Ma, C.; Chen, L.; Yong, J. AU R-CNN: Encoding expert prior knowledge into R-CNN for action unit detection. Neurocomputing 2019, 355, 35–47. [Google Scholar] [CrossRef] [Green Version]
  32. Hou, W.; Tao, X.; De, X. Combining Prior Knowledge with CNN for Weak Scratch Inspection of Optical Components. IEEE T Instrum. Meas. 2021, 70, 1–11. [Google Scholar]
  33. Zhao, H.; Cai, D.; Huang, C.; Kit, C. Chinese word segmentation: Another decade review (2007–2017). arXiv 2019, arXiv:1901.06079. [Google Scholar]
  34. Krishnaraj, N.; Elhoseny, M.; Lydia, E.L.; Shankar, K.; ALDabbas, O. An efficient radix trie-based semantic visual indexing model for large-scale image retrieval in cloud environment. Softw. Pract. Exp. 2021, 51, 489–502. [Google Scholar] [CrossRef]
  35. Manogaran, G.; Vijayakumar, V.; Varatharajan, R.; Malarvizhi Kumar, P.; Sundarasekar, R.; Hsu, C. Machine Learning Based Big Data Processing Framework for Cancer Diagnosis Using Hidden Markov Model and GM Clustering. Wirel. Pers. Commun. 2018, 102, 2099–2116. [Google Scholar] [CrossRef]
  36. Shlezinger, N.; Farsad, N.; Eldar, Y.C.; Goldsmith, A.J. ViterbiNet: A Deep Learning Based Viterbi Algorithm for Symbol Detection. Ieee T Wirel. Commun. 2020, 19, 3319–3331. [Google Scholar] [CrossRef] [Green Version]
  37. Liu, K.; Ergu, D.; Cai, Y.; Gong, B.; Sheng, J. A New Approach to Process the Unknown Words in Financial Public Opinion. Procedia Comput. Sci. 2019, 162, 523–531. [Google Scholar] [CrossRef]
  38. Qingshuang, Y.U.; Jie, Z.H.O.U.; Wenjuan, G.O.N.G. A Lightweight Sentiment Analysis Method. ZTE Commun. 2019, 17, 2. [Google Scholar]
  39. Zhang, D.; Xu, H.; Su, Z.; Xu, Y. Chinese comments sentiment classification based on word2vec and SVMperf. Expert Syst. Appl. 2015, 42, 1857–1863. [Google Scholar] [CrossRef]
  40. Han, L.; Li, C.; Shen, L.; Becerra Villanueva, J.A. Application in Feature Extraction of AE Signal for Rolling Bearing in EEMD and Cloud Similarity Measurement. Shock Vib. 2015, 2015, 752078. [Google Scholar] [CrossRef] [Green Version]
  41. Zhou, S.; Qian, S.; Chang, W.; Xiao, Y.; Cheng, Y. A Novel Bearing Multi-Fault Diagnosis Approach Based on Weighted Permutation Entropy and an Improved SVM Ensemble Classifier. Sensors 2018, 18, 1934. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Jin, K.H.; McCann, M.T.; Froustey, E.; Unser, M. Deep Convolutional Neural Network for Inverse Problems in Imaging. IEEE T Image Process. 2017, 26, 4509–4522. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  43. Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adeli, H. Deep convolutional neural network for the automated detection and diagnosis of seizure using EEG signals. Comput. Biol. Med. 2018, 100, 270–278. [Google Scholar] [CrossRef] [PubMed]
  44. Poria, S.; Cambria, E.; Gelbukh, A. Aspect extraction for opinion mining with a deep convolutional neural network. Knowl. Based Syst. 2016, 108, 42–49. [Google Scholar] [CrossRef]
Figure 1. The Proposed Aircraft Fault Diagnosis Model Structure.
Figure 1. The Proposed Aircraft Fault Diagnosis Model Structure.
Aerospace 08 00112 g001
Figure 2. Continuous Bag-of-Words (CBOW) Probabilistic Graphical Model.
Figure 2. Continuous Bag-of-Words (CBOW) Probabilistic Graphical Model.
Aerospace 08 00112 g002
Figure 3. Convolutional Neural Network (CNN) structure.
Figure 3. Convolutional Neural Network (CNN) structure.
Aerospace 08 00112 g003
Figure 4. The prior-knowledge CNN model structure.
Figure 4. The prior-knowledge CNN model structure.
Aerospace 08 00112 g004
Figure 5. Diagnosis effect comparison of CNN and prior-knowledge CNN for different fault types.
Figure 5. Diagnosis effect comparison of CNN and prior-knowledge CNN for different fault types.
Aerospace 08 00112 g005
Table 1. Examples of aircraft fault text.
Table 1. Examples of aircraft fault text.
Text NumberContentFault Type
1Compressor bladed was broken and the rotor was stuck.Mechanical fault (4)
2The booster switch cannot be closed, resulting in a broken motor shaft.Switch fault (9)
3Low output voltage due to resistance fault.Resistance fault (3)
4The vibration meter amplifier of Engine 4 did not indicate, the light did not work, and there was an internal fault.Indicator fault (7)
5Oil pipe aging led to oil leakage of Engine 3’s hydraulic oil inlet pipe.Equipment aging (5)
Table 2. Examples of aircraft fault text after preprocessing.
Table 2. Examples of aircraft fault text after preprocessing.
Text NumberText Preprocessing Result
1Compressor/bladed/broken/rotor/stuck
2booster switch/cannot/closed/resulting in/motor shaft/broken
3Low output voltage/due to/resistance fault
4vibration meter/amplifier/Engine 4/did not/indicate, light/did not work/there was/internal fault
5Oil pipe/aging/led to/oil/leakage/Engine 3/hydraulic/oil inlet pipe
Table 3. Aircraft fault text feature vector extracted by Word to Vector (Word2vec).
Table 3. Aircraft fault text feature vector extracted by Word to Vector (Word2vec).
NumberDimension
12349899100
10.02240.17500.12490.13610.08540.05360.0307
20.01230.13640.09330.10070.05600.03450.0208
30.01660.13350.09400.09930.06010.03720.0183
40.00900.11330.07500.0776 0.04970.02620.0221
50.00800.12360.08740.09480.05050.03130.0263
Table 4. CSM similarity between test set and expert fault knowledge base.
Table 4. CSM similarity between test set and expert fault knowledge base.
NumberFault Type
0123456789
10.95230.86450.87480.98450.84120.63120.74120.89360.85120.9621
20.81540.61260.22780.73860.62600.87900.99000.49810.58600.6609
30.98890.52770.90090.72980.00050.47950.57470.66640.89080.8654
40.80130.84520.08350.98230.92830.84490.63520.28190.20550.0170
Table 5. Comparison table of the classifier evaluation results.
Table 5. Comparison table of the classifier evaluation results.
Group IDMethod A c c F 1 A U C
ATF-IDF + MLP0.83250.81690.8187
BLDA + SVM0.89460.87210.8825
CTF-IDF + CNN0.87350.82240.8465
DLDA + CNN0.93640.91050.9476
EWord2vec + CNN0.96230.96470.9587
FWord2vec + MLP0.85680.86780.8628
GWord2vec + SVM0.92510.91680.9176
HWord2vec + Priori-knowledge CNN0.97420.97400.9844
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xu, Z.; Chen, B.; Zhou, S.; Chang, W.; Ji, X.; Wei, C.; Hou, W. A Text-Driven Aircraft Fault Diagnosis Model Based on a Word2vec and Priori-Knowledge Convolutional Neural Network. Aerospace 2021, 8, 112. https://doi.org/10.3390/aerospace8040112

AMA Style

Xu Z, Chen B, Zhou S, Chang W, Ji X, Wei C, Hou W. A Text-Driven Aircraft Fault Diagnosis Model Based on a Word2vec and Priori-Knowledge Convolutional Neural Network. Aerospace. 2021; 8(4):112. https://doi.org/10.3390/aerospace8040112

Chicago/Turabian Style

Xu, Zhenzhong, Bang Chen, Shenghan Zhou, Wenbing Chang, Xinpeng Ji, Chaofan Wei, and Wenkui Hou. 2021. "A Text-Driven Aircraft Fault Diagnosis Model Based on a Word2vec and Priori-Knowledge Convolutional Neural Network" Aerospace 8, no. 4: 112. https://doi.org/10.3390/aerospace8040112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop