Next Article in Journal
Evaluation of Biobed Bio-Mixture from Olive Oil Mill Wastewater Treatment as a Soil Organic Amendment in a Circular Economy Context
Next Article in Special Issue
A Bearing Fault Diagnosis Method Based on Wavelet Denoising and Machine Learning
Previous Article in Journal
Special Issue on Materials and Technologies in Oral Research
Previous Article in Special Issue
A Review on Vibration-Based Condition Monitoring of Rotating Machinery
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Method for Fault Diagnosis of Bearings with Small and Imbalanced Data Based on Generative Adversarial Networks

1
School of Electrical Engineering, Beijing Jiaotong University, Beijing 100044, China
2
Beijing Rail Transit Electrical Engineering Technology Research Center, Beijing 100044, China
3
School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
4
Center of Safety Technology, National Railway Administration of the People’s Republic of China, Beijing 100160, China
5
Bogie Technology Center, CRRC Tangshan Locomotive and Rolling Stock Co., Ltd., Tangshan 064000, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(14), 7346; https://doi.org/10.3390/app12147346
Submission received: 14 June 2022 / Revised: 15 July 2022 / Accepted: 20 July 2022 / Published: 21 July 2022

Abstract

:
The data-driven intelligent fault diagnosis method of rolling bearings has strict requirements regarding the number and balance of fault samples. However, in practical engineering application scenarios, mechanical equipment is usually in a normal state, and small and imbalanced (S & I) fault samples are common, which seriously reduces the accuracy and stability of the fault diagnosis model. To solve this problem, an auxiliary classifier generative adversarial network with spectral normalization (ACGAN-SN) is proposed in this paper. First, a generation module based on a deconvolution layer is built to generate false data from Gaussian noise. Second, to enhance the training stability of the model, the data label information is used to make label constraints on the generated fake data under the basic GAN framework. Spectral normalization constraints are imposed on the output of each layer of the neural network of the discriminator to realize the Lipschitz continuity condition so as to avoid vanishing or exploding gradients. Finally, based on the generated data and the original S & I dataset, seven kinds of bearing fault datasets are made, and the prediction results of the Bi-directional Long Short-Term Memory (BiLSTM) model is verified. The results show that the data generated by ACGAN-SN can significantly promote the performance of the fault diagnosis model under the S & I fault samples.

1. Introduction

As a key component of rotating machinery, rolling bearings can fix objects and carry loads [1,2,3]. Once a fault occurs on the bearing, it will cause the quality of the produced products to decline, and the entire mechanical equipment will be shut down or even paralyzed, which will bring immeasurable consequences and economic losses [4]. Therefore, fault diagnosis of rolling bearings is the focus and difficulty of current research [5,6,7].
Because the vibration signal is rich in a large number of impulse shocks, which indirectly reflect the bearing fault appearance, scholars have developed many intelligent fault diagnosis methods based on the vibration signals [8,9]. Among them, as a successful data-driven method, deep learning, can automatically extract fault features from vibration signals and realize fault model recognition. Compared with traditional data processing feature extraction and pattern recognition methods, the deep learning method greatly reduces the requirements for professional knowledge, and can also deal with fault diagnosis tasks under big data. For example, Zhang et al. [10] utilized the feature extraction ability of a convolutional neural network to realize the variable working condition fault diagnosis of rolling bearings. The authors of [11] proposed a performance evaluation method for bearing performance degradation based on a long short-term memory network (LSTM), which accurately predicted the remaining bearing life. Li et al. [12] extracted non-Euclidean spatial features of vibration signals based on a graph convolutional neural network, which expanded the research ideas of fault diagnosis.
Although the abovementioned intelligent diagnosis methods have achieved great success, these methods have a strong dependence on fault data. That is, a large amount of data and labels are necessary to train the model, which is beneficial for fitting nonlinear fault classification functions. Unfortunately, this contradicts the actual situation in industrial scenarios: mechanical equipment is generally in normal working conditions, there are sufficient normal data for research, and rolling bearing failure data are very sparse. In the field of fault diagnosis, S & I data [7] refer to the number ratio between normal data and fault data not equal to 1, and the data volume of the fault samples is very small. S & I data reduce the performance of deep learning models and have attracted the attention of many scholars [13,14].
At present, the analysis methods of S & I data problems can be divided into two categories: algorithm-based and data-based methods. From an algorithmic point of view, researchers are committed to improving the model structure and classification loss function to reduce the interference of S & I data on classification performance. For example, He et al. [15] proposed a nonlinear support tensor machine containing a dynamic penalty factor (DC-NSTM), which optimizes the support vector machine classifier by using the dynamic penalty factor and is applied to the fault diagnosis of unbalanced data of rotating machinery. Wu et al. analyzed the fault diagnosis of weighted loss [16] and focal loss [17] under the imbalanced data of a high-speed train, and the experimental results show that the weighted loss is helpful for the improvement of the diagnosis performance.
From the data point of view, the imbalance of the sample data is reduced by augmenting or enhancing the original data to enhance the classification performance of the model, which is represented by the synthetic minority oversampling technique (SMOTE) [18]. SMOTE performs random interpolation sampling in the minority sample data to generate data within the minority sample data distribution. The data distribution generated by this method is not the same as the test data distribution and cannot fit the real data distribution. Yi et al. [19] proposed a minority clustering SMOTE algorithm for the problem of unbalanced data synthesis in order to achieve oversampling of the minority data, and the algorithm has been validated in wind turbine blade failures. In addition, He et al. [20] proposed the adaptive synthetic technique (ADASYS), which improves the learning of the data distribution by enforcing larger synthetic weights on the minority class samples. Although the above methods can synthesize valid data, these methods are based on self-replication or the interpolation of samples, and do not take into account the distribution and label information of the test data.
The Generative Adversarial Network [21] is also a data synthesis algorithm in terms of the data level. Its core idea is an adversarial game: Gaussian noise is input into the generator to generate fake data, and the discriminator is used to identify real data and fake data. When the discriminator reaches a Nash equilibrium, the generated data distribution is considered to be consistent with the true distribution. Different from the methods described above, GAN has no requirement regarding the minimum number of samples and can be used flexibly to generate images and enhance resolution [22]. However, it is difficult for the basic GAN structure to reach the Nash equilibrium. To this end, Radford et.al [23] proposed a deep convolutional generative adversarial network (DCGAN), which greatly improves the model training stability. Additionally, the Wasserstein and Gradient Penalized-based GAN model (WGAN-GP) developed by Arjovsky et al. [24] can address the vanishing gradient problem while improving the training performance. Odena et al. [25] proposed ACGAN using label information, which can synthesize multilabel data at the same time, solving the defect that the unlabeled GAN model can only synthesize single data. These models greatly improve the possibility of GAN application in engineering and provide new ideas for rolling bearing fault data synthesis and fault diagnosis tasks under S & I data.
In recent years, there have been many valuable works on GAN-based fault data synthesis. Bui et al. [26] studied the automatic enhancement technology of mechanical data based on vibration signals and GANs and realized high-precision fault diagnosis. Shao et al. [13] utilized the basic structure of ACGAN and convolutional layers to train the generator, augmented with few-shot fault data. Li et al. [27] combined the Wasserstein distance and gradient penalty to improve ACGAN and proposed ACWGAN-GP, which was verified in the bearing dataset. The data synthesis and fault diagnosis of an induction motor under unbalanced data are realized by using the DCGAN structure [28]. Although the current GAN-based bearing fault data synthesis research has made good progress, under S & I data, there are still problems of unstable model training and gradient disappearance or explosion. More importantly, the data synthesis methods mentioned above have not been studied under extremely limited sample data, and the amount of data for each fault type in most studies are no less than five. This paper explores the problem of fault diagnosis in the case of unbalanced data calculation and insufficient data volume, assuming that there are enough normal data, and there is only one labeled datum under each fault type. The spectral normalization technique [29] can impose constraints on the parameters of the model to achieve the Lipschitz continuum condition, thereby enhancing the training stability of the model.
Therefore, this paper proposes a novel bearing fault method based on an auxiliary classifier generative adversarial network with spectral normalization (ACGAN-SN) under S & I data. This method can realize the synthesis of data under extremely unbalanced samples to use the synthesized data to train the model and realize fault diagnosis. The main contributions of this paper are summarized as follows:
  • We propose a new GAN model, named ACGAN-SN, by introducing the SN skill. This model improves the training stability of the GAN model and can generate high-quality data of corresponding labels arbitrarily, which provides a new idea for fault diagnosis under S & I data.
  • In the fault diagnosis stage, seven fault datasets were made based on the original S & I data and synthetic data, and a bidirectional LSTM model was proposed to fit the nonlinear fault classification function to realize the classification of the fault data.
  • To fully test the performance of the proposed method, we adopted seven classification models and four data synthesis methods for comparative research, and selected three data synthesis quality indicators to quantitatively describe the data synthesis ability of ACGAN-SN.
The rest of the paper is organized as follows: Section 2 describes the fault diagnosis methods used in this paper, including the ACGAN and ACGAN-SN model structures, as well as the model training and testing process. Section 3 presents the fault diagnosis results based on ACGAN-SN data synthesis and BiLSTM classification and the performance comparison study of the proposed method. Section 4 summarizes the full work.

2. Problem Formulation

The intelligent fault diagnosis model is based on data analysis. A large amount of data are helpful to improve the diagnostic accuracy of the model. However, intelligent fault diagnosis in engineering scenarios is a typical small and unbalanced data problem [7]. At this time, the model is prone to overfitting or underfitting, which leads to a significant reduction in the accuracy of fault diagnosis. Therefore, how to realize intelligent fault diagnosis under S & I data is an urgent problem to be solved. The fault diagnosis problem to be solved in this paper is defined as follows: In the training phase, the data in the normal state are sufficient, while the data in the faulty state are extremely limited, and the samples of the normal data and the faulty data are unbalanced. The dataset in the testing phase is well balanced. In the experiments in Section 4, the number of data samples for each fault state in the training set is only 1.

3. Methods

In this section, we introduce the basic structure of the ACGAN model. Then, the framework of the proposed fault diagnosis method is explained, including the data preprocessing module, data synthesis module, fault diagnosis module, and model training and testing procedures.

3.1. Auxiliary Classifier Generative Adversarial Networks

Based on the idea of an adversarial game, Goodfellow et al. [21] pioneered the GAN model. The model consists of a generator (G) and a discriminator (D), where the input of the generator is the noise vector z and the output is the synthetic fake data X S y n t h e t i c = G ( z ) . The input of the discriminator is real data X R e a l and synthetic data, and the output is true and false labels. As shown in Figure 1a, the training of the GAN model is divided into two stages. Train the generator: The parameters of the discriminator are fixed, and the synthetic data output by the generator is output by the discriminator to obtain the true and false discrimination loss. Train the discriminator: The generator parameters are fixed, and the synthetic data and real data are passed into the discriminator output to obtain the true and false discrimination loss. The two joint losses above are passed through the backpropagation algorithm, and the overall loss function is as follows:
L ( D , G ) = E s ~ P data   [ log D ( s ) ] + E z ~ P z [ log ( 1 D ( G ( z ) ) ) ]
where, s is the discriminator input vector s ~ P data , P data represent the real data distribution. z is the generator input vector z ~ P z , and P z represent the noisy data distribution. E ( ) is the mathematical expectation. The optimization process of the model can also be simplified to the maximum and minimum function problem, and the formula is:
Goal   = arg min G max D L ( G , D )
The meaning of Formula (2) can be understood as follows. The optimization goal of the GAN model is that model D, corresponding to the maximum loss function L ( G , D ) , and model G, corresponding to the minimum loss, are the trained GAN model.
ACGAN is an improvement of GAN, which utilizes the introduction of label information to add an auxiliary classifier to the discriminator, thereby improving the training stability of the model and generating higher quality data samples, as shown in Figure 1b. Another advantage of ACGAN is that it only needs one trained model to synthesize the data of any label, reducing the time to repeatedly train the model for different label data.
Specifically, the generator of ACGAN can be expressed as X S y n t h e t i c = G ( z , y ) , where y is a label vector, and the discriminator can not only judge the authenticity of the data but also the category of the data. During the training process, both the discriminator and the generator have two-part loss functions, where L T - F is the true and false loss functions and L class   is the label loss function.
L T - F = E s ~ P data   [ log D ( s ) ] + E z ~ P ( z ) [ log ( 1 D ( G ( z ) ) ) ]
L class   = E s ~ P data   log P Y = y | S R e a l   + E z ~ P ( z ) log P Y = y | S S y n t h e t i c
The training process of the model is similar to that of GAN. First, the parameters of the discriminator are fixed, the generator is trained, and the loss function is L T - F L class . Second, the generator parameters are fixed, the discriminator is trained, and the loss function is L T - F + L class .

3.2. Overall Framework

In this section, we present the overall framework of the proposed method, including the following three modules: the FFT-based data preprocessing module, ACGAN-SN-based data generation module, and the BiLSTM-based fault diagnosis module.

3.2.1. Data Preprocessing Module

The original time−domain vibration signal is easily disturbed by background noise, which affects the feature extraction work. The spectral lines in the spectrum are often used as an important basis for fault feature extraction [8]. For this reason, we use FFT to preprocess the original data and convert the time domain signal into a frequency domain signal to provide reliable data for the subsequent data generation module and fault diagnosis module. First, the discrete Fourier transform formula of the signal x n = x 0 , x N 1 is as follows:
x k = n = 0 N 1 x n e i 2 π k n / N
where xk is the result of DFT. To reduce the calculation time of Formula (5), Formula (5) can be converted to Formulas (6) and (7):
x k = m = 0 N / 2 1 x 2 m e i 2 π k ( 2 m ) / N + m = 0 N / 2 1 x 2 m + 1 e i 2 π k ( 2 m + 1 ) / N
x k = m = 0 N / 2 1 x 2 m e i 2 π k ( m ) / ( N / 2 ) + m = 0 N / 2 1 x 2 m + 1 e i 2 π k ( m ) / ( N / 2 )
e i 2 π / N is a primitive N-th root of 1 [26]. Formula (6) is the FFT result corresponding to the odd point, and Formula (7) is the FFT result corresponding to the even point. Therefore, the computational complexity is not reduced, which consists of 2 × [ ( N / 2 ) × N ] , for a total of N 2 . By executing Equations (5)–(7) on the original signal, the frequency domain signal of the original signal can be obtained.

3.2.2. Data Generation Module

The feature generation module is the ACGAN-SN model proposed in this paper, which is an improvement on the ACGAN model in Section 3.1. Considering the feature extraction capability of the CNN model, in the generator and discriminator of ACGAN, convolution and deconvolution are used to form the discriminator and generator. The input of the discriminator is 32 × 32 data after the reshape of the 1024-point frequency domain data, and then four convolution layers are used to extract 256-dimensional features, for which each feature size is 2 × 2. The output results are reshaped into one-dimensional feature data and then input into two fully connected layers—one fully connected output size is 1, which is used for true and false discrimination. Another fully connected layer with an output size of 10 is used for label prediction.
The structure of the generator is symmetrical with the discriminator as a whole. First, the 100-dimensional noise vector is multiplied by the one-dimensional label vector to obtain the 100-dimensional input vector. Then, after full connection and reshape transformation, a 256 × 2 × 2 three-dimensional matrix is obtained. Four deconvolution operations are performed on the matrix to obtain a three-dimensional matrix of 1 × 32 × 32. Finally, the reshape operation is used to obtain a one-dimensional vector of 1 × 1024, which is the generated data. The specific parameters of the generator and discriminator are shown in Table 1.
To make the model stable for training, the gradient penalty term [30] is often added to the GAN discriminator. However, the gradient penalty cannot constrain the data in the whole structural space, which leads to the space between the synthetic data and the real data being shifted, the GAN model cannot be trained stably, and the computational speed of the gradient penalty is also a defect. SN is a parameter regularization technique, and the literature [29] has proven that SN can make the discriminator satisfy the Lipschitz constraints, thereby improving the training stability of the model and greatly reducing the computation time compared with the gradient penalty. Assuming that θ is the weight parameter matrix of the model, SN can be expressed by Formula (8):
θ l = θ θ 2 = θ max ( θ T θ )
where θ 2 is the spectral norm of θ . max ( θ T θ ) is the largest eigenvalue of θ T θ . θ l is the weight parameter of the l-th network. Based on SN technology, this paper imposes regularization constraints on each layer of the neural network of the discriminator to realize the stable training process of the model, as shown in Figure 2. Once the model is trained, the generator can be used to synthesize a large amount of fake data. To evaluate the quality of the data, we used three quantitative indicators to analyze the data similarity. They are Pearson Correlation Coefficient (PCC), Cosine Similarity (CS), and Wasserstein distance. Through this similarity measure, the model training process is regulated to obtain the best model parameters.

3.2.3. Fault Diagnosis Module

Based on the synthetic data in Section 3.2.2., we used mixed data (including real data and synthetic data) to train BiLSTM, and then validated the trained model in the test set.
LSTM is a special kind of recurrent neural network (RNN) that is proposed to solve the problem of the long-range dependence of RNNs. For a given one-dimensional sequence x = ( x 1 , x 2 , x n ) , the RNN model can be built using Formulas (9) and (10).
h t = f W x h x t + W h h h t 1 + b h
y t = W h y h t + b y
where h t is the hidden layer sequence. y t represents the output layer sequence at time t. W x h is the weight matrix between the input layer and the hidden layer. b h is the bias of the hidden layer. f represents the activation function.
To solve the problem that RNN cannot handle long sequences, LSTM has improved RNN. The model structure diagram is shown in Figure 3. Its calculation method is as follows:
i t = σ W x i x t + W h i h t 1 + W c i c t 1 + b i
f t = σ W x f x t + W h f h t 1 + W c f c t 1 + b f
c t = f t c t 1 + i t t a n h W x c x t + W h c h t 1 + b c
o t = σ W x o x t + W h o h t 1 + W c o c t + b o
h t = o t τ c t
where, i t , f t , c t , o t represent the input gate, forget gate, candidate memory cell, and output gate, respectively. σ , τ are the sigmoid and hyperbolic tangent activation functions, respectively.
On the basis of LSTM, BiLSTM realizes the context-based judgment of the model by stacking two layers of LSTM so that the model has the ability to understand the overall data [31].
To make the one-dimensional data meet the input size of BiLSTM, two embedding layers and a one-dimensional flip layer are used to transform the signal dimension and then input it into the BiLSTM model. The input feature dimension of the model is 32, and the feature dimension of the hidden layer is 64. The number of recurrent neural network layers is 2, and the bidirectional LSTM is stacked, which is easy to implement in PyTorch. Then, two fully connected layers are used to output the model prediction results to classify the data. The structural parameter settings of the fault diagnosis module are shown in Table 2.

3.2.4. Training and Testing Procedure of the Proposed Fault Diagnosis Framework

Based on the data preprocessing, data generation, and fault diagnosis modules built above, the training and testing process of the proposed overall framework is shown in Figure 4. It consists of the following steps:
  • The raw vibration signal data are acquired and divided into an S & I dataset for training and a balanced dataset for testing.
  • FFT is performed on the two datasets to obtain the preprocessed frequency domain signal.
  • The ACGAN-SN model described in Section 3.2.2. is built to initialize the model weight parameters and hyperparameters, and to train the ACGAN-SN until the Nash equilibrium conditions are met. The trained model is used to synthesize the data that require labels.
  • Real data and synthetic data are mixed to form a variety of datasets with different imbalance ratios for the training of the fault diagnosis models.
  • The BiLSTM-based fault diagnosis model described in Section 3.2.3 is built, the BiLSTM model is trained with different datasets, the fault state of the test set with the trained model is predicted and compared with the other methods to evaluate the performance of the proposed fault diagnosis method.

4. Results

In this section, we design related experiments to validate the proposed method. First, the bearing dataset used in the experiment is introduced. Second, we use the proposed ACGAN-SN model to synthesize the bearing data with different fault states and adopt the PCC, CS, and Wasserstein distance to measure the quality of the synthetic data. Then, synthetic data are gradually added to the S & I dataset to form a failure dataset with seven imbalance ratios, and a bidirectional LSTM model is used to classify the data. Finally, we comparatively analyze the superiority of the proposed method in terms of data synthesis and classifier dimensions.

4.1. Dateset Introduction

We chose the rolling bearing fault dataset publicly available from the Electrical Engineering Laboratory of Case Western Reserve University. It is composed of a motor, torque sensor/decoder, power test meter, and electronic controller. The tested bearing is the drive end bearing, the model is SKF6205. The original vibration signal is collected by collecting the acceleration sensor installed at the driving end. The sampling frequency is 12 kHz, the load is 2 hp, and the sampling time is 10 s. The parameters of the rolling bearing are shown in Table 3. There are four bearing health states and three fault diameters in the dataset. The health states include normal, roller failure, inner ring failure, and outer ring failure. The damage diameters are 0.007, 0.014, and 0.021 inches. The data are labeled as 10 state data: Normal, Ball1, Ball2, Ball3, Inner1, Inner2, Inner3, Inner3, Outer1, Outer2, and Outer3. Considering the FFT transformation, 2048 is selected as the sample sampling length, and the sample set is made in an overlapping manner. In the training set, the normal data of the majority class contain 100 samples, and each fault data of the minority class has 10 samples. In the test set, there are 100 samples of each type of data. Figure 5 is the time domain waveform diagram of 10 types of fault data, and Table 4 shows a detailed description of the dataset.

4.2. Sample Generation and Evaluation

In this section, we use the training set in Table 4 to train the ACGAN-SN model. During the training process, the hyperparameters are set as follows: 3000 epochs, learning rate of 0.0002, size 10 batch size, the optimizer is Adam, and the coefficients used for computing the running averages of the gradient and its square are 0.5 and 0.999, respectively.
First, the original time−domain signal is converted into a frequency−domain signal through FFT, the signal length is reduced from 2048 to 1024 points and then reshaped into a 32 × 32 grayscale matrix, and 32 × 32 is used as the input of the discriminator and the output size of the generator. Then, the set parameters and data are used to train the ACGAN-SN. Figure 6 shows the transformation curve of the loss function of the discriminator and the generator during the training process, and Figure 7 shows the transformation curve of the classification accuracy of the discriminator and the generator. From the loss results in Figure 6, it can be seen that the discriminator and generator begin to show a decreasing trend of loss after 250 epochs and maintain a steady state between 1000 and 3000 epochs. On the whole, the model reaches the Nash equilibrium at approximately 1000 epochs, which represents the equilibrium state of the generator and discriminator confrontation game at this time. We believe that the generator at 3000 epochs achieves the level of falsehood, that is, it already has the ability to synthesize high-quality fault sample data. It can be seen from the accuracy transformation diagram in Figure 7 that the classification accuracy of the discriminator and generator during the training process fluctuates and rises, which corresponds to the trend in Figure 6. It is proven that the data synthesized by the generator at this time have similar characteristics to the original data, so that the classifier can distinguish the labels corresponding to the synthetic data and the real data.
The Wasserstein distance is often used to measure the similarity between two distributions, so this paper uses this metric to indirectly observe the training process of the model. The Wasserstein distance transformation diagram is shown in Figure 8. We can see that the Wasserstein distance from 1000 epochs gradually tends to 0. The 10 types of fault data synthesized at this time are compared with the real data, as shown in Figure 9, where the blue curve is the synthetic data and the orange curve is the real data. We can see that the synthetic data are very similar to the real data. In particular, some frequency spectrum lines representing fault features correspond to the real values, which indicates that the synthetic data have some fault frequencies and their multipliers and plays a key role in whether the signal is classified correctly or not. Therefore, we have good reason to believe that after 3000 epochs of training, the model training effect is ideal, and the data synthesized by the generator can provide data support for downstream classification tasks.
To quantitatively characterize the quality of the synthetic data, we introduce three similarity indicators as the basis for discrimination, including PCC, CS, and Wasserstein distance. PCC is the correlation between two sets of data, CS represents the cosine value of the angle between the two samples, and the Wasserstein distance represents the similarity between the two data distributions. A PCC greater than 0.5 represents a high similarity, and the larger the value of CS, the higher the similarity. The PCC and CS can take values in the range 0–1. Table 5 summarizes the corresponding values of the similarity index between synthetic data and real data. The minimum value of PCC is 0.7494, corresponding to the data of label 2, and the maximum value of 0.9530 corresponds to the data of label 7. A CS minimum value of 0.8142 corresponds to label 2, and a maximum value of 0.9609 corresponds to label 7. These two indicators are better than those in the literature [27]. As the Wasserstein distance is the distance between distributions, in this paper, the distribution distance is equivalent to the distance between two samples, and the value of the Wasserstein distance is 0.0530. Based on the above analysis, we can draw the following conclusions: the trained ACGAN-SN model can synthesize data that are highly similar to the real data.

4.3. Imbalanced Fault Diagnosis

In this section, we make seven training sets and one testing set based on the datasets in Table 1. A total of 100 normal data and only one sample of each fault class are selected. As an extremely limited sample set, named training dataset 1, is used to simulate real S & I failure data scenarios. On this basis, the generator model trained is used to synthesize the sample data and gradually adds to dataset 1 so that the majority class and minority class ratios are 100:1, 25:1, 20:1, 10:1, 2:1, and 1:1, and training sets 2–7 are obtained. The specific details of each dataset are shown in Table 6.
To test the analytical performance of the proposed method on the S & I dataset, different classification models are used for testing. Datasets 1–7 are used for training and then classifying the testing set. There are a total of seven different classification models, including: Multilayer perceptron MLP with [1024, 128, 64] hidden layer neurons; convolutional neural network structure with output channels [16, 32, 64, 128], for which the specific parameters can be found in [32]; LeNet; ResNet; and AlexNet. For the BiLSTM model in the proposed method, the parameter settings can be found in Section 3.
We use three indicators to measure the performance of the model, namely accuracy, F1-score (F1), and the average area under the receiver operating characteristic curve (AUC). The fault diagnosis results of the six trained classification models in the testing dataset are shown in Figure 10. We found that when dataset 1 composed of real data was used for testing, all of the models were affected by unbalanced samples and could not achieve the best diagnosis progress. This is because the number of samples in the dataset was too sparse. The number of samples is only 1, which is not conducive to the model learning the real data distribution law. The synthetic data are added to dataset 1, i.e., datasets 2–7, resulting in a balanced dataset 7. During this process, the diagnostic accuracy of MLP, CNN, ResNet, and LSTM is greatly improved. AlexNet’s diagnostic accuracy jumps from 10% to 99.50% from dataset 4, while LeNet only reaches 94.5% and 99.3%, respectively in balanced dataset 7. The following inferences can be drawn:
  • LeNet and AlexNet have the worst performance in processing S & I data. The possible reason for this is that they cannot fit the data distribution of the test set from the S & I data. A diagnostic accuracy of 10% corresponds to the prediction results of the normal data, which indicates that the imbalanced data do indeed affect the prediction results of the model.
  • The diagnosis accuracy of the MLP, CNN, ResNet, and LSTM models has been greatly improved since dataset 2, which shows that the synthetic fault data promote the training process of the model, and the synthetic data are very similar to the real data.
  • We found that BiLSTM achieved an accuracy of 97.6% in dataset 2, and the diagnostic accuracy in dataset 2−dataset 7 was higher than that of the comparison method, which shows that BiLSTM has a strong classification performance for the bearing frequency domain data. Although the effect of CNN and ResNet is also very good, the calculation time of ResNet is too long. Considering the time cost and diagnosis accuracy, BiLSTM is used in this paper to perform fault diagnosis tasks on S & I data.
To demonstrate the performance of ACGAN-SN in data synthesis, four contrasting methods are used in this paper to demonstrate the superiority of the proposed method. including ACWGAN [33], ACGAN [13], Ro-Sampling [34], and SMOTE [35]. The data synthesis methods all use the original data in Table 4 as the input, and the classification model adopts the MLP and the BiLSTM model proposed in this paper; that is, there are a total of 10 combined algorithms. We used the above 10 algorithms to perform classification tasks on datasets 1–7, for which the MLP-based diagnosis results are shown in Figure 11a and the LSTM-based diagnosis results are shown in Figure 11b. We found that compared with the original unbalanced dataset 1, different data synthesis methods have a certain improvement in diagnostic performance in different classification models, but the diagnostic effects of ACWGAN and ACGAN are unstable. RO-Sampling, SMOTE, and ACGAN-SN have a positive effect on the diagnostic accuracy of the classification model. Notably, the proposed ACGAN-SN is of great help for improving the diagnostic performance of the MLP and LSTM classification models. Compared with the comparative data synthesis algorithms, the combined algorithm based on ACGAN-SN and BiLSTM has the best diagnostic accuracy and is the most stable. This shows that the data synthesized by ACGAN-SN have the highest similarity with the original data.
To observe the internal operation process of the fault diagnosis model more intuitively, we used t-distributed stochastic neighbor embedding (t-SNE) to visualize the feature distribution. Figure 12a–e show the t-SNE diagrams of ACGAN-SN, ACWGAN, ACGAN, RO-Sampling, and SMOTE models with an imbalance ratio of 25:1, respectively. As can be seen from Figure 12, the proposed ACGAN-SN method separates the different fault data well, while the other four methods did not have a good clustering effect on the fault data, which once again proves the advantages of the proposed method.

5. Conclusions

In practical engineering, in order to solve the problem that S & I data reduce the accuracy of fault diagnosis and interfere with the stability of model training, this paper presents an ACGAN-SN data synthesis model. The method uses synthetic high-quality fault data to solve the fault diagnosis problem under S & I data. Through the analysis of the experimental results, the following conclusions can be drawn:
  • By introducing kernel norm regularization into the ACGAN model, the training stability of the model can be effectively improved, and gradient disappearance and model collapse can be avoided.
  • Compared with traditional SMOTE and RO-Sampling data synthesis algorithms, ACGAN-SN can synthesize high-quality fault sample data. The similarity between the synthetic data and real data can reach 95.84%.
  • The data synthesized by ACGAN-SN effectively improve the fault diagnosis accuracy under S & I data.

Author Contributions

Conceptualization, Q.T. and F.L.; methodology, F.L.; software, F.L.; validation, Z.F. and Q.W.; formal analysis, Q.W., T.G. and J.C; investigation, Z.F.; resources, G.A.; data curation, G.A.; writing—original draft preparation, F.L.; writing—review and editing, Q.T.; visualization, Z.F.; supervision, T.G. and J.C.; project administration, J.C. and T.G.; funding acquisition, Q.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Beijing Natural Science Foundation (grant no. L211010, 3212032) and the National Railway Administration (grant no. AJ2021-043). The authors wish to extend their sincere thanks for the support from the Beijing Municipal Science and Technology Commission of China.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Heras, I.; Aguirrebeitia, J.; Abasolo, M.; Coria, I.; Escanciano, I. Load distribution and friction torque in four-point contact slewing bearings considering manufacturing errors and ring flexibility. Mech. Mach. Theory 2019, 137, 23–36. [Google Scholar] [CrossRef]
  2. Ambrożkiewicz, B.; Syta, A.; Gassner, A.; Georgiadis, A.; Litak, G.; Meier, N. The influence of the radial internal clearance on the dynamic response of self-aligning ball bearings. Mech. Syst. Signal Process. 2022, 171, 108954. [Google Scholar] [CrossRef]
  3. Gao, S.; Chatterton, S.; Pennacchi, P.; Han, Q.; Chu, F. Skidding and cage whirling of angular contact ball bearings: Kinematic-hertzian contact-thermal-elasto-hydrodynamic model with thermal expansion and experimental validation. Mech. Syst. Signal Process. 2021, 166, 108427. [Google Scholar] [CrossRef]
  4. Chen, H.; Jiang, B.; Ding, S.X.; Huang, B. Data-Driven Fault Diagnosis for Traction Systems in High-Speed Trains: A Survey, Challenges, and Perspectives. Mech. Syst. Signal Process. 2022, 23, 1700–1716. [Google Scholar] [CrossRef]
  5. Lee, J.; Wu, F.; Zhao, W.; Ghaffari, M.; Liao, L.; Siegel, D. Prognostics and health management design for rotary machinery systems—Reviews, methodology and applications. Mech. Syst. Signal Process. 2014, 42, 314–334. [Google Scholar] [CrossRef]
  6. Li, W.; Huang, R.; Li, J.; Liao, Y.; Chen, Z.; He, G.; Yan, R.; Gryllias, K. A perspective survey on deep transfer learning for fault diagnosis in industrial scenarios: Theories, applications and challenges. Mech. Syst. Signal Process. 2022, 167, 108487. [Google Scholar] [CrossRef]
  7. Zhang, T.; Chen, J.; Li, F.; Zhang, K.; Lv, H.; He, S.; Xu, E. Intelligent fault diagnosis of machines with small & imbalanced data: A state-of-the-art review and possible extensions. ISA Trans. 2022, 119, 152–171. [Google Scholar]
  8. Randall, R.B.; Antoni, J. Rolling element bearing diagnostics—A tutorial. Mech. Syst. Signal Process. 2011, 25, 485–520. [Google Scholar] [CrossRef]
  9. Georgoulas, G.; Loutas, T.; Stylios, C.D.; Kostopoulos, V. Bearing fault detection based on hybrid ensemble detector and empirical mode decomposition. Mech. Syst. Signal Process. 2013, 41, 510–525. [Google Scholar] [CrossRef]
  10. Zhang, W.; Li, C.; Peng, G.; Chen, Y.; Zhang, Z. A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mech. Syst. Signal Process. 2018, 100, 439–453. [Google Scholar] [CrossRef]
  11. Zhang, B.; Zhang, S.; Li, W. Bearing performance degradation assessment using long short-term memory recurrent network. Comput. Ind. 2018, 106, 14–29. [Google Scholar] [CrossRef]
  12. Li, T.; Zhao, Z.; Sun, C.; Yan, R.; Chen, X. Multireceptive Field Graph Convolutional Networks for Machine Fault Diagnosis. IEEE Trans. Ind. Electron. 2021, 68, 12739–12749. [Google Scholar] [CrossRef]
  13. Shao, S.; Wang, P.; Yan, R. Generative adversarial networks for data augmentation in machine fault diagnosis. Comput. Ind. 2019, 106, 85–93. [Google Scholar] [CrossRef]
  14. Xing, S.; Lei, Y.; Yang, B.; Lu, N. Adaptive Knowledge Transfer by Continual Weighted Updating of Filter Kernels for Few-Shot Fault Diagnosis of Machines. IEEE Trans. Ind. Electron. 2021, 69, 1968–1976. [Google Scholar] [CrossRef]
  15. He, Z.; Shao, H.; Cheng, J.; Zhao, X.; Yang, Y. Support tensor machine with dynamic penalty factors and its application to the fault diagnosis of rotating machinery with unbalanced data. Mech. Syst. Signal Process. 2019, 141, 106441. [Google Scholar] [CrossRef]
  16. Patterson, J.; Gibson, A. Deep Learning: A Practitioner’s Approach; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017. [Google Scholar]
  17. Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
  18. Japkowicz, N.; Stephen, S. The class imbalance problem: A systematic study. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
  19. Yi, H.; Jiang, Q.; Yan, X.; Wang, B. Imbalanced Classification Based on Minority Clustering Synthetic Minority Oversampling Technique With Wind Turbine Fault Detection Application. IEEE Trans. Ind. Inform. 2020, 17, 5867–5875. [Google Scholar] [CrossRef]
  20. Haibo, H.; Yang, B.; Garcia, E.A.; Shutao, L. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008. [Google Scholar]
  21. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 2, 2672–2680. [Google Scholar]
  22. Navidan, H.; Moshiri, P.F.; Nabati, M.; Shahbazian, R.; Ghorashi, S.A.; Shah-Mansouri, V.; Windridge, D. Generative Adversarial Networks (GANs) in networking: A comprehensive survey & evaluation. Comput. Netw. 2021, 194, 108149. [Google Scholar]
  23. Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2016, arXiv:1511.06434. [Google Scholar]
  24. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved Training of Wasserstein GANs. arXiv 2017, arXiv:1704.00028. [Google Scholar]
  25. Odena, A.; Olah, C.; Shlens, J. Conditional Image Synthesis with Auxiliary Classifier GANs. arXiv 2016, arXiv:1610.09585. [Google Scholar]
  26. Bui, V.; Pham, T.; Nguyen, H.; Jang, Y. Data Augmentation Using Generative Adversarial Network for Automatic Machine Fault Detection Based on Vibration Signals. Appl. Sci. 2021, 11, 2166. [Google Scholar] [CrossRef]
  27. Li, Z.; Zheng, T.; Wang, Y.; Cao, Z.; Guo, Z.; Fu, H. A Novel Method for Imbalanced Fault Diagnosis of Rotating Machinery Based on Generative Adversarial Networks. IEEE Trans. Instrum. Meas. 2020, 70, 1–17. [Google Scholar] [CrossRef]
  28. Chang, H.-C.; Wang, Y.-C.; Shih, Y.-Y.; Kuo, C.-C. Fault Diagnosis of Induction Motors with Imbalanced Data Using Deep Convolutional Generative Adversarial Network. Appl. Sci. 2022, 12, 4080. [Google Scholar] [CrossRef]
  29. Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral Normalization for Generative Adversarial Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  30. Miao, J.; Wang, J.; Zhang, D.; Miao, Q. Improved Generative Adversarial Network for Rotating Component Fault Diagnosis in Scenarios with Extremely Limited Data. IEEE Trans. Instrum. Meas. 2021, 71, 1–13. [Google Scholar] [CrossRef]
  31. Sanagavarapu, S.; Sridhar, S.; Chitrakala, S. News Categorization using Hybrid BiLSTM-ANN Model with Feature Engineering. In Proceedings of the IEEE Annual Computing and Communication Workshop and Conference, Las Vegas, NV, USA, 27–30 January 2021. [Google Scholar]
  32. Zhao, Z.; Li, T.; Wu, J.; Sun, C.; Wang, S.; Yan, R.; Chen, X. Deep learning algorithms for rotating machinery intelligent diagnosis: An open source benchmark study. ISA Trans. 2020, 107, 224–255. [Google Scholar] [CrossRef]
  33. Zhang, P.; Wen, G.; Dong, S.; Lin, H.; Huang, X.; Tian, X.; Chen, X. A Novel Multiscale Lightweight Fault Diagnosis Model Based on the Idea of Adversarial Learning. IEEE Trans. Instrum. Meas. 2021, 70, 1–15. [Google Scholar] [CrossRef]
  34. Zhang, H.; Li, M. RWO-Sampling: A random walk over-sampling approach to imbalanced data classification. Inf. Fusion 2014, 20, 99–116. [Google Scholar] [CrossRef]
  35. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. arXiv 2011, arXiv:1106.1813. [Google Scholar] [CrossRef]
Figure 1. Typical architecture of (a) GAN and (b) the auxiliary classifier GAN.
Figure 1. Typical architecture of (a) GAN and (b) the auxiliary classifier GAN.
Applsci 12 07346 g001
Figure 2. The framework of our proposed ACGAN-SN model for fault diagnosis with S & I data.
Figure 2. The framework of our proposed ACGAN-SN model for fault diagnosis with S & I data.
Applsci 12 07346 g002
Figure 3. The structure of a LSTM cell.
Figure 3. The structure of a LSTM cell.
Applsci 12 07346 g003
Figure 4. The flow chart of training and testing for the proposed framework.
Figure 4. The flow chart of training and testing for the proposed framework.
Applsci 12 07346 g004
Figure 5. Time domain waveforms of the 10 kinds of samples in the bearing dataset.
Figure 5. Time domain waveforms of the 10 kinds of samples in the bearing dataset.
Applsci 12 07346 g005
Figure 6. The discrimination loss and generation loss curve of the ACGAN-SN model.
Figure 6. The discrimination loss and generation loss curve of the ACGAN-SN model.
Applsci 12 07346 g006
Figure 7. The discrimination loss and generation accuracy curve of the ACGAN-SN model.
Figure 7. The discrimination loss and generation accuracy curve of the ACGAN-SN model.
Applsci 12 07346 g007
Figure 8. The change curve of the Wasserstein distance.
Figure 8. The change curve of the Wasserstein distance.
Applsci 12 07346 g008
Figure 9. Comparison of the frequency spectrum of the real and synthetic samples data of the bearing dataset.
Figure 9. Comparison of the frequency spectrum of the real and synthetic samples data of the bearing dataset.
Applsci 12 07346 g009
Figure 10. Diagnosis results with different classification models after using ACGAN-SN.
Figure 10. Diagnosis results with different classification models after using ACGAN-SN.
Applsci 12 07346 g010
Figure 11. Diagnosis results with different data augmentation approaches: (a) MLP-based model and (b) BiLSTM-based model.
Figure 11. Diagnosis results with different data augmentation approaches: (a) MLP-based model and (b) BiLSTM-based model.
Applsci 12 07346 g011
Figure 12. The t-SNE results of (a) ACGAN-SN, (b) ACWGAN, (c) ACGAN, (d) RandomOS, and (e) SMOTE.
Figure 12. The t-SNE results of (a) ACGAN-SN, (b) ACWGAN, (c) ACGAN, (d) RandomOS, and (e) SMOTE.
Applsci 12 07346 g012
Table 1. The structure of the discriminator and generator.
Table 1. The structure of the discriminator and generator.
LayerParameters SettingOutput Size
DiscriminatorInputReshape: 1024–32 × 321 × 32 × 32
Conv1Kernel size: 32 × 5 × 5, 8, 2, LeakyReLU, dropout32 × 16 × 16
Conv2Kernel size: 64 × 5 × 5, 8, 2, LeakyReLU, dropout, BN64 × 8 × 8
Conv3Kernel size: 128 × 5 × 5, 8, 2, LeakyReLU, dropout, BN128 × 4 × 4
Conv4Kernel size: 256 × 5 × 5, 8, 2, LeakyReLU, dropout, BN256 × 2 × 2
Output_11024 × 1, Sigmoid1
Output_21024 × 10, Softmax10
GeneratorInputFully connection layer: 100 × 1024;
Reshape: 1 × 1024–256 × 2 × 2
256 × 2 × 2
Deconv1Kernel size: 128 × 5 × 5, 1, 1, ReLU, BN128 × 4 × 4
Deconv2Kernel size: 64 × 5 × 5, 1, 0, ReLU, BN64 × 8 × 8
Deconv3Kernel size: 32 × 5 × 5, 3, 5, ReLU, BN32 × 16 × 16
Deconv4Kernel size: 1 × 5 × 5, 3, 9, Tanh1 × 32 × 32
OutputReshape: 32 × 32–10241 × 1024
Table 2. The structure of the fault diagnosis module.
Table 2. The structure of the fault diagnosis module.
LayerParameters SettingOutput Size
Embedding1Conv1d: 16 × 3, 1, ReLU, BN, Maxpool1d: (2,2)1 × 1024
Embedding2Conv1d: 32 × 3, 1, ReLU, BN, AdaptiveMaxpool1d: 2516 × 512
TransposeReshape: 16 × 512–32 × 25;
Data replacement between 1 and 2 dimensional
25 × 32
BiLSTM32 × 64 × 2, Tanh25 × 128
Reshape25 × 128–1 × 32001 × 3200
FC13200 × 2561 × 256
FC2256 × 10, Softmax1 × 10
Table 3. The parameters of the rolling bearing.
Table 3. The parameters of the rolling bearing.
Bearing TypePitch DiameterBall DiameterNumber of BallsSpeedLoad
6205-2RS JEM SKF39.04 mm7.94 mm91750 rpm2 hp
Table 4. The details of the bearing dataset.
Table 4. The details of the bearing dataset.
Sample ClassDamage Diameter (Inches)Sample LengthTraining
Set
Testing
Set
Label
MajorityNormal20481001000
MajorityBall1B0.0072048101001
Ball2B0.0142048101002
Ball3B0.0212048101003
Inner1I0.0072048101004
Inner2I0.0142048101005
Inner3I0.0212048101006
Outer1O0.0072048101007
Outer2O0.0142048101008
Outer3O0.0212048101009
Table 5. The similarity indicators between the real data and the synthetic data.
Table 5. The similarity indicators between the real data and the synthetic data.
Sample class0123456789
PCC0.82250.87020.74940.83880.93140.81250.95060.95300.79570.8990
CS0.83780.89620.81420.87000.94810.87210.95840.96090.84630.9232
Wasserstein distance 0.05300.05300.05300.05300.05300.05300.05300.05300.05300.0530
Table 6. Component of the bearing datasets for diagnosis.
Table 6. Component of the bearing datasets for diagnosis.
Dataset Dataset1Dataset2Dataset3Dataset4Dataset5Dataset6Dataset7
Testing sample number of each classReal100100100100100100100
Synthetic0000000
Training sample number of majority classReal100100100100100100100
Synthetic00000020
Training sample number of each minority classReal1111111
Synthetic013494999
Imbalance ratio (majority/minority) 100:150:125:120:110:12:11:1
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Tong, Q.; Lu, F.; Feng, Z.; Wan, Q.; An, G.; Cao, J.; Guo, T. A Novel Method for Fault Diagnosis of Bearings with Small and Imbalanced Data Based on Generative Adversarial Networks. Appl. Sci. 2022, 12, 7346. https://doi.org/10.3390/app12147346

AMA Style

Tong Q, Lu F, Feng Z, Wan Q, An G, Cao J, Guo T. A Novel Method for Fault Diagnosis of Bearings with Small and Imbalanced Data Based on Generative Adversarial Networks. Applied Sciences. 2022; 12(14):7346. https://doi.org/10.3390/app12147346

Chicago/Turabian Style

Tong, Qingbin, Feiyu Lu, Ziwei Feng, Qingzhu Wan, Guoping An, Junci Cao, and Tao Guo. 2022. "A Novel Method for Fault Diagnosis of Bearings with Small and Imbalanced Data Based on Generative Adversarial Networks" Applied Sciences 12, no. 14: 7346. https://doi.org/10.3390/app12147346

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop