Next Article in Journal
Laboratory Investigation of Tomography-Controlled Continuous Steel Casting
Previous Article in Journal
A Memetic Algorithm for Solving the Robust Influence Maximization Problem on Complex Networks against Structural Failures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Filter Clustering Fusion for Feature Selection in Rotating Machinery Fault Classification

1
School of Mechanical Engineering, Pusan National University, Busan 46241, Korea
2
Department of Mechanical Engineering, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia
3
Research Institute of Mechanical Technology, Pusan National University, Busan 46241, Korea
4
H&A Research Center, LG Electronics, Changwon 51554, Korea
*
Author to whom correspondence should be addressed.
Sensors 2022, 22(6), 2192; https://doi.org/10.3390/s22062192
Submission received: 16 February 2022 / Revised: 8 March 2022 / Accepted: 9 March 2022 / Published: 11 March 2022
(This article belongs to the Topic Recent Advances in Structural Health Monitoring)

Abstract

:
In the fault classification process, filter methods that sequentially remove unnecessary features have long been studied. However, the existing filter methods do not have guidelines on which, and how many, features are needed. This study developed a multi-filter clustering fusion (MFCF) technique, to effectively and efficiently select features. In the MFCF process, a multi-filter method combining existing filter methods is first applied for feature clustering; then, key features are automatically selected. The union of key features is utilized to find all potentially important features, and an exhaustive search is used to obtain the best combination of selected features to maximize the accuracy of the classification model. In the rotating machinery examples, fault classification models using MFCF were generated to classify normal and abnormal conditions of rotational machinery. The obtained results demonstrated that classification models using MFCF provide good accuracy, efficiency, and robustness in the fault classification of rotational machinery.

1. Introduction

Rotating machinery plays a crucial role in the systems and processes of industrial applications, such as manufacturing systems, transportation, home appliances, and power systems [1,2]. Since rotating machinery generally operates continuously at high speeds and with high power [3], interruption of the related processes could threaten safety and result in massive economic loss [4,5]. Therefore, fault diagnosis of rotating machinery is essential to prevent critical failures that would cause a system to shut down.
The fault diagnosis of rotating machine is performed by detecting outliers that may occur due to faults in the monitored data. Traditional fault detection methods have mainly used thresholds set based on domain knowledge. However, recently, many fault detection methods have detected faults by learning monitored data, using machine learning/deep learning technology. Classifying normal or abnormal conditions is performed with binary classification models, and multi-class classification models are used to detect different combinations of faults. These fault classification models help people make decisions and predict the occurrence of more severe failures in parts or machines in advance.
The most detectable signs of failures in rotating machinery are vibrations and noise from abnormal conditions. However, since noise generated under abnormal conditions is often difficult to distinguish from noise generated in external environments, vibration data are more frequently used to diagnose failures in rotating machinery. In particular, accelerometers are frequently used to measure vibration data, and various fault diagnostic methods have been developed using vibration information [6,7]. Vibration data of rotating machinery measured over time are further amplified by periodic operation of the machinery at certain frequencies. Thus, the characteristics (features) of signals, in both time and frequency domains, can be important measurements to distinguish between normal and abnormal conditions of rotating machinery. Such features are used to create fault classification models that distinguish between normal and abnormal states or different failure modes.
Selection of appropriate features (key to successfully building classification models) has been studied in recent years [8,9]. When extracting features for use as input features for classification models, it is important to use features that are highly relevant to classification, and to eliminate redundant or unnecessary features [10]. This is because learning from the data to generate a model takes a long time, and the complexity of the model increases; but the accuracy can decrease as the number of unnecessary features increases. Therefore, research on effective and accurate feature selection needs to be carried out to improve the efficiency and accuracy of fault classification models [11,12].
Various feature selection methods have been proposed for fault diagnosis in rotating machinery. In particular, since the number and type of features derived from both time and frequency domains can vary, feature selection is very important for obtaining accurate classification models. Therefore, many studies have recently been conducted to optimize the combination of features with the highest classification accuracy in feature selection. An optimal combination of features has been derived using filter methods such as relief, chi-square, and information gain [13], or by using Pareto optimization [14] or a binary particle swarm optimization method [15] after using a filter method. A wrapper-based embedded method, using a support vector machine (SVM) [16], and a method of deriving optimal features using the sensitivity of features [17] or a genetic algorithm [18] have been proposed. Although these methods have shown results in improving classification accuracy, each method has been applied to specific classification problems, which is insufficient to show the universality and robustness of the proposed methods.
Some studies have proposed effective fault diagnosis methods with several components, by extracting several features based on deep learning models using multivariate sampling data [19]. However, because these methods use a backpropagation training process, they is time-consuming and can have unsatisfactory performance when dealing with high-dimensional data. In another study, a universal domain adaptation method was proposed, to enhance the generalization ability of a data-driven model for fault diagnosis [20]. The fault diagnosis results of roller bearings showed that the proposed method yielded the best performance compared with other neural network methods. However, the study assumed that the balanced data were available in the training process. Thus, it might not be applicable to unbalanced data, which often occur in real industry applications.
This study proposes a fault classification model for rotating machinery, which is combined with a notable feature selection method, as follows: (1) Multi-filter clustering fusion (MFCF) was developed to provide an adaptive threshold capable of determining the total number of relevant features through hierarchical clustering. (2) An exhaustive search of the wrapper method was used to find the best feature sets maximizing classification accuracy. (2) The performance of the proposed method was validated in four rotating machinery cases with different operating processes, fault modes, and numbers of datasets. (3) The selected features were used to train and test several classifiers, including the SVM, k-nearest neighbors (KNN), and multilayer perceptron (MLP), to ensure that the final selected features are compatible with all classifiers. Finally, the proposed method was shown to have high accuracy, robustness, efficiency, and generalizability in fault classification for rotating machinery through multi-domain feature extraction and multi-filter fusion.

2. Related Methods

2.1. Feature Selection Methods

Feature selection can be divided into wrapper, hybrid, and embedded methods. Filter methods include methods determining the ranks of features by evaluating close relationships or similarity of features, based on information theory and statistics [17]. They evaluate the relative importance of features, but there is no absolute criterion for selecting them [21], so it is difficult to distinguish between necessary and unnecessary features [22]. Users need to arbitrarily determine the number of features, or select features according to a user-specified percentage [10], making it difficult to clearly conclude that certain filter methods are superior to others [23]. Therefore, while filter methods can efficiently remove unnecessary features based on importance, there is no guideline for selecting important features.
The most commonly used filter methods include chi-square (CS), the extra trees classifier (ETC), and a correlation matrix (CM). The CS method provides a ranking of features based on an independence test of two events using χ 2 values. The ETC uses entropy values to measure the probability of the same class by aggregating the learning ensemble, and a CM measures the similarity between two features, with a final coefficient of the degree of linear correlation, as shown in Equations (1)–(3):
χ 2 = O i E i 2 E i
where O i is the observed feature data, and E i is the expected feature data,
Entropy   E = i = 1 c p i log 2 p i
where c is the number of group labels, and p i is the proportion of feature values associated with group i , and
r = X i X ¯ Y i Y ¯ X i X ¯ 2 Y i Y ¯ 2
where X i and Y i are feature observation data, and X ¯ and Y ¯ are the mean values of the two features, X and Y.
However, each method has different measures to evaluate feature importance, so they can yield different rankings of features. Therefore, for effective and robust feature selection, the method of extracting and combining key features from each feature selection method with different characteristics becomes an important issue, in finally deriving the best feature set.
The wrapper method determines the types and number of features based on the accuracy of classification models. All possible feature combinations are used as input features in the classification model, so the feature combination with the highest classification accuracy is chosen as the final feature set [4]. The wrapper method, unlike the filter method, provides an optimal combination of features, but requires a long computational time, because it generates a classification model for every combination of features [24,25]. Among the wrapper methods, an exhaustive search enables accurate and robust feature selection by simultaneously evaluating all combinations of features, instead of gradually adding or removing features. Exhaustive search is inefficient, compared to the other methods if all existing features are used without removing unimportant features. However, if the filter method is applied first, and the number of key features derived from the filter method is small, then an exhaustive search can find optimum features more effectively and efficiently. The hybrid method is a combination of filter and wrapper methods, to improve the shortcomings in each one. For example, after unnecessary features are removed using the filter method, the wrapper method is applied to only find the best feature set from the reduced features, resulting in a significant reduction in computational time [26,27]. While hybrid methods reduce the computational time needed with wrapper methods, they still need to select the appropriate number of features in the filter process.

2.2. Classifiers

Various classifiers can be applied to build classification models for the diagnostic needs of rotating machinery [28,29,30]. MLP with a neural network structure, the SVM with a decision boundary, and the distance-based KNN model are widely used classifiers. Since classifiers may have very different performances, depending on their characteristics, this study attempts to verify performance through a combination of the proposed feature selection methods, with the above three representative classifiers.
The SVM solves linear and nonlinear classification problems by finding hyperplanes that maximize the distance between groups, by learning from training data and determining the kernel type, such as linear, polynomial, or radial basal plane [31,32]. The SVM classifier is formulated as follows:
f x = w T x + b
where w is the vector of the weight, b is the bias for optimizing the hyperplane, and x is the mapping function of the kernel. The vector of weight w can be known by minimizing it:
Minimize   w T w + C i ξ i
where C is the penalty hyperparameter, and ξ i is a slack variable for i = 1, 2, …, N, with N as the number of data samples.
KNN is a type of supervised learning that can be used as a task in classification and regression. It performs classification by measuring similarity (e.g., distance functions) between data points [32]. Euclidean distance is often used as the distance metric, as follows:
Dist X , Y = i = 1 n x i y i 2
where x i and y i are the coordinate values of the sample for X and Y as two data points, and n is the dimension of the data points. KNN attempts to find the distance between the query and all sample data. After that, it specifies the number of samples (k) closest to the query, and then, the most frequent label is selected.
MLP is an algorithm in machine learning that works with feed-forward neural networks. It has a structure consisting of an input layer, multiple hidden layers, and an output layer. MLP is famous for being able to solve complex problems, because of its outstanding performance in building classifications [33,34]. In simple terms, the MLP output function is expressed as
y = g W T x + b
where x is the input variable in vector form, y is the output; g · is the activation function of the nodes, W is the weight matrix linked to the input layer and hidden layer, and b is the bias vector of hidden layer nodes. Each component of the input layer, multiple hidden layers, and output layers can be assigned according to the level of complexity in the problem.

3. Proposed Method

Each filter method described in Section 2.1 can select different features depending on the type of features and the characteristics of the data, so it is important to systematically and effectively select the most important features that affect classification performance. The proposed MFCF feature selection focuses on how to cut off unnecessary features adaptively from the candidate feature sets and find the best feature combination in an efficient and systematic way. For this, the raw data are first used to extract features from time and frequency domains through fast Fourier transform (FFT), as shown in Figure 1.
MFCF is used to extract candidate feature sets using multiple filter methods and feature clustering, and an exhaustive search is used to select the optimal feature set that maximizes classification accuracy. The selected features are used to generate fault classification models (such as SVM, KNN, and MLP) where hyperparameters of the three models are optimized using a grid search. The performance of the proposed method is evaluated in terms of accuracy, efficiency, stability, and robustness. Accuracy and efficiency are evaluated using measures such as the percentage of the correct predictions and computational time, respectively. Stability is estimated from changes in both accuracy and efficiency when the method is applied to training and testing the data. Robustness is measured through variation of accuracy values, through cross validation.

3.1. Fusion Multi-Filter Feature Selection

Before MFCF is applied, all features from the time and frequency domains first need to be defined. In order to determine the statistical characteristics of the measured data in the time and frequency domains, 12 features were extracted from each domain, including absolute mean (abs_mean), peak-to-peak (ptp), kurtosis (kur), skewness (skew), root mean square (rms), etc.; 25%, 50%, and 75% are the 25th, 50th, and 75th percentile values, respectively. The 24 features were evaluated with CS, ETC, and CM methods. The numbering for the 72 features is in Table 1.
Referring to Table 1, feature numbering can be expressed as F C o m b = F 0 F 1 , F N for k = 0, 1, …, 71, where k is the list of feature numbers, and N is the total number. Then the term is redefined as follows: F Comb = F CS F ETC F CM where F CS = F f CS F t CS , F ETC = F f _ ETC F t _ ETC , and F CM = F f _ CM F t _ CM .
Clustering of the 72 features should be performed to classify them into important features and unimportant features, to be used for classification based on the feature importance measures from each filter method. For this, all feature values are normalized, and the distances between two feature values are calculated using Euclidean distance for all features, as shown in Equation (8):
d i j = d F i , F j = k = 0 N F i , k F j , k 2
where d i j is the distance between feature i and feature j , and N denotes the amount of data, including all feature values. Using hierarchical clustering, the distances between features are repeatedly calculated, and features with small or large distances are combined into one of two clusters: selected features or removed features. Using the Euclidean distance between two features in Equation (8), a pairwise distance matrix to find cluster A with selected features and cluster B with removed features can be defined as follows:
d A B = 0 d 01 d 0 N d 10 0 0 d N 0 0
where d A B is a proximity matrix for measuring the distances between features. Features with short distances are clustered based on Min   d F i ,   F j   , and then, the proximity matrix is expressed as d A B = { d A } { d B } . In other words, features with high proximity values are clustered into one group, whereas features with low proximity values are clustered into the other. This agglomerative clustering is repeated by building a new matrix, until the last matrix consists of only two large clusters, separating one important feature group and one unimportant feature group.
d A B = { d A } { d B } represents the final proximity result capable of building cluster A containing the feature set d A , with high-ranking values from evaluating important features obtained from each single-filter method. On the other hand, cluster B, containing the unimportant feature set ( d B ) , which is far from cluster A, is discarded.
This feature selection is unsupervised learning, in which the algorithm automatically searches for important features by using Ward’s method, through error sum of squares (ESS) and calculating the loss associated with each cluster. The ESS is computed, to measure the distance between two clusters of important and unimportant features of multi-filter scoring, which is called the linkage function. Ward’s linkage function is known to be the most suitable method to quantify a good group based on the variance of the clusters. The target of the linkage search is to minimize the increment of the ESS at each step, to find the minimum information loss. This algorithm works by fusing two clusters as the mean vector, and it then calculates the ESS from each cluster, namely the selected feature cluster and the discarded feature cluster. The following equations define the ESS in Equation (10) and the linkage between clusters A and B, D(A,B), in Equation (11):
ESS d A = t = 1 T a V a i 1 T a   j = 1 T a V a j 2
D A , B = ESS d A B ESS d A + ESS d B
where V a is the value of each feature, and T a is the number of data points in cluster A. With the same formula as Equation (10), ESS d A B and ESS d B are calculated by changing the names of the variables, such as V a to V b as a feature value in cluster B, and V a b as a feature value in the combined cluster resulting from cluster fusion. Then, cluster A consists of several features that may be the same, due to feature selection by clustering based on all members. To optimize the combination of all potential features, after constructing a union of features selected with each filter method, redundant features are removed, and features are sorted according to importance:
cluster _ A = F C S _ n e w F E T C _ n e w F C M _ n e w
F C S _ n e w = F C S p | p = 1 ,   2 , 3   , ,   P ;   F C S p F C S
F E T C _ n e w = F E T C q | q = 1 ,   2 , 3   , ,   Q ;   F E T C q F E T C
F C M _ n e w = F C M r | r = 1 ,   2 , 3   , ,   R ;   F C M r F C M
where F C S _ n e w   , F E T C _ n e w , and F C M _ n e w are feature sets consisting of P, Q, and R features selected using CS, ETC, and CM filter methods, respectively. Multi-filter clustering fusion can be defined as follows:
F f u s i o n = F C S _ n e w F E T C _ n e w   F C M _ n e w

3.2. Exhaustive Search Application

The next step in the feature selection process is to derive a fusion feature set, F f u s i o n , that combines the selected features, by considering the accuracy of the classification model. The algorithm used to find the best combination from among all combinations of features is an exhaustive search used in the wrapper method. In this algorithm, the fusion feature has at least two to four features. The minimum number of features is determined so that the classification model can have at least two dimensions. Up to four features are used (considering the computational time), but a larger number of features can be used. The set of all features is defined as Equation (17), and the number of feature combinations is calculated using Equation (18):
  Y c = y l | l = 1 ,   2 , 3   , ,   C ;   y l F fusion
C = s = 2 S s m ! m s ! s !
where Y c is a set of all feature combinations, C is the number of feature set combinations, m is the length of F fusion , and s is the number of subset features that are combined as a target feature set, with the maximum feature combination being S s = 4 .
The process of determining the combination of these feature sets is evaluated with various classifiers, such as an SVM, KNN, and MLP. Normal and abnormal data are labeled as binary levels 0 and 1, respectively, and the accuracy of the classifiers is calculated as follows:
Accuracy y , y ^ = 1 S samples   l = 0 S samples 1 y ^ l = y l
where y is the measured label values, y ^ is the predicted label values, S samples is the amount of data, and 1(·) denotes an indication factor. The accuracy of the classification models using all feature combinations is tested and then sorted into a set of combinations with the highest accuracy. Equation (20) is used to obtain the best combination of features based on the highest accuracy value:
Y optimum = max accuracy Y c
This study uses hyperparameter tuning to improve model accuracy, and uses 10-fold cross-validation to verify robustness. The grid search is the most representative tuning technique for computing the optimum hyperparameter value. Since it does not require much time for a small search space, and only combines a set of hyperparameters, it is simple and easy to apply. Hyperparameter optimization stops when the objective function of the hyperparameters, such as accuracy, reaches its highest value. Then, 10-fold cross-validation is performed on the generated models (based on the optimized parameters from using a grid search) and is repeated 10 times, while changing the test dataset. The accuracy of the classification model is evaluated by calculating the average accuracy for each test dataset.

4. Case Studies

From four examples of failures or faults that occur in different types of rotating machinery, data were collected from experiments. Cases 1 and 2 contain experimental bearing data from NASA repositories collected from the Intelligent Maintenance System Center (IMS) [35]. Cases 3 and 4 are vibration data collected from air conditioning compressors. The experiment settings in Examples 1 through 4 are shown in Figure 2, Figure 3, Figure 4 and Figure 5, and details are described in Section 4.1 and Section 4.2.

4.1. Data Collection

Cases 1 and 2 are rotating machinery problems that occurred with Rexnord ZA-2115 double row bearings installed on a shaft, as illustrated in Figure 2. The rotation speed of the shaft remained constant at 2000 rpm under a radial load of 6000 lbs. The bearings operated while being lubricated (so it is considered non-dry), and failure occurred after more than 100 million revolutions.
The bearing vibrations were measured using an accelerometer, recording 20,480 points at a sampling rate of 20 kHz. The data for cases 1 and 2 were recorded at 10-min intervals, and were measured 2155 times over five weeks. Cases 1 and 2 had different causes of bearing failure; case 1 had defects in the inner race, and case 2 had defects in the roller. Figure 2 shows an experiment schematic for cases 1 and 2.
Figure 3 and Figure 4 are vibration data for cases 1 and 2, measured over a five-week period. The data points along the X-axis are the time index at 10-min intervals. Figure 3 shows that the failure in case 1 occurred around the end of operations, at the 1789th measurement of the 2155 measurements. Therefore, data from 1789 measurements can be classified as normal, and 366 can be classified as abnormal. On the other hand, Figure 4 shows that the failure in case 2 occurred earlier than in case 1, during the 1434th measurement. Therefore, in case 2, there were 1434 normal measurements, and 471 abnormal measurements. The normal and abnormal data were divided based on the history of the vibrations, as shown in Figure 3 and Figure 4, and based on the threshold for kurtosis (a feature mainly used in fault classification). In addition, data segmentation was verified through an operation indicating the time when bearing abnormalities occurred [36,37].
In cases 3 and 4, vibration data collected from an air conditioner compressor were used to apply the proposed method to fault data with various characteristics. The faults in these two cases were caused by two different failure modes (a mechanical defect, and lack of refrigerant inside the compressor). The machine used in cases 3 and 4 was a twin rotary compressor with low vibration and a 180° phase difference when rotating the shaft. Figure 5 shows a schematic for cases 3 and 4.
The experiment was conducted in two different rooms, an outdoor unit and an indoor unit, to simulate actual conditions for using air conditioners. In cases 3 and 4, an accelerometer measured the vibrations in the compressors shown in Figure 5. Details of the experiment variables in cases 3 and 4 are shown in Table 2 and Table 3, respectively.
In case 3, both normal and abnormal compressors operated, consisting of six electric expansion valve (EEV) variables × three fan-speed variables × four frequency variables × two conditions = 144 measurements. To improve the accuracy of the classification model, the data were partitioned into 50 intervals of the compressor cycle, increasing to 7200 measurements in total [39]. Case 4 collected normal and abnormal data at seven frequencies, where each variable was repeated three times. A state in which the refrigerant is charged at the 100% level is considered normal, and a state in which the refrigerant level is 50–90% is considered abnormal. This indicates that the refrigerant charge gradually declined, due to continuous operation of the air conditioner. Therefore, there are 39 normal measurements and 216 abnormal measurements.

4.2. Feature Extraction and Selection

The data in this study constitute the time domain and the frequency domain. The data measured in the time domain are acceleration. The data in the frequency domain were obtained by transforming time domain data using fast Fourier transform (FFT), as follows:
f k = r = 0 L x r e 2 π j k r L    
where L is the length of the data sequence of x r as input time domain, and k = 0 ,   1 ,   L . As described in Table 1, 12 features were extracted from each domain, for a total of 24 features. Thus, two input feature matrices were formed into time-domain and frequency-domain combinations: F t = F 1 t F 12 t , F f = F 1 f F 12 f , and F = F t F f .
After normalizing using a min–max scaler method with the data in both domains, box plots of the data for each feature can be obtained for the four cases, as shown in Figure 6, where the intersection refers to the overlapped normal and abnormal data distributions. The intersection of normal and abnormal data distributions is used to show whether each feature sufficiently distinguishes between normal and abnormal data. The smaller the intersection area, the more easily the corresponding feature classifies normal and abnormal data; the larger the intersection area, the more difficult the classification is. Thus, the intersection area can be used as an initial estimate of whether the feature is easy or difficult to classify into normal and abnormal conditions.
As seen in Figure 6, the intersection areas vary greatly, depending on the feature types and cases. Some features have a small intersection area, which means they can clearly distinguish between normal and abnormal data, while others are not useful for fault classification. Case 1 has the smallest intersection area, indicating that failure classification is easiest. However, a large number of key features can be selected, so the number of features needs to be reduced to improve fault classification accuracy and decrease the computational time. On the other hand, the distributions of normal and abnormal data for most features are not clearly distinguished in case 3, and their intersection areas are close to 1 in both time and frequency domains, making it very difficult to derive important features. Case 4 shows that some features in the time domain are valid, but most features in the frequency domain are invalid. The results from case 4 confirm that using multiple domains rather than a single domain helps improve the accuracy of fault classification. In summary, each case had a different number and type of features extracted, due to different causes of failure and the different data characteristics. Therefore, it is necessary to correctly select the type and number of features suitable in each case.
Figure 7 shows a dendrogram of the results from hierarchical clustering obtained for all features by applying MFCF in cases 1 through 4. The dendrogram represents the hierarchical relationship between the clusters, where the X-axis represents feature numbering (see Table 1) listed by importance, and the Y-axis represents the proximity of the Euclidean distance between two features. The features can mainly be clustered into two groups (orange lines and green lines). The orange lines include the main features with high importance and proximity, and the green lines include features that need to be deleted based on the three filtering methods. Cases 1 through 4 have data with different characteristics, so the types and numbers of selected features are different in all cases.
In order to remove the same features at the clustering stage, the total number of features selected before sorting via fusion of the three filter methods are 19, 14, 9, and 10 for Cases 1, 2, 3, and 4, respectively. As expected from the results in Figure 6, case 1 contains the largest number of features classified as main features. On the other hand, cases 3 and 4 have a smaller number of main features for classification, so the number of selected features is less than in cases 1 and 2. Hierarchical clustering allows users to easily derive valid features, by dividing all features into necessary and unnecessary sets. However, the number of clustered features is still large, so it needs to be further reduced. Details of the feature reduction process at each stage (multi-filter clustering, fusion, and the proposed method) are shown in Table 4.
Table 4 shows the features selected in each case with different subsets at each MFCF stage, where the three numbers in the last column of the final set indicate the number of features used in SVM, KNN, and MLP, respectively. Since data in each case are measured from different rotatory machines with different failure modes, each case has different numbers and types of important features extracted from the different domains. For example, in case 1, mean_T was used as the input feature in the three classifiers. The acceleration time series data have a sine or cosine curve with almost the same amplitude, so mean_T tends to have a constant value. However, since the abnormal data differ from the average values of acceleration of normal data, mean_T may be an important feature for fault classification. In case 2, skew_T was used as an input feature, because skewness measures the asymmetry of the probability density function of the vibration signals. In case 3, kur_T, ptp_F, and min_T were used as common input features for the three classifiers, where kur_T and ptp_F indicate the degree of flatness of the probability density function near the center and the peak value of the signals, respectively. They are often used to measure the strength of signals, due to failure of rotating machinery. Furthermore, min_T shows that the normal compressor condition had a low minimum value for acceleration response, compared to the minimum value under abnormal compressor conditions. In case 3, the intersection areas for many of the features are high, i.e., there are few important features except those three features shown in Figure 6, and they were used as input features in the fault classification models. Accordingly, the proposed method can be used more effectively in a problem that is difficult to classify. The most frequently selected features in case 4 are mean_F and abs_mean_F. The amplitude of the vibration signal of rotatory machinery is particularly useful for distinguishing between a normal state and an abnormal state in the frequency domain. Therefore, mean_F and ABS_mean_F functions were selected as common main features. After the training process, to build the classifiers for the finally selected features in all cases through the exhaustive search, two to four feature combinations were derived.
Using the finally selected features, fault classification models were generated using SVM, KNN, and MLP. Table 5 shows the accuracy and calculation times of the three single-filter methods and the proposed MFCF. The proposed method was compared with the three single-filter methods using the top three features, because it can present a good comparison by selecting the most useful features from the top three, and it can control the computational time. The accuracy of the proposed method was 1.0 (100%) for all classifiers in cases 1 and 4, and cases 2 and 3 had an average accuracy of 0.99 for all classifiers. In terms of efficiency, the proposed method consumed the least computational time compared to the others, because the exhaustive search was only performed for the selected features through MFCF, and the randomness in feature selection was low. Conversely, CS returned the lowest accuracy and required the longest running time, even though it is not much different from the other feature selection methods. In particular, the CS method yielded the lowest accuracy in case 4 (with the smallest number of samples), because it depends on sample size, and its time consumption was drastically different from the other methods, owing to the selection of classifiers. KNN was the most computationally expensive classifier. This was due to the complexity of the algorithm that stores the training data, as well as the number of iterations needed to calculate the distance between feature values. Thus, the proposed method was the most efficient, and yet had it the highest classification accuracy.
These high-accuracy and low-computational times are highly advantageous for machine learning, especially when diagnosing failures in rotating machinery with many classification difficulties. The performance of the proposed method was validated by testing several cases with different characteristics, such as the number of datasets, the types of failures, the types of experimental objects, and the variables in the data collection, as described in the previous section.
To validate the classification model, 10-fold cross-validation was carried out to determine the general applicability of the proposed method. Figure 8 shows box plots for the accuracy results from CS, ETC, CM, and the proposed MFCF. Comparing each method, CS generally had low accuracy, high variability, and varying results, depending on the classifier type. CM tended to be similar to the results from CS, indicating that accuracy varies according to the classifier. ETC often had a higher accuracy than the other filter methods, but still had a lower accuracy and higher variability than the proposed method. On the other hand, the proposed method had little variability in the results, although accuracy was close to 1.0 in cases 1 to 3, where classification is easy regardless of the classifier type. However, in case 4, we can see that the lack of data resulted in lower classification accuracy and higher variability than in the other cases, but it still showed the best accuracy in comparison with the other methods.

5. Conclusions

This study developed a hierarchical clustering method using multiple filters (called MFCF), to extract key features from time and frequency domains, and to maximize classification accuracy by optimizing the number and type of features using an exhaustive-search-based wrapper method. MFCF enables robust, accurate, and efficient fault classification, regardless of the type of failure classification model, especially in the fault classification of rotatory machinery, including complex failure modes and different data characteristics. To validate the proposed method, vibration data from rotating machinery with four different failure modes were used, and cross-validation results confirmed that it had the best classification performance, compared to the other filter methods. Although the proposed method in this study was used for the problem of classifying normal measurements and those with abnormalities, it will be applied in the future to problems including multi-classification and multi-domain features, to verify its general applicability to broad engineering applications. In addition, this study obtained vibration signals using only accelerometer sensors, but the proposed method will be applied to extract features of data collected using various sensors, such as chemical and temperature sensors in the future.

Author Contributions

Conceptualization, S.M. and Y.N.; data acquisition, S.M. and S.P.; methodology, S.M.; coding, S.M.; validation, S.M.; formal analysis, S.M. and Y.-J.K.; investigation, S.M., resources, J.L. and S.C.; project administration, Y.N.; funding acquisition, Y.N.; writing-original draft preparation, S.M. and Y.N.; writing-review and editing, Y.N. and Y.-J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Education (No. 2018R1D1A1A02086093) and the Korea government (MSIT) (2020R1A5A8018822 and 2021R1A2C1013557), and LG electronics.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cheng, J.; Yang, Y.; Hu, N.; Cheng, Z.; Cheng, J. A noise reduction method based on adaptive weighted symplectic geometry decomposition and its application in early gear fault diagnosis. Mech. Syst. Signal Process. 2021, 149, 107351. [Google Scholar] [CrossRef]
  2. Li, X.; Yang, Y.; Hu, Z.; Cheng, J.; Cheng, J. Discriminative manifold random vector functional link neural network for rolling bearing fault diagnosis. Knowl.-Based Syst. 2021, 211, 106507. [Google Scholar] [CrossRef]
  3. Fuli, Y.L.; Jia, W.M.; Qi, Y. Centrifugal compressor fault diagnosis based on qualitative simulation and thermal parameters. Mech. Syst. Signal Process. 2016, 8, 259–273. [Google Scholar] [CrossRef]
  4. Li, X.; Yang, Y.; Pan, H.; Cheng, J.; Cheng, J. A novel deep stacking least squares support vector machine for rolling bearing fault diagnosis. Comput. Ind. 2019, 110, 36–47. [Google Scholar] [CrossRef]
  5. Yang, B.; Lei, Y.; Jia, F.; Xing, S. An intelligent fault diagnosis approach based on transfer learning from laboratory bearings to locomotive bearings. Mech. Syst. Signal Process. 2019, 122, 692–706. [Google Scholar] [CrossRef]
  6. Li, X.; Yang, Y.; Shao, H.; Zhong, X.; Cheng, J.; Cheng, J. Symplectic weighted sparse support matrix machine for gear fault diagnosis. Measurement 2021, 168, 108392. [Google Scholar] [CrossRef]
  7. Gao, Y.; Yu, D. Total variation on horizontal visibility graph and its application to rolling bearing fault diagnosis. Mech. Mach. Theory 2020, 147, 103768. [Google Scholar] [CrossRef]
  8. Youwei, W.; Lizhou, F. Hybrid feature selection using component co-occurrence based feature relevance measurement. Expert Syst. Appl. 2018, 102, 83–99. [Google Scholar]
  9. Uysal, A.K.; Gunal, S. A novel probabilistic feature selection method for text classification. Knowl.-Based Syst. 2021, 36, 226–235. [Google Scholar] [CrossRef]
  10. Dimitrios, E.; Avi, A. An evaluation of feature selection methods for environmental data. Ecol. Inform. J. 2021, 61, 101224. [Google Scholar]
  11. Abdelhamid, N.; Thabtah, F.; Abdel-Jaber, H. Phishing detection: A recent intelligent machine learning comparison based on models content and features. In Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics: Security and Big Data (ISI), Beijing, China, 22–24 July 2017; pp. 72–77. [Google Scholar]
  12. Kamalov, F.; Thabtah, F. A feature selection method based on ranked vector scores of features for classification. Ann. Data Sci. 2017, 4, 483–502. [Google Scholar] [CrossRef]
  13. Sánchez, R.; Lucero, P.; Vasquez, R.; Cerrada, M.; Macancela, J.; Cebrera, D. Feature ranking for multi-fault diagnosis of rotating machinery by using random forest and KNN. J. Intell. Fuzzy Syst. 2018, 34, 3463–3473. [Google Scholar] [CrossRef]
  14. Ziani, R.; Mahgoun, H.; Fedala, S.; Felkaoui, A. Feature selection scheme based on Pareto method for gearbox fault diagnosis. Signal Process. Appl. Rotating Mach. Diagn. 2017, 12, 1–15. [Google Scholar]
  15. Zhang, X.; Zhang, Q.; Chen, M.; Sun, Y.; Qin, X.; Li, H. A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method. Neurocomputing 2018, 275, 2426–2439. [Google Scholar] [CrossRef]
  16. Hui, K.; Ooi, C.; Lim, M.; Leong, M.; Al-Obaidi, S. An improved wrapper-based feature selection method for machinery fault diagnosis. PLoS ONE 2017, 12, e0189143. [Google Scholar] [CrossRef] [Green Version]
  17. Lu, N.; Zhang, G.; Xiao, Z.; Malik, O.P. Feature extraction based on adaptive multiwavelets and LTSA for rotating machinery fault diagnosis. Shock. Vib. 2019, 2019, 1201084. [Google Scholar] [CrossRef] [Green Version]
  18. Cerrada, M.; Sánchez, M.R.V.; Cabrera, D.; Zurita, G.; Li, C. Multi-stage feature selection by using genetic algorithms for fault diagnosis in gearboxes based on vibration signal. Sensors 2015, 15, 23903–23926. [Google Scholar] [CrossRef] [Green Version]
  19. Huang, K.; Wu, S.; Li, F.; Yang, C.; Gui, W. Fault diagnosis of hydraulic systems based on deep learning model with multirate data samples. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–13. [Google Scholar] [CrossRef]
  20. Zhang, W.; Li, X.; Ma, H.; Luo, Z.; Li, X. Universal domain adaptation in fault diagnostics with hybrid weighted deep adversarial learning. IEEE Trans. Ind. Inform. 2021, 17, 7957–7967. [Google Scholar] [CrossRef]
  21. Kang, L.C.; Choon, L.T.; Kok, S.W.; Kelvin, S.C.Y.; Wei, K.T. A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 2019, 484, 153–166. [Google Scholar]
  22. Amir, E.; Adel, K.; David, Y. Data-driven fault detection and diagnosis for packaged rooftop units using statistical machine learning classification methods. Energy Build. 2020, 225, 110318. [Google Scholar]
  23. Andrea, B.; Xudong, S.; Bernd, B.; Jörg, R.; Michel, L. Benchmark for filter methods for feature selection in high-dimensional classification data. Comput. Stat. Data Anal. 2020, 143, 106839. [Google Scholar]
  24. Liu, C.; Jiang, D.; Yang, W. Global geometric similarity scheme for feature selection in fault diagnosis. Expert Syst. Appl. 2014, 41, 3585–3595. [Google Scholar] [CrossRef]
  25. Cortizo, J.C.; Giraldez, I. Multi criteria wrapper improvements to naive bayes learning. In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning, Burgos, Spain, 20–23 September 2006; pp. 419–427. [Google Scholar]
  26. Wang, Y.; Feng, L.; Li, Y. Two-step based feature selection method for filtering redundant information. J. Intell. Fuzzy Syst. 2017, 33, 2059–2073. [Google Scholar] [CrossRef]
  27. Zorarpacı, E.O.; Selma, A. A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst. Appl. 2016, 62, 91–103. [Google Scholar] [CrossRef]
  28. Samanta, B. Gear fault detection using artificial neural network and support vector machines with genetic algorithms. Mech. Syst. Signal Process. 2004, 12, 625–644. [Google Scholar] [CrossRef]
  29. Rafiee, J.; Arvani, F.; Harifi, A.; Sadeghi, M.H. Intelligent condition monitoring of a gearbox using artificial neural network. Mech. Syst. Signal Process. 2007, 21, 1746–1754. [Google Scholar] [CrossRef]
  30. Wu, J.D.; Hsu, C.C. Fault gear identification using vibration signal with discrete wavelet transform technique and fuzzy-logic inference. Expert Syst. Appl. 2009, 36, 3785–3794. [Google Scholar] [CrossRef]
  31. Dong, X.; Wang, C.; Si, W. ECG beat classification via deterministic learning. Neurocomputing 2017, 240, 112. [Google Scholar] [CrossRef]
  32. Jha, C.K.; Kolekar, M.H. Cardiac arrhythmia classification using tunable Q-wavelet transform based features and support vector machine classier. Biomed. Signal Process. Control 2020, 59, 101875. [Google Scholar] [CrossRef]
  33. Sun, K.; Wu, X.; Xue, J.; Ma, F. Development of a new multi-layer perceptron based soft sensor for SO2 emissions in power plant. J. Process Control 2019, 84, 182191. [Google Scholar] [CrossRef]
  34. Wang, Z.Y.; Lu, C.; Zhou, B. Fault diagnosis for rotary machinery with selective ensemble neural networks. Mech. Syst. Signal Process. 2018, 113, 112–130. [Google Scholar] [CrossRef]
  35. IMS Bearings Dataset. 2021. Available online: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/ (accessed on 1 January 2021).
  36. Ali, J.B.; Saidi, L.; Mouelihi, A.; Chebel-Morello, B.; Fnaiech, F. Linear feature selection and classification using PNN and SFAM neural networks for a nearly online diagnosis of bearing naturally progressing degradations. Eng. App. Artif. Intell. 2015, 42, 67–81. [Google Scholar]
  37. Duong, B.P.; Khan, S.A.; Shon, D.; Im, K.; Park, J.; Lim, D.S.; Jang, B.; Kim, J.M. A reliable health indicator for fault prognosis of bearings. Sensors 2018, 18, 3740. [Google Scholar] [CrossRef] [Green Version]
  38. Mochammad, S.; Kang, Y.-J.; Noh, Y.; Park, S.; Ahn, B. Stable hybrid feature selection method for compressor fault diagnosis. IEEE Access 2021, 9, 97415–97429. [Google Scholar] [CrossRef]
  39. Kim, S.; Noh, Y.; Kang, Y.-J.; Park, S.; Ahn, B. Fault classification model based on time domain feature extraction of vibration data. J. Comp. Struct. Eng. Inst. Korea 2021, 34, 25–33. [Google Scholar] [CrossRef]
Figure 1. Flow chart of the proposed method.
Figure 1. Flow chart of the proposed method.
Sensors 22 02192 g001
Figure 2. Schematic of experimental setup in cases 1 and 2.
Figure 2. Schematic of experimental setup in cases 1 and 2.
Sensors 22 02192 g002
Figure 3. Raw data for case 1.
Figure 3. Raw data for case 1.
Sensors 22 02192 g003
Figure 4. Raw data for case 2.
Figure 4. Raw data for case 2.
Sensors 22 02192 g004
Figure 5. Schematic of the experiment setup in case 3 and case 4 [38].
Figure 5. Schematic of the experiment setup in case 3 and case 4 [38].
Sensors 22 02192 g005
Figure 6. Distribution and intersection areas of normalized feature values. (a) Time domain features for case 1; (b) Time domain features for case 2; (c) Time domain features for case 3; (d) Time domain features for case 4; (e) Frequency domain features for case 1; (f) Frequency domain features for case 2; (g) Frequency domain features for case 3; (h) Frequency domain features for case 4.
Figure 6. Distribution and intersection areas of normalized feature values. (a) Time domain features for case 1; (b) Time domain features for case 2; (c) Time domain features for case 3; (d) Time domain features for case 4; (e) Frequency domain features for case 1; (f) Frequency domain features for case 2; (g) Frequency domain features for case 3; (h) Frequency domain features for case 4.
Sensors 22 02192 g006
Figure 7. Multi-filter clustering. (a) case 1; (b); case 2; (c) case 3; (d) case 4.
Figure 7. Multi-filter clustering. (a) case 1; (b); case 2; (c) case 3; (d) case 4.
Sensors 22 02192 g007aSensors 22 02192 g007bSensors 22 02192 g007c
Figure 8. Ten-fold cross validation. (a) case 1; (b); case 2; (c) case 3; (d) case 4.
Figure 8. Ten-fold cross validation. (a) case 1; (b); case 2; (c) case 3; (d) case 4.
Sensors 22 02192 g008
Table 1. Feature number.
Table 1. Feature number.
Frequency Domain Features F f _ C S F f _ E T C F f _ C M Time Domain Features F t _ C S F t _ E T C F t _ C M
abs_mean_F02448abs_mean_T123660
peak_m_F12549peak_m_T133761
kur_F22650kur_T143862
skew_F32751skew_T153963
rms_F42852rms_T164064
mean_F52953mean_T174165
std_F63054std_T184266
min_F73155min_T194367
25%_F8325625%_T204468
50%_F9335750%_T214569
75%_F10345875%_T224670
max_F113559max_T234771
Table 2. Experiment variables in case 3.
Table 2. Experiment variables in case 3.
ConditionsEEVFan Speed (rev/min)Frequency (Hz)
Cooling60, 120, 180, 240, 300, 360350, 500, 70020, 30, 40, 50
Heating60, 120, 180, 240, 300, 360350, 500, 70020, 30, 40, 50
Table 3. Experiment variables in case 4.
Table 3. Experiment variables in case 4.
ConditionsRefrigerant (%)Frequency (Hz)
Normal10030~90
Abnormal50~9030~90
Table 4. Selected features at each stage.
Table 4. Selected features at each stage.
CasesStagesFeaturesNo. of Features
Case 1Multi-filter clustering{25%_T} ∪ {max_F, mean_T} ∪ {ptp_F, 75%_T, abs_mean_F, rms_F, mean_F, std_F, 25%_F, 50%_F, 75%_F, max_F, abs_mean_T, rms_T, std_T, 25%_T, 75%_T, skew_F}19
Fusion25%_T, max_F, mean_T, ptp_F, 75%_T, abs_mean_F, rms_F, mean_F, std_F, 25%_F, 50%_F, 75%_F, abs_mean_T, rms_T, std_T, 75%_T, skew_F17
Final setSVM: rms_T, 75%_T, KNN: mean_T, 75%_T, mean_F,
MLP: mean_T, 75%_T, ptp_F
2,3,3
Case 2Multi-filter clustering{kur_T} ∪ {kur_F, skew_F, mean_T} ∪ {kur_F, skew_F, mean_T, ptp_F, std_F, max_F, kur_T, skew_T, min_T, 50%_T}14
Fusionkur_T, kur_F, skew_F, mean_T, ptp_F, std_F, max_F, skew_T, min_T, 50%_T10
Final setSVM: skew_F, std_F, kur_T, skew_T, KNN: kur_T, skew_F, std_F, skew_T, MLP: kur_F, max_F, skew_T, mean_T4,4,4
Case 3Multi-filter clustering{ptp_T, kur_T} ∪ {kur_T, min_T} ∪ {ptp_F, ptp_T, kur_T, min_T, max_T}9
Fusionptp_T, kur_T, min_T, ptp_F, max_T5
Final setSVM: ptp_T, kur_T, ptp_F, min_T, KNN: ptp_T, kur_T, ptp_F, min_T
MLP: kur_T, ptp_F, min_T, max_T
4,4,4
Case 4Multi-filter clustering{75%_F} ∪ {75%_F, 50%_F, rms_F, abs_mean_F, std_F} ∪ {75%_F, rms_F, mean_F, 50%_T}10
Fusion75%_F, 50%_F, rms_F abs_mean_F, mean_F, std_F, 50%_T7
Final setSVM: abs_mean_F, rms_F, mean_F, KNN: abs_mean_F, mean_F, std_F,
MLP: 75%_F, abs_mean_F, mean_F, std_F
3,3,4
Table 5. Accuracy and execution times with the test data.
Table 5. Accuracy and execution times with the test data.
MethodsCase 1Case 2Case 3Case 4
SVMKNNMLPSVMKNNMLPSVMKNNMLPSVMKNNMLPAvg.
AccuracyCS0.930.990.930.910.950.920.970.990.960.760.940.950.93
ETC0.980.990.960.880.950.930.980.990.970.860.990.960.95
CM0.930.980.930.930.970.980.980.990.980.940.980.960.96
MFCF1.01.01.00.991.00.990.990.990.991.01.01.00.99
Efficiency
(sec.)
CS3.52114.954.26.75104.247.6290.2659.580.940.430.311.4100.3
ETC3.01115.423.95.23103.652.784.1678.970.890.430.813.498.5
CM3.5118.643.34.99100.936.686.5680.869.270.430.59.198.7
MFCF2.4113.213.84.02100.139.683.0655.381.630.430.010.094.4
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Mochammad, S.; Noh, Y.; Kang, Y.-J.; Park, S.; Lee, J.; Chin, S. Multi-Filter Clustering Fusion for Feature Selection in Rotating Machinery Fault Classification. Sensors 2022, 22, 2192. https://doi.org/10.3390/s22062192

AMA Style

Mochammad S, Noh Y, Kang Y-J, Park S, Lee J, Chin S. Multi-Filter Clustering Fusion for Feature Selection in Rotating Machinery Fault Classification. Sensors. 2022; 22(6):2192. https://doi.org/10.3390/s22062192

Chicago/Turabian Style

Mochammad, Solichin, Yoojeong Noh, Young-Jin Kang, Sunhwa Park, Jangwoo Lee, and Simon Chin. 2022. "Multi-Filter Clustering Fusion for Feature Selection in Rotating Machinery Fault Classification" Sensors 22, no. 6: 2192. https://doi.org/10.3390/s22062192

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop