1. Introduction
Rolling bearings are one of the most widely used mechanical components in rotary machinery and have diverse applications in the medical, aerospace, and railway fields [
1,
2,
3]. Bearing failures account for 30% of rotating machinery failures, according to published research [
4]. Ensuring the normal and safe operation of rolling bearings thus demands state monitoring and fault diagnosis [
5]. When a rolling bearing fails, vibration signals emit regular pulse signals, which analysts often scrutinize to identify the specific type of failure [
6,
7,
8]. However, the noisy nature of collected vibration signals poses difficulties in extracting fault features from pulse signals, which is an area of active research in the field of fault diagnosis.
Several approaches have been proposed for processing vibration signals, including empirical mode decomposition (EMD) [
9,
10,
11], ensemble empirical mode decomposition (EEMD) [
12,
13,
14], and the local mean decomposition method (LMD) [
15,
16,
17]. Nonetheless, modal-analysis-based approaches, such as EMD, EEMD, and LMD, are limited in that they cannot fully address the issue of modal mixing and endpoint effect. Dragomiretskity et al. [
18] introduced an adaptive signal analysis approach named VMD that deals with signal processing by formulating and resolving variational issues, providing strong resistance to noise. Owing to its ability to tackle modal mixing and endpoint effect effectively, VMD finds applications in multiple fields, including generator anomaly detection, structural health monitoring, and bearing fault diagnosis [
19,
20,
21]. Li and colleagues [
22] devised a diagnosis method for fault detection in rolling bearings using VMD and improved Kernel Extreme Learning Machine and demonstrated that VMD successfully addressed the mode mixing problem and boasted superior computational efficiency compared to EMD and LMD methods.
However, VMD requires pre-set parameters, including the decomposition number (K) and penalty factor (α), which significantly affect the final outcome. To identify the optimal parameter combination, several studies have proposed different methods. For instance, Jiang et al. [
19] established a central frequency mode decomposition (CFMD), based on VMD, and the difficulty of the selection of initial parameters in the traditional VMD was relieved, provided that the range of bandwidth parameters was preset. Wang et al. [
23] compared the center frequency of modal components that were decomposed by different parameter combinations. However, this method had limited adaptability. Tang et al. [
24] implemented the Particle Swarm Optimization (PSO) algorithm to optimize VMD and demonstrated its ability to extract early fault features from bearing vibrations. Another study by Zhang et al. [
25] suggested a parameter adaptive VMD strategy based on the grasshopper optimization algorithm (GOA) and validated its efficiency in analyzing the vibration signals of real rotating machinery. Moreover, Gu et al. [
26] adopted the grey wolf optimizer (GWO) algorithm, which was superior to the fixed-parameter VMD and maximum weighted kurtosis optimization VMD in terms of selecting the optimal parameter combination. These studies show that optimization algorithms with strong search abilities can achieve better adaptive selection of VMD parameters. Recently, the MFO algorithm [
27] has achieved excellent performance in solving engineering optimization problems [
28,
29]. Sivalingam et al. [
30] compared MFO with other optimization algorithms, finding that MFO performed the best. However, the original MFO algorithm searches in the region around its unique flame, hence increasing the risk of falling into the local optimum and slow convergence speed. Therefore, the original MFO algorithm requires improvement. Although VMD can effectively solve the modal problem, there will still be false intrinsic modal functions (IMF) in the decomposed IMF, which will affect the accuracy of the fault classification of rolling bearings [
31]. Therefore, it is necessary to screen the decomposed IMF.
In addition to feature extraction, the swift and accurate identification of fault types is pivotal for fault diagnosis. ELM replaces the gradient descent algorithm with a random assignment method and enhances the generalization ability of traditional classification networks, making it effective for fault diagnosis [
32]. Jiang et al. [
33] proposed a fault diagnosis model based on multiscale weighted permutation entropy (MWPE) and ELM, which demonstrated superior recognition accuracy and speed compared to using various multiscale feature extraction methods with BPNNs and SVMs. Similarly, Lan et al. [
34] achieved the diagnosis of slipper abrasion faults using ELM and demonstrated that the classification performance of ELM is superior to BP and SVM. Liang et al. [
35] proposed OSELM to resolve the long training time issue of ELM when the training data were large. This algorithm obviates the need to retrain historical data to facilitate rapid diagnosis. Sahani et al.’s use of VMD and OSELM for real-time detection and classification of power quality events provided high classification accuracy and robustness [
36]. However, the randomly generated input weight and hidden layer bias in the OSELM algorithm have been found to limit its prediction accuracy and robustness. Hence, this paper introduces the ensemble DE-OSELM method, which was proposed by Zhou et al. [
37], to realize fault classification and diagnosis.
This paper presents a novel approach for detecting rolling bearings faults, utilizing optimized VMD and DE-OSELM. Firstly, the method adopts a dynamic adaptive weight factor and a crossover operator. Compared with the original MFO algorithm and other intelligent optimization algorithms, the improved MFO algorithm has higher global optimization ability and search accuracy, and it solves the problem of sub-optimal performance and limited convergence accuracy of multi-objective optimization algorithm. Secondly, the improved MFO is used to optimize VMD and overcome the susceptibility of VMD parameters to artificial settings, thus facilitating the adaptive selection of these parameters. Thirdly, since the vibration characteristics of the signal cannot be fully interpreted by a single index, this method introduces a new evaluation index, the effective weighted correlation sparsity index, to strip false modal components and filter the IMF recovered through VMD decomposition. The energy features of the effective IMFs are subsequently extracted as feature vectors. Finally, in order to improve the classification accuracy, the energy eigenmatrix is normalized and subjected to DE-OSELM training to identify the fault type.
2. Basic Principle
VMD is an adaptive signal processing technique that relies on the formulation and resolution of variational problems [
38]. Essentially, the VMD algorithm partitions the input signal into
IMFs while simultaneously minimizing the estimated broadband sum. To achieve this, the algorithm operates under the assumption that the sum of the IMF components is equivalent to that of the original signal. The corresponding variational problem is subject to certain constraints and can be expressed as:
where;
is the decomposed IMF components, and
is the central frequency of each IMF component.
The problem is solved by introducing a penalty factor
and a Lagrange multiplier operator
into the model, thereby transforming it into an unconstrained variational problem.
where;
is the original signal.
The steps for solving Equation (2) are as follows:
Initialize parameters , , , and set ;
Let
, an update
,
,
iteratively according to Equations (3)–(5):
where;
is the number of iterations,
is the Fourier transform, and
is the noise tolerance.
Repeat Step 2 until the cycle ends, when the components satisfy Equation (6), and
IMFs are obtained.
where;
is the discriminant accuracy.
Proper parameter selection is a prerequisite for VMD signal decomposition, where the parameters
and
exert major influence on the decomposition effect, whereas the parameters
and
play a less prominent role [
39]. Hence, achieving the optimal VMD decomposition effect warrants the identification of suitable
and
values.
3. MFO Algorithm and Its Improvement
3.1. MFO Algorithm
Inspired by the natural phenomenon of moths fighting fire, the MFO algorithm assumes that the moth population flies around the flame population in the form of a logarithmic spiral curve. If a new position is found to be better than the original flame position, the position will be updated. The matrix
represents the initial moth position, and the matrix
represents the initial moth fitness value, as given below.
where;
is the position of the
moth in the solution space of the moth population
;
is the population number; and
is the dimension of the solution space.
where;
is the fitness value of the
moth.
Each moth has a unique flame corresponding to it. The flame represents the local optimal solution found by each moth in the search process. The flame position is represented by the matrix
, and the matrix
represents the flame fitness value.
where;
is the position of the
flame in the solution space of flame population
;
is the number of flame population; and
is the dimension of solution space.
where;
is the fitness value of the
flame.
Moth
will move towards the corresponding flame
in the form of a logarithmic spiral curve due to phototaxis. The movement formula is defined as Equation (11), and the matrix
represents the updated position of the moth.
where;
is the
moth;
is the
flame;
is the distance between the
moth and the
flame;
is a constant related to the shape of the spiral function; and
is a random number in the interval [−1, 1].
The algorithm adaptively reduces the number of flames based on Equation (12), thereby enhancing its efficiency and ensuring that the population of moths is converging towards the optimal flame.
where;
is the number of current flames;
is the number of original flame population;
is the number of current iterations; and
is the maximum number of iterations.
For the MFO algorithm, a good moth population and flame population after the initial position of the each moth are usually around the corresponding search flame area, and moths will follow the flame along into local optimum only if the flame in a local optimum. Thus, the original MFO is unable to jump out from the local optimum, which leads to the relatively low convergence accuracy and slow convergence speed.
3.2. The Improved MFO Algorithm
To address the issues related to local optimization, low convergence accuracy, and slow convergence speed of the original optimization algorithm, the MFO is modified by using both a dynamic adaptive weight factor and a crossover operator to improve the performance in terms of global optimization ability, accuracy, and efficiency.
In the original concept of the MFO algorithm, the moth population is supposed to fly around the flame population in the form of a logarithmic spiral curve, and the flame is not fully utilized, which makes the algorithm fall into local optimal easily. To improve its global optimization ability and convergence speed, the dynamic adaptive weight factor
is introduced into the position updating strategy of moths, as formulated below.
where;
represents the current iteration number, and
represents the maximum iteration number. The updated moth position combined with the adaptive weight factor
is introduced in Equation (14).
The gradual decrease as the iteration increases from 1 to 0 results in an enlarged search scope during the initial stages. Consequently, the algorithm performance significantly improves concerning search and global optimization abilities, accuracy, and efficiency.
In order to effectively elevate the algorithm out of local optima, the positions of the front flames are disordered by applying the crossover operator from the genetic algorithm. The main idea is to make times of cross recombination of each dimension data in the matrix involving front flames, and the corresponding dimension data of other flames are combined into a new flame. If the fitness value of the new flame is better than the original flame, the original flame will be replaced. The flame population contains a relatively high diversity and can jump out of local optimum in a certain probability by perturbation of the front optimal flames.
The process for improving the MFO is shown in
Figure 1.
3.3. Verification of the Algorithm
To verify the effectiveness and superiority of the improved MFO algorithm, four commonly used functions were selected for the test, and the dimensions for the tested functions are all 30. The tested functions are given as follows.
- 8.
(1) Schwefel’s Problem 1.2 function
where;
, and the optimal value of this function is 0.
(2) Schwefel’s Problem 2.22 function
where;
, and the optimal value of this function is 0.
(3) Sum Squares function
where;
, and the optimal value of this function is 0.
(4) Ackley function
where;
, and the optimal value of this function is 0.
In the verification, the moth population is set to consist of 30 individuals, with a maximum of 1000 iterations; the parameters m and
in the crossover operator are set as 15 and 5, respectively, which indicates that the first 15 flame positions after sorting are disordered five times. Each tested function was performed 20 times, and the results are compared with those obtained from the original MFO, GWO, and PSO optimization algorithms. A summary of the comparison results is shown in
Table 1.
The iterative optimization convergence curves of MFO, GWO, and PSO are shown in
Figure 2. Additionally,
Table 1 illustrates that the improved MFO algorithm exhibits superior optimization capability compared to three other algorithms. It produces the highest optimization accuracy across four test functions and is able to identify global optimal values. These findings provide convincing evidence that the algorithm improvement was effective.