The paper proposes a classification algorithm to isolate multiple faults. The proposed approach was tested using vibration signals. The isolation of multiple faults is a topic of interest to the diagnostic community, mainly if it can be achieved using only samples with single faults.

In my opinion, the technical clarity of the algorithm description must be improved before the publication (see questions in detailed remarks). The paper also needs language and formatting corrections (see minor remarks).

Detailed remarks:

In my opinion, multiple faults is a more common term than simultaneous faults.
ELM – how a and b are obtained?
Eq. (11) – according to eq. (7) alpha is a vector (so there is no diagonal)
Hierarchical classification: how exactly are second-level classifiers trained? On what data? How do they differ from the first-level classifiers?
Line 204-205 “each classifier is trained by training samples of the ith and jth fault” do you mean samples with both faults present? Please elaborate.
Line 208: pij = 1 – pij – why?
Line 212: is it conditional probability on the occurrence of fault j? If yes, why?
Eq. (19) – why >, not >=? What about the states with only one fault present?
Eq. (20) – “nij represents the amount of samples with the ith and jth fault mode” – with both faults simultaneously?
Line 322 – segments from one sample should be put only in one dataset (train/val/test) to avoid data leakage.
Line 624: “the entire data set is reassigned” – how?

Minor remarks:

Language problems: line 28, line 74-75 unfinished sentence, line 91 un-known, line 407: “tabel”, table 3: “category”, fig. 9 “disgnosis”
Math expressions need formatting (e.g. line 113)
Eq. (18) matrix is not the best label.
Fig. 10 and fig. 11 – there is no need for 3D plots here.
Reference 3 - capitalization

Author Response

Reply to reviewer1’s comments

Manuscript Number: applsci-2133450 (Research Article)

Title: Simultaneous fault diagnosis based on hierarchical multi-label classification and sparse Bayesian extreme learning machine

Firstly, thank the editor and reviewer for the patient guidance and professional suggestions from which I really benefited a lot. Meanwhile, I appreciate the reviewer for his support and affirmation of the research field. The following are the replies to comments:

Detailed remarks:

In my opinion, multiple faults is a more common term than simultaneous faults.

Reply: Thanks for the reviewer’s professional suggestion. As the reviewer said, multiple fault diagnosis and simultaneous fault diagnosis are both commonly-used terms. In my previous studies, I used simultaneous fault diagnosis more. Additionally, simultaneous fault diagnosis has appeared in the literatures as follows:

[1] Yunpu Wu, Weidong Jin, A novel method for simultaneous-fault diagnosis based on between-class learning, Measurement 172 (2021)

[2] Pengfei Liang, Chao Deng, Single and simultaneous fault diagnosis of gearbox via a semi-supervised and high-accuracy adversarial learning framework, Knowledge-Based Systems 198 (2020)

[3] Yanghui Tan, Jundong Zhang, Hui Tian, Multi-label classification for simultaneous fault diagnosis of marine machinery: A comparative study, Ocean Engineering 239 (2021)

[4] Samira Zare, Moosa Ayati, Simultaneous fault diagnosis of wind turbine using multichannel convolutional neural networks, ISA Transactions 108 (2021), 230-239.

If the reviewer agrees, I hope to be allowed to continue using this term.

ELM – how a and b are obtained?

Reply: Thanks for reviewer’s kind remind. I ignored the related descriptions. The three-step learning model of ELM is: (1) randomly generate the input weight a and bias b of L hidden layer nodes; (2) calculate the feature mapping matrix of hidden layer H; (3) calculate the output weight β according to the Moore-Penrose generalized inverse H⁺of matrix H. I have added the description of this issue in the manuscript.

(11) – according to eq. (7) alpha is a vector (so there is no diagonal)

Reply: In the section 2.2, matrix A is a diagonal matrix generated from hyper-parameter a and is used to limit weights around the zero mean and have a small variance, so as to only a small part of the weights are non-zero which can achieve the sparsity of network.

Hierarchical classification: how exactly are second-level classifiers trained? On what data? How do they differ from the first-level classifiers?

Reply: In the manuscript, hierarchical multi-label classification strategy is proposed. At the first level, each classifier only focuses on the identification of one fault type. For diagnostic problem with m fault types, there are m classifiers presented as [C1, C2, …, Cm] in which Ci is utilized to recognize the ith fault type. At the second level, each classifier at the previous level is segmented into several sub-classifiers which can be trained on single fault samples. Such as, the ith classifier Ci in the first level is further expressed as dichotomization [Ci1, Ci2, …, Cim] (m≠i) in which m-1 elements represent a set of classifiers focusing on two classes. Sub-classifier Cij is trained on fault samples of the ith and ith fault type. It means that the output of Cij is expressed as pij and is used to judge the probability that an unknown sample belongs to the ith and jth fault type.

Based on the information fusion strategy, the probabilistic output of each sub-classifier in the second level is fused to acquire the fusion output of the classifier in the first level. Considering the positive correlation between the amount of single fault type samples and the corresponding occurrence probability, the sample size of training set is taken as weight coefficient of fusion and to generate the fusion result using formula (20).

Line 204-205 “each classifier is trained by training samples of the ith and jth fault” do you mean samples with both faults present? Please elaborate.

Reply: For multiple fault diagnosis, there may be some correlation between any two single fault modes. The traditional SBELM can only be used to solve the binary classification problem and fully considering the correlation between any two single fault modes. In the second level, each sub-classifier SBELMij is used to identify the ith and the jth fault mode. In other words, the training data of SBELMij is samples of two single fault modes rather than samples where multiple faults occurring simultaneously. So, SBELM_ij is trained by using single fault samples belonging to the jth fault mode and single fault samples belong to the jth fault mode.

Line 208: pij = 1 – pij – why?

Reply: “Due to the complementarity between SBELMij and SBELMji, that is pij = 1 – pji”. SBELMij is used to determine the probability that a failure belonging to class i or j. To train SBELMij, samples belonging to the ith fault mode are viewed as positive samples and samples belonging to the jth fault mode are viewed as negative samples. Likewise, to train SBELMji, samples belonging to the jth fault mode are viewed as positive samples and samples belonging to the ith fault mode are viewed as negative samples. It means that SBELMij and SBELMji are complementary.

Line 212: is it conditional probability on the occurrence of fault j? If yes, why?

Reply: SBELM has the characteristics of probability output by solving the prediction probability distribution of an unknown category of sample x: p (t | x, β '). Fault classification algorithm based on SBELM can provide the conditional probability of fault modes, which is to achieve probability output, so as to solve the classification problem. Probabilistic classifier is essential for solving fault diagnosis problem.

By using prune strategy, SBELM utilizes Bayesian learning method and hyper-parameters corresponding to the output weights β to prune a partial hidden layer node so as to achieve a sparse representation of the hidden layer. With For an unknown sample x, SBELMij can predict the possibility of associating the category t_i for this sample x which represents the conditional probability of category t_i.

(19) – why >, not >=? What about the states with only one fault present?

Reply: Thanks for the reviewer’s professional instruct. In this manuscript, in order to obtain the output probability vector p= [p₁,..., p_m] of the proposed model based on C-SBELM, the probability output of each individual sub-classifiers SBELM_ij are fused. Decision layer fusion is utilized to fuse the output of multiple sub-classifiers SBELM_ij p_ij (j=1,2,…,m, j≠i) to obtain the global classification result p_i that reflects the occurrence probability of the ith fault type, that is, the output probability of the classifier C-SBELM_i. For each single fault type, (m-1) sub-classifiers SBELM_ij (j=1,2,…,m, j≠i) will be constructed to identify the ith fault type.

There are several available methods for pairwise strategy, which are, however unsuitable for simultaneous fault diagnosis because of the constraint . After fully considering the reviewer's question, the team reexamined and studied the question, and it is more appropriate to correct it as follows: for simultaneous fault diagnosis, is unnecessarily equal to 1.

(20) – “nij represents the amount of samples with the ith and jth fault mode” – with both faults simultaneously?

Reply: To take the pairwise correlation between a pair of fault modes into account, nij is the number of single-fault samples with either the ith or jth fault mode, rather than simultaneous-fault samples. It was my mistake that I didn’t describe clearly. It is more appropriate to correct it as follows: in which represents the amount of samples with either the ith or the jth fault mode which are used to train classifier SBELMij.

Line 322 – segments from one sample should be put only in one dataset (train/val/test) to avoid data leakage.

Reply: Thanks for the reviewer’s kind reminder. It was my mistake that I didn’t describe clearly. It is more appropriate to correct it as follows: the remaining single failure samples and simultaneous failure samples are divided into two subsets to generate the validation set and testing set.

Line 624: “the entire data set is reassigned” – how?

Reply: Thanks for the reviewer’s kind reminder. To further verify the effectiveness of the proposed intelligent fault diagnosis model, the diagnostic model is executed 100 trials. In each trial, the entire data set is reassigned to obtain a new training set, determination threshold set and testing set. That is, some training sample in the previous trial may be testing sample in the next trial. It is more appropriate to correct it as follows: In each trial, the entire data set is shuffled and reassigned to obtain diverse training set, determination threshold set and testing set.

Minor remarks:

Language problems: line 28, line 74-75 unfinished sentence, line 91 un-known, line 407: “tabel”, table 3: “category”, fig. 9 “disgnosis”

Reply: Thanks for the reviewer’s kind reminder. I have corrected the above issues and examined the paper throughout.

Math expressions need formatting (e.g. line 113)

Reply: Thanks for the reviewer’s kind reminder. I have corrected the above issues and examined other expressions.

(18) matrix is not the best label.

Reply: Thanks for the reviewer’s kind reminder. I have revised some of the descriptions and details in the manuscript.

10 and fig. 11 – there is no need for 3D plots here.

Reply: Thanks for the reviewer’s kind reminder.

Reference 3 – capitalization

Reply: Thanks for the reviewer’s kind reminder. I have corrected the above issues

Additionally, in the manuscript there are some details of the proposed methods that are not clearly described and some spelling mistakes, and I have revised the English descriptions, English language and style in the manuscript.

Finally, thanks again for editor and reviewer’s help to improve the overall level of this paper. The reviewer is very professional. The reviewer’s comments and suggestions are very instructive for me. I really have benefited a lot from the advice and comments.

Best wishes!

Qing Ye

2023/2

Author Response File: Author Response.docx

Reviewer 2 Report

The paper is well drafted.

The method is not novel but yes its use in the application domain suggested by you is innovative.

The performance of the classifier is good as number of samples are quite low. There is need to collect the sufficient number of samples and re-experiment with the model suggested for performance.

Author Response

Reply to reviewer2’s comments

Manuscript Number: applsci-2133450 (Research Article)

Title: Simultaneous fault diagnosis based on hierarchical multi-label classification and sparse Bayesian extreme learning machine

The following are the replies to comments:

The paper is well drafted.

The method is not novel but yes its use in the application domain suggested by you is innovative.

The performance of the classifier is good as number of samples are quite low. There is need to collect the sufficient number of samples and re-experiment with the model suggested for performance.

Reply: Thanks for the reviewer’s professional instruction. In this research, all the experiments are implemented on real dataset which are collected from actual test bench through destructive operations in the production workshop. Based on the test bench of main reducer, the collected data contains 500 normal samples, 3000 single failure samples and 2500 simultaneous failure samples. By consulting some relevant references, it can also be seen that many mechanical fault diagnosis studies use dataset which are not larger than this paper.

As the reviewer said, the number of samples is not very sufficient. Based on our current experimental conditions, it is hard to collect large numbers of samples of simultaneous faults. By using some data augmentation approaches to artificially increase the amount of dataset, the performance improvement of the model is not considerable and it’s difficult to break through current best accuracy, but also increases the model execution time which is not suitable of real-time diagnosis. It is possible that due to the limited size of the model, excessive data may lead to over-fitting. However, based on the suggestions made by the reviewer, we decided to further adapt and optimize the model in the next stage to meet the needs of larger datasets.

Additionally, in the manuscript there are some details of the proposed methods that are not clearly described and some spelling mistakes, and I have revised some of the descriptions and details in the manuscript.

Best wishes!

Qing Ye

2023/2

Author Response File: Author Response.docx

Reviewer 3 Report

The subject matter is interesting and worth publishing in AS. However, I have a few minor comments. The quality of the drawings is poor and it is difficult to read what is on the axes in the reviewed file. This requires corrections in the final version of the article, e.g. Fig. 4 - it is difficult to find out what is in the drawings. Fig. 6 - a fragment of the text has been cut off, Fig. 12 descriptions on the axes are too small. Figure captions are written differently Fig. X. spaces and dots are missing! My comments are minor and do not affect the positive assessment of the work

Author Response

Reply to reviewer3’s comments

Manuscript Number: applsci-2133450 (Research Article)

Title: Simultaneous fault diagnosis based on hierarchical multi-label classification and sparse Bayesian extreme learning machine

The following are the replies to comments:

Reply: Thanks for the reviewer’s professional instruction. I have corrected above issues including the accuracy and clarity of the figures, descriptions of Figure captions, text missing.

Best wishes!

Qing Ye

2023/2

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Most of my questions were answered, and the description was clarified.
Formatting and language were also improved.
I do not have any further comments.

Article Menu

Simultaneous Fault Diagnosis Based on Hierarchical Multi-Label Classification and Sparse Bayesian Extreme Learning Machine

Further Information

Guidelines

MDPI Initiatives

Follow MDPI