Next Article in Journal
Functional Properties of Fruits of Common Medlar (Mespilus germanica L.) Extract
Next Article in Special Issue
Quantification of the Information Loss Resulting from Temporal Aggregation of Wind Turbine Operating Data
Previous Article in Journal
Optimal Planning for Energy Stations and Networks in Distributed Energy Systems Based on Voronoi Diagram and Load Characteristics
Previous Article in Special Issue
Exploring the Effect of Temporal Aggregation on SCADA Data for Wind Turbine Prognosis Using a Normality Model
 
 
Article
Peer-Review Record

Improved Ensemble Learning for Wind Turbine Main Bearing Fault Diagnosis

Appl. Sci. 2021, 11(16), 7523; https://doi.org/10.3390/app11167523
by Mattia Beretta 1,2, Yolanda Vidal 3,4, Jose Sepulveda 2, Olga Porro 2 and Jordi Cusidó 2,5,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Appl. Sci. 2021, 11(16), 7523; https://doi.org/10.3390/app11167523
Submission received: 6 July 2021 / Revised: 6 August 2021 / Accepted: 12 August 2021 / Published: 17 August 2021
(This article belongs to the Special Issue Boosting Wind Power Integration)

Round 1

Reviewer 1 Report

The manuscript proposed an Ensemble Learning-based method for the monitoring objective of wind turbine bearing. The proposed method seems to be interesting, but the authors should clarify some points to improve it.

 

The sentence “it is an unsupervised approach, thus it does not need historical fault data to be constructed;” - line 104. So, what is the role of historical SCADA in the proposed ANN training?

 

Concerning Normal behavior model establishment based on ANN with the output being the temperature of the low-speed shaft, why the chosen input variables need to consist of temperatures of components surrounding at time t-1? What is the advantage of using ANN rather than calculating the output value directly from the temperatures of components surrounding based on heat transfer? Furthermore, the complexity of the ANN (the number of neurons in layers) seems not to be appropriate with the amount of training data samples (over 1 year, the sampling rate is 10-min period).

 

The proposed monitoring method for the objective of the main bearing depends on many other component states. Is it still reliable when these components also get faulty?

 

Regarding the formula in step 3 of algorithm 1, if assuming that all values of i_a,w (also for i_n,w) are equal after a rolling window size time, then the value of i_e,wt will always be 1 > DT (consistently predict as require maintenance). Please explain about these cases.

 

The final output is a binary decision for the main bearing maintenance (requiring maintenance or not) really depending on the decision threshold (DT). Is there any fundamental basis to determine this important value?

 

 

 

 

 

Author Response

The manuscript proposed an Ensemble Learning-based method for the monitoring

objective of wind turbine bearing. The proposed method seems to be

interesting, but the authors should clarify some points to improve it.

Author’s reply: The authors thank the reviewer for the positive general

comments and will try to answer the points that need to be clarified.

The sentence “it is an unsupervised approach, thus it does not need historical

fault data to be constructed;” - line 104. So, what is the role of historical

SCADA in the proposed ANN training?

Author’s reply: The proposed methodology does not require faulty data,

but, as stated by the reviewer, it requires historical SCADA data. The authors

agree that the phrase “it is an unsupervised approach, thus it does not need

historical fault data to be constructed;” was not clear in this sense. In the revised

manuscript, the aforementioned phrase has been replaced by the next

one:

i) it is an unsupervised approach, thus there is no need that the

specific studied fault happened in the past to train the proposed

models.

Concerning Normal behavior model establishment based on ANN with the

output being the temperature of the low-speed shaft, why the chosen input

variables need to consist of temperatures of components surrounding at time

t ? 1? What is the advantage of using ANN rather than calculating the output

value directly from the temperatures of components surrounding based on

heat transfer? Furthermore, the complexity of the ANN (the number of neurons

in layers) seems not to be appropriate with the amount of training data

samples (over 1 year, the sampling rate is 10-min period).

Author’s reply: The authors thank the reviewer for these valuable comments.

Clearly, the initial manuscript did not explain comprehensively the

details of the proposed ANN.

First, related to the question ”why the chosen input variables need to consist

of temperatures of components surrounding at time t ? 1?”, the short answer

is that it is not necessary. Other lags in time (involving not only t and t?1

but also t?2, t?3, t?4...) could be studied in future work. However, increasing

the number of lags increases the number of inputs and the complexity of

the model, that is why in this work only t and t ? 1 were used.

Second, related to the question ”What is the advantage of usingANNrather

than calculating the output value directly from the temperatures of components

surrounding based on heat transfer?”, the main reasons are that heat

transfer methods are rather complicated, they depend too much on the specific

characteristics of the components, and are not easily scalable. Thus, in

this work, a data-based strategy that allows to avoid the necessity of explicit

physical modeling of the turbine components has been chosen.

Finally, related to the complexity of the ANN, recall that the Bayesian selection

of the regularization parameters (used in this work) provides insight

into the effective number of parameters actually used by the ANN (which is

extremely useful to design the size of the network). The following phrase has

been added to the revised manuscript to clarify this issue:

In this work a value of 1058 effective number of parameters is obtained

from a total of 1153 parameters in the proposed network

(number of weights and biases), thus the complexity of the ANN

is appropriate for the used training dataset.

The proposed monitoring method for the objective of the main bearing de-

pends on many other component states. Is it still reliable when these components

also get faulty?

Author’s reply: On the one hand, the ANN strategy depends on several inputs

and when a related component has a fault, this will lead to a higher value

of the normal behavior model indicator. This indicates an abnormal behavior,

but not strictly related to the main bearing. On the other hand, the anomaly

detection strategy focuses on very few variables (rotor temperature, ambient

temperature and main bearing), thus when the anomaly indicator is high the

fault is highly related to the main bearing. In summary, the isolation forest

adds reliability to the ensemble in the sense that when an alarm is triggered, it

is more likely to be related to the main bearing fault.

Regarding the formula in step 3 of algorithm 1, if assuming that all values

of ia;w (also for in;w ) are equal after a rolling window size time, then the value

of ie;wt will always be 1 > DT  (consistently predict as require maintenance).

Please explain about these cases.

Author’s reply: Thanks to the reviewer’s comment, we realized that there

was an error in the algorithm definition. In the new version of the manuscript,

at step 2 of the algorithm, the calculation of the x[ensemble] is defined as the sum

of the percentiles obtained by a given turbine along the period of analysis for

the normality and anomaly indicator.

The value of x[max?ensemble] has also been updated, as the previous definition

was misleading and partially incorrect. The maximum value is calculated

as the maximum theoretical value that a turbine can reach during the period

of observation. Therefore, it has been re-defined as follows: x[max?ensemble] =

p (x[max?normality] + x[max?anomaly]). Given that we are dealing with percentiles,

i.e. the maximum value that xanomaly; xnormality can assume is 1, the

theoretical maximum is equal to 2p, where p is the length of the observation

period.

Finally, to address the problem arising from ties between turbines, a ranking

scheme that assigns the minimum value to the tying group is used. An

example of the various ranking schemes that were considered prior to choosing

the minimum is provided below.

Given the following set of values, in which turbines are sharing the same

value,

wt indicator

01 0.5

02 0.5

03 0.5

04 0.5

the following ranking schemes are studied:

  1. ’min’ lowest rank in the group

2. ’max’ highest rank in the group

3. ’mean’ average rank of the group

4. ’first’ ranks assigned in order they appear in the array

5. ’dense’ like ‘min’, but rank always increases by 1 between groups.

obtaining the following results:

wt indicator rankMin rankMax rankMean rankFirst rankDense

01 0.5 0.25 1.0 0.625 0.25 1.0

02 0.5 0.25 1.0 0.625 0.50 1.0

03 0.5 0.25 1.0 0.625 0.75 1.0

04 0.5 0.25 1.0 0.625 1.00 1.0

Taking into consideration the characteristics of the main bearing, i.e. low

failure rate that makes it difficult that multiple turbines fail at the same time,

the chosen ranking method is the ”minimum” scheme. In fact, this scheme

assigns low values in case of ties, making it unlikely to raise alarms in case

turbines are tying.

The following sentence has been added to the revised manuscript to clearly

state the ranking scheme to use when assigning the score to turbines.

Possible ties between turbines are managed using a ranking scheme

that assigns to the tying turbines the lowest rank of the group.

The final output is a binary decision for the main bearing maintenance (requiring

maintenance or not) really depending on the decision threshold (DT).

Is there any fundamental basis to determine this important value?

Author’s reply: As stated by the reviewer the outcome of the described

methodology is a binary decision, thus the choice of the decision threshold

(DT) is crucial. The choice for this parameter is influenced by various factors,

including the cost of scheduling a maintenance task, the cost of the component

to substitute, the cost of missed production, etc.The original manuscript

introduces the problem of setting an appropriate DT around line 399, where it

is mentioned that generally higher values of DT results in better performance

as most of the false positives are filtered out. Moreover, Figure 7 and Table

6 provide a complete overview of how the choice of DT has an impact on the

number of FP, FN, TP, TN. As not enough information were available to set up a

complete economic analysis and optimization of DT, we have decided to investigate

the relation between performance indicators and DT. Nonetheless, optimizing

DT by taking into consideration costs and other operative constraints

is a very important line of investigation that should be pursued by future researches.

A summary of these considerations is provided in the conclusion at

line 443-446 of the original manuscript.

Finally, we would like to thank the reviewer for the valuable feedback and

the time to review the paper. We truly understand the concerns of the reviewer.

However, we hope that after the response to the reviewer’s comments together

with the changes made to improve the revised manuscript, he/she could accept

it for publication.

 

Author Response File: Author Response.pdf

Reviewer 2 Report

In this very interesting scientific paper, an ensemble method for main bearing fault diagnosis has been proposed, implemented, and validated on a real under-production wind park composed by 18 wind turbines.
The paper presents the available SCADA data and work order logs used in this work. Next, it states the proposed ensemble methodology for wind turbine main bearing fault diagnosis, including a comprehensive description of the single models and their indicators. Then it presents the results and their discussion to interpret and describe the significance of the ensemble method in comparison to the single models by themselves. Finally, conclusions are drawn, and future work is proposed.

The research design is appropriate. The methods and results are adequately described. The results are clearly presented. The conclusions are supported by the results.

The reference to Table 6 is far from the table (two pages away).

Plagiarisms were detected in chapter 3.1.2., chapter 4.3 without citations.

Minor spell check is required.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

This manuscript is interesting, concise, and clearly written.The introduction has presented relevant literatures as it relates to the research area. 

The method is appropriate and clearly described, but further clarification on rationale for the use of these method in WT fault prediction is needed. 

Data acquisition and usage is clearly expressed and results which are compelling have been presented and discussed clearly. The conclusion has also highlighted the outcome of the study and recommendation to wind park operators. Specific comments to be addressed by the author includes;

1. Clarify how the work order and operation data are used effectively in this study.

2. p.2, line 50-51: the statement should be made clearer.

3. p.2, line 60: State why unsupervised approaches are preferred for SCADA prediction maintenance.

4. p. 5, line 194: Please change the word "arise" to "raise".

5. p. 8, line 279: In-citing just after the name Liu et al, should be done.

6. p. 10, line 354-355: Rewrite the statement to clearly show that the criteria referred to are what is discussed afterwards.  

Some recommendations to the author to be included in the introduction;

Discussion of the types of bearing used in the wind turbine, their lifespan and the likely failure they encounter and how these can be captured. 

Describe major function of the bearing in the wind turbine. 

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The paper has been improved. 

Back to TopTop