Condition Forecasting of a Power Transformer Based on an Online Monitor with EL-CSO-ANN

Fan, Jingmin; Shao, Huidong; Cao, Yunfei; Feng, Lutao; Chen, Jianpei; Meng, Anbo; Yin, Hao

doi:10.3390/en15228587

Open AccessArticle

Condition Forecasting of a Power Transformer Based on an Online Monitor with EL-CSO-ANN

by

Jingmin Fan

¹,

Huidong Shao

²,

Yunfei Cao

¹,

Lutao Feng

¹,

Jianpei Chen

¹,

Anbo Meng

¹ and

Hao Yin

^1,*

¹

School of Automation, Guangdong University of Technology, Guangzhou 510012, China

²

Guangdong Tianlian Electric Power Design Co., Ltd., Guangzhou 510700, China

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(22), 8587; https://doi.org/10.3390/en15228587

Submission received: 18 September 2022 / Revised: 28 October 2022 / Accepted: 8 November 2022 / Published: 16 November 2022

(This article belongs to the Special Issue Power System Fault Diagnosis and Maintenance)

Download

Browse Figures

Versions Notes

Abstract

:

Power transformers are vital to the power grid and discovering the latent faults in advance is helpful for avoiding serious problems. This study addressed the problem of forecasting and diagnosing the faults of power transformers with small dissolved gas analysis (DGA) data samples that arise from faults in transformers with low occurrence rates. First, an online monitor that was developed in our previous work was applied to obtain the DGA data. Second, the ensemble learning (EL) of a bagging algorithm with bootstrap resampling was used to deal with small training samples. Finally, a criss-cross-optimized neural network (i.e., CSO-NN) was applied to the short-term prediction of the DGA data, based on which the transformer status could be forecasted. The case studies showed that the proposed EL-CSO-NN algorithm integrated into the monitor was capable of achieving satisfactory classification and prediction accuracy for transformer fault forecasting.

Keywords:

online DGA monitor; ensemble learning; CSO-NN; fault diagnosis; faults prediction

1. Introduction

It is well-known that an oil-immersed transformer is the most expensive component in the power grid and the failure of a transformer will result in a widespread blackout and incalculable economic losses [1,2]. Hence, it is important to recognize the condition of the transformer and perform the proper maintenance before its failure [3]. Over the past decades, DGA has been largely used for incipient transformer fault identification [4]. However, these offline DGA tests are not always effective due to sudden transformer faults.

In recent years, with the development of sensor and computation technology, online monitoring with its corresponding artificial intelligence (AI) algorithms, such as support vector machine (SVM) [5,6], back propagation neural network (BPNN) [7,8] and their corresponding improved algorithms, have been applied to the status evaluation of power transformers. Although AI algorithms can establish a complex and nonlinear relationship between transformer faults and the feature gases, a large number of samples are needed for the training process [9] to improve the diagnosis accuracy. However, in practice, the fault occurrence rate of an individual transformer is rather low such that only a few historical samples can be collected by an online monitor. This leads to difficulties in the training process. Furthermore, most of the existing AI algorithms are only utilized for fault classification. However, what is more important is power transformer condition prediction since the discovery of latent faults in advance is more helpful for avoiding serious transformer problems.

To address diagnoses with small samples and transformer status forecasting, based on the online monitoring DGA system that was developed in our previous work [10], a new hybrid model combining the advantages of EL and CSO-NN, i.e., EL-CSO-NN, was proposed for transformer condition evaluation and prediction in this study. The criss-cross optimization (CSO) algorithm was proposed in [11] by Meng et al. The CSO algorithm is a heuristic optimization algorithm that has been widely used. For example, Xiangang Peng et al. applied CSO to address an ODGA problem regarding interacting operators [12]. In this study, the CSO algorithm performs the function of predicting transformer faults.

The contributions of this work are as follows: (1) ensemble learning (EL) via the bagging algorithm [13] was applied to transformer fault diagnosis for the first time to deal with small samples that usually have an unbalanced distribution, and (2) on the basis of our newly developed CSO algorithm [14], for the condition prediction of power transformers, the CSO-optimized artificial neural network (i.e., CSO-NN), which circumvents the limitation of the backpropagation (BP) algorithm’s tendency of getting stuck in local minima easily, was utilized for the DGA data prediction for the first time. In the new model, EL was used for transformer faults classification, while the CSO-NN was used to forecast the DGA series data. To validate the effectiveness of the hybrid model, verification was performed and it was found that EL was superior to other methods in terms of diagnosis accuracy with the condition that the training data set was smaller than the test data set. Combined with the CSO-NN prediction methods, the forecasting results showed that the proposed EL-CSO-NN model had advantages in terms of fault type classification and power transformer condition prediction. All of these are elaborated on in this paper.

2. Framework of the Online DGA System

As shown in Figure 1, this is an online DGA system framework with a focus on dissolved gas measurement, which is used to monitor the transformer status. The whole online DGA monitoring system contains many units, i.e., oil sampling, gas extraction, gas separation, gas detection, data sampling, data analysis and fault diagnosis units. The controlling unit is used to control and connect the other parts. Each component involved is discussed in this section.

As the gases are generated and dissolved in the transformer oil, the feature gases are extracted from the oil for GC analysis. The gas extraction module comprises the integration of oil sampling and gas extraction units. The feature gases are mixed after the extraction is done. Then, chromatographic separation is performed on the mixed gas through the separation unit.

After separation, the target gases need to be detected. The SOFC detector designed in Ref. [15] was adopted for the online monitor. Although the SOFC sensor is mostly used as an oxygen sensor, it also shows sensitivities to H₂, C₂H₄, C₂H₆, C₂H₂ and CO since they easily react with oxygen. Furthermore, the SOFC sensor only needs nitrogen as a carrier gas. Therefore, the SOFC has huge advantages of high sensitivity and low cost.

Within the online DGA monitoring system, the data analysis and fault prediction unit based on the EL-CSO-NN was implemented. Hence, based on the results of the measurement, the transformer status and fault type can be automatically evaluated. It makes it possible to distinguish and predict the fault types and give suggestions to the maintainers for further actions. In the following sections of this paper, the EL-CSO-NN algorithms for transformer fault diagnosis and forecasting are detailed.

3. Ensemble Learning for Transformer Fault Diagnosis

3.1. Bagging Algorithm

Ensemble learning [13] with bagging is a machine learning method in which many classifiers are integrated to obtain better performance than a single one. It combines several machine learning algorithms to reduce prediction errors. The realization process is that several models are trained and the outputs are averaged from the test samples on all of these models; this strategy is called ensemble learning. This model combination and averaging technique are expected to make more accurate predictions than any individual model. There exist two types of ensemble learning methods, i.e., heterogeneous and homomorphic learning methods, which are distinguished according to the type of base classifiers. In heterogeneous EL, various base classifiers are integrated. In contrast, in the homomorphic EL method, all of the base classifiers are the same type, although the parameters are distinct from each other. The base classifiers can be selected from decision trees [16], artificial neural networks, k-nearest neighbors, etc. On one hand, to achieve a better performance, the classification error of each base classifier should be less than 50%; otherwise, the final error rate of the results will be increased. On the other hand, the discrimination achieved by these base classifiers should be apparent. In case the outputs of these base classifiers are similar to each other, the ensemble results would not be improved compared with the decisions made by a single classifier. This technique is effective because the variances of these algorithms on the same test data set are different. Algorithms with high variance can be reduced using the general procedure of bootstrap aggregation, where a classification and regression tree (CART) [16] is the most frequently used one. A CART uses the Gini coefficient minimization criterion for feature selection and takes the smallest value as the splitting property. Based on this, each internal node is split into two child nodes by using a binary recursive splitting method, which forms a binary tree with a simple structure.

In this study, transformer fault diagnosis was essentially treated as a classification problem. Hence, bagging of the CART algorithm was suitable to be used and it worked as follows: the samples from the prepared DGA data set were resampled to create different sub-samples. The CART model was trained on each sub-sample. Then, for a new test data set, the diagnosis was performed with each model. The types of transformer faults could be classified into five types as follows: discharge with high energy (high-intensity arcing), discharge with low energy, partial discharge (low-intensity discharge) according to the energy level, low- and median-temperature thermal fault (<700 °C), and high-temperature thermal fault (>700 °C) according to the temperature level. All of the outputs from the trained CART models were collected, with the most frequent class being taken as the output class. When bagging using decision trees, overfitting of the training data is less concerned about each tree. Considering the efficiency and maintaining the variability of these base classifiers, the generated trees were not pruned. The classification tree kept branching until the overall impurity became optimal. Therefore, in the absence of pruning, the tree usually grew very deep. Furthermore, they had high variance as well as low bias. The advantage of the bagging decision is that only a few parameters are needed, i.e., sample and tree numbers. With an increase in the number of classifiers, a longer time is taken to prepare, and the efficiency is influenced. However, it is worth mentioning that the training data is not overfitted, which performs excellently on the training set, but poorly on the test set. Bagging can be applied to both classification and regression problems.

3.2. Structure of Fault Diagnosis

The structure of ensemble learning of the bagging algorithm for transformer diagnosis based on DGA is shown in Figure 2. A series of classifiers (called base classifiers) were trained via ensemble learning and DGA samples, and their output classification results were fused to obtain a better one when the classified transformer had a fault. The generalization capability of the ensemble classifier to faults classification could be improved by utilizing the decisions of these base classifiers.

The achievement of a series of diverse base classifiers and the integration of the output results of these classifiers are essential to ensemble learning. To construct an ensemble classifier, in this study, a series of the training subsets originating from the DGA set (i.e., 90 sets, 45% of the whole data set) were used and different classifiers were trained. In this study, the base classifier used was a decision tree, the number of which was set to 175. On the other hand, 75% of the samples were drawn randomly from the training data set to train each base estimator. That is to say, the sub-training sets in Figure 2 were extracted from the training data set. Hence, there were 175 sub-sample sets, and 175 base classifiers were obtained via training.

After the training of each classifier, the well-trained classifier could output a label when used for the fault classification of the test sample (with 110 records, 55% of the whole data set). The final diagnostic result was determined through voting with all of the classifiers involved, i.e., simple, weighted and Bayesian voting [17]. The transformer fault diagnosis system designed in this study establishes the classifier by resampling the data set with duplication, which is called bootstrapping [18].

This is a powerful technique for assessing the accuracy of a parameter in situations where the samples are limited. When training the base classifier, by using different training data sets, each training process produces a different basis and the final result is obtained via simple voting.

4. DGA Data Prediction with CSO-ANN

DGA is a technique that is used worldwide for incipient thermoelectric fault detection of the power transformer. The trace gases dissolved in the transformer insulation oil and analyzed using a chromatographic method mainly contain H₂, CO, CO₂, CH₄, C₂H₂, C₂H₄, C₂H₆, etc. Generally, the first five or six gases are enough for most of the transformer diagnosis methods, e.g., IEC or Rogers ratios [19] and decision trees [20]. Presently, most of the transformer fault diagnosis algorithms are based on historical DGA data obtained using offline measuring methods with a relatively long time interval. This means that the diagnosis is usually performed when the faults have already taken place, i.e., breakdown maintenance is done after the event. Under the principle that “prevention is better than cure”, it is more significant to forecast transformer faults in advance to prevent failure.

DGA data forecasting with a time label is essentially a time-series data prediction problem, where the differences are that a vector with five or six dimensions should be predicted. Recently, various predicting techniques were developed, such as the grey model [21], SVM [22] and artificial neural network (ANN) [23]. Grey theory uses an exponential law to establish the time-series data model, which is suitable for monotonously decreasing or increasing processes. However, the exponential law for the DGA data fitting may lead to errors because of the complexity of the transformer faults and its corresponding gas content in oil. SVM utilizes a structural risk minimization principle instead of experiential risk minimization, which enables it to have excellent generalization ability on small samples for binary classification problems. Although the SVM was extended to multiple classifications and regression problems [24], recently, the practicability and effectiveness of the SVM were different with specific applications.

When the training sample is enough, an artificial neural network, e.g., BPNN, is one of the most suitable methods used for nonlinear prediction. Theoretically, it can be used for fitting any nonlinear function [25]. Conventionally, a gradient descent algorithm (GDA) is the most commonly used one for standard BP network training. It is known that the training speed of GDA for a BP network is low and it is easy to converge to a local minimum point. Similarly, traditional ANN suffers from some weaknesses during the training process, such as slow convergence and falling into a local minimum easily, which affects the prediction accuracy largely. To overcome these limitations of ANN, a new training method called CSO that was proposed in [11] by Meng et al. was adopted in this study for DGA series data predictions. The CSO algorithm is a heuristic optimization algorithm that is exploited in wind speed and energy predictions [14]. This algorithm has excellent global optimization, as well as rapid convergence, which has an obvious advantage over PSO [26] or the GA [27]. Two search operators called horizontal crossover (HC) and vertical crossover (VC) play an important role to ensure the global searchability of CSO. The procedures of CSO are as follows [28]:

(1) Population initialization

Suppose that X is a randomly generated matrix with D columns and M rows; it represents the D-dimensional population that consists of M individuals.

(2) Horizontal crossover operation

Generate the randomly distributed integers from 1~M and the individuals in X are matched into M/2 pairs. Suppose X(i) and X(j) are the counterpart parent pairs. The next generation of moderation solutions is reproduced using Equation (1) [26]:

\{\begin{matrix} {MS}_{hc} (i, d) = r_{1} \cdot X (i, d) + (1 - r_{1}) \cdot X (j, d) + c_{1} \cdot (X (i, d) - X (j, d)) \\ {MS}_{hc} (j, d) = r_{2} \cdot X (j, d) + (1 - r_{2}) \cdot X (i, d) + c_{2} \cdot (X (j, d) - X (i, d)) \end{matrix}

(1)

where r₁ and r₂ are random values in [0, 1], and c₁ and c₂ are random values in [−1, 1] with uniform distribution. MS_hc(i) and MS_hc(j) are the offsprings of X(i) and X(j), respectively.

(3) Vertical crossover operation

By performing the VC operation on the d₁ and d₂ dimensions of the individual X(i), the offspring MS_vc(i) can be generated using Equation (2):

{MS}_{vc} (i, d) = r \cdot X (i, d_{1}) + (1 - r) \cdot X (i, d_{2})

(2)

The HC and VC search approaches were shown to be effective methods that had excellent global search ability. The role of VC is important because this crossover is useful for making the dimensions step out from the local minima.

In this study, the weights and thresholds of the ANN for DGA data prediction are optimized using CSO, which helps to improve the prediction accuracy. Moreover, the quadratic cost function [28] was used in this study. The components of each element of the population consist of weights and thresholds, which are stored in matrix X as a row. The number of rows of X is the number of populations. At each iteration, the horizontal crossover operation is performed according to Equation (1) to obtain the median solution MS_hc. In CSO, the new solutions should be evaluated before entering the next generation of populations. Hence, the fitness values in MS_hc are calculated via comparison with the parent population. Similarly, the fitness values in MS_vc are calculated after performing the vertical crossover operation on the update population according to Equation (2). These results can survive when they outperform their parents. Hence, it is obvious that only the best solutions can be maintained in the population during the iteration. Once the iteration is completed, the set of solutions with the best fitness is output as the weights and threshold values of the ANN network.

5. Experiment and Analysis

5.1. Characterizing Vector and Distribution of Sample Data

The volume concentrations of dissolved gases in oil differ since the transformer’s capacity and voltage level are different from one to another. To eliminate the difference, the input DGA data should be normalized. In this work, the relative portion of each gas in five characteristic gases was adopted as the input vectors, which is shown in Equation (3):

X_{i} = \frac{X_{i}^{*}}{\sum_{j = 1}^{5} X_{j}^{*}}

(3)

In Equation (3),

X_{i}

represents the percentage of volume concentration of each gas. The sample distribution of the DGA data set is shown in Table 1. The DGA data set used in this study was collected from [29], which consists of 200 DGA records.

As shown in Table 1, the faults of power transformers can be divided into five types: discharge with low energy (D1); discharge with high energy (D2); thermal fault with a low temperature, i.e., less than 700 ℃ (T1); thermal fault with a high temperature, i.e., higher than 700 ℃ (T2); and partial discharge (PD). As shown in Table 1, the collected DGA data set was divided into a training data set (with 90 records, 45% of the whole data set) and a testing data set (with 110 records, 55% of the whole data set).

In most of the studies, the training data samples were larger than the test samples. However, the failure rate of an individual transformer is rather low in practice, leading to few fault samples being collected, especially the power transformers with the same voltage level. Hence, to simulate this situation of practical transformers, different from other existing works [9], the training data set was less than the testing data set, which also helped to improve the training efficiency. The specific numbers regarding the data set information (i.e., sample distribution) are summarized in Table 1. Correspondingly, the means and variances of the DGA sample data evaluated using bootstrapping with 1000 repetitions are presented in Table 2, which shows that the contents of the fault feature gases dissolved in the transformer oil were different with distinct internal faults. When the low-energy-discharge faults occurred, e.g., partial discharges, hydrogen and ethane tended to be the main components, while in a high energy discharge, e.g., arcing, the acetylene concentration was higher than other gases. When median- or low-temperature (<700 °C) thermal faults occurred, hydrocarbons of C₂ gases (CH₄, C₂H₆, C₂H₄) tended to be high than other gases.

To observe the distribution of the DGA samples, by using principal component analysis (PCA) [30], the five dimensions of the DGA data set were reduced to two and the data distribution was observed clearly, as shown in Figure 3. The labels 1, 2, 3, 4 and 5 represent a discharge with high energy, discharge with low energy, partial discharge, low-temperature thermal fault (<700 °C) and high-temperature thermal fault (>700 °C), respectively. From the principal component analysis, it was observed that the high-energy faults of DGA samples were scattered among the low-energy samples, which means that they could not be classified easily. Furthermore, several high-temperature thermal fault (<700 °C) data points were distributed within the low-energy-discharge area, which made them difficult to distinguish. From the distribution of the DGA data sample, the thermal faults and the discharge DGA data were expected to be classified easily.

5.2. Fault Diagnosis and Comparison

To validate the effectiveness of the algorithm proposed in this study, five algorithms, i.e., single decision tree [16], random forest [31], AdaBoost classifier [32], gradient boosting tree [33] and SVM [5], were implemented for comparison. To achieve better classification accuracy, each classification algorithm used the grid-searching method [34] to find the optimal parameters. In this study, a “k-fold cross-validation” likelihood method was used to observe the stability of the model. Considering that the training and testing samples ratio was 4.5:5.5, the samples were split randomly k times (k = 5 in this study) according to the same abovementioned ratio. The training was conducted k times, and then the average accuracy of the model was calculated. The base estimators of the bagging, random forest, AdaBoost and gradient boosting tree algorithms were set as decision trees, whose numbers were set to 175, 188, 180 and 170, respectively. For the EL of bagging, the max feature and sample leaf were set to 1. The max sampling rate of every estimator was set to 0.75, and bootstrapping sampling method was used. The learning rate of AdaBoost and the gradient boosting classifier were set to 0.1. There were 110 sets of DGA samples in the testing group, and the output results of the five different algorithms (confusion matrix) [35] are listed in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8.

Overall, as shown in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8, it is notable that the classification accuracy of the six algorithms were as follows: bagging (92.7%), random forest (91.8%), decision tree (86.4%), AdaBoost (86.4%), gradient boosting tree (86.4%) and SVM (43.6%). In terms of the training samples, the classification accuracy of all these algorithms for training samples was 100%, except for SVM (56.7%) and AdaBoost (86.4%). These results indicated that the bagging and random forest algorithms outperformed a single decision tree, AdaBoost, a gradient boosting tree and an SVM. The advantage of an SVM is to solve problems with small samples [24], which allowed it to map the DGA samples into a higher dimensional space using a kernel function before performing the classification. This was the reason why the SVM was chosen as the comparison reference, while other AI algorithms, such as BPNN and DBN, which need large training samples, were not selected; these algorithms have poor performance when the training samples are fewer than the testing samples. The introduction of ensemble learning with voting into all the different base classifiers was appropriate to be applied for transformer faults diagnosis, whose training samples are usually small. Moreover, the ensemble learning of bagging has a tendency to achieve higher classification accuracy.

From Table 3 and Table 4, it can be seen that the performances of bagging and the random forest were similar. Both of their base classifiers were decision trees. The difference was their splitting strategy. However, the result showed the former performed better. Ideally, compared with bagged decision trees, the random forests should provide an improvement. The reason is that the random forest changes the subtrees’ learning algorithm so that the predictions from all of the base estimators are less correlated. The experimental results presented in Table 3 and Table 4 showed that the classification accuracies of bagging (92.7%) and random forest (91.8%) were close to each other. Compared with bagging, two transformer cases were classified wrongly, i.e., discharge with high energy was diagnosed as a high-temperature thermal fault (>700 °C), and a high-temperature thermal fault (>700 °C) was diagnosed as a low-temperature thermal fault (<700 °C). This indicated that the bagging decision tree was more capable of distinguishing a high-temperature fault from other fault types. Generally, the temperature of the transformer’s fault position cannot be changed suddenly, which means that the thermal fault of the transformer may develop from a low-temperature fault (<700 °C) to a high-temperature fault (>700 °C). For instance, during the online monitoring process, some DGA data may be collected from 690 ℃, while others may be from 710 °C, which are close to each other. Thus, the dissolved gas concentrations of these two fault types are probably similar, which indicates that the features of these two types of DGA samples for discrimination are not evident. Similarly, the discharges of high energy develop from low energy, and when the DGA samples are located near the critical value, there is an obvious difficulty in distinguishing them.

To distinguish them effectively, more DGA training samples are needed to extract the potential imperceptible features. This is the reason why the conventional approach (e.g., SVM) with few training samples will be more likely to fail to distinguish these two pairs of fault types directly, as presented in Table 8. However, with few training samples, e.g., 15 sets of low-temperature thermal faults (<700 °C) and 18 sets of high-temperature thermal faults (>700 °C), the ensemble learning of the bagging decision tree could handle this problem without mistakes. The diagnosis accuracies of the decision tree (85.4%), AdaBoost (86.4%) and gradient boosting tree (86.4%) were close to each other, which was due to them having the same ability to distinct partial discharge and thermal faults (<700 °C) from other transformer fault types, i.e., 21 cases of both fault types were discriminated.

Furthermore, it can be noticed in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8 that almost all methods involved in this comparative study rarely mistook a partial discharge for another fault type, except SVM due to the lack of enough training samples. This meant that the dissolved gas concentration was distinctive when the transformer oil was experiencing a partial discharge, which enabled the EL methods to learn this fault type better.

The overall accuracy of bagging and other methods are summarized in Table 9. Except for AdaBoost and SVM, the training classification accuracies of a decision-tree-based method were 100%. The accuracies of bagging and a random forest were close and the highest (>90%). The performance of the decision tree, gradient boosting tree and AdaBoost were almost the same (86.4%). SVM had the poorest performance under the small training sample conditions (each group only had about 20 sets).

5.3. DGA Data Prediction and Fault Diagnosis

Nowadays, offline and online monitoring are used simultaneously in an electric power company. With the development of DGA online monitoring technology, the prediction of the gases dissolved in transformer oil according to historical data can be performed. Therefore, the potential faults of a power transformer and its development trend can be detected as early as possible. The breakdown of a transformer could be prevented and the economic loss could be greatly reduced.

To verify the effectiveness of the CSO-NN for a one-step DGA series data prediction, 200 data points obtained by the online monitoring system developed by Fan et al. [36] with the same monitoring interval (1 d) were used to construct the training data set. Six points were used as the input vector, and the seventh point (one-step prediction) was taken as the output. That is to say, the gas content of the seventh day was predicted according to the data of the previous six days. Hence, the number of input neurons was set to 6, and the number of output neurons was set to 1. To construct training and testing data sets, the 1–175th data points were used as the training data, and the 176~200th data points were used as the testing data. For the forecasting accuracy evaluation, three evaluation indexes were utilized in this study: mean absolute percentage error (MAPE), mean absolute error (MAE) and root-mean-squared error (RMSE) [14].

\{\begin{matrix} MAPE = \frac{1}{N} \sum_{n = 1}^{N} |\frac{y_{h} - \hat{y_{h}}}{y_{h}}| \times 100 % \\ MAE = \frac{1}{N} \sum_{n = 1}^{N} |y_{h} - \hat{y_{h}}| \times 100 % \\ RMSE = \sqrt{\frac{1}{N} \sum_{n = 1}^{N} {(y_{h} - \hat{y_{h}})}^{2}} \end{matrix}

(4)

where

y_{h}

and

\hat{y_{h}}

represent the measured and predicted values of the gas concentrations, respectively, and N is the length of the data. All simulations were performed in Python 2.7 with PyCharm and a PC with a Core (TM) CPU (i5-6200U) with a 2.3 GHz processor and 8 GB memory and using the Windows 10 operating system. After training and testing, the comparison of the predicted and actual measured DGA data for six gases with 25 points are shown in Figure 4, Figure 5 and Figure 6. It can be seen that the trend of DGA data can be traced precisely. As shown in Figure 4, Figure 5 and Figure 6, due to the circulation flow of the transformer oil, the variation trend of the DGA data series did not always increase monotonously.

The actual fluctuation curve of the gas data was convex or concave, as shown in Figure 5 (C₂H₆). Since the DGA data series fluctuated greatly, the prediction error increased correspondingly, no matter what kind of prediction model was utilized. ANN implements the principle of empirical risk minimization [37] and has superior nonlinear mapping ability, but a traditional ANN suffers from some weaknesses of slow convergence and falls into a local minimum, which significantly affects the prediction accuracy when applied to DGA series data prediction. Due to the excellent global optimization, as well as the rapid convergence of CSO, it can be used for ANN training for DGA data forecasting. Although the CSO algorithm is a heuristic optimization algorithm that is exploited in wind speed and energy prediction [14], first, it can also be used in other data series predictions with short-term intervals. To compare the proposed method with other hybrid models of PSO-SVM, GM and ANN implemented in [37], the prediction comparisons are presented in Table 10.

The failures of the transformer often develop gradually, e.g., from a partial discharge to a high energy discharge. Furthermore, most of the faults are not a single type of failure. Although there already exist studies on the predictions of DGA data, it is impossible to determine the most appropriate candidate because the forecasting models are generally specific to the site situations. Therefore, the combinations of several methods showed advantages [14]. As presented in Table 10, all of the six gases’ prediction accuracies indicated by MAPE were less than 4%, the maximum of which was 3.512%; the prediction accuracy of the CSO-NN was superior to the PSO-SVM and GM models on the MAPE of H₂, CH₄, C₂H₄ and C₂H₆ in [37]. These results demonstrated that the forecasting results of the CSO-NN were more sensitive and accurate at identifying the variation trend of the DGA data. The comparisons of the MAE and RMSE are not listed due to the lack of data in [37].

To evaluate the effectiveness of the CSO-NN regarding DGA data prediction, except for prediction accuracy, another indicator, i.e., faults diagnosis accuracy, was used in this study, which was based on the predicted DGA data and the proposed bagging algorithm. It is worth mentioning that data prediction can only forecast the trend of each gas concentration, while fault diagnosis based on the predicted data is also important. Hence, to verify the performance of the CSO-NN model for fault prediction, a 220 kV transformer was monitored using the online monitoring device from 11 October 2018, to 15 October 2018, and five DGA records with one-day fault intervals were obtained. The historical series data were restructured into a 194-group training data set, which contained six input data points and one output data point. Five data set groups were used as the test data set. The forecasted results are shown in Table 11.

As shown in this table, the transformer diagnosis results based on the prediction results and bagging algorithm were consistent with the measured results, indicating that the developed CSO-NN effectively forecasted the trend of the DGA data during the fault process. During the maintenance of this transformer, the insulation breakdown between winding layers was found, showing the effectiveness of the proposed EL-CSO-NN diagnosis and forecasting algorithm.

6. Conclusions

In this study, by considering the low occurrence rate of transformer faults, the DGA fault samples collected using the online DGA monitor are usually limited, leading to the inadaptability of most AI algorithms due to their demand for large training samples. To address this problem, the EL-CSO-NN was proposed and applied to the classification of transformer faults with small training samples and the short-term prediction of DGA data. Based on a small sample database, the following conclusions were obtained:

The EL of the bagging algorithm was proposed and applied to transformer fault diagnosis for the first time. The experiment showed that this method can obviate the limitations of conventional AI methods using DGA due to the small and unbalanced distribution of transformer fault samples.
For the condition prediction of power transformers, the CSO-NN circumvented the limitation of ANN, i.e., easily becoming stuck in local minima, with a BP learning algorithm. This algorithm was utilized for DGA series data prediction and significantly improved the prediction ability of the ANN on the DGA dataset.
The predicted data of the CSO-NN served as input samples into the EL of bagging for diagnosing. The experimental results showed that the EL-CSO-NN model could predict transformer faults effectively, which is beneficial for the maintenance of transformers in advance and can prevent failure from aggravation.

Author Contributions

Conceptualization, J.F. and H.S.; methodology, A.M. and H.Y.; software, L.F. and J.C.; validation, J.F., H.S., C.Y., L.F., J.C., A.M. and H.Y.; formal analysis, H.S.; investigation, Y.C.; resources, A.M.; data curation, J.C.; writing—original draft preparation, Y.C.; writing—review and editing, Y.C.; visualization, L.F.; supervision, J.F.; project administration, J.F.; funding acquisition, J.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number 62073084 and 61876040. And the APC was funded by 62073084.

Conflicts of Interest

The authors declare no conflict of interest.

References

Emara, M.M.; Peppas, G.D.; Gonos, I.F. Two Graphical Shapes Based on DGA for Power Transformer Fault Types Discrimination. IEEE Trans. Dielectr. Electr. Insul. 2021, 28, 981–987. [Google Scholar] [CrossRef]
Wang, L.; Littler, T.; Liu, X. Gaussian Process Multi-Class Classification for Transformer Fault Diagnosis Using Dissolved Gas Analysis. IEEE Trans. Dielectr. Electr. Insul. 2021, 28, 1703–1712. [Google Scholar] [CrossRef]
Duval, M. A review of faults detectable by gas-in-oil analysis in transformers. IEEE Electr. Insul. Mag. 2002, 18, 8–17. [Google Scholar] [CrossRef] [Green Version]
Rao, U.M.; Fofana, I.; Rajesh, K.N.V.P.S.; Picher, P. Identification and Application of Machine Learning Algorithms for Transformer Dissolved Gas Analysis. IEEE Trans. Dielectr. Electr. Insul. 2021, 28, 1828–1835. [Google Scholar] [CrossRef]
Li, J.; Zhang, Q.; Wang, K.; Wang, J.; Zhou, T.; Zhang, Y. Optimal dissolved gas ratios selected by genetic algorithm for power transformer fault diagnosis based on support vector machine. IEEE Trans. Dielectr. Electr. Insul. 2016, 23, 1198–1206. [Google Scholar] [CrossRef]
Corinna, C.; Vapnik, N.V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar]
Werbos, P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences; Harvard University: Boston, MA, USA, 1974. [Google Scholar]
Naresh, R.; Sharma, V.; Vashisth, M. An Integrated Neural Fuzzy Approach for Fault Diagnosis of Transformers. IEEE Trans. Power Deliv. 2008, 23, 2017–2024. [Google Scholar] [CrossRef]
Dai, J.; Song, H.; Sheng, G.; Jiang, X. Dissolved gas analysis of insulating oil for power transformer fault diagnosis with deep belief network. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 2828–2835. [Google Scholar] [CrossRef]
Fan, J.; Fu, C.; Yin, H.; Wang, Y.; Jiang, Q. Power transformer condition assessment based on online monitor with SOFC chromatographic detector. Int. J. Electr. Power Energy Syst. 2019, 118, 105805. [Google Scholar] [CrossRef]
Meng, A.; Mei, P.; Yin, H.; Peng, X.; Guo, Z. Crisscross optimization algorithm for solving combined heat and power economic dispatch problem. Energy Convers. Manag. 2015, 105, 1303–1317. [Google Scholar] [CrossRef]
Peng, X.; Lin, L.; Zheng, W.; Liu, Y. Crisscross Optimization Algorithm and Monte Carlo Simulation for Solving Optimal Distributed Generation Allocation Problem. Energies 2015, 8, 13641–13659. [Google Scholar] [CrossRef]
Wang, Y.; Zhu, K.; Sun, M.; Deng, Y. An Ensemble Learning Approach for Fault Diagnosis in Self-Organizing Heterogeneous Networks. IEEE Access 2019, 7, 125662–125675. [Google Scholar] [CrossRef]
Meng, A.; Ge, J.; Yin, H.; Chen, S. Wind speed forecasting based on wavelet packet decomposition and artificial neural networks trained by crisscross optimization algorithm. Energy Convers. Manag. 2016, 114, 75–88. [Google Scholar] [CrossRef]
Fan, J.; Wang, F.; Sun, Q.; Bin, F.; Ding, J.; Ye, H. SOFC detector for portable gas chromatography: High-sensitivity detection of dissolved gases in transformer oil. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 2854–2863. [Google Scholar] [CrossRef]
Suarez, A.; Lutsko, J. Globally optimal fuzzy decision trees for classification and regression. IEEE Trans. Pattern Anal. Mach. Intell. 1999, 21, 1297–1311. [Google Scholar] [CrossRef] [Green Version]
John, D.; Cesar, M. A Bayesian Model of Voting in Juries. Games Econ. Behav. 2001, 37, 259–294. [Google Scholar]
Zoubir, A.; Boashash, B. The bootstrap and its application in signal processing. IEEE Signal Process. Mag. 1998, 15, 56–76. [Google Scholar] [CrossRef]
IEEE guide for the Interpretation of Gases Generated in Oil-Immersed Transformers. In IEEE Std C57.104-2008; IEEE: Piscataway, NJ, USA, 2009.
Senoussaoui, M.E.A.; Brahami, M.; Fofana, I. Combining and comparing various machine-learning algorithms to improve dissolved gas analysis interpretation. IET Gener. Transm. Distrib. 2018, 12, 3673–3679. [Google Scholar] [CrossRef]
Wang, M.; Hung, C. Novel grey model for the prediction of trend of dissolved gases in oil-filled power apparatus. Electr. Power Syst. Res. 2003, 67, 53–58. [Google Scholar] [CrossRef]
Kari, T.; Gao, W.; Zhao, D.; Abiderexiti, K.; Mo, W.; Wang, Y.; Luan, L. Hybrid feature selection approach for power transformer fault diagnosis based on support vector machine and genetic algorithm. IET Gener. Transm. Distrib. 2018, 12, 5672–5680. [Google Scholar] [CrossRef]
Al-Janabi, S.; Rawat, S.; Patel, A.; Al-Shourbaji, I. Design and evaluation of a hybrid system for detection and prediction of faults in electrical transformers. Int. J. Electr. Power Energy Syst. 2015, 67, 324–335. [Google Scholar] [CrossRef]
Zheng, H.; Zhang, Y.; Liu, J.; Wei, H.; Zhao, J.; Liao, R. A novel model based on wavelet LS-SVM integrated improved PSO algorithm for forecasting of dissolved gas contents in power transformers. Electr. Power Syst. Res. 2018, 155, 196–205. [Google Scholar] [CrossRef]
Zhang, L.; Wu, K.; Zhong, Y.; Li, P. A new sub-pixel mapping algorithm based on a BP neural network with an observation model. Neurocomputing 2008, 71, 2046–2054. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle Swarm Optimization. In Proceedings of the IEEE International Conference on Neural Networks, IV, Perth, WA, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
Lorenzo, B.; Glisic, S. Optimal Routing and Traffic Scheduling for Multihop Cellular Networks Using Genetic Algorithm. IEEE Trans. Mob. Comput. 2012, 12, 2274–2288. [Google Scholar] [CrossRef]
Meng, A.-B.; Chen, Y.-C.; Yin, H.; Chen, S.-Z. Crisscross optimization algorithm and its application. Knowledge-Based Syst. 2014, 67, 218–229. [Google Scholar] [CrossRef]
Fan, J.; Wang, F.; Sun, Q.; Bin, F.; Liang, F.; Xiao, X. Hybrid RVM–ANFIS algorithm for transformer fault diagnosis. IET Gener. Transm. Distrib. 2017, 11, 3637–3643. [Google Scholar] [CrossRef]
Licciardi, G.; Avezzano, R.G.; Del Frate, F.; Schiavon, G.; Chanussot, J. A novel approach to polarimetric SAR data processing based on Nonlinear PCA. Pattern Recognit. 2014, 47, 1953–1967. [Google Scholar] [CrossRef]
Chai, Z.; Zhao, C. Multiclass Oblique Random Forests With Dual-Incremental Learning Capacity. IEEE Trans. Neural Networks Learn. Syst. 2020, 31, 5192–5203. [Google Scholar] [CrossRef]
Wu, S.; Nagahashi, H. Parameterized AdaBoost: Introducing a Parameter to Speed Up the Training of Real AdaBoost. IEEE Signal Process. Lett. 2014, 21, 687–691. [Google Scholar] [CrossRef]
Zhang, Z.; Jung, C. GBDT-MO: Gradient-Boosted Decision Trees for Multiple Outputs. IEEE Trans. Neural Networks Learn. Syst. 2020, 32, 3156–3167. [Google Scholar] [CrossRef]
Ma, H.; Ekanayake, C.; Saha, T. Power transformer fault diagnosis under measurement originated uncertainties. IEEE Trans. Dielectr. Electr. Insul. 2012, 19, 1982–1990. [Google Scholar] [CrossRef] [Green Version]
Ohsaki, M.; Wang, P.; Matsuda, K.; Katagiri, S.; Watanabe, H.; Ralescu, A. Confusion-Matrix-Based Kernel Logistic Regression for Imbalanced Data Classification. IEEE Trans. Knowl. Data Eng. 2017, 29, 1806–1819. [Google Scholar] [CrossRef]
Fan, J.; Wang, F.; Sun, Q.; Bin, F.; Ye, H.; Liu, Y. An Online Monitoring System for Oil Immersed Power Transformer Based on SnO₂ GC Detector with a New Quantification Approach. IEEE Sensors J. 2017, 17, 6662–6671. [Google Scholar] [CrossRef]
Fei, S.-W.; Wang, M.-J.; Miao, Y.-B.; Tu, J.; Liu, C.-L. Particle swarm optimization-based support vector machine for forecasting dissolved gases content in power transformer oil. Energy Convers. Manag. 2009, 50, 1604–1609. [Google Scholar] [CrossRef]

Figure 1. Framework of the DGA monitoring system.

Figure 2. Structure of EL for transformer diagnosis based on DGA.

Figure 3. PCA of sample data (200 sets, in percent).

Figure 4. The predicted vs. measured DGA data of H₂ and CH₄.

Figure 5. The predicted vs. measured DGA data of C₂H₄ and C₂H₆.

Figure 6. The predicted vs. measured DGA data of C₂H₂ and CO.

Table 1. Sample distribution of DGA data.

Fault Type	H₂		CH₄		C₂H₆		C₂H₄		C₂H₂
Fault Type	Training	Test	Training	Test	Training	Test	Training	Test	Training	Test
D1	18	22	18	22	18	22	18	22	18	22
D2	21	19	21	19	21	19	21	19	21	19
PD	18	22	18	22	18	22	18	22	18	22
T1	15	25	15	25	15	25	15	25	15	25
T2	18	22	18	22	18	22	18	22	18	22
Total sets	90	110	90	110	90	110	90	110	90	110

Table 2. Means (

μ

) and variances (

σ

) of DGA data evaluated using bootstrapping with 1000 repetitions (in percentages).

Table 2. Means (

μ

) and variances (

σ

) of DGA data evaluated using bootstrapping with 1000 repetitions (in percentages).

Fault Type	H₂		CH₄		C₂H₆		C₂H₄		C₂H₂
Fault Type	$μ$	$σ$	$μ$	$σ$	$μ$	$σ$	$μ$	$σ$	$μ$	$σ$
D1	0.32	0.22	0.16	0.07	0.15	0.14	0.13	0.08	0.23	0.13
D2	0.41	0.18	0.15	0.06	0.04	0.06	0.23	0.12	0.17	0.11
PD	0.71	0.25	0.08	0.07	0.16	0.19	0.04	0.09	0.01	0.01
T1	0.13	0.10	0.32	0.11	0.20	0.08	0.35	0.13	0	0.01
T2	0.13	0.12	0.22	0.08	0.10	0.08	0.52	0.17	0.03	0.10

Table 3. Confusion matrix of the bagging output.

	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Actual Fault Type	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Discharge with high energy	17	1	1	0	0
Discharge with low energy	0	21	1	0	0
Partial discharge	0	0	22	0	0
Thermal fault (<700 °C)	0	1	0	22	2
Thermal fault (>700 °C)	0	0	1	1	20

Table 4. Confusion matrix of the random forest classifier.

	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Actual Fault Type	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Discharge with high energy	17	0	1	0	1
Discharge with low energy	0	21	1	0	0
Partial discharge	0	0	22	0	0
Thermal fault (<700 °C)	1	0	0	22	2
Thermal fault (>700 °C)	0	0	1	2	19

Table 5. Confusion matrix of the decision tree.

	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Actual Fault Type	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Discharge with high energy	14	0	1	0	4
Discharge with low energy	2	19	1	0	0
Partial discharge	1	0	21	0	0
Thermal fault (<700 °C)	0	1	0	21	3
Thermal fault (>700 °C)	0	1	1	2	18

Table 6. Confusion matrix of the AdaBoost classifier.

	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Actual Fault Type	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Discharge with high energy	15	0	1	0	3
Discharge with low energy	2	19	1	0	0
Partial discharge	1	0	21	0	0
Thermal fault (<700 °C)	0	1	0	21	3
Thermal fault (>700 °C)	0	1	2	1	19

Table 7. Confusion matrix of GBCT.

	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Actual Fault Type	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Discharge with high energy	15	1	2	0	1
Discharge with low energy	1	19	1	1	0
Partial discharge	0	1	21	0	0
Thermal fault (<700 °C)	1	1	0	21	2
Thermal fault (>700 °C)	0	0	2	1	19

Table 8. Confusion matrix of SVM.

	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Actual Fault Type	Discharge with High Energy	Discharge with Low Energy	Partial Discharge	Thermal Fault (<700 °C)	Thermal Fault (>700 °C)
Discharge with high energy	12	0	5	0	2
Discharge with low energy	20	0	2	0	0
Partial discharge	6	0	16	0	0
Thermal fault (<700 °C)	4	0	0	0	21
Thermal fault (>700 °C)	1	0	1	0	20

Table 9. Diagnosing accuracy of each model.

Method	EL of Bagging	Random Forest	Decision Tree	GBCT	AdaBoost	SVM
Training	100%	100%	100%	100%	100%	56.7%
Test	92.7%	91.8%	86.4%	86.4%	86.4%	43.6%

Table 10. Results of the errors for the one-step predictions shown in Figure 4, Figure 5 and Figure 6.

Gas	H₂	C₂H₂	CH₄	C₂H₄	C₂H₆	CO
MAPE (%)	3.512	2.011	0.078	1.603	1.711	1.702
Ref. [37]	6.20	---	2.8306	7.5719	3.7727	---
MAE (ppm)	8.864	0.827	0.341	0.430	0.150	11.177
RMSE (ppm)	5.606	0.508	0.058	0.269	0.086	7.284

Table 11. Predicted and measured data and corresponding fault types.

		H₂	CH₄	C₂H₄	C₂H₆	C₂H₂	Fault Type
No. 1	Predicted data	21.15	5.58	1.46	5.09	0.01	Normal	Agree
No. 1	Measured data	39.60	6.10	1.40	5.00	0.00	Normal	Agree
No. 2	Predicted data	61.80	7.79	1.86	5.94	0.30	D2	Agree
No. 2	Measured data	64.50	8.00	1.60	5.80	0.30	D2	Agree
No. 3	Predicted data	66.89	7.88	1.89	6.10	0.29	D2	Agree
No. 3	Measured data	69.10	8.60	1.80	6.50	0.30	D2	Agree
No. 4	Predicted data	66.03	8.01	1.73	5.91	0.32	D2	Agree
No. 4	Measured data	68.90	8.20	1.80	6.10	0.30	D2	Agree
No. 5	Predicted data	69.08	8.48	1.91	6.29	0.31	D2	Agree
No. 5	Measured data	73.70	9.80	2.20	7.20	0.40	D2	Agree

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fan, J.; Shao, H.; Cao, Y.; Feng, L.; Chen, J.; Meng, A.; Yin, H. Condition Forecasting of a Power Transformer Based on an Online Monitor with EL-CSO-ANN. Energies 2022, 15, 8587. https://doi.org/10.3390/en15228587

AMA Style

Fan J, Shao H, Cao Y, Feng L, Chen J, Meng A, Yin H. Condition Forecasting of a Power Transformer Based on an Online Monitor with EL-CSO-ANN. Energies. 2022; 15(22):8587. https://doi.org/10.3390/en15228587

Chicago/Turabian Style

Fan, Jingmin, Huidong Shao, Yunfei Cao, Lutao Feng, Jianpei Chen, Anbo Meng, and Hao Yin. 2022. "Condition Forecasting of a Power Transformer Based on an Online Monitor with EL-CSO-ANN" Energies 15, no. 22: 8587. https://doi.org/10.3390/en15228587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Condition Forecasting of a Power Transformer Based on an Online Monitor with EL-CSO-ANN

Abstract

1. Introduction

2. Framework of the Online DGA System

3. Ensemble Learning for Transformer Fault Diagnosis

3.1. Bagging Algorithm

3.2. Structure of Fault Diagnosis

4. DGA Data Prediction with CSO-ANN

5. Experiment and Analysis

5.1. Characterizing Vector and Distribution of Sample Data

5.2. Fault Diagnosis and Comparison

5.3. DGA Data Prediction and Fault Diagnosis

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI