A Transformer Fault Diagnosis Model Based on Chemical Reaction Optimization and Twin Support Vector Machine

Yuan, Fang; Guo, Jiang; Xiao, Zhihuai; Zeng, Bing; Zhu, Wenqiang; Huang, Sixu

doi:10.3390/en12050960

Open AccessArticle

A Transformer Fault Diagnosis Model Based on Chemical Reaction Optimization and Twin Support Vector Machine

by

Fang Yuan

^1,2,

Jiang Guo

^1,2,*,

Zhihuai Xiao

²,

Bing Zeng

^1,2

,

Wenqiang Zhu

^1,2 and

Sixu Huang

^1,2

¹

Intelligent Power Equipment Technology Research Center, Wuhan University, Wuhan 430072, China

²

College of Power & Mechanical Engineering, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

Energies 2019, 12(5), 960; https://doi.org/10.3390/en12050960

Submission received: 2 January 2019 / Revised: 5 March 2019 / Accepted: 7 March 2019 / Published: 12 March 2019

(This article belongs to the Section A1: Smart Grids and Microgrids)

Download

Browse Figures

Versions Notes

Abstract

:

The condition monitoring and fault diagnosis of power transformers plays a significant role in the safe, stable and reliable operation of the whole power system. Dissolved gas analysis (DGA) methods are widely used for fault diagnosis, however, their accuracy is limited by the selection of DGA features and the performance of fault diagnosis models, for example, the classical support vector machine (SVM), is easily affected by unbalanced training samples. This paper presents a transformer fault diagnosis model based on chemical reaction optimization and a twin support vector machine. Twin support vector machines (TWSVMs) are used as classifiers for solving problems involving unbalanced and insufficient samples. Restricted Boltzmann machines (RBMs) are used for data preprocessing to ensure the effective identification of feature parameters and improve the efficiency and accuracy of fault diagnosis. The chemical reaction optimization (CRO) algorithm is used to optimize TWSVM parameters to select the optimal training parameters. The cross-validation (CV) method is used to ensure the reliability and generalization ability of the diagnostic model. Finally, the validity of the model is verified using real fault samples and random testing.

Keywords:

transformer; fault diagnosis; dissolved gas analysis; twin support vector machines; chemical reaction optimization algorithm; restricted Boltzmann machine

1. Introduction

With the development of smart grids in power system construction, improving the real-time diagnosis and analysis of the equipment involved represents an urgent technical challenge. The reliability of power transformers, which are critical core equipment in power transmission and distribution systems, dictates the safe and reliable performance of the whole electrical system. Thus, timely discovery of early potential transformer faults is very important and condition monitoring, as well as fault detection, have gradually attracted more and more attention of both the domestic and foreign research communities [1,2,3].

The composition, content, and proportion of dissolved gas in the oil of a power transformer are closely related to the fault type and fault degree of the transformer, which effectively reflect the operation state of a transformer. Dissolved gas analysis (DGA) is the most convenient and effective method available for early potential fault diagnosis of oil-immersed transformers because it is an accurate and reliable method to find potential faults inside a transformer of this type. The improved/new three-ratio (IEC three-ratio) and Dornerburg method are effective methods for oil-immersed transformer fault diagnosis, being easily implemented and widely used. These traditional DGA methods mainly use operational experience and expert knowledge to establish the diagnosis rules of "code-fault type" based on the ratio (or content) of dissolved gas and fault types, but these methods suffer some drawbacks in their engineering applications, such as their code boundaries being excessively absolute and faults not being completely covered by the defined codes [4,5].

With the development of artificial intelligence (AI), machine learning (ML), data mining (DM), and other theories and technologies, the data-driven intelligent model has been applied to transformer fault diagnosis. The use of the artificial neural network (ANN) [6,7], expert system, Bayes network, support vector machine (SVM) [8,9], random forest (RF), and other theories and methods have provided new ideas for transformer fault diagnosis. However, all AI diagnostic methods have some limitations. For example, ANN needs to be prevented from overfitting and is susceptible to local extrema, the completeness of an expert system knowledge base cannot be guaranteed, a Bayes network needs a large number of sample data, and matter-element theory has strict requirements on sample consistency and sample size.

The input parameters of AI fault diagnosis methods are mainly derived from the gas ratio or content of DGA methods recommended by the International Electrotechnical Commission (IEC) and Institute of Electrical and Electronics Engineers (IEEE), but some important features are neglected by the input parameters of these methods and the feature sets need to be improved. Normal and fault states of transformers cannot be distinguished by these methods, which limits their application in online monitoring and fault diagnosis. At the same time, in the actual work of operating and maintaining transformers, fault samples and data are difficult to collect and the types of fault are diverse, so it is difficult to assemble massive and complete fault samples. Meanwhile, the collected fault samples are often unbalanced, therefore, the application in practical work of AI diagnostic methods has been limited by these conditions.

SVM has a high generalization ability, no neural network overfitting, slow convergence, and can be easily affected by local extrema. Therefore, SVM is suitable for nonlinear, local minimum point, and small sample pattern recognition, and other practical problems. However, SVM is easily affected by unbalanced samples [10,11,12]. To overcome this disadvantage, researchers have introduced many modification methods, and twin support vector machine (TWSVM) is the most commonly applied among them. TWSVM is used as the classifier to address the potential unbalanced sample issue associated with SVMs and to increase the SVM training speed. TWSVM can apply different penalty factors to the two categories of samples if the samples are not balanced.

In this study, we propose a transformer fault diagnosis model based on chemical reaction optimization and a TWSVM based on the status of the power industry and the characteristics of transformer fault diagnosis. A variety of DGA fault diagnosis feature parameters are fused as the input parameters of the model to solve the problems of too absolute coding boundary and incomplete coding coverage of faults. TWSVMs are used as classifiers for solving the problems of unbalanced and insufficient samples. The restricted Boltzmann machine (RBM) is used for data preprocessing to ensure the effective identification of feature parameters and improve the efficiency and accuracy of fault diagnosis. A chemical reaction optimization (CRO) algorithm is used to optimize TWSVM parameters (penalty factors and kernel parameter) to select the optimal training parameters. We used cross-validation (CV) to ensure the reliability and generalization ability of the diagnostic model. Finally, the validity of the model was verified using real fault samples and random testing.

The remainder of this paper is organized as follows: Section 2 outlines related work on transformer fault diagnosis techniques. Section 3 covers fundaments of CRO-TWSVM model. The transformer fault diagnosis model based on CRO-TWSVM is presented in Section 4. Examples of the transformer fault diagnosis model are given in Section 5. Finally, conclusions are drawn and potential future work is discussed is Section 6.

2. Related Work

With the rapid development of computer technology and artificial intelligence (AI) theory, machine learning-based techniques and data-driven modeling methods, including artificial neural network (ANN) [13,14,15,16,17], fuzzy theory [18,19], expert system (EPS) [20], rough sets theory (RST) [21], and other intelligent diagnosis methods [22,23,24,25,26,27,28,29,30,31] such as random forest (RF), gradient boosting decision tree (GBDT), deep belief network (DBN), support vector machine (SVM) and evidential reasoning approach, have been introduced to the research field of transformer fault diagnosis based on the DGA approach. These intelligent methods make up for the deficiencies of the mentioned traditional DGA methods, and directly or indirectly improve the accuracy of transformer fault diagnosis, and provide a new train of thought for high-precision transformer fault diagnosis.

Guardado et al [15] demonstrate that IEC based BP neural network can acquire a higher accuracy rate than other DGA methods in fault detection. Yang et al [16] proposed a multi-level BP neural network fault diagnosis model for transformer. Wang et al. [17] present a novel method for power transformer fault diagnosis based on probability neural network (PNN) and dissolved gas analysis, they used a new hybrid evolutionary algorithm combined with a particle swarm optimization (PSO) algorithm and back propagation (BP) algorithm to optimize the parameters of PNN. Illias et al. [7] proposed an improved hybrid modified evolutionary particle swarm optimizer (PSO) time varying acceleration coefficient-ANN for power transformer fault diagnosis. Rigatos et al [23] proposed neural modeling and local statistical approach to fault diagnosis for the detection of incipient faults in power transformers, which can detect transformer failures at their early stages and consequently can avoid critical conditions for the power grid, furthermore, the random forest technique-based fault discrimination scheme [25] for fault diagnosis of power transformers, as well as the multi-layer perceptron (MLP) neural network-based decision [30], and have been proposed consecutively. The authors of [18] proposed a deep belief network (DBN) approach to predict transformer concentrations. They achieved good prediction accuracy, but their method ignores the relationship between the individual gas components.

Through the optimization of different ANN approaches, the above methods to some extent solve the problems that the neural network is easy to overfit, has slow convergence speed, easily gets into local minima, and achieve good results, but the premise is the availability of a large number of samples as support. In the same way, other ML methods, such as RF, DBN also need a large number of samples. However, in the actual work of transformer operation and maintenance, fault samples and data are difficult to collect and the types of fault are diverse, so it is difficult to form massive and complete fault samples.

SVM, proposed by Vapnik, is a machine learning method developed from statistical theory. SVM adopts the principle of structural risk minimization and is suitable for training and classification of small samples. The authors of [32,33,34,35,36,37,38] proposed support vector machine (SVM)-based intelligent fault classification approaches for power transformer DGA. Zhang et al. [37] developed a useful approach combing the wavelet technique with a least squares support vector machine (LS-SVM) based on particle swarm optimization (PSO) with mutation for forecasting of dissolved in the power transformer. There are also many research efforts focusing on fault detection for another equipment and systems. Li et al. [38] presented an intelligent method for the fault diagnosis of power transformers based on selected gas ratios and SVM. They used a genetic algorithm (GA) to obtain the optimal dissolved gas ratios (ODGR) which is used for the DGA ratio selection and SVM parameters optimization. Liu et al. [35] presented a fault classification approach for power transformer using DGA and SVM algorithm. They used training data to build a multi-layer SVM classifier. Such a classifier has a good performance in identifying the transformer fault types.

The mentioned intelligent approaches have improved the conventional DGA-based transformer fault diagnosis methods, and directly or indirectly improved the accuracy of fault diagnosis for the power transformers. However, these studies tend to focus on model selection and optimization, and the problem of unbalanced samples in the case of small samples is not well solved. Meanwhile, there are deficiencies in the training methods and data preprocessing methods. These problems limit the practical application of AI algorithm or model in transformer fault diagnosis. Therefore, in this paper, the CRO-TWSVM model is proposed to solve these problems.

3. Fundaments of CRO-TWSVM Model

3.1. Restricted Boltzmann Machine

In 1986, on the basis of the Boltzmann machine, Smolensky proposed a stochastic neural network model named Restricted Boltzmann Machines (RBM) [39]. An RBM network is composed of a hidden layer and a visible layer. The visible layer is composed of visible units, which can be used to receive the input of transformer characteristic parameters, and the hidden layer is composed of hidden units, which can be used to extract the deep feature vectors. The units on the same layer are not connected, and the units on different layers are connected to each other.

The state of each RBM unit is random, and 1 or 0 is used to indicate whether the unit is active or not, and the state is determined by energy probability statistics method.

Suppose RBM has n visible units and m hidden units, and the bias weights of visible layer are defined as a = {a₁,a₂,…,a_n}, the bias weight of hidden layer are defined as b = {b₁,b₂,…,b_m}, the weights of connection between hidden units and visible units are defined as ω = {w₁₁,w₁₂,…,w_nm}, then the RBM’s parameters can be defined as θ = {ω,a,b}. Given these, the energy of a configuration of RBM is defined as:

E (v, h | θ) = - \sum_{i = 1}^{n} a_{i} v_{i} - \sum_{j = 1}^{m} b_{i} h_{i} - \sum_{i = 1}^{n} \sum_{j = 1}^{m} ω_{i j} v_{i}^{} h_{j}

(1)

where: ν_i is the state of the i-th visible unit, a_i is the bias weight of i-th visible unit, ν = {ν₁,ν₂,…,ν_n} is the state of the visible layer, h_j is the state of j-th hidden unit, b_iis the bias weight of j-th hidden unit, h = {h₁,h₂,…,h_m} is the state of hidden layer, w_ij is the weight of connection between the i-th visible unit and the j-th hidden unit.

The activated probability of i-th visible unit and the j-th hidden unit can be obtained by the following equation:

\begin{array}{l} p (v_{i} = 1 | h) = σ (a_{i} + \sum_{j = 1}^{m} h_{j} w_{i j}) \\ p (h_{j} = 1 | v) = σ (b_{i} + \sum_{i = 1}^{n} v_{j} w_{i j}) \end{array}

(2)

where:

σ (x) = \frac{1}{1 + e^{- x}}

(3)

and the probability that RBM in the state of (ν,h) can be obtained by the following equation:

p (v, h | θ) = \frac{e^{- E (v, h | θ)}}{\sum_{v, h} e^{^{- E (v, h)}}}

(4)

Suppose the total number of training samples is Q, the i-th sample can be defined as

ν^{q} = {ν_{1}^{q}, ν_{2}^{q}, \dots, ν_{n}^{q}}

. The likelihood function of Equation (5) can be maximized through training, and the parameters of RBM θ = {ω,a,b} can be obtained.

L (θ) = \prod_{q = 1}^{Q} P (v^{q})

(5)

By using the Contrastive Divergence (CD) algorithm proposed by Hindon, the parameters of RBM can be described as Equation (6):

\begin{array}{l} w_{i j} (t + 1) = w_{i j} (t) + ε ({(v_{i} h_{j})}_{d a t a} - {(v_{i} h_{j})}_{r e c o n}) \\ a_{i} (t + 1) = a_{i} (t) + ε ({(v_{i})}_{d a t a}^{} - {(v_{i})}_{r e c o n}) \\ b_{j} (t + 1) = b_{j} (t) + ε ({(h_{j})}_{d a t a}^{} - {(h_{j})}_{r e c o n}) \end{array}

(6)

where, [□]_data is the expected value of the given data, [□]_recon is the mathematical expectation of the reconstructed model, ε is the learning rate for the parameters of the RBM.

Studies have shown that RBM can improve the recognition accuracy and training speed of classification model through data preprocessing and the RBM preprocessing method proposed for Finger Motion Estimation by Mousas [40] works well.

Based on the aforementioned process, RBM is used to preprocess the transformer diagnosis features. The DGA parameters of transformer are set as the input elements of visible layer, the hidden layer stands for a non-linear transformed feature space, and the hidden layer’s units, as a hyper-parameter, define the dimensionality of the new feature space. Finally, the output vector preprocessed by RBM is used as the input of the TWSVM in the following section.

3.2. Twin Support Vector Machine

SVM, proposed by Vapnik, is a machine learning method developed from statistical theory. SVM adopts the principle of structural risk minimization to classify data by constructing the optimal hyperplane. It can effectively solve small sample, non-linear, and high dimension classification problems, and has the advantages of fast calculation speed and strong generalization ability, so it has been widely examined and applied [9,33,34].

Given a set of data

T = {x_{i}, y_{i}}_{i = 1}^{m}

, where X_i ∈ Rⁿ denotes the input vectors, y_i ∈ {+1,–1} is two classes of output, and m is the sample number. The sample set can be constructed into the following classification planes:

f (x) = s i g n (ω x + b)

(7)

where ω denotes the weight vector and b denotes the bias term. ω and b are used to define the position of the separating hyperplane.

The problem of seeking the optimal classification hyperplane can be transformed into a constrained quadratic optimization problem by comprehensively considering structural risk minimization criteria, regularization terms, and fitting errors. The optimal hyperplane separating the data can be obtained by the following equation:

\min J (ω, ξ) = \frac{1}{2} | | ω {| |}^{2} + C \sum_{i = 1}^{m} ξ_{i}

(8)

where C is the penalty factor and

ξ_{i}

is the slack factor. Positive slack variables

ξ_{i}

are introduced to measure the distance between the margin and the vectors x_i that lie on the wrong side of the margin. The Lagrange coefficient method is used. By introducing the Lagrangian multiplier α_i, the optimal hyperplane is:

f (x) = s i g n (\sum_{i, j = 1}^{m} α_{i} y_{i} (x_{i}, x_{j}) + b)

(9)

SVM can also be used in nonlinear classification by using the kernel function. Using the nonlinear mapping function

φ (\cdot)

, the original data x are mapped into a high-dimensional feature space, where the linear classification is possible. Then, the nonlinear decision function is:

f (x) = s i g n (\sum_{i, j = 1}^{m} α_{i} y_{i} K (x_{i}, x_{j}) + b)

(10)

where

K (x_{i}, x_{j})

is the kernel function,

K (x_{i}, x_{j}) = φ (x_{i}) φ (x_{j})

.

TWSVMs generate two unparalleled classification hyperplanes by converting one quadratic programming problem (QPP) of a classical SVM into two QPPs, but of a smaller size.

The basic idea of TWSVM is to construct two unparalleled hyperplanes in an n-dimension space:

\begin{array}{l} K {(x}^{'} {, c}^{'}) ω^{(1)} + b^{(1)} = 0 \\ K {(x}^{'} {, c}^{'}) ω^{(2)} + b^{(2)} = 0 \end{array}

(11)

Each hyperplane occurs such that the samples are as close as possible to the category to which they belong and as far as possible from the categories to which other samples belong. The TWSVM solves the following two quadratic optimization problems:

\begin{array}{l} \min_{w^{(1)} . b^{(1)}} \frac{1}{2} {‖ K (A, C^{'}) ω^{(1)} + e_{1} b^{(1)} ‖}^{2} + c_{1}^{} e^{'}_{2} q \\ s . t . - (K (B, C^{'}) ω^{(1)} + e 2 b^{(1)}) + q \geq e_{2} \\ \min_{w^{(2)} . b^{(2)}} \frac{1}{2} {‖ K (A, C^{'}) ω^{(1)} + e_{1} b^{(1)} ‖}^{2} + c_{1}^{} e^{'}_{2} q \\ s . t . - (K (B, C^{'}) ω^{(1)} + e 2 b^{(1)}) + q \geq e_{1} \end{array}

(12)

where K represents the kernel function; A and B represent

m_{1}

positive category samples and

m_{2}

negative category samples, respectively;

e_{1}

and

e_{2}

are unit vectors with the same number of dimensions with kernel function

K (A, C)

and

K (B, C)

, respectively;

c_{1}^{}

and

c_{2}^{}

are penalty factors;

ω

and

b

are the normal vector and the offset of the optimum hyperplane, respectively; and q is the slack factor.

Each of the hyperplanes corresponds to a sample category. The category of a sample depends on the distance between the sample and the hyperplane, with the decision function being:

K (x^{T}, C^{T}) w_{r} {+ b}_{r} = \min_{l = 1, 2} | K (x^{T}, C^{T}) w_{l} + b_{l} |

(13)

For a binary classification problem, the space complexity of a standard SVM is O(m³), where m is the number of samples. Supposing each category contains m/2 samples, the space complexity would be

O (2 \times {(\frac{m}{2})}^{3})

if two QPPs are considered. Thus, TWSVM has a space complexity 1/4 that of SVM. With the two penalty factors c₁ and c₂ built into TWSVM, it is also possible to apply different penalty factors to the two categories of samples to solve the classification accuracy concern typical of a classical SVM arising from unbalanced samples [10,11]. TWSVM is employed as the classifier to address the potential unbalanced sample trouble associated with SVMs and to increase the SVM training speed.

3.3. Multi-Category Classification Algorithm

A standard SVM is a binary classifier. For multi-category classification problems, a combinational multi- category classification SVM is generally used; the problem is decomposed and reconstructed into multiple binary classification problems, which are then solved one-by-one.

Among commonly used multi-category classification algorithms are one vs. one, one vs. many, decision directed acyclic graph (DAG), and binary tree algorithms. A binary tree algorithm has the following benefits: (1) ability to fuse with a classification model of interest; (2) for k-category classification problems, only k-1 binary classifiers are needed, which is the least among the aforementioned algorithms and thus the least computation-intensive; (3) the samples required decrease from layer to layer, resulting in a quicker and more efficient training for a given number of layers; and (4) there are no inseparable zones [33,34,35]. We selected the binary tree algorithm to construct a hierarchical TWSVM decision model.

As shown in Figure 1, a classification problem is either a complete or a partial binary tree classification problem. The left figure illustrates a complete binary tree SVM (BT-SVM), wherein each decision node divides its categories into two sub-categories of equal size; the right figure shows a PBT-SVM, wherein each decision node singles out a category from the rest.

3.4. Chemical Reaction Optimization Algorithm

Chemical reaction optimization (CRO) is a metaheuristic algorithm having just emerged in recent years. It is a swarm intelligent algorithm implemented by simulating the molecular movement and energy conversion process in a chemical reaction. It is known that the molecular state in a container is unstable at the beginning of reaction due to excessive energy in the molecules, and the molecules are transferred to the lowest possible energy state through processes facilitated by intermolecular collisions and post-collision chemical reaction in order to achieve a stable state. The ultimate outcome of a chemical reaction is the reaction products, which are formed in a process characterized by a diminishing reaction potential energy. It means that the CRO is an optimization process in which the potential energy of the search system is minimized [41,42,43,44,45,46].

Over the past recent years, metaheuristic algorithms have found extensive applications in various domains and have evolved into GA, particle swarm optimization, ant colony algorithms, etc. Relative to other swarm intelligent algorithms, CRO is good at finding a global solution and producing a shorter optimization time, and thereby, it provides a new idea for solving an optimization problem.

The basic operation units involved in CRO algorithm are composed of molecules (ω) and container walls (buffer), where the molecules possess both potential energy (PE) and kinetic energy (KE) whereas the container walls create the environment in which the reaction occurs. The molecular PE is the ultimate criterion for evaluating a chemical reaction and thus becomes the objective function of the question of interest while the KE is a quantized value for determining whether the system is able to initiate a molecular reaction. In a chemical reaction, there are four basic reaction operators: monomolecular collision, monomolecular decomposition, intermolecular collision, and molecule synthesis. The following paragraphs provide a brief description of the four basic reaction operators [47].

(1) Monomolecular collision

Monomolecular collision is a process in which the KE and PE of a molecule change due to the collision between the wall and the molecule. The energy change occurring in a molecular collision reaction process may be described by Equation (14):

K E_{ω^{'}} = (P E_{ω} - P E_{ω^{'}} + K E_{ω}) \times δ

(14)

where, ω is the original molecule, ω’ is the new molecule after structural change,

δ \in [K E L o s s R a t e, 1]

is a random number,

K E L o s s R a t e

is the upper limit (in percentage) of monomolecular collision loss rate, being a constant, and

P E_{ω} = f (ω)

is molecular potential energy, with

f (\cdot)

being the objective function of the problem considered. Monomolecular collision enables local searching in the problem space.

(2) Monomolecular decomposition

Monomolecular decomposition is a reaction occurring when a molecule with higher KE collides with the wall and during which the molecule decomposes or breaks up into two new molecules. The energy difference E_dec existing before and after collision is transmitted to the two new molecules in a random manner; such energy change in this reaction process may be expressed by Equation (15):

\begin{array}{l} E_{dec} = (P E_{ω} + K E_{ω} + δ_{1} \times δ_{2} \times buffer) \\ - (P E_{ω_{1}^{'}} + P E_{ω_{2}^{'}}), \\ K E_{ω_{1}^{'}} = E_{dec} \times δ_{3}, \\ K E_{ω_{2}^{'}} = E_{dec} \times (1 - δ_{3}) \end{array}

(15)

where

δ_{1}

,

δ_{2}

are of uniform distribution in [0, 1] and

δ_{3}

is a random value in [0, 1]. Compared to monomolecular collision, monomolecular decomposition is capable of local search in a larger scope.

(3) Intermolecular collision

Intermolecular collision refers to a process in which two new molecules are produced after collision, and energy exchange, between two molecules, while involving no synthesis of molecules. A new molecule may be taken from the original molecular domain and, due to the absence of a collision with the wall, no energy is lost, hence the total energy does not change after the reaction. The two new molecules have a total KE of E_inter, which is distributed among them randomly; the energy change in this reaction process may be described using Equation (16):

\begin{array}{l} E_{inter} = (P E_{ω_{1}} + P E_{ω_{2}} + K E_{ω_{1}} + K E_{ω_{2}}) \\ - (P E_{ω_{1}^{'}} + P E_{ω_{2}^{'}}), \\ K E_{ω_{1}^{'}} = E_{inter} \times δ_{4}, \\ K E_{ω_{2}^{'}} = E_{inter} \times (1 - δ_{4}) \end{array}

(16)

where

δ_{4}

is a random value in the range of [0, 1].

(4) Molecule synthesis

Molecule synthesis refers to the phenomenon where a new molecule is produced after the collision of two molecules. As with intermolecular collision, it is a process involving no wall collision and so the energy remains constant before and after collision. The energy change in this reaction may be described by Equation (17):

K E_{ω^{'}} = (P E_{ω_{1}} + P E_{ω_{2}} + K E_{ω_{1}} + K E_{ω_{2}}) - (P E_{ω^{'}})

(17)

Molecule synthesis reactions greatly augment the molecule diversity and the synthesized new molecules differ appreciably from their predecessors and typically possess higher molecular PE. Molecule synthesis reaction improves molecular search power in new regions, hence enhancing CRO’s global search performance.

3.5. CRO-TWSVM Modeling Method

CRO-TWSVM model uses the CRO algorithm to optimize TWSVM’s penalty factors and kernel widths, arriving at the optimum penalty factors and kernel widths through iterative optimization involving the four reaction operators. A globally optimum SVM fault diagnosis model is then generated following training using the training sample. The actual procedures are illustrated below, where Figure 2 is depicts the flow chart of CRO-TWSVM modeling.

3.5.1. Pre-Processing

The original DGA samples were normalized to avoid the difference in the order of magnitude of the values of the input parameters. The contents of dissolved gas were converted into the relative content within the range of [0,1], which is conducive to reducing the mutual exclusion between gases. The normalization treatment is described in Equation (18):

x_{i j}^{'} = x_{i j}^{} / \sum_{j = 1}^{k} x_{i j}^{}, i = 1, \dots, n

(18)

The RBM method was used to preprocess the input vector of TWSVM and the feature space was transformed into an appropriate representation, which is conducive to the machine learning of the TWSVM classifier. After being processed by RBM, the transformer fault sample set can be described as a feature set {(y₁,l₁),(y₂,l₂),…,(y_n,l_y)}, where y_i ∈ R_d is the characteristic output of the i-th transformer fault sample, d is the characteristic dimension, n is the number of training samples,

l_{i} \in {\pm 1}

represents the output target (I = 1, 2..., N).

3.5.2. Set the Objective Function

We set the objective function along with the initial values of parameters c₁, c₂ and q with regard to the actual problem. Cross-validation is used to eliminate the training bias caused by randomness and evaluate the performance of the training model and improve the stability and generalization ability of the model.

The average classification accuracy of k-fold cross-validation is taken as the object function f to minimize the error of the trained TWSVM model. Considering the sample size and training efficiency, a five-fold cross-validation method was adopted:

\begin{array}{l} f = \frac{1}{k} \sum_{i = 1}^{k} (\frac{l_{T}^{i}}{l^{i}} \times 100 %) \end{array}

(19)

where lⁱ is the number of samples in the i-th verification set, and

l_{T}^{i}

is the number of samples correctly classified in the verification set.

3.5.3. Initialize the CRO Algorithm

In this step, the following quantities are determined: the initial number of molecules in the container (PopSize), the upper limit (in percentage) of KE loss due to wall-collision reaction (

KE L o s s R a t e

), the factor determining the type of molecular reaction (MoleColl), the factor determining the type of monomolecular reaction (α), the factor determining the type of multimolecular reaction (β), and the maximum number of iterations (Iteration).

3.5.4. Compute the Initial PE and KE of Molecules

The initial PE of each molecule is estimated using Equation (11) and the initial value of molecular KE is taken as the initial KE.

3.5.5. Iteration and Optimization

In this step, iterative optimization of the molecules in the container is performed using the four basic reaction operators, with each iteration involving only one basic operator. An iteration operation is composed of three judgment processes, during which the reaction type, the monomolecular reaction type, and the intermolecular reaction type are determined. A judgment is determined based on the random number; if t > MoleColl, then no intermolecular reaction exists and the reaction is instead monomolecular, or vice versa. In the case of a monomolecular reaction, if NumHit–MinHit > α, then proceed with monomolecular decomposition reaction; otherwise, the reaction is monomolecular, where NumHit is the number of collisions, and MinHit is the minimum number of collisions. In the case of a multimolecular reaction, if KE < β, then proceed with molecule synthesis reaction, otherwise proceed with intermolecular collision reaction.

3.5.6. End the Algorithm

If the molecular behavior is such that the algorithm may be ended, then end the optimization process.

3.5.7. Train the Diagnosis Model

The molecule with the minimum PE corresponds to the global optimum solution, which is associated with the kernel width and penalty factor of the optimized TWSVM. The width and factor values are then assigned to TWSVM, which is trained using the training sample to produce the transformer fault diagnosis model.

3.5.8. Testing

Finally, the model is tested for accuracy.

4. A Transformer Fault Diagnosis Model Based on CRO-TWSVM

4.1. Choice of Parameters

The fault diagnosis model parameters are derived from the input characteristic values of the fault diagnosis methods recommended by IEC and IEEE, which are mainly divided into two categories: the input parameters of the IEC method, Roger method (RRM), and Doernenburg method (DRM) are gas ratios; and the input parameters of the key gas method (KGM), David triangle method (DTM) are gas contents.

The feature parameters in the above methods are extracted and repeated features are removed. C₂H₂/C₂H₄ and CH₄/H₂ ratios provide a good discriminating power with respect to thermal faults and discharge faults; C₂H₄/C₂H₆ and C₂H₄/CH₄ ratios perform well in distinguishing between low-, medium-, and high-temperature thermal faults; C₂H₂/C₂H₄ and C₂H₄/C₂H₆ ratios are suitable for distinguishing between medium- temperature and high-temperature thermal faults. C₂H₂/C₂H₄ and C₂H₄/C₂H₆ ratios are good indicators to distinguish partial-, low-, and high-temperature discharge faults [4,17,22,48,49,50].

The contents of H₂ and C₂H₂ were added to the model as input parameters to judge the normal or abnormal state of the transformer. The content of C₂H₂, as a percentage of the total hydrocarbon, is an important indicator used to determine the degree of discharge and over-heating in the oil; therefore, the ratio of C₂H₂/TCG (total combustible gas) was included. Finally, combined with the advantages of feature parameter identification of various fault diagnosis methods, a DGA fusion analysis model was established as the input parameter of the support vector machine, as shown in Table 1.

4.2. Model Construction

The transformer fault diagnosis model based on CRO-TWSVM is composed of TWSVMs, the fusion DGA parameters, and multiple classification structures of a partial binary tree (PBT), as shown in Figure 3.

Seven fault modes are identified if the normal state is included: normal NF, low overheat T1 (below 300℃), medium overheat T2 (300–700℃), high overheat T3 (over 700℃), partial discharge (PD), low-energy discharge (D1), and high-energy discharge (D2).

The model includes a total of six sub-classifiers. To improve training and diagnosis efficiencies, the input of each sub-classifier contains characteristic parameters that are most effective for identifying the faults for which the sub-classifier is designed.

In this model, six TWSVMs, eight characteristic parameters, and four layers of fault samples classification were established to identify seven patterns. In the first layer, TWSVM1 is used as a classifier to classify the samples into a normal sample (NF) and fault sample (TF and DF). In the second layer, TWSVM2 is used as a classifier to classify samples into discharge fault (DF) and overheat fault (TF). In the third layer, TWSVM3 and TWSVM4 are used. TWSVM3 is used to classify T1, T2, and T3; and TWSVM4 is used to classify PD, D1, and D2. TWSVM5 and SVM6 are used in the fourth layer for further fault classification. TWSVM5 is used as a classifier to classify the samples T2 and T3, TWSVM6 is used as a classifier to classify the samples D1 and D2.

5. Examples of the Transformer Fault Diagnosis Model

5.1. Diagnosis Examples

The training and test sample set used by CRO-TWSVM model included 960 pieces of data, part of the data in the sample set were actual 220 kV and 500 kV transformer DGA data taken from a provincial branch company of the state grid and some were raw DGA data taken from the literature [3,6,7,8,9,10,35,36,37,38,39]. Four hundred and fifty seven (457) pieces of data were extracted from the sample set, among which 345 pieces were taken as training samples and 112 pieces were taken as test samples. Meanwhile, PD and T3 samples are properly processed to achieve an ideal unbalanced situation. The distribution of samples are given in Table 2.

The CRO-TWSVM fault diagnosis model was created on the MATLAB simulation platform. Parameters used to train the model are given in Table 3.

The training and testing result of the model are summarized in Table 4. The transformer fault diagnosis fusion model presented in this paper scores 94.64% in diagnosis accuracy and 308ms in diagnosis time, TWSVM4 and TWSVM5 is affected by unbalanced samples but also maintains high accuracy, therefore the fault diagnosis model presented in this paper not only offers a high diagnosis accuracy and a good generalization capacity but also consumes less time and is more efficient.

5.2. Random Test

As only one class of samples was tested in Section 5.1, there are contingency and subjective factors in the test. To ensure the objectivity and persuasiveness of the test, a random test was adopted and different random seeds were set in each test, so that the training set and test set generated by each sampling were different. The IEC, PSO-TWSVM, CRO-SVM, BPNN, random forest method (RF), Gradient Boosting Decision Tree (GBDT) and CRO-TWSVM models were selected for comparison. Meanwhile, considering that the data preprocessing may affect the performance of classifiers, CRO-TWSVM model without pre-processing is added for comparison.

The model was subjected to 10 random tests with 5-fold cross-validation, the test data are shown in Table 5, and the 95% confidence interval (CI) of the diagnostic accuracy was calculated for comparison. The diagnostic results are shown in Table 6.

The comparison with the four models shows that: (1) The minimum and maximum sample size of 10 random test were 241 and 667. The 95% CI of the diagnostic accuracy of CRO-TWSVM was 89.42–94.88 and SD was 2.73 because the samples were picked at random and it’s hard to keep a balance. It’s worth noting that, in the case where the number of samples of random seed2, 4, 5, 9, 10 were close, and the test results were distributed between 89.97% and 94.33%, which further illustrates the impact of unbalanced samples on the test results; (2) RF, GBDT, SVM and TWSVM all show good generalization ability and robustness for transformer fault diagnosis. In terms of diagnostic accuracy, the order from low to high is RF, SVM, GBDT and TWSVM, respectively, and this is a very interesting phenomenon. Because in 10 random samples, there are different degrees of unbalanced samples, and the samples are kept to a small order of magnitude, in this case, RF does not give the best performance, more samples are needed to maintain better accuracy. Meanwhile, in the case of unbalanced samples, TWSVM is less affected than SVM and GBDT and shows better generalization ability; (3) the accuracy of the BP neural network was just 87.24% because producing highly accurate results with the BP neural network diagnosis method requires a large amount of sample data; (4) Compared with the model without pre-processing, the average diagnostic accuracy of the CRO-TWSVM model was improved by 6.92%; (5) the average diagnostic accuracy of the CRO-TWSVM and PSO-TWSVM model were similar, but the convergence rate was improved by 12.31%;(6) According to the comparison of the test results in Section 5.1 and Section 5.2, the test accuracy of CRO-TWSVM model was 94.46% in Section 5.1, while the range CRO-TWSVM under 95%CV was 89.42% to 94.88%, with an average accuracy of 92.15% in Section 5.2. This also shows that the method of random test can make the sample closer to the actual situation and maintain the objectivity of test results. Therefore, the CRO-TWSVM model described in this paper is better than other AI algorithms for fault classification and generalization and is more suitable for constructing a fault diagnosis model when few training samples are available.

6. Conclusions

This paper presented a transformer fault diagnosis model based on chemical reaction optimization and twin support vector machine. The following conclusions were reached following analysis and comparison of the diagnosis results produced by different models:

(1): The proposed model not only combines the binary tree structure and TWSVM, but also integrates a variety of DGA parameters. On the basis of maintaining high diagnostic accuracy, training and diagnostic efficiency can be improved, and the problems of small sample size and unbalanced sample in transformer fault diagnosis can be solved. Our model’s application effect is better than the traditional IEC and Doernenburg ratio method, and is more suitable for transformer fault diagnosis than other AI methods. The tests showed that the model can be used for real-time diagnosis and analysis of power system equipment.
(2): RBM was used for sample features optimization, and the random test method was adopted for test results optimization. The k-fold cross-validation method and the CRO method were used for hyperparameter tuning. These processes and optimization methods helped to considerably improve the generalization ability and robustness of the model. These methods are universal and can be used for reference in other models.

Our future work will focus on establishing a real-time fault diagnosis system to realize prediction and in-time diagnosis of transformer faults. The diagnosis system would decrease the working load of grid engineering and provide reasonable guidance for the management and maintenance of electrical equipment.

Author Contributions

F.Y. and J.G. designed the algorithm. F.Y. test the example and write the manuscript. Z.X., B.Z., W.Z. and S.H. helped design the algorithm and debug the code.

Funding

This work was supported in part by the National Natural Science Foundation of China (51379160) and the State Grid Science and Technology Program of China.

Acknowledgments

The authors gratefully acknowledge the support of the National Natural Science Foundation of China (Grant. 51379160), and the State Grid Science and Technology Program of China.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

DGA	dissolved gas analysis
CRO	chemical reaction optimization
FCM	fuzzy c-means clustering
TWSVM	twin support vector machines
QPP	quadratic programming program
PE	potential energy
KE	kinetic energy
K	kernel function
DAG	directed acyclic graph
PBT	partical binary tree
BT	binary tree
ω’	new molecule after structural change
w	normal vector
b	offset of the optimum huperplane
PopSize	the initial number of molecules in the container
KELossRate	the upper limit in the container
MoleColl	the factor determining the type of molecular reaction
α	the factor determining the type of monomolecular reaction
β	the factor determining the type of multimolecular reaction
ω	original molecule

References

Abu-Siada, A.; Hmood, S. A new fuzzy logic approach to identify power transformer criticality using dissolved gas-in-oil analysis. Int. J. Electr. Power Energy Syst. 2015, 67, 401–408. [Google Scholar] [CrossRef]
Liu, C.; Lin, T.; Yao, L.; Wang, S. Integrated power transformer diagnosis using hybrid fuzzy dissolved gas analysis. IEEJ Trans. Electr. Electron. Eng. 2015, 10, 689–698. [Google Scholar] [CrossRef]
Wang, X.; Li, Q.; Yang, R.; Li, C.; Zhang, Y. Diagnosis of solid insulation deterioration for power transformers with dissolved gas analysis-based time series correlation. IET Sci. Meas. Technol. 2015, 9, 393–399. [Google Scholar] [CrossRef]
Huang, Y.C.; Sun, H.C. Dissolved Gas Analysis of Mineral Oil for Power Transformer Fault Diagnosis Using Fuzzy Logic. IEEE Trans. Dielectr. Electr. Insul. 2013, 20, 974–981. [Google Scholar] [CrossRef]
Yang, M.T.; Hu, L.S. Intelligent Fault Types Diagnostic System for Dissolved Gas Analysis of Oil-immersed Power Transformer. IEEE Trans. Dielectr. Electr. Insul. 2013, 20, 2317–2324. [Google Scholar] [CrossRef]
Zhang, Y.; Ding, X.; Liu, Y.; Griffin, P.J. An artificial neural network approach to transformer fault diagnosis. IEEE Trans. Power Deliv. 1996, 11, 1836–1841. [Google Scholar] [CrossRef]
Illias, H.A.; Chai, X.R.; Bakar, A.H.A. Hybrid modified evolutionary particle swarm optimisation-time varying acceleration coefficient-artificial neural network for power transformer fault diagnosis. Measurement 2016, 90, 94–102. [Google Scholar] [CrossRef]
Souahlia, S.; Bacha, K.; Chaari, A. SVM-based decision for power transformers fault diagnosis using Rogers and Doernenburg ratios DGA. In Proceedings of the 10th International Multi-Conferences on Systems, Signals & Devices 2013 (SSD13), Hammamet, Tunisia, 18–21 March 2013; pp. 1–6. [Google Scholar]
Bacha, K.; Souahlia, S.; Gossa, M. Power transformer fault diagnosis based on dissolved gas analysis by support vector machine. Electr. Power Syst. Res. 2012, 83, 73–79. [Google Scholar] [CrossRef]
Jayadeva; Khemchandani, R.; Chandra, S. Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. 2007, 29, 905–910. [Google Scholar] [CrossRef]
Qi, Z.; Tian, Y.; Shi, Y. Robust twin support vector machine for pattern classification. Pattern Recognit. 2013, 46, 305–316. [Google Scholar] [CrossRef]
Luo, S.; Cheng, J.; HungLinh, A. Application of lcd-svd technique and cro-svm method to fault diagnosis for roller bearing. Shock Vib. 2015, 2015, 847802. [Google Scholar] [CrossRef]
Colorado, D.; Hernández, J.A.; Rivera, W.; Martinez, H.; Juárez, D. Optimal operation conditions for asingle-stage heat transformer by means of an artificial neural network inverse. Appl. Energy 2011, 88, 1281–1290. [Google Scholar] [CrossRef]
Bhalla, D.; Bansal, R.K.; Gupta, H.O. Function analysis based rule extraction from artificial neural networks for transformer incipient fault diagnosis. Int. J. Electr. Power Energy Syst. 2012, 43, 1196–1203. [Google Scholar] [CrossRef]
Guardado, J.L.; Naredo, J.L.; Moreno, P.; Fuerte, C.R. A comparative study of neural network efficiency in power transformers diagnosis using dissolved gas analysis. IEEE Trans. Power Deliv. 2001, 16, 643–647. [Google Scholar] [CrossRef]
Mohamed, E.; Abdelaziz, A.; Mostafa, A. A neural network-based scheme for fault diagnosis of power transformers. Electr. Power Syst. Res. 2005, 75, 29–39. [Google Scholar] [CrossRef]
Wang, X.; Wang, T.; Wang, B. Hybrid pso-bp based probabilistic neural network for ower transformer fault diagnosis. In Proceedings of the 2008 Second International Symposium on Intelligent Information Technology Application, Shanghai, China, 20–22 December 2008; Volume 1, pp. 545–549. [Google Scholar]
Huang, Y.C.; Yang, H.T.; Huang, C.L. Developing a new transformer fault diagnosis system through evolutionary fuzzy logic. IEEE Trans. Power Deliv. 1997, 12, 761–767. [Google Scholar] [CrossRef]
Youssef, O.A.S. Applications of fuzzy-logic-wavelet-based techniques for transformers inrush currents identification and power systems faults classification. In Proceedings of the IEEE PES Power Systems Conference and Exposition, New York, NY, USA, 10–13 October 2004; Volume 1, pp. 553–559. [Google Scholar]
Li, J.P.; Chen, X.J.; Wu, C.M. Application of comprehensive relational grade theory in expert system of transformer fault diagnosis. In Proceedings of the IEEE International Workshop on Intelligent Systems and Applications, Wuhan, China, 23–24 May 2009; pp. 1–4. [Google Scholar]
Tang, W.H.; Goulermas, J.Y.; Wu, Q.H.; Richardson, Z.J.; Fitch, J. A probabilistic classifier for transformer dissolved gas analysis with a particle swarm optimizer. IEEE Trans. Power Deliv. 2008, 23, 751–759. [Google Scholar]
Ghoneim, S.S.M.; Taha, I.B.M. A new approach of DGA interpretation technique for transformer fault diagnosis. Int. J. Electr. Power Energy Syst. 2016, 81, 265–274. [Google Scholar] [CrossRef]
Rigatos, G.; Siano, P. Power transformers’ condition monitoring using neural modelling and the local statistical approach to fault diagnosis. Int. J. Electr. Power Energy Syst. 2016, 80, 150–159. [Google Scholar] [CrossRef]
Prasanth Babu, B.; Surya Kalavathi, M.; Singh, B.P. Use of wavelet and neural network (BPFN) for transformer fault diagnosis. In Proceedings of the 2006 IEEE Conference on Electrical Insulation and Dielectric Phenomena, Kansas City, MO, USA, 15–18 October 2006; pp. 93–96. [Google Scholar]
Shah, A.M.; Bhalja, B.R. Fault discrimination scheme for power transformer using random forest technique. IET Gener. Transm. Distrib. 2015, 10, 1431–1439. [Google Scholar] [CrossRef]
Shah, A.M.; Bhalja, B.R. Discrimination between internal faults and other disturbances in transformer using the support vector machine-based protection scheme. IEEE Trans. Power Deliv. 2013, 28, 1508–1515. [Google Scholar] [CrossRef]
Zhang, Y.Y.; Liu, J.F.; Zheng, H.B.; Wei, H.; Liao, R.J. Study on quantitative correlations between the ageing condition of transformer cellulose insulation and the large time constant obtained from the extended Debye model. Energies 2017, 10, 1842. [Google Scholar] [CrossRef]
Wu, L.Z.; Zhu, Y.L.; Yuan, J.S. Novel method for transformer faults integrated diagnosis based on Bayesian network classifier. Trans. China Electrotech. Soc. 2005, 20, 45–51. [Google Scholar]
Pandya, A.A.; Parekh, B.R. Interpretation of sweep frequency response analysis (SFRA) traces for the open ircuit and short circuit winding fault damages of the power transformer. Int. J. Electr. Power Energy Syst. 2014, 62, 890–896. [Google Scholar] [CrossRef]
Souahlia, S.; Bacha, K.; Chaari, A. MLP neural network-based decision for power transformers fault diagnosis using an improved combination of Rogers and Doernenburg ratios DGA. Int. J. Electr. Power Energy Syst. 2012, 43, 1346–1353. [Google Scholar] [CrossRef]
Lu, C.; Wang, Z.; Zhou, B. Intelligent fault diagnosis of rolling bearing using hierarchical convolutional network based health state classification. Adv. Eng. Inform. 2017, 32, 139–151. [Google Scholar] [CrossRef]
Iosifidis, A.; Gabbouj, M. Multi-class support vector machine classifiers using intrinsic and penalty graphs. Pattern Recognit. 2016, 55, 231–246. [Google Scholar] [CrossRef]
Tomar, D.; Agarwal, S. Multi-class Twin Support Vector Machine for Pattern Classification2. In Smart Innovation Systems and Technologies; Nagar, A., Mohapatra, D.P., Chaki, N., Eds.; Springer: Berlin, Germany, 2016; Volume 43, pp. 97–110. [Google Scholar]
Ding, M.; Yang, D.; Li, X. Fault diagnosis for wireless sensor by twin support vector machine. Math. Probl Eng. 2013. [Google Scholar] [CrossRef]
Liu, N.; Tan, K.X.; Gao, W.S. Fault diagnosis method for power transformers based on patterns of dissolved gases in the oil. J. Tsinghua Univ. 2003, 43, 301–303. [Google Scholar]
Ding, S.; Zhang, X.; Yu, J. Twin support vector machines based on fruit fly optimization algorithm. Int. J. Mach. Learn. Cybern. 2016, 7, 193–203. [Google Scholar] [CrossRef]
Zhang, X.; Ding, S.; Sun, T. Multi-class lstmsvm based on optimal directed acyclic graph and shuffled frog leaping algorithm. Int. J. Mach. Learn. Cybern. 2016, 7, 241–251. [Google Scholar] [CrossRef]
Li, J.; Zhang, Q.; Wang, K.; Wang, J.; Zhou, T.; Zhang, Y. Optimal dissolved gas ratios selected by genetic algorithm for power transformer fault diagnosis based on support vector machine. IEEE Trans. Dielect. Elect. Insul. 2016, 23, 1198–1206. [Google Scholar] [CrossRef]
Smolensky, P. Information Processing in Dynami cal Systems: Foundations of Harmony Theory, Volume 1 of Parallel Distributed Processing; MIT Press: Cambridge, MA, USA, 1986. [Google Scholar]
Mousas, C.; Anagnostopoulos, C.-N. Learning Motion Features for Example-Based Finger Motion Estimation for Virtual Characters. 3D Res. 2017, 8, 25. [Google Scholar] [CrossRef]
Bechikh, S.; Chaabani, A.; Said, L.B. An efficient chemical reaction optimization algorithm for multiobjective optimization. IEEE Trans. Cybern. 2015, 45, 2051–2064. [Google Scholar] [CrossRef] [PubMed]
Eldos, T.; Khreishah, A. Maximally distant codes allocation using chemical reaction optimization with enhanced exploration. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 235–243. [Google Scholar] [CrossRef]
Lam, A.Y.S.; Li, V.O.K. Chemical-reaction-inspired metaheuristic for optimization. IEEE Trans. Evol. Comput. 2010, 14, 381–399. [Google Scholar] [CrossRef]
Nayak, J.; Naik, B.; Behera, H.S. A novel chemical reaction optimization based higher order neural network (CRO-HONN) for nonlinear classification. Ain Shams Eng. J. 2015, 6, 1069–1091. [Google Scholar] [CrossRef]
Alatas, B. ACROA: Artificial chemical reaction optimization algorithm for global optimization. Expert Syst. Appl. 2011, 38, 13170–13180. [Google Scholar] [CrossRef]
Lam, A.Y.; Li, V.O.; James, J.Q. Real-coded chemical reaction optimization. IEEE Trans. Evol. Comput. 2012, 16, 339–353. [Google Scholar] [CrossRef]
Rao, P.S.; Banka, H. Novel chemical reaction optimization based unequal clustering and routing algorithms for wireless sensor networks. Wirel. Netw. 2017, 23, 759–778. [Google Scholar] [CrossRef]
Liu, Z.; Song, B.; Li, E.; Mao, Y.; Wang, G. Study of “code absence” in the iec three-ratio method of dissolved gas analysis. IEEE Electr. Insul. Mag. 2015, 31, 6–12. [Google Scholar] [CrossRef]
Cheng, L.; Yu, T. Dissolved Gas Analysis Principle-Based Intelligent Approaches to Fault Diagnosis and Decision Making for Large Oil-Immersed Power Transformers: A Survey. Energies 2018, 11, 913. [Google Scholar] [CrossRef]
Chen, W.; Chen, X.; Peng, S.; Li, J. Canonical Correlation between Partial Discharges and Gas Formation in Transformer Oil Paper Insulation. Energies 2012, 5, 1081–1097. [Google Scholar] [CrossRef]

Figure 1. Multiple classification structure. (left) Complete and (right) partial BT-SVM.

Figure 2. CRO-TWSVM design and training flowchart.

Figure 3. TWSVM neural network design and training flowchart.

Table 1. Fusion parameter model of DGA.

Serial Number	Gas Composition	Fault Identification
1	H₂	Normal and abnormal
1	C₂H₂	Normal and abnormal
2	C₂H₂/C₂H₄	Thermal fault or discharge fault
	CH₄/H₂
	C₂H₂/Tag
3	C₂H₄/C₂H₆	Low-, medium-, and high-temperature thermal fault
3	C₂H_4/CH₄	Low-, medium-, and high-temperature thermal fault
4	C₂H₂/C₂H₄	Medium- or high-temperature thermal fault
4	C₂H₄/C₂H₆	Medium- or high-temperature thermal fault
5	C₂H₂/C₂H₄C₂H₄/ CH₄	Particle discharge or low and high discharge
6	C₂H₄/C₂H₆	Low-discharge or high-discharge

Table 2. Training sample distribution table.

Fault Code	Training Sample (N)	Testing Sample (N)
NF	69	23
PD	28	17
D1	57	12
D2	51	21
T1	55	17
T2	58	10
T3	27	12
Total	345	112

Table 3. Training parameters of CRO-TWSVM fault diagnosis model.

Training Parameter	Paper
Number of input parameter	7
Number of classifiers	6
Range of c1	0.1–100
Range of c2	0.1–100
Range of q	0.1-30
kernel function	RBF
Epochs	4000
Goal of MSE	0.05
initial number of molecules	80
upper limit of KE loss	0.3
MoleColl	0.3
α	500
β	15
Iteration	7000
k-ford cross validation	5
Number of hidden unit	250

Table 4. Training results of CRO model with 5-fold cross-validation.

Model	Training Sample (N)	Test Sample (N)	CRO
Model	Training Sample (N)	Test Sample (N)	Classification Accuracy (%)	C₁	C₂	q	Average Time (ms)
TWSVM1	345	112	98.21	47.23	2.67	8.21	5121
TWSVM2	276	89	98.88	59.12	16.43	21.66	4581
TWSVM3	140	39	97.43	22.36	46.92	3.09	3025
TWSVM4	136	50	94.00	57.13	8.54	5.47	3247
TWSVM5	85	22	90.91	2.76	10.68	18.73	1988
TWSVM6	108	33	96.97	71.55	12.08	10.02	2267

Table 5. Ten random tests of CRO-TWSVM with 5-fold cross-validation.

Test	Samples (N)	Training Samples (N)	Training Accuracy (%)	No. Test Samples	Testing Accuracy (%)
Random seed1	427	320	95.86%	107	94.61%
Random seed2	368	276	94.57%	92	94.57%
Random seed3	521	391	96.74%	130	95.97%
Random seed4	312	234	90.60%	78	88.46%
Random seed5	348	261	94.25%	87	94.25%
Random seed6	667	500	96.15%	167	95.95%
Random seed7	295	221	87.68%	74	85.42%
Random seed8	241	181	87.41%	60	87.97%
Random seed9	369	277	95.75%	92	89.97%
Random seed10	335	251	93.53%	84	94.33%

Table 6. Comparison of 10 random tests.

Model	Testing Accuracy (%)			SD
Model	Average	Upper Limit	Lower Limit	SD
PSO	90.84	93.8	87.88	2.96
SVM	88.71	92	85.42	3.29
BPNN	86.24	89.98	82.5	3.74
RF	86.83	90.18	83.48	3.35
GBDT	89.17	92.33	86.01	3.16
Without pre-processing	85.23	89.42	81.04	4.19
This paper	92.15	94.88	89.42	2.73

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yuan, F.; Guo, J.; Xiao, Z.; Zeng, B.; Zhu, W.; Huang, S. A Transformer Fault Diagnosis Model Based on Chemical Reaction Optimization and Twin Support Vector Machine. Energies 2019, 12, 960. https://doi.org/10.3390/en12050960

AMA Style

Yuan F, Guo J, Xiao Z, Zeng B, Zhu W, Huang S. A Transformer Fault Diagnosis Model Based on Chemical Reaction Optimization and Twin Support Vector Machine. Energies. 2019; 12(5):960. https://doi.org/10.3390/en12050960

Chicago/Turabian Style

Yuan, Fang, Jiang Guo, Zhihuai Xiao, Bing Zeng, Wenqiang Zhu, and Sixu Huang. 2019. "A Transformer Fault Diagnosis Model Based on Chemical Reaction Optimization and Twin Support Vector Machine" Energies 12, no. 5: 960. https://doi.org/10.3390/en12050960

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Transformer Fault Diagnosis Model Based on Chemical Reaction Optimization and Twin Support Vector Machine

Abstract

1. Introduction

2. Related Work

3. Fundaments of CRO-TWSVM Model

3.1. Restricted Boltzmann Machine

3.2. Twin Support Vector Machine

3.3. Multi-Category Classification Algorithm

3.4. Chemical Reaction Optimization Algorithm

3.5. CRO-TWSVM Modeling Method

3.5.1. Pre-Processing

3.5.2. Set the Objective Function

3.5.3. Initialize the CRO Algorithm

3.5.4. Compute the Initial PE and KE of Molecules

3.5.5. Iteration and Optimization

3.5.6. End the Algorithm

3.5.7. Train the Diagnosis Model

3.5.8. Testing

4. A Transformer Fault Diagnosis Model Based on CRO-TWSVM

4.1. Choice of Parameters

4.2. Model Construction

5. Examples of the Transformer Fault Diagnosis Model

5.1. Diagnosis Examples

5.2. Random Test

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI