Next Article in Journal
Communication-Efficient Tracking of Unknown, Spatially Correlated Signals in Ad-Hoc Wireless Sensor Networks: Two Machine Learning Approaches
Next Article in Special Issue
Deep-Learning-Based Approach to Anomaly Detection Techniques for Large Acoustic Data in Machine Operation
Previous Article in Journal
FAC-Net: Feedback Attention Network Based on Context Encoder Network for Skin Lesion Segmentation
Previous Article in Special Issue
Crack Size Identification for Bearings Using an Adaptive Digital Twin
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generative Adversarial Network-Based Scheme for Diagnosing Faults in Cyber-Physical Power Systems

1
Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON N9B 3P4, Canada
2
School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada
3
Center for Data Science, Coventry University, Coventry CV1 5FB, UK
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(15), 5173; https://doi.org/10.3390/s21155173
Submission received: 30 June 2021 / Revised: 25 July 2021 / Accepted: 27 July 2021 / Published: 30 July 2021
(This article belongs to the Special Issue Artificial Intelligence for Fault Diagnostics and Prognostics)

Abstract

:
This paper presents a novel diagnostic framework for distributed power systems that is based on using generative adversarial networks for generating artificial knockoffs in the power grid. The proposed framework makes use of the raw data measurements including voltage, frequency, and phase-angle that are collected from each bus in the cyber-physical power systems. The collected measurements are firstly fed into a feature selection module, where multiple state-of-the-art techniques have been used to extract the most informative features from the initial set of available features. The selected features are inputs to a knockoff generation module, where the generative adversarial networks are employed to generate the corresponding knockoffs of the selected features. The generated knockoffs are then fed into a classification module, in which two different classification models are used for the sake of fault diagnosis. Multiple experiments have been designed to investigate the effect of noise, fault resistance value, and sampling rate on the performance of the proposed framework. The effectiveness of the proposed framework is validated through a comprehensive study on the IEEE 118-bus system.

1. Introduction

Complex cyber-physical power systems contain a numerous number of elements such as generation units, bus bars, transmission lines, and loads, which are protected by circuit breakers and protective relays. When a fault happens in an element of the system, a large amount of alarms can potentially be generated by means of the protecting devices to be sent to the supervisory control and data acquisition (SCADA) system [1]. However, it is often difficult to manage the received alarms by means of the SCADA for the sake of fault diagnosis in cyber-physical power systems. Furthermore, the large amount of data measurements collected by means of the sparse measuring devices such as phasor measuring units (PMUs) in SCADA systems makes it even more challenging for the successful implementation of diagnostic frameworks for cyber-physical power systems [2]. Therefore, it is of paramount importance to develop an efficient diagnostic method that can cope with the large amount of data [3].
Data-driven methods have been widely used for fault diagnosis [4]. One of the major challenges in the design of data-driven diagnostic frameworks goes back to the extraction of the most informative features from the numerous number of collected features by means of the SCADA system [5]. This issue, however, can be addressed by resorting to feature selection techniques [6]. Feature selection could be refer to as the process of selecting an informative and relevant subset of the original features [7]. These techniques can generally be divided into three major categories including filters, wrappers, and embedded techniques [8]. Filters makes use of the developed tools for measuring the mutual information, distances, dependencies, and consistencies in order to extract a set of reliable features [9]. Wrappers, however, are generally constructed based on a classification model and take the classification accuracy as a measure in extraction of a subset of features that leads to the best classification accuracy [10]. In embedded techniques, the learning and selection processes are combined and in contrast to the filter or wrapper techniques, these two processes cannot be performed separately [11]. Other than feature selection, developing a diagnostic system that is free of the type of the data distribution is also important for the generalization purposes. To this end, generative adversarial networks (GANs) have been extensively used in the design of data-driven diagnostic frameworks due to their capability of generating the true data distribution from random distributions [12].
GANs are known for their capability in dealing with unbalanced classification by generating realistic looking data samples. They have also found application in data augmentation based on the artificial data which are very similar to the original data samples. Thanks to their advantages, GANs have been widely used in many applications including fault diagnosis [13]. For instance, an auxiliary classifier GAN (ACGAN) has been proposed in [14] to construct an augmentation mechanism for the sake of fault diagnosis. A deep GAN model has been proposed in [15] to deal with the imbalance data used for fault diagnosis. In [16], the authors have used the infrared thermography and infrared image processing in order to generate a useful set of features to evaluate the working condition of angle grinders through a method called BCAoMID-F. The implementation of GANs for self-supervised [17], semi-supervised [18], and unsupervised [12] fault diagnosis schemes have also been extensively studied. In [19], the authors adapt a framework to generate knockoffs, which are random variables that mimic the correlation structure within the existing set of variables in a way that provides a mechanism for the accurate control of the false discovery rate. Even though this technique shows encouraging results, however, it only adapts to the Gaussian distributions. Thanks to the promising results of GANs, the results of this work has been recently extended to a method called knockoffGAN (kGAN), in which the knockoffs are generated without any assumption on the distribution of variables [20]. It is also proposed that by resorting to statistical measures based on the attained coefficients through the Lasso regression, one can employ the feature selection task based on the generated knockoffs. However, a poorly chosen statistic can yield to unreliable results. Therefore, this work aims to propose a diagnostic model based on the knockoffs to benefit from their advantages, while ensures a reliable feature selection for the sake of fault diagnosis.
This work puts forward a novel framework for fault diagnosis in cyber-physical power systems. In contrast to the model-based techniques that rely on the explicit model of the system [21], the proposed framework in this work is data-driven and benefits from the generated knockoffs by means of the kGAN. We consider three different modules in the proposed framework to deal with the large amount of data collected from different spots in the network of the system. Specifically, it is proposed to collect voltage, frequency, and phase-angle features from each bus in the system. After normalization, we make use of a feature selection module in order to construct a subset of the most informative features from the original set of features. In this regard, we make use of the well-known techniques including infinite feature selection (InfFS) [22], mutual information feature selection (MutInfFS) [23], minimum redundancy maximum relevance feature selection (mRMR) [24], and relief feature selection (Relief) [25]. We then run the kGAN technique in the next module of the proposed diagnostic framework, where the selected features from the previous module are used as basis for the generation of knockoffs. The generated knockoffs of the selected features are then fed into the next module, that is the classification module, where two different classification models including k-nearest neighbour (kNN) and support vector machine (SVM) are used to diagnose different types of faults. The main contribution of this work relies on the design of a novel framework that involves multiple modules including feature selection, kGAN, and classification. This is proposed to benefit the most from the advantages of the generative adversarial networks in the extraction of knockoffs. The generated knockoffs are free of the distribution of data and can be generated in a way that controls the false discovery rate of the selected variables. Compared with the scenario, in which no knockoffs are generated from the selected features, the attained results denote the superiority of the proposed framework for classification tasks. We implement three different types of faults on the IEEE 118-bus system and investigate the effect of noise, fault resistance value, and sampling rate on the performance of the proposed framework through a very comprehensive analysis of the attained results.
The rest of this paper is organized as follows. We review the literature of GANs in Section 2. The generation of knockoffs is presented in Section 3. Simulation results and analysis of the attained results are represented in Section 4 and concluding remarks are given in Section 5.

2. Literature Review

It is well-studied that GANs consist of two models called generator and discriminator, which are typically implemented by neural networks. The generator model aims to learn the distribution of true examples in order to generate new data samples. The discriminator model, however, aims to discriminate the generated data examples by means of the generator from the true data examples [26]. GANs are constructed based on the generative algorithms, which are a category of machine learning algorithms alongside the discriminative algorithms. The generative algorithms make use of a fully probabilistic model of the observed data and can be categorized into two classes including explicit density model and implicit density model. The former model is based on the distribution of data and tries to train the model either based on the true examples of the distribution or by fitting the distribution parameters. Techniques based on the maximum likelihood estimation, approximate inference [27], and Markov chain [28] are used in training of the explicit models. The implicit models, however, do not rely on the direct estimation or fitting of the distribution parameters. Without any explicit hypothesis, these models generate data samples from a distribution to modify the existing model. The training is typically based on the ancestral sampling [29].
In this regard, different representative variants of GANs have been recently developed for different applications. For instance, InfoGAN [30] in contrast to the typical GAN that makes use of a single unconstructed noise signal, decomposes the noise signal into two parts and tries to derive a lower bound of the mutual information objective for an efficient optimization. Some variants of the InfoGAN including causal InfoGAN [31] and semi-supervised InfoGAN (ss-InfoGAN) [32] have been recently developed. GANs are also extended to the case, in which some extra conditions are assigned to the generator and discriminator models. This model is called conditional GAN (cGAN) [33] and can generate data samples that are conditioned on the class labels [34,35]. For image-to-image translation tasks, where the aim is to learn a mapping from an input image to an output image, cycle-consistent GANs (CycleGAN) have been developed to deal with the issue of unpaired data samples [36]. DualGAN [37] has the same structure as that of the CycleGAN; however, its loss function is supported by the Wasserstein GAN (WGAN) [38]. In contrast to the original GAN, in which the discriminator is used for a binary classification task, the discriminator in WGAN is applied to a regression task in order to estimate the Wasserstein distance. This idea, however, requires the discriminator to be K-Lipschitz constrained. In [39], a method called Wasserstein-divergence (W-div) is proposed to relax the WGAN Lipschitz constraints, where it was then used in WGAN-div to approximate the W-div based on an optimization scheme. Same as the WGAN, loss sensitive GAN (LS-GAN) has also Lipschitz constraints, where the given distribution is assumed to belong to a set of Lipschitz densities with a compact support [40]. These variant models of GANs are trained based on different training structures.
The original GAN is developed based on the multilayer perceptron (MLP). Specifically, the generator and discriminator are MLP models, which can only be used for small-sized datasets and have no good generalization capability to deal with complex images [41]. Laplacian GAN (LAPGAN) [42] has been proposed for higher resolution images and makes use of a cascade of convolutional neural networks (CNN) in a Laplacian framework. In the framework of general GAN model, SinGAN [43] and InGAN [44] have also been proposed to learn a generative model based on a single natural image. The next structure is deep convolutional GAN (DCGAN), where in contrast to the original GAN that makes use of the MLP models, are based on the deep convolutional neural networks (DCNNs) [45]. Progressive GAN (PGGAN) [46] is another category of GAN models, in which the progressive neural networks are used in order to grow the generator and discriminator models progressively. Self-attention GAN (SAGAN) [47] is also another developed structure that utilizes the spectral normalization for generator and discriminator models so as to improve the training dynamics. BigGAN [48] is a recently-developed structure which is similar to the SAGAN, however, it is more scalable. Furthermore, StyleGAN [49] is known for its high-quality generator model in generation of face images. Other structures based on the autoencoders [50], encoders [51], multi-discriminator learning [52], multi-generator learning [53], and multi-GAN learning [54] have also been recently developed for GAN models.

3. Knockoff Generation

The general framework of the proposed method has been illustrated in Figure 1. As it can be observed from this figure, the proposed framework contains multiple modules including data acquisition, feature selection, kGAN, and decision making. We make use of sparse data measuring devices in order to collect voltage, frequency, and phase-angle measurements form each bus in the distributed power system. The collected data measurements are then fed into a feature selection module, in which multiple state-of-the-art techniques including InfFS, Relief, MutInfFS, and mRMR have been implemented in order to extract the most informative features from the original set of features. The extracted features are then fed into the kGAN module, where the selected features are used as input and a corresponding set of random variables called knockoffs are outputs of the module. The generated knockoffs are then fed into the decision making module, where the kNN and SVM classification models have been used in order to diagnose different types of faults.
Assume that the set of features is denoted by D and its dimension is d. Suppose that the set of labels is denoted by C and D = { D 1 , , D d } and C are random variables. Then, the concept of a null set can be defined as follows [55].
Definition 1.
A variable D j is null if and only if C is independent of D j conditional on { D j : i j } .
The set of all null variables is shown by K . In order to select the set of most informative features while controlling the false discovery rate, suppose that the set of selected features is denoted by X ^ { 1 , , d } . The false discovery rate can then be defined as follows:
F D R = E | X ^ K | | X ^ | .
Based on the given notations, the definition of the knockoffs can be given as follows [55].
Definition 2.
A knockoff for the variable D is a random variable denoted by D ˜ D that satisfies the following constraints:
D , D ˜ = d D , D ˜ s w a p X
D ˜ C | D
where X { 1 , , d } and . , . s w a p ( X ) is used to show the vector that can be obtained by swapping the ith component with the ( i + d ) th component and = d denotes the equality on distribution.
In order to make use of the generated knockoffs for the sake of feature selection, it is required to define a feature statistic F j that only relies on D , D ˜ , and C . This statistic is defined as F j = f j ( D , D ˜ ) , C for f j R . The f j function is required to satisfy the following constraint:
F j ( [ D , D ˜ ] s w a p ( X ) , C ) = { f j ( [ D , D ˜ ] , C ) , j X f j ( [ D , D ˜ ] , C ) , otherwise .
In order to utilize the above statistic, one way is to resort to the LASSO coefficients in order to regress on the augmented set of knockoffs-feature. Denoting the LASSO coefficients by w 1 , , w 2 d , one can define the LASSO coefficient difference as follows:
F j = | w j | | w j + d | .
Then, based on the given statistic and the definition of knockoffs, the following theorem can be given for the sake of feature selection [55].
Theorem 1.
Suppose that q [ 0 , 1 ] . Given the statistics F 1 , , F d , define:
τ = min t > 0 : 1 + | { j : F j t } | | j : F j t | q .
Then, the selection of variables X ^ = { j : F j τ } will lead to the control of false discovery rate at level q.
In order to satisfy the given constraints in (2), a modified GAN model, called kGAN, has been used to generate knockoffs without any assumption on the distribution of data. The kGAN module has been illustrated in Figure 1.
As it can be observed from this figure, the kGAN module contains a generator network, denoted by G , that is a function that satisfies G ( . , . , ξ ) : D × [ 0 , 1 ] c D , where its parameters are shown by ξ and takes a random realization of D and random noise n U ( [ 0 , 1 ] c ) as input and outputs the set of knockoffs D ˜ .
The discriminator network is designed so as to deal with the given constraint in (2). In this regard, a discriminator network is defined to have a loss which is minimized only for distributions that satisfy the condition given in Equation (2). To this end, the discriminator is denoted by S , which is a function satisfying S ( . , ψ ) : D × D [ 0 , 1 ] d , and takes the swapped sample-knockoff pair ( D , D ˜ ) s w a p ( X ) and its output is a vector in [ 0 , 1 ] d , where the ith component of the output is denoted by S ( D , D ˜ ) s w a p ( X ) and denotes the probability of i X . To this end, the loss of the discriminator can be given as follows:
L S = X { 0 , 1 } d E D E D ˜ X . log ( S ( D , D ˜ ) ) s w a p ( X ) + ( 1 X ) . log ( 1 S ( ( D , D ˜ ) s w a p ( X ) ) ) ,
where ‘.’ denotes the dot product. In order to deal with the computational complexity of this loss function, it is suggested to utilize the stochastic gradient descent algorithm for minibatches of X that are uniformly sampled. Furthermore, a hint vector T is introduced, which is a random variable to be passed into the discriminator. The introduction of the hint vector involves the sampling of a multivariate Bernoulli random variable B that takes the value of 1 with the probability of 0.9. Then, given T i = B i in case that B i = 1 and T i = 0.5 if B i = 0 , the discriminator will then aim to predict only values of X for which B i = 0 . To this end, the final loss of the discriminator will be of the following form:
L S = X { 0 , 1 } d E D [ E D ˜ [ E T [ X ( 1 B ) . log ( S ( ( D , D ˜ ) s w a p ( X ) , T ) ) + ( 1 T ) ( 1 B ) . log ( 1 S ( ( D , D ˜ ) s w a p ( X ) , T ) , T ) ] ] ,
where ⊙ is the element-wise product.
In order to make the discriminator algorithm more stable, a regularization term of the form of WGAN, denoted by f is added to loss function (8). Therefore, the general loss of the discriminator will be of the following form:
L F = E f ( D ) f ( D ˜ ) η ( D ^ f ( D ^ ) 2 1 ) 2 ,
where D ^ = ϵ D + ( 1 ϵ ) D ˜ with ϵ U [ 0 , 1 ] and η is parameter to be tuned.
Finally, in order to generate knockoffs that are as independent as possible of the original features, the mutual information neural estimation (MINE) [56] is used to minimize the mutual information between the set of features and their corresponding knockoffs. In this regard, the mutual information between each pair of the feature and knockoff is estimated by means of d neural networks, denoted by N 1 , , N d with the set of parameters θ 1 , , θ d . By considering a trade-off parameter λ , the following loss of estimation is added to the loss of the generator:
L P = j = 1 d i = 1 n ( N θ j j ( D j ( i ) , D ˜ j ( i ) ) ) log ( i = 1 n exp ( N θ j j ( D j k ( i ) , D ˜ j ( i ) ) ) ) ,
in which k is supposed to be a permutation of [ n ] 2 and superscript ( i ) is used to demonstrate the ith sample. Based on the discussion in this section, the general loss of the proposed method is defined as follows:
min G max S ( L S ) + λ max P ( L P ) + μ max f ( L f ) ,
where μ is a parameter to be tuned.

4. Simulation Results

In this section, we firstly introduce the IEEE 118-bus power system, and, then, we discuss the types of faults and the generated datasets, and finally, we present the results of the proposed diagnostic framework.
As mentioned earlier, we aim to diagnose different types of faults on the IEEE 118-bus system.This system contains 118 buses, 91 loads, and 19 generation buses. In this work, we simulate three different types of faults on this system. These faults are called load loss (LL), generator outage (G), and generator ground (GG). Together with the normal operational state of the system, there will be four types of states to be diagnosed. As for the simulation of the ‘LL’/‘G’ faults, we have disconnected the corresponding load/generation unit from its corresponding bus for a short period of time. As for the ‘GG’ faults, we have simulated a three-phase short-circuit fault between the generation units and ground. We have simulated 31 ‘LL’ faults by disconnecting each single of them from the corresponding bus. In the same way, 19 ‘G’ faults and 19 ‘GG’ faults are simulated. By adding the normal operational state of the system to the above-mentioned simulated faults, there exist 70 classes of operational states to be diagnosed. For each class or operational state, we have collected 500 samples from the sample that fault has been injected into either loads or generators, to the sample that fault has been cleared. Furthermore, voltage, frequency, and phase-angle features are collected from each bus of the system. In Figure 2, the voltage, frequency, and phase angle measurements collected from the first bus of the system in presence of an LL fault on bus #1 are illustrated. The fault has been injected at t = 1 second and the simulation period is set to five seconds. As there are 118 buses in the system and three types of features are collected from each of them, there exist a total number of 354 features to be used in construction of datasets.
In order to study the effect of fault resistance (FR), signal to noise ratio (SNR), and sampling rate (SR) on the performance of the proposed diagnostic framework, 12 different datasets have been created. In this regard, two different SR values have been considered which are 20 KHz to 10 KHz. The FR values are supposed to be 1 Ω and 10 Ω , and the SNR values have been selected to be 50 dB, 40 dB, and 30 dB. By making a combination of the FR, SNR, and SR, 12 datasets { A 1 , , A 12 } are generated.
Following the given description in Section 1, we consider two different scenarios and compare them with a baseline. Our baseline is the case, in which the raw data measurements are directly and without any processing fed into the classification models. Furthermore, in order to investigate the effectiveness of the proposed framework, we compare it with a scenario, in which the raw data measurements are firstly fed into the feature selection module, and, then, the selected features are directly fed into the classification module [6]. This is the first scenario (‘S#1’). In the second scenario, which is the proposed diagnostic framework in this study, we propose to generate the knockoffs of the selected features by means of the kGAN module, and, then, set these knockoffs as inputs to the classification models. Therefore, in the second scenario (‘S#2’), the raw data measurements are firstly fed into the feature selection module, where the selected features are further processed by the kGAN module and the generated knockoffs are fed into the classification models. As for the feature selection module, we resort to four well-known feature selection techniques including InfFS, MutInfFS, mRMR, and Relief. In the feature selection module, in order to find the best number of features to be selected, we start with two features and increase the number of features up to the value, for which no significant performance improvement can be observed for each classification model. The performance of each classification model has been reported based on the F-Measure.
We start with the kNN classification model, where the attained results by means of this classifier are illustrated in Figure 3. It worth noting that each classification model is validated through a 10-fold cross-validation manner. As there are 12 datasets and a 10-fold cross-validation is performed, there are 120 F-Measure values for each experiment. In Figure 3, we have reported the results for all datasets for the baseline and the aforementioned feature selection techniques w.r.t. scenarios ‘S#1’ and ‘S#2’. As it can be observed from this figure, both scenarios have successfully improved the results compared with the baseline case. However, the second scenario (our proposed method) outperforms the first scenario [6] in all experiments, despite of the type of the feature selection technique. As for the first scenario, the attained results denote that InfFS leads to the highest average F-Measure value, which is then followed by mRMR, Relief, and MutInfFS. As for the second scenario, InfFS, MutInfFS, and mRMR show almost the same performance in terms of the average F-measure value, while Relief is the worst technique. Another worthwhile point to be mentioned is that the attained results for the second scenario show lower variation in the attained F-Measure values compared with the first scenario in the case of the InfFS, Relief, and MutInfFS, showing its robustness in dealing with different datasets. We have summarized the attained results of the kNN classification model in Table 1 w.r.t. each dataset in order to check for the effect of noise, fault resistance, and sampling rate on the performance of this classifier.
As it was mentioned earlier, in the first scenario, InfFS leads to the highest average F-Measure that is 0.7633, which is then followed by mRMR (0.7460), Relief (0.7224), and MutInfFS (0.6935). As for the second scenario, the best performance has been achieved by means of the mRMR (0.8057), which is then followed by MutInfFS (0.8055), InfFS (0.8030), and Relief (0.7743). In order to check the effect of FR on the performance of the proposed technique, we resort to the results of datasets { A 1 , , A 6 } for which the FR is 1 Ω , and compare them with those of datasets { A 7 , , A 12 } , for which the FR is 10 Ω . For datasets with FR = 1 Ω , the average F-Measure for the second scenario is 0.8362, whereas it is 0.8331 for datasets with FR = 10 Ω . In the same vein, the average F-Measure for the first scenario are 0.7888 and 0.7718, respectively. Therefore, the attained results denote that the performance of the proposed method is not significantly affected by the value of FR. In order to check for the effect of noise, we regroup datasets into three groups of { A 1 , A 4 , A 7 , A 10 } with SNR = 50 dB, { A 2 , A 5 , A 8 , A 11 } with SNR = 40 dB, and { A 3 , A 6 , A 9 , A 12 } with SNR = 30 dB. For the second scenario, the average F-Measure values for the aforementioned three groups are 0.8291, 0.8190, and 0.7433, respectively. In the case of the first scenario, the attained results are 0.7762, 0.7247, and 0.6931, respectively. The attained results for both scenarios denote the superiority of the proposed method (the second scenario) in dealing with noisy data measurements compared with the first scenario. Finally, we aim to check for the effect of SR on the proposed method by regrouping the given datasets into two groups { A 1 , A 2 , A 3 , A 7 , A 8 , A 9 } , for which the sampling rate is 10 KHz, and { A 4 , A 5 , A 6 , A 1 0 , A 11 , A 12 } , for which the sampling rate is 20 KHz. The attained F-Measure values for the first scenario are 0.7167 and 0.7459 w.r.t. the aforementioned group of datasets, respectively. The average F-Measure values for the second scenario are 0.7958 and 0.7985, respectively. The attained results, on one hand, denote the superiority of the second scenario in comparison with the first scenario in dealing with datasets with different SR values. On the other hand, there is no significant changes for the second scenario when the SR decreases from 20 KHz to 10 KHz, denoting its robustness against the sampling rate issues.
We repeat the same experiments for the SVM classification model. The attained results are represented in Figure 4. As it can be observed, the baseline shows much variation in terms of F-Measure in dealing with different datasets. However, it almost shows the same average F-Measure value in comparison with the results of the MutInfFS used in the first scenario, but lower average F-Measure values compared with the second scenario. By comparing the first and second scenarios, the attained results, on one hand, show the superiority of the second scenario for each feature selection technique. On the other hand, the second scenario leads to lower variation in F-Measure values when InfFS, Relief, and MutInfFS techniques are applied. The attained results for the SVM classification model are summarized in Table 2.
The collected results in Table 2 denote that mRMR leads to the best performance in both scenarios, which is then followed by InfFS, Relief, and MutInfFS in the first scenario, and InfFS, MutInfFS, and Relief in the second scenario. In order to investigate the effect of FR value, we regroup datasets as before, where the FR value is 1 Ω for the first group and 10 Ω for the second group. The average F-Measure value for the first group has been obtained as 0.8365 for the first scenario, whereas it is 0.7840 for the second group of datasets. In the same vein, the attained results for the second scenario are 0.8909 and 0.8627, respectively. The attained results of this experiment, on one hand, denote that the second scenario outperforms the first scenario. On the other hand, they verify the more robust performance of the second scenario compared with the first scenario against the changes in the FR value. In order to check for the effect of noise on the performance of the proposed scheme, same as what was done for the kNN classification model, we regroup datasets into three groups w.r.t. SNR = 50 dB, SNR = 40 dB, and SNR = 30 dB. For the first scenario, the average F-Measure values for each group of datasets can be computed as 0.8335, 0.8044, and 0.7928, respectively. As for the second scenario, the attained results are 0.9022, 0.8949, and 0.8331, respectively. As it can be observed from the attained average values of F-Measure for both scenarios, the second scenario outperforms the first one in dealing with noisy measurements. Finally, we regroup datasets into two groups based on the SR values, where SR = 10 KHz for the first group and SR = 20 KHz for the second group. The average F-Measure values for these two groups are 0.8048 and 0.8157, respectively, for the first scenario, whereas the average values for the second scenario are 0.8756 and 0.8779, respectively. The attained results verify the superiority of the second scenario over the first scenario.
Following the presented results for the kNN and SVM classification models, some general remarks can be made. Generally speaking, SVM classification model has outperformed the kNN by considering both scenarios plus the baseline, where the average F-Measure for the SVM is 0.8304, whereas it is 0.7508 for the kNN. The results of of the second scenario suggest that mRMR shows the best performance in dealing with the aforementioned datasets, however, its combination with the SVM classification model leads to a better combination for the sake of fault diagnosis in the IEEE 118-bus system. Furthermore, the results of both classification models verified the superiority of the proposed technique in comparison with the baseline and the first scenario.
The main advantage of the proposed diagnostic scheme is that it is data-driven, and, therefore, there is no need to have knowledge about the explicit model of the system. Further to this, we have proposed the use of kGAN module in order to generate a set of informative features from the selected measurements. This module can generate this set of features despite of the type of the distribution of data. Furthermore, the proposed diagnostic scheme can be easily extended to involve semi-supervised and unsupervised feature selection techniques in order to benefit from their advantages. The main drawback of the proposed framework goes back to the fact that this technique is offline and cannot be used for a real-time implementation.

5. Conclusions

This work is devoted to the design of a novel diagnostic framework for distributed power systems. The proposed diagnostic framework involves three modules including feature selection, kGAN, and decision making for the sake of fault diagnosis. It makes use of the voltage, frequency, and phase angle measurements collected by means of sparse measuring devices attached to each bus of the power system. The collected data measurements are firstly fed into the feature selection module in order to find the most informative features. The selected feature are then further processed by feeding them into the kGAN module, where a technique based on the GANs has been used in order to generate the corresponding set of knockoffs of the selected features. Generated knockoff are finally fed into the decision making module, where two different classification models are utilized to diagnose different types of faults. A very comprehensive comparative study has been provided in order to investigate the performance of the proposed method in dealing with noisy data measurements, datasets with high fault resistance values, and datasets with different sampling rate values. The attained results verify the applicability, effectiveness, and superiority of the proposed framework in comparison with a literature work. Verifying the results of this work for other large-scale power systems by making use of other state-of-the-art feature selection techniques and classification models could be investigated in a future work.

Author Contributions

Conceptualization, H.H. and R.R.-F.; methodology, H.H. and R.R.-F.; software, H.H.; validation, H.H.; formal analysis, H.H. and R.R.-F. and M.S. and V.P.; investigation, H.H. and R.R.-F. and M.S. and V.P.; resources, R.R.-F.; data curation, H.H.; writing—original draft preparation, H.H.; writing—review and editing, H.H. and R.R.-F. and and V.P.; visualization, H.H.; supervision, R.R.-F. and M.S.; project administration, R.R.-F. and M.S.; funding acquisition, R.R.-F. and M.S. and V.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hassani, H.; Razavi-Far, R.; Saif, M. Fault location in smart grids through multicriteria analysis of group decision support systems. IEEE Trans. Ind. Inform. 2020, 16, 7318–7327. [Google Scholar] [CrossRef]
  2. Wang, Q.; Yu, Y.; Ahmed, H.O.A.; Darwish, M.; Nandi, A.K. Fault Detection and Classification in MMC-HVDC Systems Using Learning Methods. Sensors 2020, 20, 4438. [Google Scholar] [CrossRef] [PubMed]
  3. Hassani, H.; Razavi-Far, R.; Saif, M.; Capolino, G.A. Regression Models With Graph-Regularization Learning Algorithms for Accurate Fault Location in Smart Grids. IEEE Syst. J. 2021, 15, 2012–2023. [Google Scholar] [CrossRef]
  4. Razavi-Far, R.; Palade, V.; Zio, E. Optimal detection of new classes of faults by an invasive weed optimization method. In Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN), Beijing, China, 6–11 July 2014; pp. 91–98. [Google Scholar]
  5. Kari, T.; Gao, W.; Zhao, D.; Abiderexiti, K.; Mo, W.; Wang, Y.; Luan, L. Hybrid feature selection approach for power transformer fault diagnosis based on support vector machine and genetic algorithm. IET Gener. Transm. Distrib. 2018, 12, 5672–5680. [Google Scholar] [CrossRef]
  6. Hassani, H.; Hallaji, E.; Razavi-Far, R.; Saif, M. Unsupervised concrete feature selection based on mutual information for diagnosing faults and cyber-attacks in power systems. Eng. Appl. Artif. Intell. 2021, 100, 104150. [Google Scholar] [CrossRef]
  7. Cui, Q.; El-Arroudi, K.; Weng, Y. A feature selection method for high impedance fault detection. IEEE Trans. Power Deliv. 2019, 34, 1203–1215. [Google Scholar] [CrossRef] [Green Version]
  8. Gangavarapu, T.; Patil, N. A novel filter–wrapper hybrid greedy ensemble approach optimized using the genetic algorithm to reduce the dimensionality of high-dimensional biomedical datasets. Appl. Soft Comput. 2019, 81, 105538. [Google Scholar] [CrossRef]
  9. Cekik, R.; Uysal, A.K. A novel filter feature selection method using rough set for short text data. Expert Syst. Appl. 2020, 160, 113691. [Google Scholar] [CrossRef]
  10. Nouri-Moghaddam, B.; Ghazanfari, M.; Fathian, M. A novel multi-objective forest optimization algorithm for wrapper feature selection. Expert Syst. Appl. 2021, 175, 114737. [Google Scholar] [CrossRef]
  11. Fu, Y.; Liu, X.; Sarkar, S.; Wu, T. Gaussian mixture model with feature selection: An embedded approach. Comput. Ind. Eng. 2021, 152, 107000. [Google Scholar] [CrossRef]
  12. Liu, H.; Zhou, J.; Xu, Y.; Zheng, Y.; Peng, X.; Jiang, W. Unsupervised fault diagnosis of rolling bearings using a deep neural network based on generative adversarial networks. Neurocomputing 2018, 315, 412–424. [Google Scholar] [CrossRef]
  13. Farajzadeh-Zanjani, M.; Hallaji, E.; Razavi-Far, R.; Saif, M. Generative adversarial dimensionality reduction for diagnosing faults and attacks in cyber-physical systems. Neurocomputing 2021, 440, 101–110. [Google Scholar] [CrossRef]
  14. Shao, S.; Wang, P.; Yan, R. Generative adversarial networks for data augmentation in machine fault diagnosis. Comput. Ind. 2019, 106, 85–93. [Google Scholar] [CrossRef]
  15. Zhang, W.; Li, X.; Jia, X.D.; Ma, H.; Luo, Z.; Li, X. Machinery fault diagnosis with imbalanced data using deep generative adversarial networks. Measurement 2020, 152, 107377. [Google Scholar] [CrossRef]
  16. Glowacz, A. Ventilation diagnosis of angle grinder using thermal imaging. Sensors 2021, 21, 2853. [Google Scholar] [CrossRef] [PubMed]
  17. Zhang, W.; Chen, D.; Kong, Y. Self-Supervised Joint Learning Fault Diagnosis Method Based on Three-Channel Vibration Images. Sensors 2021, 21, 4774. [Google Scholar] [CrossRef]
  18. Farajzadeh-Zanjani, M.; Hallaji, E.; Razavi-Far, R.; Saif, M.; Parvania, M. Adversarial Semi-Supervised Learning for Diagnosing Faults and Attacks in Power Grids. IEEE Trans. Smart Grid 2021, 12, 3468–3478. [Google Scholar] [CrossRef]
  19. Barber, R.F.; Candès, E.J. Controlling the false discovery rate via knockoffs. Ann. Stat. 2015, 43, 2055–2085. [Google Scholar] [CrossRef] [Green Version]
  20. Jordon, J.; Yoon, J.; van der Schaar, M. KnockoffGAN: Generating knockoffs for feature selection using generative adversarial networks. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2018. [Google Scholar]
  21. Razavi-Far, R.; Davilu, H.; Palade, V.; Lucas, C. Model-based fault detection and isolation of a steam generator using neuro-fuzzy networks. Neurocomputing 2009, 72, 2939–2951. [Google Scholar] [CrossRef]
  22. Roffo, G.; Melzi, S.; Cristani, M. Infinite feature selection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–15 December 2015; pp. 4202–4210. [Google Scholar]
  23. Zaffalon, M.; Hutter, M. Robust feature selection using distributions of mutual information. In Proceedings of the 18th International Conference on Uncertainty in Artificial Intelligence (UAI-2002), Edmonton, AB, Canada, 1–4 August 2002; pp. 577–584. [Google Scholar]
  24. Li, X.; Xie, S.; Zeng, D.; Wang, Y. Efficient l0-norm feature selection based on augmented and penalized minimization. Stat. Med. 2018, 37, 473–486. [Google Scholar] [CrossRef]
  25. Kira, K.; Rendell, L.A. The Feature Selection Problem: Traditional Methods and a New Algorithm. In Proceedings of the AAAI-92, San Jose, CA, USA, 12–16 July 1992; Volume 2, pp. 129–134. [Google Scholar]
  26. Gui, J.; Sun, Z.; Wen, Y.; Tao, D.; Ye, J. A review on generative adversarial networks: Algorithms, theory, and applications. arXiv 2020, arXiv:2001.06937. [Google Scholar]
  27. Rezende, D.J.; Mohamed, S.; Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning; JMLR, Beijing, China, 21–26 June 2014; pp. 1278–1286. [Google Scholar]
  28. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
  29. Bengio, Y.; Yao, L.; Alain, G.; Vincent, P. Generalized denoising auto-encoders as generative models. arXiv 2013, arXiv:1305.6663. [Google Scholar]
  30. Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5 December 2016; pp. 2180–2188. [Google Scholar]
  31. Kurutach, T.; Tamar, A.; Yang, G.; Russell, S.; Abbeel, P. Learning plannable representations with causal infogan. arXiv 2018, arXiv:1807.09341. [Google Scholar]
  32. Spurr, A.; Aksan, E.; Hilliges, O. Guiding infogan with semi-supervision. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Skopje, Macedonia, 2017; pp. 119–134. [Google Scholar]
  33. Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
  34. Odena, A.; Olah, C.; Shlens, J. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the International Conference on Machine Learning; JMLR, Sydney, Australia, 6–11 August 2017; pp. 2642–2651. [Google Scholar]
  35. Nguyen, A.; Clune, J.; Bengio, Y.; Dosovitskiy, A.; Yosinski, J. Plug & play generative networks: Conditional iterative generation of images in latent space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4467–4477. [Google Scholar]
  36. Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
  37. Yi, Z.; Zhang, H.; Tan, P.; Gong, M. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2849–2857. [Google Scholar]
  38. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning; JMLR, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
  39. Wu, J.; Huang, Z.; Thoma, J.; Acharya, D.; Van Gool, L. Wasserstein divergence for gans. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 653–668. [Google Scholar]
  40. Qi, G.J. Loss-sensitive generative adversarial networks on lipschitz densities. Int. J. Comput. Vis. 2020, 128, 1118–1140. [Google Scholar] [CrossRef] [Green Version]
  41. Wiatrak, M.; Albrecht, S.V.; Nystrom, A. Stabilizing generative adversarial networks: A survey. arXiv 2019, arXiv:1910.00927. [Google Scholar]
  42. Denton, E.; Chintala, S.; Szlam, A.; Fergus, R. Deep generative image models using a laplacian pyramid of adversarial networks. arXiv 2015, arXiv:1506.05751. [Google Scholar]
  43. Shaham, T.R.; Dekel, T.; Michaeli, T. Singan: Learning a generative model from a single natural image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–3 November 2019; pp. 4570–4580. [Google Scholar]
  44. Shocher, A.; Bagon, S.; Isola, P.; Irani, M. Ingan: Capturing and retargeting the “dna” of a natural image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–3 November 2019; pp. 4492–4501. [Google Scholar]
  45. Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
  46. Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
  47. Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In Proceedings of the International Conference on Machine Learning; JMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 7354–7363. [Google Scholar]
  48. Brock, A.; Donahue, J.; Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
  49. Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
  50. Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial autoencoders. arXiv 2015, arXiv:1511.05644. [Google Scholar]
  51. Donahue, J.; Krähenbühl, P.; Darrell, T. Adversarial feature learning. arXiv 2016, arXiv:1605.09782. [Google Scholar]
  52. Ulyanov, D.; Vedaldi, A.; Lempitsky, V. It takes (only) two: Adversarial generator-encoder networks. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
  53. Hoang, Q.; Nguyen, T.D.; Le, T.; Phung, D. Multi-generator generative adversarial nets. arXiv 2017, arXiv:1708.02556. [Google Scholar]
  54. Liu, M.Y.; Tuzel, O. Coupled generative adversarial networks. Adv. Neural Inf. Process. Syst. 2016, 29, 469–477. [Google Scholar]
  55. Candès, E.J.; Fan, Y.; Janson, L.; Lv, J. Panning for Gold: Model-Free Knockoffs for High-Dimensional Controlled Variable Selection; Department of Statistics, Stanford University: Stanford, CA, USA, 2016. [Google Scholar]
  56. Belghazi, M.I.; Baratin, A.; Rajeshwar, S.; Ozair, S.; Bengio, Y.; Courville, A.; Hjelm, D. Mutual information neural estimation. In Proceedings of the International Conference on Machine Learning; JMLR, Stockholm, Sweden, 10–15 July 2018; pp. 531–540. [Google Scholar]
Figure 1. The general framework of the proposed diagnostic method.
Figure 1. The general framework of the proposed diagnostic method.
Sensors 21 05173 g001
Figure 2. The collected voltage, frequency, and phase angle measurements following an LL fault on bus #1 at t = 1 second of the simulation.
Figure 2. The collected voltage, frequency, and phase angle measurements following an LL fault on bus #1 at t = 1 second of the simulation.
Sensors 21 05173 g002
Figure 3. The attained F-Measure values for the kNN classification model w.r.t. each feature selection technique.
Figure 3. The attained F-Measure values for the kNN classification model w.r.t. each feature selection technique.
Sensors 21 05173 g003
Figure 4. The attained F-measure values by means of the SVM classification model w.r.t. each dataset.
Figure 4. The attained F-measure values by means of the SVM classification model w.r.t. each dataset.
Sensors 21 05173 g004
Table 1. The attained F-Measure (FM) values by means of the kNN classification model w.r.t. each dataset.
Table 1. The attained F-Measure (FM) values by means of the kNN classification model w.r.t. each dataset.
DatasetBaselineInfFSReliefMutInfFSmRMR
S#1S#2S#1S#2S#1S#2S#1 S#2
A 1 0.73790.85270.83120.77180.79630.74790.85600.77580.8647
A 2 0.68090.78470.82090.73860.78580.71110.84780.74680.8563
A 3 0.65190.71490.79660.71690.69730.69620.79680.72790.7818
A 4 0.75220.88280.84080.79280.80520.76950.86320.79560.8733
A 5 0.70060.80200.83050.75870.78870.73290.85110.75990.8639
A 6 0.66200.74180.74150.73880.70190.71130.77570.74320.7885
A 7 0.66910.77540.82440.72170.81490.65710.81550.76580.8046
A 8 0.56680.72740.81830.67050.80700.59050.80510.71570.7957
A 9 0.49730.65350.74060.63710.73310.61800.72360.68280.7141
A 10 0.68840.81250.82980.75440.81940.75150.82110.79100.8077
A 11 0.59250.73170.81810.69810.81060.68800.80840.73910.7956
A 12 0.51870.67990.74380.66910.73430.64850.72870.70900.7217
Avg.0.64320.76330.80300.72240.77430.69350.80550.74600.8057
Table 2. The attained F-Measure (FM) values by means of the SVM classification model w.r.t. each dataset.
Table 2. The attained F-Measure (FM) values by means of the SVM classification model w.r.t. each dataset.
DatasetBaselineInfFSReliefMutInfFSmRMR
S#1S#2S#1S#2S#1S#2S#1S#2
A 1 0.79700.91430.91560.82580.89360.77760.91590.86960.9413
A 2 0.74070.88980.90600.81570.88770.75200.90610.86000.9344
A 3 0.67990.87680.84140.79980.80840.73510.84800.85160.8840
A 4 0.82650.87800.90910.85710.89730.80070.91950.88640.9407
A 5 0.77490.85270.90030.83300.89010.76710.90900.87010.9361
A 6 0.72110.84810.84180.82540.82190.75600.84560.86110.8867
A 7 0.73410.82590.88860.78930.86940.71590.86970.85460.9156
A 8 0.67060.79770.88390.76110.86400.67460.88670.83770.9050
A 9 0.61870.78560.81700.75180.80000.65490.80190.82590.8513
A 10 0.76970.83730.89340.81470.87660.73530.87420.88230.9157
A 11 0.71430.82060.88650.78390.86680.69540.86560.85950.9085
A 12 0.66110.80300.82210.77670.80540.67620.80000.85680.8543
Avg.0.72570.84410.87550.80290.85690.72840.86850.86560.9061
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hassani, H.; Razavi-Far, R.; Saif, M.; Palade, V. Generative Adversarial Network-Based Scheme for Diagnosing Faults in Cyber-Physical Power Systems. Sensors 2021, 21, 5173. https://doi.org/10.3390/s21155173

AMA Style

Hassani H, Razavi-Far R, Saif M, Palade V. Generative Adversarial Network-Based Scheme for Diagnosing Faults in Cyber-Physical Power Systems. Sensors. 2021; 21(15):5173. https://doi.org/10.3390/s21155173

Chicago/Turabian Style

Hassani, Hossein, Roozbeh Razavi-Far, Mehrdad Saif, and Vasile Palade. 2021. "Generative Adversarial Network-Based Scheme for Diagnosing Faults in Cyber-Physical Power Systems" Sensors 21, no. 15: 5173. https://doi.org/10.3390/s21155173

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop