A Deep Learning Approach for the Automated Classification of Geomagnetically Induced Current Scalograms

Aksenovich, Tatyana; Selivanov, Vasiliy

doi:10.3390/app14020895

Open AccessArticle

A Deep Learning Approach for the Automated Classification of Geomagnetically Induced Current Scalograms

by

Tatyana Aksenovich

^*

and

Vasiliy Selivanov

Northern Energetics Research Centre—Branch of the Federal Research Centre “Kola Science Centre of the Russian Academy of Sciences” (NERC KSC RAS), 184209 Apatity, Russia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(2), 895; https://doi.org/10.3390/app14020895

Submission received: 5 November 2023 / Revised: 15 January 2024 / Accepted: 18 January 2024 / Published: 20 January 2024

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

During geomagnetic storms, which are a result of solar wind’s interaction with the Earth’s magnetosphere, geomagnetically induced currents (GICs) begin to flow in the long, high-voltage electrical networks on the Earth’s surface. It causes a number of negative phenomena that affect the normal operation of the entire electric power system. To investigate the nature of the phenomenon and its effects on transformers, a GIC monitoring system was created in 2011. The system consists of devices that are installed in the neutrals of autotransformers at five substations of the Kola–Karelian power transit in northwestern Russia. Considering the significant amount of data accumulated over 12 years of operating the GIC monitoring system, manual analysis becomes impractical. To analyze the constantly growing volume of recorded data effectively, a method for the automatic classification of GICs in autotransformer neutrals was proposed. The method is based on a continuous wavelet transform of the neutral current data combined with a convolutional neural network (CNN) to classify the obtained scalogram images. The classifier’s performance is evaluated using accuracy and binary cross-entropy loss metrics. As the result of comparing four CNN architectures, a model that showed high GIC classification performance on the validation set was chosen as the final model. The proposed CNN model, in addition to the main layers, includes pre-processing layers and a dropout layer.

Keywords:

geomagnetically induced currents; autotransformer; continuous wavelet transform; convolutional neural network; binary classification

1. Introduction

During geomagnetic storms, which are a result of solar wind’s interaction with the Earth’s magnetosphere, geomagnetically induced currents (GICs) begin to flow in the long, high-voltage electrical networks on the Earth’s surface [1,2,3,4,5]. One of the paths for their flow is the grounded neutrals of power transformers and autotransformers. Even a slight deviation of the neutral current from zero (tens of amperes) leads to half-cycle saturation of the electrical device core. This, in turn, causes a number of malfunctions in electrical networks. Examples of the detrimental effects of GICs are the generation of even and odd harmonics, an increase in reactive power demand, overheating occurring in individual parts of transformer windings and their construction elements, the generation of vibrations in the transformer core and windings, and the unwanted tripping of protective devices [6,7,8]. Normally the duration of GIC flow does not exceed a few minutes. But due to repeated GIC impacts on transformers, they may fail. There have been several cases in the past where GIC flow during strong geomagnetic storms has resulted in outages in high-voltage power transmission systems in Canada and Sweden [9,10].

More than 12 years ago, a system for registering GICs in electrical grids on the Kola Peninsula and in Karelia was created [11]. This monitoring system was originally designed for the European Risk from Geomagnetically Induced Currents (EURISGIC) project [12]. The Polar Geophysical Institute (Russian Academy of Sciences) and the Northern Energetics Research Centre (Kola Science Centre of the Russian Academy of Sciences) participated in the development and creation of devices for recording GICs. These devices are installed at five substations in northwestern Russia. Figure 1 shows the location of 330 kV transmission lines and part of the 110 kV transmission lines in the Murmansk region and Karelia, as well as substations where continuous recording of GICs is carried out.

At the VKH, TTN, LKH, and KND substations, current monitoring is carried out in the dead-grounded neutrals of 330 kV autotransformers, and in the neutral of the 110 kV transformer at the RVD substation. When assessing current values, it is necessary to take into account that the monitoring devices at the VKH, TTN, and LKH substations are installed in the neutral of one of two step-down autotransformers, in contrast to the KND and RVD substations with one autotransformer and one transformer, respectively. In this regard, the total current in the grid node at the VKH, TTN, and LKH substations equals twice the measured current.

The GIC monitoring system was created to solve the following problems:

The assessment of GICs’ impact on power autotransformers of 330 kV power lines;
Investigations into the correlation dependencies between GIC values and the level of geomagnetic activity;
The identification of line sections that are most affected by geomagnetic disturbances;
The development of measures to protect equipment from the GICs’ influence.

Table 1 provides information about the substations where the devices of the GIC registration system are installed, information about the duration of these devices’ operation, and the total size of files recorded at each substation. As can be seen from the numbers in the last column, the size of the data varies depending on the point of study. This is explained by the fact that part of the data is missing. The reasons for the missing files are hard drive failure; the shutdown of the device due to external interference; and problems with data transfer to the server associated with weak cell signal reception [13]. Since the start of the GIC monitoring system, more than 16,000 files have been created, containing data of currents in the autotransformers’ neutrals. The total size of these files was about 27 GB at the beginning of 2023. The constantly growing volume of recorded neutral current data cannot be analyzed manually within a reasonable amount of time.

The purpose of this study is to develop a convolutional neural network (CNN) model for the automatic analysis of the 12 years of data accumulated by the GIC monitoring system in northwestern Russia. To train the CNN model, a dataset containing 800 scalogram images was created. The dataset includes images of two classes: GICs and geomagnetically quiet hours. To select the most suitable neural network for solving the problem, the performance of four models with different configurations was compared.

The structure of this paper is as follows: Section 2 provides an overview of existing research that applies continuous wavelet transform (CWT) and CNNs to analyze nonstationary signals in power systems, including GICs. Section 3 is dedicated to the methodology and the created dataset. Section 3.1 introduces the basic concepts of CNNs, describes the layers required to build one, and explains the choice of performance metrics for binary classification. Section 3.2 presents a description of the CWT method. Section 3.3 details the dataset specially created for this study, consisting of two-dimensional images of wavelet spectra (scalograms) of the current in autotransformer neutrals. Section 4 presents the results of a performance comparison of four CNN models and a detailed description of the architecture of the chosen model. The conclusions are presented in Section 5.

2. Related Work

This section contains a review of recent studies that employed CWT and CNNs to analyze nonstationary signals in power systems, including GICs.

A machine learning approach for detecting GICs in power grids was proposed in [14]. The authors used a CNN-based architecture to classify scalogram and spectrogram images obtained via the CWT and short-time Fourier transform (STFT) of currents of single-phase current transformers, respectively. CWT and STFT are spectral analysis methods. The main advantage of CWT is that the wavelet function is scaled during signal analysis, while the sinusoid function has a fixed size during STFT. The proposed framework, consisting of a hybrid feature extractor using Gaussian CWT and STFT and a CNN, successfully achieved good performance, even under low-intensity GICs. The framework works very quickly (detection is achieved within 30 ms). Thus, it can be applied for online monitoring applications in power systems.

Study [15] proposed a deep learning method for fault diagnosis in dry-type transformers using vibration signals. To convert the transformer vibration signals to scalogram images, a CWT method was used, which allows for the extraction of fault features from different conditions. The proposed diagnosis method achieved the mean accuracy of 99.95%. The feature extraction and classification process took less than 7 s, which potentially provides a fast and accurate online diagnosis of transformer faults.

In article [16], deep learning algorithms were used to classify power quality disturbances. This study examined the application of several CNN models for voltage disturbance classification. The results obtained showed high accuracy for a CNN created from scratch and ResNet-50 (transfer learning). This study confirmed the suitability of using the CNN for voltage disturbance classification and the advantage of using scalograms to characterize these disturbances.

The studies listed above use CNN-based methods. This is due to the fact that this type of artificial neural network architecture has several advantages over other machine learning algorithms for image classification. Traditional image feature extraction algorithms focus more on manually setting specific image features, so these methods have a poor generalization ability and portability. In contrast, CNNs automatically detect the relevant features without any human supervision [17]. CNNs use a convolution technique that allows CNNs to reduce the amount of information that needs to be processed, making them faster and more efficient than other types of algorithms. CNNs can exploit the spatial and hierarchical structure of images by using filters that build more abstract and high-level representations as they go deeper into the network. Therefore, CNNs can capture the variability and diversity of the images, and also generalize well to new and unseen data. All this makes CNNs suitable for image classification.

3. Data and Methods

3.1. CNNs

To analyze data from devices of the GIC monitoring system, we use a CNN for the binary classification of scalograms obtained after applying a CWT to the neutral currents. The choice of CNN is justified by the fact that this deep learning algorithm is the most popular, including for solving image classification tasks [18,19]. Classification is a concept that categorizes a dataset, in our case, images, into classes. CNNs are a class of feedforward learning algorithm. They are designed to automatically and adaptively learn spatial hierarchies of features, from minor features to global ones. This aspect improves the generalization ability of such neural networks and makes them perfect for image classification.

3.1.1. Architecture of CNNs

The typical CNN structure consists of three types of layers, which are the convolutional layer, the pooling layer, and the fully connected layer. In addition to the above main layers of a CNN, it may include layers such as a non-linear activation layer, a classification layer, a dropout layer, and a pre-processing layer. The convolutional and non-linear layers are very commonly combined into one. Brief information about these layers is presented below.

The convolutional layer, as its name suggests, is the most significant component in CNN architecture and plays an important role in its operation. The main purpose of these layers is feature extraction: they learn feature representations of their input images. The convolutional layer consists of a set of convolutional filters called kernels. The kernel is a grid of discrete numbers or values called kernel weights. At the beginning of the CNN training process, random numbers are assigned as the weights of the kernel. The input image is convolved with filters to generate the feature map of the output. The kernel slides over the whole image horizontally and vertically. A dot product is determined between the input image and the kernel, where their corresponding values are simultaneously multiplied and then summed up to calculate a single scalar value. The whole process is repeated until no further sliding is possible. The calculated dot product values create the output feature map.

By optimizing the output of convolutional layers, they are able to significantly reduce model complexity. The output is optimized through three hyperparameters: the depth, the stride, and setting zero-padding.

Non-linear activation functions allow for the extraction of non-linear features. The main purpose of all types of activation functions is to map the input to the output. The neuron’s input value is determined by computing the summation of weighted outputs from all neurons in the previous layer. In the first layer of the CNN, the neuron’s input is the numeric representation of the image. Thus, the activation function decides which neuron to fire with reference to a particular input by creating the corresponding output. These layers allow the CNN to learn more complex things. Non-linear activation layers are applied after all convolutional and fully connected layers, i.e., after layers with weights. The following types of activation functions are commonly used in CNNs: sigmoid, tanh, ReLU (rectified linear unit), and leaky and noisy ReLUs [20]. In this study, an ReLU is chosen as the activation function [21]. The choice of this activation function is explained by the fact that it has simple computation, good sparsity, a fast convergence speed, and solves the gradient dispersion problem caused by sigmoid and tanh functions [22].

The purpose of the pooling layer is to reduce the spatial resolution of the feature maps, due to which spatial invariance to input distortions and shifts is achieved. At this layer, part of the data is discarded, decreasing the complexity of the model. This, in turn, reduces the chance of it overfitting. There are several types of pooling methods. These methods include average pooling, min pooling, max pooling, tree pooling, gated pooling, and global average pooling [23]. This study uses max pooling because it provides unique performance features. Max pooling calculates the maximum value in each patch of the feature map covered by the kernel, thereby highlighting the most present feature in the patch.

The fully connected layers are located at the end of each CNN architecture. These layers acquired their name because of the approach of the same name, which is that each neuron in the layer is connected to all neurons of the preceding layer. The input of these layers is in the form of a vector, which is created from the feature maps after flattening. The fully connected layer is used as a CNN classifier: it takes input from feature extraction stages and globally analyzes the output of all previous layers. The result is a non-linear combination of selected features that are used to classify the data.

The final classification is performed on the last layer, which represents the output layer of the CNN. To calculate the class predicted error, loss functions are used in the output layer; the error value reveals the difference between the actual output and the predicted one. The predicted error is optimized through the CNN learning process. In the case of binary classification, the following loss functions are used: binary cross-entropy loss, Hinge loss, and squared Hinge loss. The proposed CNN uses the binary cross-entropy loss function, a detailed description of which is given in Section 3.1.2. Cross-entropy loss is one of the most commonly used loss functions for training classifiers. One of the properties of this function is to provide a clear signal for model updates, making it suitable for classification tasks.

One of the problems encountered with training neural networks on small datasets is overfitting [24]. Along with underfitting, such an undesirable machine learning behavior is the main cause for the poor performance of algorithms. Overfitting is a phenomenon when a model performs well on training data, but shows not so good performance on test data. Thus, the poor ability of the model to generalize new, previously unused examples makes it useless. In order to prevent overfitting, a regularization technique called dropout is often used in neural networks [25]. The key idea of the dropout layer is that randomly selected units from the neural network are ignored during the training of a model, i.e., they are “dropped out” randomly. This allows for a reduction in the interdependent learning of neurons: since a neuron can no longer rely on the presence of other neurons, it is forced to learn more robust features. The dropout layer is placed before the layer to which we want to apply the dropout.

The pre-processing layers are used in the early stages of a CNN, and include data augmentation. Data augmentation is a set of methods used to artificially expand the size of the training dataset [26]. This technique is often used on small datasets, as it allows the model to avoid overfitting and improve its performance. The essence of this method is to perform random (but realistic) transformations to existing images to create a set of new variants without altering their natures. Typical data augmentation approaches in image classification are cropping, reflection, translation, rotation, zoom, and contrast adjustment. The pre-processing layers are only active during model training.

3.1.2. Performance Metrics for Binary Classification

The choice of neural network performance metrics is an integral part of creating any model. The main purpose of metrics is to monitor and measure the performance of the model during its training and testing. They indicate weaknesses in the model and dataset, the elimination of which leads to an increase in the generalization ability of the final model. Depending on the objectives and specifics of the research, different metrics are used that are characteristic of a particular field of science. Frequently the choice of method of evaluating model quality is based on the experience of researchers in the same field, as well as on the experience of the researchers themselves. In the case of binary classification, metrics such as accuracy, precision, recall, F1 score, binary cross-entropy loss, receiver operator characteristic and area under the curve (ROC-AUC), and the Matthews correlation coefficient (MCC) are used to evaluate the performance of the neural network [27]. In this paper, the following metrics were chosen to evaluate the performance of the created CNN models: accuracy and the binary cross-entropy loss function. Both metrics are often used when debugging classifiers with two classes [28]. One of the necessary conditions for the correct use of accuracy as an evaluation metric is the presence of a balanced dataset. In this study, this condition is completely fulfilled.

Accuracy is the most widely spread measure of performance, which describes the overall prediction accuracy of a model across all classes. It is calculated as the ratio between the number of correct class predictions to the total number of predictions. In other words, the metric is the fraction of correct predictions of the trained model. Accuracy is measured as a percentage, ranging from 0% to 100%. A higher value of this metric corresponds to a better-performing classifier, while a lower one indicates more errors in class predictions.

For binary classification, accuracy can be defined in terms of positives and negatives according to the following equation [29]:

ACC = \frac{TP + TN}{TP + TN + FP + FN},

(1)

where TP (true positive) is the number of correctly classified positive samples; TN (true negative) is the number of correctly classified negative samples; FP (false positive) is the number of misclassified positive samples; and FN (false negative) is the number of misclassified negative samples.

The binary cross-entropy loss function is a metric in machine learning that is used to evaluate how well a classification model with two classes performs. This function takes two values, actual and predicted, and returns the comparative evaluation that shows how close the predicted probability is to the actual value (0 or 1). Thus, the loss function rewards the model for giving a correct prediction with a low loss and penalizes for a wrong outcome with a high error value. In the ideal case, the value of binary cross-entropy loss is zero. But a loss of zero indicates overfitting. This means that the model learns, along with the necessary features, noise or random fluctuations in the data, which negatively affects the model’s ability to generalize. Therefore, in practice, the main goal throughout the training of a CNN is to minimize the loss value, i.e., to minimize the difference between the predicted probability and true label in a final model.

Binary cross-entropy is a special case of cross-entropy [30], and is determined by the following formula:

BCEloss = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log (p_{i}) + (1 - y_{i}) \log (1 - p_{i})],

(2)

where N is the number of final layer outputs (1 for sigmoid);

y_{i}

is the binary indicator (0 or 1) denoting the class for the sample, i; and

p_{i}

is the classifier’s predicted probability distribution.

3.2. CWT

CWT is a widely spread spectral analysis method for nonstationary signals, in particular GICs [31,32,33,34]. In the case of intellectual fault diagnosis for power systems, CWT is often used as a feature extraction method from different transformer signals such as vibrations, currents and voltages [14,15,16]. CWT provides an accurate representation of changes in the frequency characteristics of a signal over time. The essence of the method is to convolve the analyzed signal with a scaled and translated version of the analyzing function, which is called the mother wavelet. The wavelet is a mathematical function that most frequently has a wave-like form, and its amplitude tends to zero with distance from the origin of the coordinates. An important property of wavelets is time–frequency localization, which makes it possible to extract frequency characteristics of a signal with good localization in time. As a result of CWT, the wavelet coefficients are calculated, the values of which are directly proportional to the degree of similarity between the analyzed signal and the selected wavelet. The CWT of a discrete time series,

x_{n},

is defined by the following equation [35]:

W_{n} (s) = \sum_{n^{'} = 0}^{N - 1} x_{n^{'}} ψ^{*} [\frac{(n^{'} - n) δ t}{s}],

(3)

where n is the localized time index, s is the wavelet scaling factor,

ψ

is the mother wavelet, * symbolizes a complex conjugation, and

δ t

is the sampling interval.

The graphical representation of a three-dimensional array of wavelet coefficients,

W_{n} (s)

, is a scalogram: the x axis indicates time; the y axis represents a frequency, the value of which is inversely proportional to the wavelet scale (width); and colors indicate the wavelet coefficients, the greater the value of which, the more the wavelet matches the shape of the analyzed signal. The scalogram is calculated as the square of the modulus of the wavelet coefficients values,

{|W_{n} (s)|}^{2}

. Two-dimensional images of wavelet spectra provide a visual representation of various phenomena in the researched signal. They are used in such areas of machine learning as classification and pattern recognition [16].

To compute scalograms for the dataset, this paper uses the open-source library PyWavelets (version 1.3.0) [36], which contains a number of CWT-compatible mother wavelets.

3.3. The GIC Scalogram Dataset

The quality of a dataset on which a CNN is trained has a strong impact on its classification performance. In order to be considered high-quality, a labeled dataset must satisfy two basic conditions. Firstly, the images must be directly related to the task. Secondly, the images in each class must be uniform.

To train the CNN, a dataset containing two-dimensional images of wavelet spectra of the current in autotransformer neutrals was created. The produced dataset consists of 800 RGB scalogram images with a size of 224 × 224 pixels. Since the goal of this study is to detect the presence or absence of GICs in the neutral current (binary classification), the dataset includes images of two classes: GICs and geomagnetically quiet hours (Figure 2). The high GIC values that are most dangerous for transformers have frequencies in the range from 1.5 mHz to 3.6 mHz [34], so it was decided to select the frequency range between 1.36 mHz and 5.42 mHz for the scalograms in this study. Also, in this frequency range, only neutral currents caused by unbalanced load distribution among three phases are reflected. Thus, a geomagnetically quiet hour means the absence of GICs and the presence of insignificant currents resulting from phase imbalance. This explains the consideration of the binary classification problem in this study. The number of examples in the “GIC” class (400) equals the number of examples in the “geomagnetically quiet hour” class (400), which makes the dataset balanced. This characteristic of the dataset makes training a classifier easier by helping to prevent the model from becoming biased towards the class with more examples, and potentially improves the accuracy of the future model.

Each image in the dataset is a scalogram of the neutral current, with a duration of 60 min and a frequency range from 1.36 mHz to 5.42 mHz. The choice of such a time interval for this study is based on several reasons. Firstly, files that contain neutral current data are transferred from GIC monitoring devices to the server every hour. In the future, the proposed CNN model will be able to analyze new measurements in real time. Secondly, GIC events have a duration of up to several hours. The CWT is applied to the current data with a sampling interval of 1 s. As a previous study has shown [34], a sampling rate of 1 Hz is sufficient for a reliable estimate of GIC values, including peak values, as well as for determining their time localization. A non-orthogonal Morlet wavelet is used as the mother wavelet. The benefits of using this wavelet function as the most suitable function for the analysis of GICs in autotransformer neutrals were also considered and proven in practice in the previous study [34]. The colors in the scalograms denote the values of wavelet coefficients (Figure 2) and characterize the degree of matching between the researched signal and the wavelet. The red color corresponds to the maximum value of the energy and the blue color corresponds to the minimum one.

We calculated the standard deviation (SD) of each scalogram image (Figure 2) to show that the standard deviation classification method is not applicable to our problem. SD characterizes the degree of deviation of data from the mean value and is defined as the square root of the variance [37]. To classify GIC and “geomagnetically quiet hour” scalograms using the SD, it is necessary to find an empirical threshold. But, as can be seen from the values in Figure 2, GIC scalograms can have both high and relatively low SD values, comparable to the SD values for “geomagnetically quiet hour” scalograms. This makes it difficult to find an accurate empirical threshold for classifying events, and may lead to incorrect class definition. This, in turn, potentially reduces the accuracy of this method.

4. GIC Classification by CNNs

The main goal of this study is to create a binary classifier of scalograms of currents in an autotransformer neutral for analyzing data from the GIC monitoring system in northwestern Russia. For this purpose, a comparative analysis of four CNN models with different configurations was carried out. All models were built from scratch. The first model has the following architecture: four convolutional layers, four max pooling layers, a flatten layer, and two fully connected layers. The architectures of the second and third models differ from the first one based on the presence of a dropout layer after the fourth max pooling layer and pre-processing layers before the first convolutional layer, respectively. The fourth model contains all of the above layers in its architecture.

The models’ training and testing were conducted using an Intel(R) Core(TM) i5-11400 CPU (central processing unit) at 2.6 GHz. Other computer specifications used in the experiment were 16 GB RAM (random access memory) and the Microsoft Windows 10 Pro 64-bit OS (operating system). A well-known Keras library (version 2.12.0) was chosen as the deep learning framework [38]. Keras is the open-source high-level API (application programming interface), running on top of the TensorFlow platform for machine learning.

The CNN models were trained by employing 80% (640) of the scalogram images and validated using the other 20% (160). Accuracy and the binary cross-entropy loss function were used as the evaluation metrics. Training all models was carried out over 15 epochs. Table 2 shows the results of the models’ performance validation.

In general, it can be stated that all CNN models showed good results. An accuracy of 99.37% with a 0.0098 binary cross-entropy loss was obtained for the first CNN model. The second binary classifier model successfully classified the scalogram images with 99.37% accuracy and the lowest binary cross-entropy loss of 0.0092. For the third CNN model, the lowest accuracy of 98.75% with the highest value of binary cross-entropy loss of 0.0344 was obtained. The best accuracy of 100.00% with a binary cross-entropy loss of 0.0115 was estimated for the fourth CNN model. The difference in the total training time (15 epochs) between the fastest and slowest CNN model was only 16 s. As a result of comparing the performance of CNN models with different structures, the last neural network model containing image augmentation layers and a dropout layer was selected, since this one achieved maximum accuracy. Also, the higher loss value of this model compared to the first model (it achieved the second highest accuracy among the considered CNN architectures) suggests that the CNN learned fewer wrong features associated with noise or random fluctuations in the data. Thus, this model will potentially be better at classifying previously unseen data. The selected CNN model also outperformed a framework for detecting GICs in power systems using a hybrid time–frequency analysis (STFT and CWT) combined with a CNN in terms of accuracy. This framework achieved 90.95% accuracy [14]. Figure 3 shows the learning curves for this model on the training and validation sets. The use of dropout along with an artificial increase in the size of the training dataset allows the model to avoid overfitting and achieve high performance. As seen in Figure 3, the validation accuracy of the selected model is higher than the training accuracy and the validation loss is lower than the training one. This is explained by the fact that regularization (dropout) and data augmentation are applied only during training, making classification of the training set more difficult. Since the dropout layer is not active during validation, the model uses all features, which makes it more robust, increases validation accuracy, and decreases validation loss. Data augmentation increases the complexity of the examples in the training set, resulting in a decrease in training accuracy and an increase in training loss compared to the values of these metrics during validation.

The architecture of the proposed CNN model for the scalogram classification of currents in autotransformer neutrals is illustrated in Figure 4. It consists of four convolutional layers, four max pooling layers, a dropout layer, a flatten layer, and two fully connected layers. Each convolutional layer and the first fully connected layer are accompanied by an ReLU activation function, which is used to provide efficient training performance. The kernel size of each subsequent convolutional layer is reduced, making it possible to extract local features from images at different filter scales. Since our goal is binary classification, the number of neurons in the last layer is equal to one. The sigmoid function was chosen as the activation function for the output layer because its use reduces the output to a value from zero to one, representing the probability. Binary cross-entropy loss is used to calculate the discrepancy between the predicted and actual value. The loss function is purpose-built for binary classifiers. The rate for the dropout layer is set to 0.5, which means that half of the units are randomly ignored during model training. The layers listed above are preceded by pre-processing layers, which are used to generate additional training images by augmenting existing examples. The random augmentation transforms that were applied are reflection (‘horizontal’), rotation (set to 0.01), and zoom (height factor set to (−0.001, −0.1), width factor set to (−0.01, −0.6)). Data augmentation was employed to improve the generalization of the proposed CNN model. The Adam (Adaptive Moment Estimation) learning algorithm [39] was used as an optimizer, the learning rate of which was set to 0.0001. This is a popular alternative for the classical stochastic gradient descent process because it automatically tunes itself and achieves good results quickly. The learning rate is an important parameter for neural networks and controls the adjustment of weights at each iteration, taking into account the loss function. Small values of this parameter increase the accuracy of the algorithm’s tuning and potentially reduce the training error.

5. Conclusions

This paper proposes a method for GIC detection in neutral currents of autotransformers based on CWT and CNNs. Although similar research has been performed in the past, this paper is the first to propose an analysis of data of currents in autotransformer neutrals from operating substations in northwestern Russia, based on the binary classification of current scalograms. To train the CNN model, a dataset containing 800 scalogram images of two classes, GICs and geomagnetically quiet hours, was created. As a result of comparing the performance of four CNNs with different architectures, a model that showed high GIC classification performance on the validation set (100.00% accuracy and binary cross-entropy loss of 0.0115) was chosen. The proposed CNN model, in addition to the main layers, includes pre-processing layers and a dropout layer. During further research, it is planned to analyze 12 years of data from the GIC monitoring system using the proposed CNN-based classifier to identify the line sections that are most affected by geomagnetic disturbances, as well as to develop measures to protect equipment from the GIC influence. To achieve the latter goal, it is planned to create and implement a program for detecting GICs in the neutrals of autotransformers in real time at 330 kV substations in the Murmansk region. This program will help substation personnel monitor the situation and take timely measures to protect equipment.

Author Contributions

Conceptualization, T.A. and V.S.; methodology, T.A. and V.S.; software, T.A.; formal analysis, T.A.; investigation, T.A. and V.S.; writing—original draft preparation, T.A.; writing—review and editing, V.S.; visualization, T.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Russian Science Foundation, grant number 22-29-00413.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw GIC data may be accessed at http://gic.en51.ru/ (accessed on 17 January 2024). The dataset created and analyzed during the current study is available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Boteler, D.H. Assessment of Geomagnetic Hazard to Power Systems in Canada. Nat. Hazards 2001, 23, 101–120. [Google Scholar] [CrossRef]
Erinmez, I.A.; Kappenman, J.G.; Radasky, W.A. Management of the Geomagnetically Induced Current Risks on the National Grid Company’s Electric Power Transmission System. J. Atmos. Sol. Terr. Phys. 2002, 64, 743–756. [Google Scholar] [CrossRef]
Pirjola, R. Effects of Space Weather on High-Latitude Ground Systems. Adv. Sp. Res. 2005, 36, 2231–2240. [Google Scholar] [CrossRef]
Liu, C.M.; Liu, L.G.; Pirjola, R. Geomagnetically Induced Currents in the High-Voltage Power Grid in China. IEEE Trans. Power Deliv. 2009, 24, 2368–2374. [Google Scholar] [CrossRef]
Marshall, R.A.; Dalzell, M.; Waters, C.L.; Goldthorpe, P.; Smith, E.A. Geomagnetically Induced Currents in the New Zealand Power Network. Space Weather 2012, 10, S08003. [Google Scholar] [CrossRef]
Molinski, T.S. Why Utilities Respect Geomagnetically Induced Currents. J. Atmos. Sol. Terr. Phys. 2002, 64, 1765–1778. [Google Scholar] [CrossRef]
Kappenman, J.G. Geomagnetic Disturbances and Impacts upon Power System Operation. In Electric Power Generation, Transmission, and Distribution: The Electric Power Engineering Handbook; CRC Press: Boca Raton, FL, USA, 2018; pp. 1–22. ISBN 9781315222424. [Google Scholar]
Rajput, V.N.; Boteler, D.H.; Rana, N.; Saiyed, M.; Anjana, S.; Shah, M. Insight into Impact of Geomagnetically Induced Currents on Power Systems: Overview, Challenges and Mitigation. Electr. Power Syst. Res. 2021, 192, 106927. [Google Scholar] [CrossRef]
Pulkkinen, A.; Lindahl, S.; Viljanen, A.; Pirjola, R. Geomagnetic Storm of 29-31 October 2003: Geomagnetically Induced Currents and Their Relation to Problems in the Swedish High-Voltage Power Transmission System. Space Weather 2005, 3, S08C03. [Google Scholar] [CrossRef]
Guillon, S.; Toner, P.; Gibson, L.; Boteler, D. A Colorful Blackout: The Havoc Caused by Auroral Electrojet Generated Magnetic Field Variations in 1989. IEEE Power Energy Mag. 2016, 14, 59–71. [Google Scholar] [CrossRef]
Barannik, M.B.; Danilin, A.N.; Kat’kalov, Y.V.; Kolobov, V.V.; Sakharov, Y.A.; Selivanov, V.N. A System for Recording Geomagnetically Induced Currents in Neutrals of Power Autotransformers. Instrum. Exp. Technol. 2012, 55, 110–115. [Google Scholar] [CrossRef]
Viljanen, A. European Project to Improve Models of Geomagnetically Induced Currents. Space Weather 2011, 9, S07007. [Google Scholar] [CrossRef]
Selivanov, V.N.; Aksenovich, T.V.; Bilin, V.A.; Kolobov, V.V.; Sakharov, Y.A. Database of Geomagnetically Induced Currents in the Main Transmission Line “Northern Transit”. Sol. Terr. Phys. 2023, 9, 93–101. [Google Scholar] [CrossRef]
Wang, S.; Dehghanian, P.; Li, L.; Wang, B. A Machine Learning Approach to Detection of Geomagnetically Induced Currents in Power Grids. IEEE Trans. Ind. Appl. 2020, 56, 1098–1106. [Google Scholar] [CrossRef]
Li, C.; Chen, J.; Yang, C.; Yang, J.; Liu, Z.; Davari, P.; Li, C.; Chen, J.; Yang, C.; Yang, J.; et al. Convolutional Neural Network-Based Transformer Fault Diagnosis Using Vibration Signals. Sensors 2023, 23, 4781. [Google Scholar] [CrossRef] [PubMed]
Salles, R.S.; Ribeiro, P.F. The Use of Deep Learning and 2-D Wavelet Scalograms for Power Quality Disturbances Classification. Electr. Power Syst. Res. 2023, 214, 108834. [Google Scholar] [CrossRef]
Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
Rawat, W.; Wang, Z. Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review. Neural Comput. 2017, 29, 2352–2449. [Google Scholar] [CrossRef]
Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A Survey of the Recent Architectures of Deep Convolutional Neural Networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Hahnioser, R.H.R.; Sarpeshkar, R.; Mahowald, M.A.; Douglas, R.J.; Seung, H.S. Digital Selection and Analogue Amplification Coexist in a Cortex-Inspired Silicon Circuit. Nature 2000, 405, 947–951. [Google Scholar] [CrossRef]
Bai, Y. RELU-Function and Derived Function Review. SHS Web Conf. 2022, 144, 02006. [Google Scholar] [CrossRef]
Lee, C.Y.; Gallagher, P.W.; Tu, Z. Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016, Cadiz, Spain, 9–11 May 2015; pp. 464–472. [Google Scholar]
Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.R. Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors. arXiv 2012, arXiv:1207.0580. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. arXiv 2011, arXiv:abs/2010.1. [Google Scholar]
Canbek, G.; Taskaya Temizel, T.; Sagiroglu, S. PToPI: A Comprehensive Review, Analysis, and Knowledge Representation of Binary Classification Performance Measures/Metrics. SN Comput. Sci. 2023, 4, 13. [Google Scholar] [CrossRef]
Lee, W.; Lee, D.; Lee, S.; Jun, K.; Kim, M.S. Deep-Learning-Based ADHD Classification Using Children’s Skeleton Data Acquired through the ADHD Screening Game. Sensors 2022, 23, 246. [Google Scholar] [CrossRef]
Good, I.J. Some Terminology and Notation in Information Theory. Proc. IEE Part C Monogr. 1956, 103, 200. [Google Scholar] [CrossRef]
Falayi, E.O.; Ogunmodimu, O.; Bolaji, O.S.; Ayanda, J.D.; Ojoniyi, O.S. Investigation of Geomagnetic Induced Current at High Latitude during the Storm-Time Variation. NRIAG J. Astron. Geophys. 2017, 6, 131–140. [Google Scholar] [CrossRef]
Adhikari, B.; Sapkota, N.; Dahal, S.; Bhattarai, B.; Khanal, K.; Chapagain, N.P. Spectral Characteristic of Geomagnetically Induced Current during Geomagnetic Storms by Wavelet Techniques. J. Atmos. Sol. Terr. Phys. 2019, 192, 104777. [Google Scholar] [CrossRef]
Xu, W.-H.; Xing, Z.-Y.; Balan, N.; Liang, L.-K.; Wang, Y.-L.; Zhang, Q.-H.; Sun, Z.-D.; Li, W.-B. Spectral Analysis of Geomagnetically Induced Current and Local Magnetic Field during the 17 March 2013 Geomagnetic Storm. Adv. Space Res. 2022, 69, 3417–3425. [Google Scholar] [CrossRef]
Aksenovich, T.V.; Bilin, V.A.; Saharov, Y.A.; Selivanov, V.N. Wavelet Analysis of Geomagnetically Induced Currents during the Strong Geomagnetic Storms. Russ. J. Earth Sci. 2023, 22, 1–12. [Google Scholar] [CrossRef]
Torrence, C.; Compo, G.P.; Torrence, C.; Compo, G.P. A Practical Guide to Wavelet Analysis. BAMS 1998, 79, 61–78. [Google Scholar] [CrossRef]
Lee, G.R.; Gommers, R.; Waselewski, F.; Wohlfahrt, K.; O’Leary, A. PyWavelets: A Python Package for Wavelet Analysis. J. Open Source Softw. 2019, 4, 1237. [Google Scholar] [CrossRef]
Whitley, E.; Ball, J. Statistics Review 1: Presenting and Summarising Data. Crit. Care 2002, 6, 66. [Google Scholar] [CrossRef]
Chollet, F. Keras Documentation. Available online: https://keras.io (accessed on 31 October 2023).
Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]

Figure 1. A map of substations equipped with GIC registration devices (magenta triangles) in northwestern Russia and schematically drawn transmission lines of the Kola–Karelian power transit (green lines). Red color denotes 110 kV transmission lines, in particular lines which are directed to the RVD substation.

Figure 2. Sample scalogram images taken from different days at different substations from the created dataset. SD means the standard deviation of wavelet coefficient values calculated for each scalogram.

Figure 3. The change in accuracy (a) and cross-entropy loss (b) versus epoch values during network training and validation.

Figure 4. Architecture of proposed CNN for GIC classification in autotransformer neutrals.

Table 1. Details about the GIC registration system in power lines in northwestern Russia.

Substation		Geographical Coordinates		The Period of Registration	Volume of Recorded Data (GB)
Code	Name	Latitude (°N)	Longitude (°E)	The Period of Registration	Volume of Recorded Data (GB)
VKH	Vykhodnoy	68.83	33.08	October 2011 until now	6.49
RVD	Revda	67.90	34.61	May 2011 until now	5.65
TTN	Titan	67.53	33.44	June 2010–December 2014	2.28
LKH	Loukhi	66.08	33.12	September 2011 until now	5.67
KND	Kondopoga	62.22	34.36	September 2011 until now	6.67

Table 2. Comparison of CNN models’ performance with different configurations on the validation set.

Number	CNN Models	Time, min	Binary Cross-Entropy	Accuracy, %
1	Simple CNN	03:06	0.0098	99.37
2	CNN with dropout	03:08	0.0092	99.37
3	CNN with augmentation	03:19	0.0344	98.75
4	CNN with augmentation and dropout	03:22	0.0115	100.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Aksenovich, T.; Selivanov, V. A Deep Learning Approach for the Automated Classification of Geomagnetically Induced Current Scalograms. Appl. Sci. 2024, 14, 895. https://doi.org/10.3390/app14020895

AMA Style

Aksenovich T, Selivanov V. A Deep Learning Approach for the Automated Classification of Geomagnetically Induced Current Scalograms. Applied Sciences. 2024; 14(2):895. https://doi.org/10.3390/app14020895

Chicago/Turabian Style

Aksenovich, Tatyana, and Vasiliy Selivanov. 2024. "A Deep Learning Approach for the Automated Classification of Geomagnetically Induced Current Scalograms" Applied Sciences 14, no. 2: 895. https://doi.org/10.3390/app14020895

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deep Learning Approach for the Automated Classification of Geomagnetically Induced Current Scalograms

Abstract

1. Introduction

2. Related Work

3. Data and Methods

3.1. CNNs

3.1.1. Architecture of CNNs

3.1.2. Performance Metrics for Binary Classification

3.2. CWT

3.3. The GIC Scalogram Dataset

4. GIC Classification by CNNs

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI