Partial Discharge Localization through k-NN and SVM

Sekatane, Permit Mathuhu; Bokoro, Pitshou

doi:10.3390/en16217430

Open AccessArticle

Partial Discharge Localization through k-NN and SVM

by

Permit Mathuhu Sekatane

^*

and

Pitshou Bokoro

Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Johannesburg 2028, South Africa

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(21), 7430; https://doi.org/10.3390/en16217430

Submission received: 13 September 2023 / Revised: 23 October 2023 / Accepted: 25 October 2023 / Published: 3 November 2023

(This article belongs to the Special Issue Advanced Technologies in Partial Discharge Detection and Fault Diagnosis)

Download

Browse Figures

Versions Notes

Abstract

:

Power transformers are essential for the distribution and transmission of electricity, but they are prone to degradation due to faults early on. Partial Discharge (PD) is the most significant pointer of insulation breakdown in high-voltage apparatus. Dissolved Gas Analysis (DGA) is a commonly used technique for detecting and diagnosing PD. However, DGA data often contain missing values, which can significantly affect the accuracy of PD diagnosis. To mitigate the issues of missing values, this paper proposes using the k-Nearest Neighbors (kNN) technique to impute the missing values in the dataset. Further, it combines kNN with a Support Vector Machine (SVM) to detect the possibility of a PD source in the high-voltage apparatus. The approach was evaluated on a real-world DGA dataset and achieved high classification performance and discriminatory power for distinguishing between PD and non-PD instances. The effectiveness of the missing value imputation technique was evaluated, and the proposed approach demonstrated improved accuracy and precision compared to methods without imputation. The proposed approach offers a current solution for PD analysis in power transformers using DGA data with missing values.

Keywords:

machine learning; support vector machine; K-nearest network; dissolve gas analysis

1. Introduction

Power transformers are a critical component of the power industry, enabling the efficient transmission and distribution of electrical power [1]. However, these transformers are subject to insulation degradation over time, which can result in economic losses and safety risks. Therefore, it is essential to monitor the condition of power transformers and detect potential faults early to prevent downtime and ensure safe operation [2,3,4]. Insulation degradation is a significant concern for power transformer reliability, and it is essential to take measures to prevent, detect, and mitigate its effects [5]. Regular maintenance, monitoring, and replacement can help transformers operate reliably and efficiently. There are traditional methods used to monitor the state of power transformers, such as frequency domain spectroscopy (FDS), Dissolved Gas Analysis (DGA), Tan delta (Td), dew point method (DP), etc. [6]. These techniques provide different advantages and disadvantages [5,6,7]. However, DGA stands on top because it can analyze gases dissolved in transformer oil and provide valuable information about faults, such as moisture content and PD in the transformer [6]. However, the DGA data collected from power transformers often contain missing values due to sensor failure or other factors. Missing values in DGA data can negatively impact the accuracy of diagnosing PD and other faults in power transformers [8]. Therefore, there is a need for a practical approach to handle missing values in DGA data for accurate diagnosis.

DGA measures the concentration of various gases generated by the equipment’s insulation and other components, which can indicate potential problems [9]. The method is cost-effective because it can monitor high-voltage apparatus without acquiring the equipment shutdown or dismantling [5,6,7,8,9]. DGA is a non-intrusive method that does not require physical access to the equipment, making it safer and easier to perform [10]. It can detect faults early, allows timely maintenance and repair of the equipment before severe damage occurs. The disadvantages of DGA include interpreting results, which can be complex and requires specialized knowledge and expertise. Furthermore, the results can be influenced by external factors such as temperature, pressure, and humidity, making interpreting results more challenging. DGA requires the collection of oil samples, which can be time-consuming and may result in equipment downtime [11]. The method provides limited information about the location and severity of faults, which may require additional diagnostic tests to pinpoint the source of the problem [12]. Hence, machine learning is proposed to model the collected data and give accurate information regarding the high-voltage apparatus.

Machine learning (ML) has emerged as a promising approach for monitoring the degradation of insulation [13]. This study focuses on diagnosing PD in power transformers using a combination of the kNN algorithm to impute missing data and SVM to model and predict PD events. Partial Discharge data collected from DGA often contain missing values for various reasons, including sensor malfunctions or communication errors. Addressing these missing values is crucial to ensure the reliability of the diagnostic model. The kNN algorithm is a non-parametric, instance-based machine learning technique that effectively handles missing data. It works by imputing missing values in a dataset based on the values of their k-nearest neighbors. In the context of PD diagnosis, this approach allows us to reconstruct a more complete dataset, which is essential for accurate modeling. SVM is a powerful supervised learning method widely used for classification and regression tasks. In diagnosing PD in power transformers, SVM can be employed to model the relationships between various features and the presence or severity of PD. Training an SVM model on the imputed dataset makes it possible to predict PD events with a high degree of accuracy.

2. Summary of Detecting and Monitoring PD in Power Transformers

PD detection in power transformers is essential for monitoring the transformer insulation condition and performance. PD refers to localized electrical discharges within a transformer’s insulation materials. Various factors, including insulation aging, contamination, and mechanical stresses, can cause these discharges. Detecting and monitoring PD in power transformers is crucial because it can indicate insulation problems that may lead to eventual transformer failure if not addressed. By detecting PD early, repair and maintenance can be used to prevent catastrophic failures and extend the transformer’s lifespan. There are numerous approaches used for PD detection in power transformers. Here are a few commonly employed techniques (Table 1):

The DGA dataset used in this paper profiles the power transformer’s condition by predicting the faults, such as PD, moisture, etc. In [16,17,18,19,22], the author investigated the degradation of power transformer insulation using the SVM technique and found that the method contains the most robust ability in learning historical DGA data fault prediction of high voltage apparatus. The method’s strength is not mainly focused on determining variables to evaluate the learning processes that increase accuracy. The quality and presentation of training data and the main issue affecting the quality are missing values in the dataset. It was showcased in [23] that most real datasets consist of missing values, including DGA. In [24,25], the researchers presented proof that the dataset drastically affects prediction accuracy.

In most cases, the missing values are usually erased, resulting in the loss of valuable data. It is vital to fill up the missing values to increase the prediction accuracy by estimating them before learning and testing the data to predict fault [25]. The DGA dataset is broadly used in partial discharge diagnosis, as it contains measurements of dissolved gas levels in the liquid insulation. Certain gases in the oil can indicate partial discharge; thus, analyzing the DGA dataset can be a powerful tool for diagnosing partial discharge [17]. However, the DGA dataset is not immune to missing values, making it challenging to extract meaningful information from the dataset and accurately diagnose partial discharge. To address this challenge of missing values, this study proposes integrating the SVM algorithm with the kNN algorithm to impute missing values in the dataset and advance the accuracy of partial discharge diagnosis [25,26,27]. The SVM algorithm is a popular machine learning system for classification tasks.

In contrast, the kNN algorithm is a simple yet powerful algorithm that can impute missing values. Integrating SVM and kNN involves first using SVM to classify the data and identify the potential importance of the missing data [28]. Then, kNN is utilized to impute the missing values constructed on the specified possible values. By combining these two algorithms, the resulting hybrid approach can improve the accuracy of partial discharge diagnosis compared to using either algorithm alone [29]. Table 2 summarizes the conventional ML-based PD diagnosis methods.

3. Sources of Insulation Degradation

3.1. Ageing Factor

As transformers age, their internal insulation materials can break down due to thermal stress, moisture, and other environmental factors [53,54]. The insulation materials can become brittle, shrink, or crack, leading to insulation breakdown or short circuits. The following are more detailed explanations of the aging factor degradation of power transformers:

Thermal Stress: Transformers are designed to operate at specific temperature ranges, and when they are exposed to high temperatures, it can cause the insulation materials to age and break down. The aging of insulation materials can lead to reduced dielectric strength, resulting in insulation failure and reduced performance.
Moisture: Infiltrate the transformer through leaks, condensation, or humidity, causing corrosion of internal components, insulation breakdown, and reduced performance. Over time, moisture can accumulate and lead to insulation failure, especially in high temperatures.
Chemical Degradation: Chemical degradation can occur due to various factors, such as exposure to pollutants, acids, and other chemicals. Chemical degradation can cause the insulation material to soften or become brittle, leading to insulation breakdown and reduced performance.
Electrical Stress: Electrical stress, such as voltage spikes, surges, or harmonics, can cause insulation breakdown and damage the transformer’s internal components. Electrical stress can weaken the insulation material, reducing dielectric strength and eventual insulation failure.
Mechanical Stress: Mechanical stress, such as vibrations and mechanical impacts, can damage the transformer’s internal components, leading to insulation breakdown and reduced performance.
Time: Over time, the materials used in transformers can undergo physical and chemical changes, leading to reduced performance and eventual failure. The rate of aging depends on the materials used, the operating conditions, and the maintenance practices.

3.2. Acoustic Factor

The acoustic factor refers to the noise produced by a power transformer and its relationship to the degradation of the transformer [47,48,49,55,56,57,58,59,60,61]. The acoustic element can be an indicator of the condition of a transformer, and changes in the noise level can be a sign of internal problems. The following is a more detailed explanation of the acoustic factor for the degradation of power transformers:

Core Laminations: The core laminations inside a transformer can vibrate due to magnetic forces, creating a humming or buzzing noise. Over time, the vibration can cause the laminations to loosen, resulting in increased noise levels and reduced transformer efficiency.
Winding Vibrations: Vibrations in the transformer windings can also create noise. The vibration can occur due to mechanical stress, high currents, or voltage fluctuations. The vibration can cause the winding conductors to rub against each other or the insulation, leading to insulation breakdown and reduced transformer performance.
Mechanical Stress: Mechanical stress can cause deformation or misalignment of the transformer’s components, increasing noise levels. Mechanical stress can occur due to uneven terrain, seismic activity, or nearby construction.
Insulation Degradation: Insulation degradation can also contribute to increased noise levels. The insulation materials can become brittle or crack over time due to thermal stress, moisture, and other factors. The degradation of insulation materials can lead to partial discharge, which creates noise.
Loose Components: Loose components, such as fasteners or bolts, can cause vibration and noise in the transformer. Over time, the vibration can cause damage to the transformer components and lead to reduced performance.

3.3. Tapping Factor

The tapping factor refers to the mechanism that allows the voltage ratio of a power transformer to be adjusted by varying the turns in the winding [50]. The tapping factor can also contribute to the degradation of the transformer, as it involves moving parts that can wear out or become damaged over time. The following is a more detailed explanation of the tapping factor for the degradation of power transformers:

Mechanical Stress: The tapping mechanism involves moving parts that can experience mechanical stress, leading to wear and tear over time. The mechanical stress can cause misalignment or damage to the components, leading to reduced performance and potential failure.
Corrosion: Corrosion can occur on the tapping mechanism due to exposure to moisture, chemicals, or other environmental factors. The corrosion can cause the mechanism to become stuck or seize, leading to reduced performance or complete failure.
Electrical Stress: Electrical stress can also affect the tapping mechanism, such as voltage spikes or surges. The electrical stress can cause insulation breakdown or arcing, leading to reduced performance and potential failure.
Age: The tapping mechanism can experience physical and chemical changes over time, leading to reduced performance and eventual failure. The rate of aging depends on the materials used, the operating conditions, and the maintenance practices.

4. Preparation for the Model and Construction

The insulation system in transformers plays a vital role in safeguarding their consistent operation. However, insulation systems can degrade over time because of various factors, including mechanical, thermal, and electrical stress. Machine learning algorithms were used in this paper to analyze data collected from power transformers to predict their health status, detect anomalies, and to be able to provide early warning of potential failures [62,63]. This can help the manufacturers or customers schedule maintenance or replacement activities, reduce downtime, and improve the electrical grid’s reliability. Numerous ML algorithms can be utilized to diagnose power transformers, such as Support Vector Machines (SVM), the supervised modeling set of rules that can be utilized for classification and regression problems [62]. Convolutional Neural Networks (CNNs) are a deep learning class commonly used in image and audio processing [54,62,63,64]. Recurrent Neural Networks (RNNs) are a sort of deep learning intended to process sequential data [64]. K-Nearest Neighbors (kNN) are instance-based learning that can be utilized for classification and regression problems. This paper collects data from a power transformer using dissolved gas analysis (DGA), and compatible machine learning models are SVM and KNN due to their capability in classification and regression problems. DGA stands for Dissolved Gas Analysis, the technique used for diagnosing and detecting faults in power transformers. Power transformers are typically filled with oil, serving as a coolant and an insulator. When defects occur in the transformer, such as insulation breakdown or overheating, gases are produced and dissolved in the oil.

4.1. The Proposed Structure

Data are collected and prepared from DGA results that were sampled in a power transformer. These data are used for profiling the high-voltage apparatus by utilizing an artificially intelligent technique called machine learning, and the algorithms entail a huge amount of data to learn from. The interpretation of DGA results can be complex and requires specialized knowledge and expertise [64,65,66,67,68]. The method provides limited information about the location and severity of faults, which may require additional diagnostic tests to pinpoint the source of the problem [69]. SVM is a popular algorithm for classification tasks, while kNN is a simple and active algorithm for classification and regression systems. However, both algorithms are sensitive to missing values, which can negatively affect the accuracy of the diagnosis. Therefore, missing value imputation is crucial for improving the performance of these algorithms. This study proposes a hybrid approach that integrates SVM and kNN algorithms with missing value imputation to diagnose PD in the high-voltage apparatus utilizing DGA data. The proposed technique contains the following steps: First, by pre-processing the DGA data by handling missing values, the kNN algorithm calculates the missing values based on the importance of nearest neighbors. This step aims to minimize the outcome of missing values in the following steps.

The next step is to select the most relevant features for PD diagnosis. The last step is training and testing: the SVM algorithm trains a classification model utilizing the chosen components and predictable missing values. Further, the effectiveness of the model is assessed using a leave-one-out cross-validation method. Figure 1 illustrates the construction of the model for classification of faults in a power transformer. The input consists of raw dataset that was extracted from DGA, and they consist of missing values. Thus, the use of kNN algorithm to estimate the missing values. The estimated missing values are filled in where they are missing to complete the dataset. Use the entire dataset when learning and predicting transformer faults with SVM.

The DGA dataset is the main result to be utilized to predict the accuracy of faults in power transformers. Figure 2 illustrates the execution of the Support Vector Machine (SVM) for classifying defects found in a power transformer through datasets from the DGA. This stage assigns the input dataset and a wide-ranging dataset from the previous estimation stage.

4.2. Estimation of Missing Values Utilizing kNN

The kNN set of rules are an effective and straightforward imputation technique. This approach uses k samples that are comparable to the sample being considered, and are used to impute the missing values from a sample. A distance metric is used to determine the similarity of two samples. This study employs well-known distance metrics to assess sample similarity.

City Block Distance (CB) is a type of geometry that uses the following equation to determine the separation between two points by summing the absolute coordinate differences between them:

d_{s t} = \sum_{j = 1}^{n} |x_{s j} - y_{t j}|

(1)

Euclidean Distance (EU): This is the most widely used method for determining the separation of two objects. It investigates the reason why two objects’ coordinates differ by a square root and is described by the following equation:

d_{s t} = \sqrt{\sum_{j = 1}^{n} {(x_{s j} - y_{t j})}^{2}}

(2)

In general, the k-NN is determined in a few steps; firstly, determine k, which represents the amount of nearest neighbors to be chosen. Second, determine how far apart the sample with the missing value to be assigned is from another sample—using a distance metric.

d (X_{i}, X_{q}) = \sum_{j = 1}^{m} {| x}_{i j} - x_{q j} |

(3)

where m is the summation of features in

X_{i}

and

X_{q}

, and

x_{i j}

showcase

j th

feature of sample

x_{q j}

and

X_{i}

and is a

j th

feature of sample

X_{q}

. Thirdly, the second stage 2 is repeated for each outstanding sample in the dataset to calculate the distance between

X_{i}

. Categorize the calculated distance values in ascending order for all

X_{q}

without

X_{i}

. Choose the upper k samples on or after the organized list by means of k-nearest neighbors to

X_{i}

. Lastly, let

x_{i j}

be the estimated missing values in

X_{i}

and the imputed values are computed from

x_{i j} = \sum_{\frac{l = 1}{\begin{matrix} k \end{matrix}}}^{k} x_{l j}

(4)

where

x_{i j}

is the

j th

feature of sample

x_{i j}

and k is the number of nearest neighbors.

4.3. SVM Classification

Normalization: A standard feature of DGA datasets is the most inclusive values comprehending some attributes. Pre-processing the data to evade the dominion of these attributes with a more significant amount and normalization technique is introduced, mainly from minimum to maximum values. All features are controlled over [0, 1] as presented in Equation (5):

v^{'} = \frac{v - {m i n}_{A}}{m a x - m i n}

(5)

where v presents real value, min and max showcase both minimum and maximum values of an attribute A, and v′ is the controlled value.

Testing and Training Data: The actual DGA dataset is divided into two sections because SVM is a supervised learning algorithm: a training set determines the plane or boundary separating the data from different classes, and a testing set validates the systems’ classifying accuracy. Lastly, it should be noted that both datasets maintain the same sample distribution across classes as the original dataset.

Model Selection: The study selects kernels to help SVM diagnose non-linear classification problems. This is because categorizing faults utilizing the DGA dataset may involve non-linear issues. They are polynomial, sigmoid kernel, and radial basic function (RBF), and their equation is as follows:

Polynomial: K( $x_{i}, x_{j}$ ) = ( ${a x}_{i}^{T} x_{j} + r$ $)^{d}$ ;
Radial basic function: K( $x_{i}, x_{j}$ ) = $e^{- γ ‖x_{i} - x_{j}‖}$ ;
Sigmoid: K( $X_{i}, X_{j}$ ) = tanh ( ${a x}_{i}^{T} x_{j} + r$ ).

1-vs-1 SVM: SVM was developed initially for classifying binaries. DGA is an example of a dataset containing multi-class labels that can be used to learn from. To address this multi-category issue, DAGSVM, one-versus-all, one-versus-one, and multi-level are a few SVM extensions that have been developed. This study uses the one-against-one method to identify various fault types in DGA datasets following the suggestions made by a few researchers [70] regarding its advantages. The one-versus-one technique divides the multi-class faults into c(c-1)/2, an SVM binary based on the “divide and conquer” principle, where c is the number of types in the dataset used for the experiment. After learning from the training set, classifiers are created. They are then utilized to categorize errors in the testing set.

5. Experimental Design and Results

This part showcases the proposed approach’s results for diagnosing PD in power transformers using DGA data. To evaluate the effectiveness of the proposed approach, experiments are performed on a real-world dataset of DGA signals obtained from power transformers. Compare the proposed approach with the SVM algorithm without missing value imputation and the kNN algorithm with imputed missing values. A few case studies are showcased in the paper to evaluate the proposed method’s effectiveness further. The first case study involves distance metrics using the PER Dataset, while the second case consists of the distance metrics using the MAT Dataset. The third case study involved a new instance with missing values, while the second case study involved a dataset with artificially introduced missing values.

5.1. Dataset

The study consists of a DGA dataset sampled on power transformers manufactured in South Africa. They have limitations, such as the percentage of missing values, which are estimated and classified. The data are grouped into two sets: the first is named PER, and the last is named MAT.

Table 3 displays the features of the datasets. As illustrated in Table 4, the DGA data sample entails various dissolved gases extracted from oil with and conforming to fault type, and red zeros represent the missing values in Table 4.

5.2. Experimental Setup

The kNN technique for missing value estimation only requires tuning two parameters: k and the distance metric. Given the small size of both datasets in Table 3, this study utilizes k = {1, 3, 5, 7, 9}. Section 5 discusses and compares two distinct distance metrics. As a result, each incomplete dataset will be accomplished utilizing three different distance metrics and kNN for various k values. Following the imputed missing values in the DGA datasets, the SVM was trained, and its model was able to predict the different fault types. Accuracy is a metric used to gauge SVM effectiveness and is defined as follows:

Accuracy = \frac{n_{c}}{n} \times 100 %

(6)

where n is the overall amount of instances in a test set, and

n_{c}

is the amount of instances with accurate prediction class labels. As Section 5 mentioned, SVM with a few different kernels was used. To determine how precisely a prognostic model will function in practice, the study used 5-fold cross-validation. Five disjoint sets (folds) are created from a dataset; the fifth fold is used for evaluation, while the other four folds are used for training. After going through this process five times, one distinct fold is left for analysis each time. This 5-fold cross-validation was run 100 times with the accuracy set to the mean of all run accuracies to reduce variability.

The missing values were imputed, and the faults were classified using the commercial software package called MATLAB R2022b [23]. Different parameters are needed for each of the three SVM kernels, and their values impact the system’s performance. The default parameter values offered by MATLAB R2022bwere utilized, though optimizing these parameters was not the aim of this study. Using “before-and-after” testing, in which the accuracy of each learned kernel on the results comprehended from the imputed datasets was compared to that from the original incomplete datasets, we validated the efficacy of our proposed method for diagnosing power transformer faults. The missing values were changed to zeros to enable MATLAB R2022b to perform classification tasks. It is set to be in the list of software programs that cannot handle datasets with missing values without deleting them. However, let it be clear that zero does not present a missing value. As a result, it is easier to compare datasets that have been imputed using the kNN imputation method to datasets that have been zero-filled and Table 5 present the dataset without the missing values.

5.3. Case Study 1:PER_Dataset

The performance of the support vector machine prediction model has been tested and compared using the PER dataset estimated by the values of the k distance metric introduced in Section 5.2. The different presentation of the kernel over a range of k values was utilized on CB and EU as the distance metric. Figure 3 showcases the performance of SVM using CB, which performed better on k = 5, and EU performed better on k = 3. Furthermore, the CB distance metrics recorded the highest accuracy compared to the EU results.

5.4. Case Study 2: MAT_Dataset

A similar observation is performed as the above case study using the MAT dataset and Figure 4 illustrates that the SVM performance using CB performed better on k = 9 and EU performed better on k = 5. Furthermore, the EU distance metrics recorded the highest accuracy compared to the CB results.

5.5. Case Study Discussion 1

A new instance with missing values was used in the first case study. The sample had missing values for four features, namely H₂, CH₄, CO, etc. The kNN algorithm was utilized to fill in the missing values, and the SVM classifier was used for PD diagnosis. The new instance was classified as PD-positive with a probability of 0.71, indicating a high probability of PD occurrence.

5.6. Case Study Discussion 2

In the second case study, the effectiveness of the proposed technique was assessed using a dataset that contained missing values that had been introduced artificially. The artificially introduced missing values calculated from the original dataset are removed randomly. The proposed method was then used to impute the missing values, and SVM with the imputed values was used to assess the classification performance. The outcomes in Table 6 demonstrate that the proposed method attained the most incredible accuracy, precision, F1 score, and AUC among the three approaches.

The performance improvement can be attributed to the effective handling of missing values by the kNN algorithm. Table 7 presents the comparison results for machine learning methods and SVM is the most effective method for diagnosing partial discharge in power transformers. The presence of missing values may slightly affect the overall performance of the methods, but SVM remains superior in either case.

5.7. Analysis

This section showcases the results analysis of the proposed approach for diagnosing PD in utilizing DGA data extracted in power transformers. The research compared the proposed method with the SVM algorithm without missing value imputation and the kNN algorithm with imputed missing values. The experiments were conducted on a real-world dataset of DGA signals obtained from power transformers. The particular dataset and the selected distance metric determine the best k value and the amount of nearest neighbors. When a dataset, like the MAT dataset, contains a significant quantity of missing values, the effectiveness of the kernel is improved by using larger values of k. For both datasets, the performance of two kernels is better for imputed datasets using EU than CB. The small dataset PER performs well over CB than EU. For large dataset MAT, the reverse is true. When comparing the learning from missing values datasets to imputed datasets, the performance of SVM kernels is improved when missing values in a DGA dataset. Figure 5 below showcases the performance of the SVM on PER and MAT datasets before imputing the missing values.

The kernels’ performance is improved when missing values are imputed using kNN for DGA datasets with many missing values, such as the MAT dataset. EU is the ideal distant metric, and using both models together results in the highest DGA accuracy for datasets with various sizes and missing value percentages. The efficiency of the proposed method with and without kNN imputation is presented in the table below. Table 8 shows the results of this comparison.

6. Conclusions

This paper proposed an approach for integrating SVM with kNN for handling missing values in diagnosing partial discharge (PD) in power transformers utilizing DGA data. The proposed method was evaluated using a publicly available dataset and equated to existing approaches. The results showed that the proposed approach achieved 75 percent high accuracy, precision, F1 score, and AUC compared to existing methods, demonstrating the effectiveness of the approach. Integrating kNN for missing value imputation and SVM for classification provided a practical approach for diagnosing PD in power transformers. The kNN algorithm effectively imputed missing values, and the SVM classifier effectively identified PD cases. The proposed method can potentially advance the diagnosis of PD in power transformers, which is crucial for improving the reliability and safety of power systems.

Furthermore, case studies were shown to calculate the proposed approach’s effectiveness. The first and second case studies involved distance metrics, the third involved a new instance with missing values, and the fourth involved a dataset with artificially introduced missing values. The results of the case studies showed that the proposed approach effectively handled missing values and achieved high classification performance. Lastly, the proposed method of integrating SVM with kNN for missing value imputation and PD diagnosis using DGA data provides a promising approach for improving the analysis of PD in power transformers. The method can potentially improve the consistency and protection of power systems by accurately diagnosing PD cases. Future studies may explore the effectiveness of the proposed approach on larger datasets and other machine learning algorithms.

Author Contributions

Conceptualization, P.M.S. and P.B.; methodology, P.M.S.; software, P.M.S.; validation, P.B.; formal analysis, P.M.S.; investigation, P.B.; resources, P.B.; data curation, P.M.S.; writing—original draft preparation, P.M.S.; writing—review and editing, P.M.S.; visualization, P.M.S.; supervision, P.B.; project administration, P.B.; funding acquisition, P.B. All authors have read and agreed to the published version of the manuscript.

Funding

This was funded by the University of Johannesburg and received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bartley, W. Analysis of Transformer Failures. In Proceedings of the 67th Annual International Conference of Doble Clients, Paper 8N, Watertown, MA, USA, 27–31 March 2000. [Google Scholar]
Braunlich, R.; Hassig, M.; Fuhr, J.; Aschwanden, T. Assessment of insulation condition of large power transformers by on-site electrical diagnostic methods. In Proceedings of the Conference Record of the 2000 IEEE International Symposium on Electrical Insulation, Anaheim, CA, USA, 2–5 April 2000; pp. 368–372. [Google Scholar]
Wang, X.; Li, B.; Roman, H.; Russo, O.; Chin, K.; Farmer, K. Acousto-optical PD Detection for Transformers. IEEE Trans. Power Deliv. 2006, 21, 1068–1073. [Google Scholar] [CrossRef]
Sharkawy, R.M.; Ibrahim, K.; Salama, M.M.A.; Bartnikas, R. Particle swarm optimization feature selection for the classi-fication of conducting particles in transformer oil. IEEE Trans. Dielectr. Electr. Insul. 2011, 18, 1897–1907. [Google Scholar] [CrossRef]
Rubio-Serrano, J.; Rojas-Moreno, M.V.; Posada, J.; Martínez-Tarifa, J.; Robles, J.; García-Souto, J. Electro-acoustic detection, identification and location of partial discharge sources in oil-paper insulation systems. IEEE Trans. Dielectr. Electr. Insul. 2012, 19, 1569–1578. [Google Scholar] [CrossRef]
Coenen, S.; Tenbohlen, S. Location of PD sources in power transformers by UHF and acoustic measurements. IEEE Trans. Dielectr. Electr. Insul. 2012, 19, 1934–1940. [Google Scholar] [CrossRef]
Li, J.; Jiang, T.; Wang, C.; Cheng, C. Optimization of UHF Hilbert Antenna for Partial Discharge Detection of Transformers. IEEE Trans. Antennas Propag. 2012, 60, 2536–2540. [Google Scholar] [CrossRef]
Hooshmand, R.A.; Parastegari, M.; Yazdanpanah, M. Simultaneous location of two partial discharge sources in power transformers based on acoustic emission using the modified binary partial swarm optimization algorithm. IET Sci. Meas. Technol. 2012, 7, 119–127. [Google Scholar] [CrossRef]
Zheng, S.; Li, C.; Tang, Z.; Chang, W.; He, M. Location of PDs inside transformer windings using UHF methods. IEEE Trans. Dielectr. Electr. Insul. 2014, 21, 386–393. [Google Scholar] [CrossRef]
Sinaga, H.H.; Phung, B.; Blackburn, T.R. Recognition of single and multiple partial discharge sources in transformers based on ultra-high frequency signals. IET Gener. Transm. Distrib. 2014, 8, 160–169. [Google Scholar] [CrossRef]
Cui, L.; Chen, W.; Xie, B.; Du, J.; Long, Z.; Li, Y. Characteristic information extraction and developing process recognizing method of surface discharge in oil immersed paper insulation. In Proceedings of the 2014 International Conference on High Voltage Engineering and Application (ICHVE), Poznan, Poland, 8–11 September 2014; pp. 3–6. [Google Scholar]
Hall, M.A. Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis, University of Waikato, Waikato, New Zealand, 1999. [Google Scholar]
Duval, M.; Depabla, A. Interpretation of gas-in-oil analysis using new IEC publication 60599 and IEC TC 10 databases. IEEE Electr. Insul. Mag. 2001, 17, 31–41. [Google Scholar] [CrossRef]
Acuña, E.; Rodriguez, C. The Treatment of Missing Values and Its Effect on Classifier Accuracy. In Classification, Clustering, and Data Mining Applications; Banks, D., McMorris, F.R., Arabie, P., Gaul, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 639–647. [Google Scholar] [CrossRef]
Liu, P.; Lei, L.; Wu, N. A Quantitative Study of the Effect of Missing Data in Classifiers. In Proceedings of the Fifth International Conference on Computer and Information Technology (CIT’05), Shanghai, China, 21–23 September 2005; pp. 28–33. [Google Scholar] [CrossRef]
Ma, H.; Saha, T.K.; Ekanayake, C. Machine learning techniques for power transformer insulation diagnosis. In AUPEC 2011; IEEE: Brisbane, QLD, Australia, 2011; pp. 1–6. [Google Scholar]
Guo, C.; Dong, M.; Wu, Z. Fault Diagnosis of Power Transformers Based on Comprehensive Machine Learning of Dissolved Gas Analysis. In Proceedings of the 2019 IEEE 20th International Conference on Dielectric Liquids (ICDL), Roma, Italy, 23–27 June 2019; pp. 1–4. [Google Scholar] [CrossRef]
Liang, Y.; Sun, X.; Liu, Q.; Bian, J.; Li, Y. Fault Diagnosis Model of Power Transformer Based on Combinatorial KFDA. In Proceedings of the 2008 International Conference on Condition Monitoring and Diagnosis, Beijing, China, 21–24 April 2008; pp. 956–959. [Google Scholar] [CrossRef]
Rajesh, K.N.V.P.S.; Rao, U.M.; Fofana, I.; Rozga, P.; Paramane, A. Influence of Data Balancing on Transformer DGA Fault Classification with Machine Learning Algorithms. IEEE Trans. Dielectr. Electr. Insul. 2023, 30, 385–392. [Google Scholar] [CrossRef]
Mehta, A.K.; Sharma, R.N.; Chauhan, S.; Saho, S. Transformer diagnostics under dissolved gas analysis using Support Vector Machine. In Proceedings of the 2013 International Conference on Power, Energy and Control (ICPEC), Dindigul, India, 6–8 February 2013; pp. 181–186. [Google Scholar] [CrossRef]
Illias, H.A.; Choon, C.K.; Liang, W.Z.; Mokhlis, H.; Ariffin, A.M.; Yousof, M.F.M. Fault Identification in Power Transformers Using Dissolve Gas Analysis and Support Vector Machine. In Proceedings of the 2021 IEEE International Conference on the Properties and Applications of Dielectric Materials (ICPADM), Johor Bahru, Malaysia, 12–14 July 2021; pp. 33–36. [Google Scholar] [CrossRef]
Raghuraman, R.; Darvishi, A. Detecting Transformer Fault Types from Dissolved Gas Analysis Data Using Machine Learning Techniques. In Proceedings of the 2022 IEEE 15th Dallas Circuit and System Conference (DCAS), Dallas, TX, USA, 17–19 June 2022; pp. 1–5. [Google Scholar] [CrossRef]
Patil, S.D.; Dharme, M.; Patil, A.J.; Gautam Chakrawarthy, A.K.; Jarial, R.K.; Singh, A. DGA Based Ensemble learning and Random Forest Models for Condition Assessment of Transformers. In Proceedings of the 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Bangalore, India, 23–25 December 2022; pp. 1–6. [Google Scholar] [CrossRef]
Rao, U.M.; Fofana, I.; Rajesh, K.N.V.P.S.; Picher, P. Identification and Application of Machine Learning Algorithms for Transformer Dissolved Gas Analysis. IEEE Trans. Dielectr. Electr. Insul. 2021, 28, 1828–1835. [Google Scholar] [CrossRef]
Laayati, O.; Bouzi, M.; Chebak, A. Design of an oil immersed power transformer monitoring and self diagnostic system integrated in Smart Energy Management System. In Proceedings of the 2021 3rd Global Power, Energy and Communication Conference (GPECOM), Antalya, Turkey, 5–8 October 2021; pp. 240–245. [Google Scholar] [CrossRef]
Das, S.; Paramane, A.; Chatterjee, S.; Rao, U.M. Sensing Incipient Faults in Power Transformers Using Bi-Directional Long Short-Term Memory Network. IEEE Sens. Lett. 2023, 7, 7000304. [Google Scholar] [CrossRef]
Souza, F.R.; Ramachandran, B. Dissolved gas analysis to identify faults and improve reliability in transformers using support vector machines. In Proceedings of the 2016 Clemson University Power Systems Conference (PSC), Clemson, SC, USA, 8–11 March 2016; pp. 1–4. [Google Scholar] [CrossRef]
Saha, C.; Baruah, N.; Nayak, S.K. Implementation of Self-Organizing Map and Logistic Regression in Dissolved Gas Analysis of Transformer oils. In Proceedings of the 2021 IEEE International Conference on the Properties and Applications of Dielectric Materials (ICPADM), Johor Bahru, Malaysia, 12–14 July 2021; pp. 131–134. [Google Scholar] [CrossRef]
Rediansyah, D.; Prasojo, R.A.; Suwarno; Abu-Siada, A. Artificial Intelligence-Based Power Transformer Health Index for Handling Data Uncertainty. IEEE Access 2021, 9, 150637–150648. [Google Scholar] [CrossRef]
Zhang, Z.; Huang, W.-H.; Xiao, D.-M.; Liu, Y.-L. Fault detection of power transformers using genetic programming method. In Proceedings of the 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826), Shanghai, China, 26–29 August 2004; Volume 5, pp. 3018–3022. [Google Scholar] [CrossRef]
Kumar, A.H.; Thind, B.S.; Reddy, C.C. Improving Reliability of Transformers based on DGA Analysis using Machine Learning Techniques. In Proceedings of the 2021 IEEE Conference on Electrical Insulation and Dielectric Phenomena (CEIDP), Vancouver, BC, Canada, 12–15 December 2021; pp. 151–154. [Google Scholar] [CrossRef]
Ekanayake, C.; Ma, H.; Saha, T. Past experience and future developments toward the safeguarding of power transformers. In Proceedings of the International Conference on Electrical & Computer Engineering (ICECE 2010), Dhaka, Bangladesh, 18–20 December 2010; pp. 279–282. [Google Scholar] [CrossRef]
Lopes, S.M.d.A.; Flauzino, R.A. A Novel Approach for Incipient Fault Diagnosis in Power Transformers by Artificial Neural Networks. In Proceedings of the 2021 IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe), Espoo, Finland, 18–21 October 2021; pp. 1–5. [Google Scholar] [CrossRef]
Tra, V.; Duong, B.-P.; Kim, J.-M. Improving diagnostic performance of a power transformer using an adaptive over-sampling method for imbalanced data. IEEE Trans. Dielectr. Electr. Insul. 2019, 26, 1325–1333. [Google Scholar] [CrossRef]
Laayati, O.; El Hadraoui, H.; Bouzi, M.; Chebak, A. Smart Energy Management System: Oil Immersed Power Transformer Failure Prediction and Classification Techniques Based on DGA Data. In Proceedings of the 2022 2nd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), Meknes, Morocco, 3–4 March 2022; pp. 1–6. [Google Scholar] [CrossRef]
Boczar, T.; Cichon, A.; Borucki, S. Diagnostic expert system of transformer insulation systems using the acoustic emission method. IEEE Trans. Dielectr. Electr. Insul. 2014, 21, 854–865. [Google Scholar] [CrossRef]
Bua-Nunez, I.; Posada-Roman, J.E.; Rubio-Serrano, J.; Garcia-Souto, J.A. Instrumentation System for Location of Partial Discharges Using Acoustic Detection with Piezoelectric Transducers and Optical Fiber Sensors. IEEE Trans. Instrum. Meas. 2014, 63, 1002–1013. [Google Scholar] [CrossRef]
Chen, M.-K.; Chen, J.-M.; Cheng, C.-Y. Partial discharge detection by RF coil in 161 kV power transformer. IEEE Trans. Dielectr. Electr. Insul. 2014, 21, 1405–1414. [Google Scholar] [CrossRef]
Harbaji, M.; Shaban, K.; El-Hag, A. Classification of common partial discharge types in oil-paper insulation system using acoustic signals. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 1674–1683. [Google Scholar] [CrossRef]
Mirzaei, H.R.; Akbari, A.; Gockenbach, E.; Miralikhani, K. Advancing new techniques for UHF PDdetection and localization in the power transformers in the factory tests. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 448–455. [Google Scholar] [CrossRef]
Sarkar, B.; Koley, C.; Roy, N.; Kumbhakar, P. Condition monitoring of high voltage transformers using Fiber Bragg Grating Sensor. Meas. J. Int. Meas. Confed. 2015, 74, 255–267. [Google Scholar] [CrossRef]
Qi, Z.; Yi, Y.; Qiaohua, W.; Zhihao, W.; Zhe, L. Study on the Online Dissolved Gas Analysis Monitor based on the Photoacoustic Spectroscopy. In Proceedings of the 2012 IEEE International Conference on Condition Monitoring and Diagnosis (CMD), Bali, Indonesia, 23–27 September 2012; pp. 433–436. [Google Scholar]
Seo, J.; Ma, H.; Saha, T. Probabilistic wavelet transform for partial discharge measurement of transformer. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 1105–1117. [Google Scholar] [CrossRef]
Mitchell, S.D.; Siegel, M.; Beltle, M.; Tenbohlen, S. Discrimination of partial discharge sources in the UHF domain. IEEE Trans. Dielectr. Electr. Insul. 2016, 23, 1068–1075. [Google Scholar] [CrossRef]
Rostaminia, R.; Saniei, M.; Vakilian, M.; Mortazavi, S.S.; Parvin, V. Accurate power transformer PD pattern recognition via its model. IET Sci. Meas. Technol. 2016, 10, 745–753. [Google Scholar] [CrossRef]
Jahangir, H.; Akbari, A.; Werle, P.; Szczechowski, J. Possibility of PD calibration on power transformers using UHF probes. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 2968–2976. [Google Scholar] [CrossRef]
Ghosh, R.; Chatterjee, B.; Dalai, S. A method for the localization of partial discharge sources using partial discharge pulse information from acoustic emissions. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 237–245. [Google Scholar] [CrossRef]
Du, J.; Chen, W.; Cui, L.; Zhang, Z.; Tenbohlen, S. Investigation on the propagation characteristics of PD-induced elec-tromagnetic waves in an actual 110 kV power transformer and its simulation results. IEEE Trans. Dielectr. Electr. Insul. 2018, 25, 1941–1948. [Google Scholar] [CrossRef]
Wang, Y.-B.; Fan, Y.-H.; Qin, S.-R.; Chang, D.-G.; Shao, X.-J.; Mu, H.-B.; Zhang, G.-J. Partial discharge localisation methodology for power transformers based on improved acoustic propagation route search algorithm. IET Sci. Meas. Technol. 2018, 12, 1023–1030. [Google Scholar] [CrossRef]
Fabian, J.; Neuwersch, M.; Sumereder, C.; Muhr, M.; Schwarz, R. State of the Art and Future Trends of Unconventional PD-Measurement at Power Transformers. J. Energy Power Eng. 2014, 8, 1093–1098. [Google Scholar] [CrossRef]
Sinaga, H.H.; Phung, B.T.; Blackburn, T.R. Partial discharge localization in transformers using UHF detection method. IEEE Trans. Dielectr. Electr. Insul. 2012, 19, 1891–1900. [Google Scholar] [CrossRef]
Jahangir, H.; Akbari, A.; Werle, P.; Szczechowski, J. UHF PD measurements on power transformers-advantages and limitations. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 3933–3940. [Google Scholar] [CrossRef]
Ariannik, M.; Azirani, M.A.; Werle, P.; Azirani, A.A. UHF Measurement in Power Transformers: An Algorithm to Optimize Accuracy of Arrival Time Detection and PD Localization. IEEE Trans. Power Deliv. 2019, 34, 1530–1539. [Google Scholar] [CrossRef]
Azadifar, M.; Karami, H.; Wang, Z.; Rubinstein, M.; Rachidi, F.; Karami, H.; Ghasemi, A.; Gharehpetian, G.B. Partial Discharge Localization Using Electromagnetic Time Reversal: A Performance Analysis. IEEE Access 2020, 8, 147507–147515. [Google Scholar] [CrossRef]
Wang, Y.B.; Chang, D.G.; Fan, Y.H.; Zhang, G.J.; Zhan, J.Y.; Shao, X.J.; He, W.L. Acoustic localization of partial discharge sources in power transformers using a particle-swarmoptimization- route-searching algorithm. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 3647–3656. [Google Scholar] [CrossRef]
Qian, S.; Chen, H.; Xu, Y.; Su, L. High sensitivity detection of partial discharge acoustic emission within power transformer by sagnac fiber optic sensor. IEEE Trans. Dielectr. Electr. Insul. 2018, 25, 2313–2320. [Google Scholar] [CrossRef]
Gao, C.; Wang, W.; Song, S.; Wang, S.; Yu, L.; Wang, Y. Localization of partial discharge in transformer oil using Fabry-Pérot optical fiber sensor array. IEEE Trans. Dielectr. Electr. Insul. 2018, 25, 2279–2286. [Google Scholar] [CrossRef]
Si, W.; Fu, C.; Yuan, P. An Integrated Sensor with AE and UHF Methods for Partial Discharges Detection in Transformers Based on Oil Valve. IEEE Sens. Lett. 2019, 3, 2019–2021. [Google Scholar] [CrossRef]
Ansari, M.A.; Martin, D.; Saha, T.K. Investigation of Distributed Moisture and Temperature Measurements in Trans-formers Using Fiber Optics Sensors. IEEE Trans. Power Deliv. 2019, 34, 1776–1784. [Google Scholar] [CrossRef]
Zhu, M.-X.; Wang, Y.-B.; Chang, D.-G.; Zhang, G.-J.; Shao, X.-J.; Zhan, J.-Y.; Chen, J.-M. Discrimination of three or more partial discharge sources by multi-step clustering of cumulative energy features. IET Sci. Meas. Technol. 2019, 13, 149–159. [Google Scholar] [CrossRef]
Yaacob, M.M.; Alsaedi, M.A.; Rashed, J.R.; Dakhil, A.M.; Atyah, S.F. Review on partial discharge detection techniques related to high voltage power equipment using different sensors. Photonic Sens. 2014, 4, 325–337. [Google Scholar] [CrossRef]
Karami, H.; Rachidi, F.; Azadifar, M.; Rubinstein, M. An Acoustic Time Reversal Technique to Locate a Partial Discharge Source: Two-Dimensional Numerical Validation. IEEE Trans. Dielectr. Electr. Insul. 2020, 27, 2203–2205. [Google Scholar] [CrossRef]
Karami, H.; Azadifar, M.; Mostajabi, A.; Rubinstein, M.; Karami, H.; Gharehpetian, G.B.; Rachidi, F. Partial Discharge Localization Using Time Reversal: Application to Power Transformers. Sensors 2020, 20, 1419. [Google Scholar] [CrossRef]
Karami, H.; Azadifar, M.; Mostajabi, A.; Favrat, P.; Rubinstein, M.; Rachidi, F. Localization of Electromagnetic Interference Sources Using a Time-Reversal Cavity. IEEE Trans. Ind. Electron. 2021, 68, 654–662. [Google Scholar] [CrossRef]
Karami, H.; Aviolat, F.Q.; Azadifar, M.; Rubinstein, M.; Rachidi, F. Partial discharge localization in power transformers using acoustic time reversal. Electr. Power Syst. Res. 2022, 206, 107801. [Google Scholar] [CrossRef]
Do, T.-D.; Tuyet-Doan, V.-N.; Cho, Y.-S.; Sun, J.-H.; Kim, Y.-H. Convolutional-Neural-Network-Based Partial Discharge Diagnosis for Power Transformer Using UHF Sensor. IEEE Access 2020, 8, 207377–207388. [Google Scholar] [CrossRef]
Saleh, B.; Yousef, A.M.; Abo-Elyousr, F.K.; Mohamed, M.; Abdelwahab, S.A.M.; Elnozahy, A. Performance Analysis of Maximum Power Point Tracking for Two Techniques with Direct Control of Photovoltaic Grid-Connected Systems. Energy Sources Part A Recover. Util. Environ. Eff. 2021, 44, 413–434. [Google Scholar] [CrossRef]
Eid, M.A.E.; Elbaset, A.A.; Ibrahim, H.A.; Abdelwahab, S.A.M. Modelling, Simulation of MPPT Using Perturb and Observe and Incremental Conductance techniques For Stand-Alone PV Systems. In Proceedings of the 2019 21st International Middle East Power Systems Conference (MEPCON), Cairo, Egypt, 17–19 December 2019; pp. 429–434. [Google Scholar] [CrossRef]
Eid, M.A.E.; Abdelwahab, S.A.M.; Ibrahim, H.A.; Alaboudy, A.H.K. Improving the Resiliency of a PV Standalone System Under Variable Solar Radiation and Load Profile. In Proceedings of the 2018 Twentieth International Middle East Power Systems Conference (MEPCON), Cairo, Egypt, 18–20 December 2018; pp. 570–576. [Google Scholar] [CrossRef]
Dessouky, S.S.; Abdellatif, W.S.E.; Abdelwahab, S.A.M.; Ali, M.A. Maximum Power Point Tracking Achieved of DFIG-Based Wind Turbines Using Perturb and Observant Method. In Proceedings of the 2018 Twentieth International Middle East Power Systems Conference (MEPCON), Cairo, Egypt, 18–20 December 2018; pp. 1121–1125. [Google Scholar] [CrossRef]

Figure 1. The proposed process for fault classification.

Figure 2. Support vector machine flowchart.

Figure 3. SVM’s performances on the PER datasets using both CB and EU.

Figure 4. The performances SVM’s on the MAT datasets using both CB and EU.

Figure 5. The performances SVM’s on the PER and MAT datasets before imputing the missing values.

Table 1. Benefits and drawbacks of various partial discharge detection methods in power transformers.

	Advantage	Drawbacks
UHF technique [14,15,16]	Effective for online sensing applications with increased resistance to external noise. High sensitivity. Dependable and safe against induced current.	Calibration problem. Expensive. Unable to provide PD charge quantity.
Optical technique [17,18,19]	A wide range of physical and chemical parameters can be employed. Small size and lightweight. Extreme sensitivity and electromagnetic interference immunity.	For both solid and liquid insulation, no detection is practical. It is not possible to calibrate.
Acoustic technique [20]	Real-time results that are convincing. Device noise immunity for online PD detection. It is possible to localize.	Vulnerable to environmental noise. Low sensitivity.
Chemical technique [21]	Convincing laboratory recording of PD signals. Extreme sensitivity.	There is no connection between the dissolved gas concentration and the fault type. There is also no connection between the concentration of glucose and the severity of the dielectric breakdown.

Table 2. Conventional machine learning-based PD diagnosis approaches.

Ref.	Objective	Technique	Methodology	Performance
[29]	Classifier	UHF	BPNN + FFT	Up to 97%
[30]	Classifier	UHF	Statistical features + OS-ELM	91.5%
[31]	Detection	DGA	MLP + data augmentation + Non-code ratio	85% to 96% overall
[31]	Classifier	PRPD	SVM + LBP&HOG	99.3%
[32]	Classifier	acoustic	k-NN/SVM + DWT	SVM performs the better
[33]	Classifier	UHF	Statistical features + SVM	99.14%
[34]	Classifier	HFCT	SVM + PSD-based features	Closer to perfect
[35]	Classifier	PRPD	SVM + PCA + Statistical features	82%
[36]	Classifier	UHF	RVM + EEMD-sample entropy	99%
[37]	Classifier	PRPD	FkNNC/BPNN/SVM + 2D PCA	96/94/98%
[38]	Classifier	acoustic	k-NN + PCA	94% (similar conditions) 90% (various conditions)
[39]	Classifier	acoustic	SVM + PSD	When there is variation in the dataset, SVM performs the worst.
[40,41,42]	Diagnosis	DGA	MLP-gas concentration	Not stated
[43,44,45]	Detection	DGA	DST + ANFIS	77% for PD detection
[46]	Classifier	PRPD	RF + Statistical features	98%
[47]	Classifier	PRPD	RF + Statistical features	94%
[48]	Classifier	impedance	LSSVM	70–74%
[49]	Detection	HFCT	ANN + Statistical features	67%
[50]	Separation	acoustic	BSS	Successful under experimental conditions
[51]	Detection	DGA	Duval’s gas values + Gaussian BN	96% for PD detection
[52]	Detection	impedance	Statistical features + KPLS	88%

Table 3. Feature of DGA Dataset.

	PER	MAT
Number of samples	101	315
Amount of dissolved gases	8	10
Amount of fault type	7	7
Cases with missing values (%)	30	57
Missing values (%)	6	20

Table 4. Dataset with missing values.

CO	H₂	C₂O	C₂H₄	CH₄	C₂H₆	C₂H₂	Fault
190	11	2065	14	0	15	9	PD Or Arc
178	278	3040	1234	683	151	19
230	429	4071	1640	965	230	31
257	520	4159	1705	1037	233	25
13	0	244	0	4	0	0
35	3	541	18	11	1	1
52	19	781	60	32	5	2
70	0	1111	137	63	14	3
75	25	1233	90	46	8	3
96	48	1661	149	77	15	3
0	43	0	218	101	21	3
160	76	2661	299	0	31	0

Table 5. Dataset without missing values.

H₂	CO	CO₂	CH₄	C₂H₆	C₂H₄	C₂H₂	Fault
11	190	2065	20	15	14	9	PD Or Arc
278	178	3040	683	151	1234	19
429	230	4071	965	230	1640	31
520	257	4159	1037	233	1705	25
76	13	244	4	16	115	10
3	35	541	11	1	18	1
19	52	781	32	5	60	2
150	70	1111	63	14	137	3
25	75	1233	46	8	90	3
48	96	1661	77	15	149	3
43	117	1749	101	21	218	3
76	160	2661	176	31	299	5

Table 6. Performance metrics for the dataset with artificially introduced missing values.

Approach	Accuracy	Precision	Recall	F1 Score	AUC
SVM without imputation	0.6024	0.5964	0.5985	0.5975	0.6787
kNN with imputation	0.7012	0.6983	0.7037	0.7010	0.7731
Proposed Approach	0.7547	0.7552	0.7519	0.7532	0.7115

Table 7. The comparison results for machine learning methods.

Method	Accuracy: without k-NN Imputation	Accuracy: with k-NN Imputation
Support Vector Machines (SVM)	0.60	0.75
Random Forest	0.46	0.57
Convolutional Neural Networks (CNN)	0.53	0.60
Decision Trees	0.48	0.63
Artificial Neural Networks (ANNs)	0.50	0.60

Table 8. Performance comparison of the proposed approach with and without kNN imputation.

	With kNN Imputation	Without kNN Imputation
Accuracy	73.81%	53.17%
Precision	75.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sekatane, P.M.; Bokoro, P. Partial Discharge Localization through k-NN and SVM. Energies 2023, 16, 7430. https://doi.org/10.3390/en16217430

AMA Style

Sekatane PM, Bokoro P. Partial Discharge Localization through k-NN and SVM. Energies. 2023; 16(21):7430. https://doi.org/10.3390/en16217430

Chicago/Turabian Style

Sekatane, Permit Mathuhu, and Pitshou Bokoro. 2023. "Partial Discharge Localization through k-NN and SVM" Energies 16, no. 21: 7430. https://doi.org/10.3390/en16217430

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Partial Discharge Localization through k-NN and SVM

Abstract

1. Introduction

2. Summary of Detecting and Monitoring PD in Power Transformers

3. Sources of Insulation Degradation

3.1. Ageing Factor

3.2. Acoustic Factor

3.3. Tapping Factor

4. Preparation for the Model and Construction

4.1. The Proposed Structure

4.2. Estimation of Missing Values Utilizing kNN

4.3. SVM Classification

5. Experimental Design and Results

5.1. Dataset

5.2. Experimental Setup

5.3. Case Study 1:PER_Dataset

5.4. Case Study 2: MAT_Dataset

5.5. Case Study Discussion 1

5.6. Case Study Discussion 2

5.7. Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI