Next Article in Journal
A 30–40 GHz CMOS Receiver Front-End with 5.9 dB NF and 16.5 dB Conversion Gain for Broadband Spectrum Sensing Applications
Next Article in Special Issue
A Modified Adaptive Neuro-Fuzzy Inference System Using Multi-Verse Optimizer Algorithm for Oil Consumption Forecasting
Previous Article in Journal
Active Disturbance Rejection Control of Multi-Joint Industrial Robots Based on Dynamic Feedforward
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

EEkNN: k-Nearest Neighbor Classifier with an Evidential Editing Procedure for Training Samples †

School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in the 13th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty.
Electronics 2019, 8(5), 592; https://doi.org/10.3390/electronics8050592
Submission received: 23 April 2019 / Revised: 17 May 2019 / Accepted: 22 May 2019 / Published: 27 May 2019
(This article belongs to the Special Issue Fuzzy Systems and Data Mining)

Abstract

:
The k-nearest neighbor (kNN) rule is one of the most popular classification algorithms applied in many fields because it is very simple to understand and easy to design. However, one of the major problems encountered in using the kNN rule is that all of the training samples are considered equally important in the assignment of the class label to the query pattern. In this paper, an evidential editing version of the kNN rule is developed within the framework of belief function theory. The proposal is composed of two procedures. An evidential editing procedure is first proposed to reassign the original training samples with new labels represented by an evidential membership structure, which provides a general representation model regarding the class membership of the training samples. After editing, a classification procedure specifically designed for evidently edited training samples is developed in the belief function framework to handle the more general situation in which the edited training samples are assigned dependent evidential labels. Three synthetic datasets and six real datasets collected from various fields were used to evaluate the performance of the proposed method. The reported results show that the proposal achieves better performance than other considered kNN-based methods, especially for datasets with high imprecision ratios.

1. Introduction

Classification of patterns is an important area of research and practical applications in a variety of fields including biology [1], psychology [2], medicine [3], electronics [4], marketing [5], military affairs [6], etc. In the past several decades, a wide variety of approaches has been developed towards this task [7]. As a type of lazy learning algorithm, the k-nearest neighbor (kNN) rule introduced by Fix and Hodges [8] has been one of the most popular and successful pattern classification techniques due to its simplicity and validity. The basic idea of the kNN rule is that patterns close in feature space are likely to belong to the same class. Though the kNN rule is suboptimal, it has been shown that as k increases, its error rate approaches the optimal Bayes error rate asymptotically in the infinite sample situation [9].
However, in the practical cases of a finite number of samples, the classical k-NN rule is not always the optimal way of utilizing the information contained in the neighborhood of query patterns, and therefore, a large number of research works focused on the improvement of this rule in the past 60 years [10,11,12,13,14,15]. One of the major concerns when using the kNN rule is that all of the training samples are considered equally important for assigning the class label of the query pattern. This limitation will result in great difficulty for classification in regions where the samples from different classes overlap. Atypical samples in overlapping regions may be assigned as much weight as those that are truly representative of the clusters. Furthermore, it may be argued that training samples with great noise should not be given equal weight. In order to overcome this difficulty, many editing procedures have been proposed to preprocess the original training samples and then to make classification based on the edited training set [16,17,18,19,20,21,22,23,24,25,26,27,28,29].
Based on different structures of the edited labels, the editing procedures can be divided into two groups: crisp editing and soft editing. The editing procedure was firstly developed by Wilson [17] to preprocess the training samples. In this procedure, a training sample x i is classified using the kNN rule with the remainder of the training set and is then deleted from the original training set if its original label does not agree with the classification result. After that, many others followed Wilson’s work and proposed some variants [18,19,20,21,22]. One of the representatives is the generalized editing procedure developed by Koplowitz and Brown [19], aiming to overcome the limitations of large amounts of samples being removed from the training set. In their work, instead of deleting all the conflicting samples as Wilson’s work, if a particular class (excluding the original class) has at least k ( ( k + 1 ) / 2 k k ) representatives among these k nearest neighbors, then x i is labeled according to that majority class. Essentially, both Wilson’s editing and its variants are crisp editing procedures, in which each edited sample is either removed or assigned to a single class. In order to overcome the weakness of the crisp editing methods, a fuzzy editing procedure was then proposed to reassign fuzzy membership to each training sample x i based on its k nearest neighbors [25]. Several different realizations of this fuzzy editing procedure have been also developed [26,27,28]. As a type of soft editing procedure, fuzzy editing makes it possible for each edited sample to be assigned to several classes with different fuzzy memberships, which provides more detailed information about the samples’ membership than the crisp editing procedures.
In real-world classification problems, different types of uncertainty may coexist due to the environments or other interference factors, e.g., fuzziness may coexist with imprecision. The fuzzy editing procedure, developed based on fuzzy set theory [30], cannot address imprecise or partial information effectively in the modeling and reasoning processes. In contrast, the belief function theory [31,32,33], also known as Dempster–Shafer theory or evidence theory, offers a well-founded and effective framework to represent and combine a variety of uncertain information. This theory has already been used in kNN-based classification [34,35,36,37,38,39]. In [34], an evidential version of kNN, called EkNN, has been proposed by the introduction of the ignorance class to model the uncertainty. Then, this classification method was further extended to deal with uncertainty using the rejection class and meta-classes in [37]. In [38], Dempster’s rule of combination used in EkNN was replaced by a class of parametric combination rules. However, neither the EkNN method nor its extensions consider any editing procedure in the classification process. Recently, an editing procedure for multi-label classification was developed in [29] based on the belief function theory, but it is essentially a crisp editing procedure, as each edited sample is just assigned a single set of classes without considering the membership degrees.
In this paper, an evidential editing version of the kNN classifier (EEkNN) is proposed based on the belief function theory (A preliminary version of some of the ideas introduced here was presented in [40,41]. The present paper is a deeply revised and extended version of this work, with several new results.). The proposed EEkNN classifier is composed of two procedures: evidential editing for the original training samples and classification based on the evidently edited training samples. First, an evidential editing procedure is developed to reassign the original training samples with new labels represented by an evidential membership structure. Compared with the crisp label or the fuzzy membership, the evidential membership provides more expressiveness to represent the imprecision and uncertainty for those samples in overlapping regions or with great noise. After the editing procedure, a kNN classification procedure specifically designed for evidently edited training samples is developed in the belief function framework. This classification procedure can well handle the more general situation where the edited training samples are assigned dependent evidential labels.
The rest of this paper is organized as follows. In Section 2, the basics of the belief function theory are recalled. Then, the evidential editing procedure is developed in Section 3. After that, the classification procedure is designed and realized based on the edited training samples in the belief function framework in Section 4. Section 5 provides several experiments to test the proposed method. Finally, Section 6 concludes the paper. To facilitate reading, Table 1 gives a list of the symbols used and their definitions.

2. Basics of the Belief Function Theory

In belief function theory [31,32,33], a problem domain is represented by a finite set Ω = { ω 1 , ω 2 , , ω M } of mutually exclusive and exhaustive hypotheses called the frame of discernment. A mass function expressing the belief committed to the elements of 2 Ω by a given source of evidence is a mapping function m: 2 Ω [ 0 , 1 ] , such that:
m ( ) = 0 and A 2 Ω m ( A ) = 1 .
Elements A Ω having m ( A ) > 0 are called the focal sets of the mass function m. The mass function has several special cases to encode different types of information. A mass function is said to be:
  • Bayesian, if all of its focal sets are singletons. In this case, the mass function just reduces to the classical probability distribution.
  • categorical, if the whole mass is allocated to one focal set A. This indicates that the truth lies in A with certainty.
  • certain, if the whole mass is allocated to a unique singleton. This indicates that we have complete knowledge about the truth.
  • vacuous, if the whole mass is allocated to Ω . This situation corresponds to complete ignorance.
  • simple, if it has at most two focal sets and one of them is Ω if it has two. It is usually denoted as A w , where A is the focal set different from Ω and 1 w is the confidence that the truth lies in A.
After representing the available pieces of evidence as mass functions, the next step is to combine these mass functions into a single one for decision making. Many combination rules have been developed. The differences among them mainly depend on two issues: the dependence and the conflict among the available pieces of evidence.
Dempster’s rule is the most popular choice to combine several distinct pieces of evidence [31]. Its combination of two mass functions m 1 and m 2 defined on the same frame of discernment Ω is:
m 1 m 2 ( A ) = 0 , A = B C = A m 1 ( B ) m 2 ( C ) 1 B C = m 1 ( B ) m 2 ( C ) , A 2 Ω .
To combine mass functions induced by nondistinct pieces of evidence, a cautious rule and, more generally, a family of parameterized t-norm based rules were proposed in [42]:
m 1 s m 2 = A Ω A w 1 ( A ) s w 2 ( A ) ,
where m 1 and m 2 are separable mass functions, such that m 1 = A Ω A w 1 ( A ) and m 2 = A Ω A w 2 ( A ) . The operator s denotes Frank’s parameterized family of t-norms:
a s b = a b , if s = 0 a b , if s = 1 log s 1 + ( s a 1 ) ( s b 1 ) s 1 , otherwise ,
for all a , b [ 0 , 1 ] , with s being a positive parameter. When s = 0 , the t-norm-based rule reduces to the cautious rule, and when s = 1 , it reduces to Dempster’s rule.
For the above combination rules, it is assumed that the pieces of evidence to be combined are fully reliable. However, when this assumption fails, there may exist large conflicts among the pieces of evidence, in which case the performance of the above combination rules degrades greatly. Dubois and Prade [43] proposed an alternative rule to the combination of pieces of conflicting evidence as:
m 1 m 2 ( A ) = 0 , A = B C = A m 1 ( B ) m 2 ( C ) + B C = , B C = A m 1 ( B ) m 2 ( C ) , A 2 Ω .
This rule boils down to Dempster’s rule when there is no conflict between the two combined pieces of evidence.
For decision making, Smets [33] proposed the pignistic transformation to transform a mass function into a probability function as:
B e t P ( A ) = B A A B B m ( B ) , A 2 Ω ,
where X is the cardinality of set X.

3. Evidential Editing Procedure for Training Samples

Let us consider an M-class classification problem in a predefined category Ω = { ω 1 , , ω M } . Assuming that a set of N labeled training samples T = { ( x 1 , ω ( 1 ) ) , , ( x N , ω ( N ) ) } with input vectors x i R P and class labels ω ( i ) Ω is available, the editing procedure aims to generate a new edited training set T , which is more powerful than the original one for classification. In this section, we develop an evidential editing procedure for training samples in the belief function framework. First, in Section 3.1, an evidential membership structure is introduced as a general representation model for class membership. Then, in Section 3.2, an evidential editing algorithm is proposed to edit the training samples based on the evidential membership structure.

3.1. Evidential Membership Structure

The purpose of the evidential editing procedure is to assign to each sample in the training set T a new soft label represented by an evidential membership structure as:
T = { ( x 1 , m 1 ) , ( x 2 , m 2 ) , , ( x N , m N ) } ,
where m i , i = 1 , 2 , , N , are mass functions defined on the frame of discernment Ω .
The above evidential membership modeled by mass function m i provides a general representation model regarding the class membership of sample x i :
  • when m i is a Bayesian mass function, the evidential membership reduces to the fuzzy membership as a special case.
  • when m i is a categorical mass function, the evidential membership reduces to the crisp set of labels as defined in [29].
  • when m i is a certain mass function, the evidential membership reduces to the crisp label.
  • when m i is a vacuous mass function, the sample x i is useless for classification and can be considered as an outlier.
Example 1.
Let us consider a set of N = 5 samples T = { ( x 1 , m 1 ) , ( x 2 , m 2 ) , ( x 3 , m 3 ) , ( x 4 , m 4 ) , ( x 5 , m 5 ) } with evidential membership regarding a set of M = 3 classes Ω = { ω 1 , ω 2 , ω 3 } . Mass functions for each sample are given in Table 2. They illustrate various situations: the case of sample x 1 corresponds to the situation of probabilistic uncertainty ( m 1 is Bayesian), whereas the case of sample x 2 corresponds to the situation of imprecision ( m 2 is categorical); the class of sample x 3 is known with precision and certainty ( m 3 is certain), whereas the class of sample x 4 is completely unknown ( m 4 is vacuous); finally, the mass function m 5 models the general situation where the class of sample x 5 is both imprecise and uncertain.
As illustrated in the above example, the evidential membership is a powerful model to represent the imprecise and uncertain information existing in the training samples. In the following part, we will study how to edit each training sample with the evidential membership.

3.2. Evidential Editing Algorithm

For each training sample x i , i = 1 , 2 , , N , we denote the leave-it-out training set as T i = T { ( x i , ω ( i ) ) } , i = 1 , 2 , , N . Now, we will show how the evidential editing procedure works for one training sample x i based on the other samples contained in T i . The evidence modeling method developed in [34] is used here to generate a mass function for each neighbor x j regarding the class membership of x i as:
m i ( { ω q } x j ) = α ϕ q ( d i j ) m i ( Ω x j ) = 1 α ϕ q ( d i j ) m i ( A x j ) = 0 , A 2 Ω { { ω q } , Ω } ,
where d i j = d ( x i , x j ) , ω q is the class label of x j (i.e., ω ( j ) = ω q ), and  α is a parameter such that 0 < α < 1 . A recommended value of α = 0.95 can be used to obtain good results on average, and a good choice for ϕ q is:
ϕ q ( d ) = exp ( γ q d 2 ) ,
where γ q is a positive parameter associated with class ω q , and it can be set to the inverse of the mean squared distance between training samples belonging to class ω q heuristically.
Based on the distance d ( x i , x j ) , we first select k e d i t nearest neighbors of x i in training set T i and construct the corresponding k e d i t mass functions according to the above way. These k e d i t mass functions are then combined to form a resulting mass function m i , synthesizing the final evidential membership regarding the class of x i . Considering the different degrees of conflict among the constructed mass functions, we developed a hierarchical combination process that is carried out at two levels: intra-class combination and inter-class combination.
At the first level, we consider the combination of mass functions derived from the neighbors with the same class label. As all the mass functions to be combined support the same class, there is no conflict among them. Besides, as the training samples are usually collected independently, the items of evidence from different neighbors are independent. In this case, Dempster’s rule is a good choice for its effectiveness and simplicity. If we denote by Ψ i q the set of the k nearest neighbors of x i belonging to class ω q and assuming that Ψ i q is not empty, the intra-class combination for mass functions derived from the neighbors with class label ω q is given by:
m i ( · Ψ i q ) = x j Ψ i q m i ( · x j ) .
As shown in Equation (8), all the mass functions to be combined are simple. Thanks to this particular structure, the computational burden of Dempster’s rule can be greatly reduced, and the above intra-class combination can be further formulated analytically as:
m i ( { ω q } Ψ i q ) = 1 x j Ψ i q m i ( Ω x j ) m i ( Ω Ψ i q ) = x j Ψ i q m i ( Ω x j ) m i ( A Ψ i q ) = 0 , A 2 Ω { { ω q } , Ω } .
If Ψ i q is an empty set, then m i ( · Ψ i q ) is simply a vacuous mass function satisfying m i ( Ω Ψ i q ) = 1 .
After the intra-class combination for mass functions derived from the neighbors belonging to each class, at the second level, we combine these sub-combination results to get a global combination result as the final evidential membership regarding the class of x i . As these sub-combination results support different classes, large conflicts may exist among them. In this case, Dubois–Prade’s rule is a good alternative combination method. However, when the number of classes is large, Dubois–Prade’s rule of combination for all the sub-combination results will generate a great number of focal sets (as many as 2 M 1 ), which results in overmuch imprecision for the edited label. Therefore, at the inter-class combination level, if there is more than one mass function having non-zero mass for the support class, we only combine those two having largest mass as:
m i = m i ( · Ψ i q 1 ) m i ( · Ψ i q 2 ) ,
where m i ( { ω q 1 } Ψ i q 1 ) m i ( { ω q 2 } Ψ i q 2 ) m i ( { ω q } Ψ i q ) , q = 1 , 2 , , M , q q 1 , q q 2 . Noting that the sub-combination results shown in Equation (11) are also simple mass functions, the above inter-class combination can be further formulated analytically as:
m i ( { ω q 1 } ) = m i ( { ω q 1 } Ψ i q 1 ) m i ( Ω Ψ i q 2 ) m i ( { ω q 2 } ) = m i ( { ω q 2 } Ψ i q 2 ) m i ( Ω Ψ i q 1 ) m i ( { ω q 1 , ω q 2 } )     = m i ( { ω q 1 } Ψ i q 1 ) m i ( { ω q 2 } Ψ i q 2 ) m i ( Ω ) = m i ( Ω Ψ i q 1 ) m i ( Ω Ψ i q 2 ) m i ( A ) = 0 , A 2 Ω { { ω q 1 } , { ω q 2 } , { ω q 1 , ω q 2 } , Ω } .
If there is only one mass function having non-zero mass for the support class, then m i is simply the same as m i ( · Ψ i q 1 ) . Algorithm 1 shows the pseudocode of the evidential editing algorithm.
Algorithm 1 Evidential editing algorithm.
Require: 
the original training set T = { ( x 1 , ω ( 1 ) ) , , ( x N , ω ( N ) ) } with x i R P and ω ( i ) { ω 1 , , ω M } , the number of nearest neighbors k e d i t
1:
Initialize T ;
2:
for i = 1 Ndo
3:
 Find k e d i t nearest neighbors of x i in T { ( x i , ω ( i ) ) } ;
4:
 Generate a mass function m i ( · x j ) for each neighbor x j using Equations (8)–(9);
5:
for q = 1 to M do
6:
  Combine mass functions derived from the neighbors belonging to class ω q to get a sub-combination result m i ( · Ψ i q ) using Equation (11);
7:
end for
8:
 Combine all the sub-combination results to get a global combination result m i using Equation (13);
9:
T T { ( x i , m i ) } ;
10:
end for
11:
return the edited training set T
Example 2.
Figure 1 illustrates a simplified three-class classification example in the two-dimensional plane. A total number of thirteen training samples was collected with x 1 x 5 belonging to class ω 1 , x 6 x 9 belonging to class ω 2 , and x 10 x 13 belonging to class ω 3 . We consider the evidential editing process for sample x 1 based on the information from the other samples. In this example, the number of nearest neighbors k e d i t was set to five. Based on the Euclidean distance, five samples x 3 , x 5 , x 6 , x 8 , x 12 were selected, and the corresponding five mass functions were constructed using Equations (8) and (9) regarding the class membership of x 1 as:
m 1 ( { ω 1 } x 3 ) = 0.751 , m 1 ( Ω x 3 ) = 0.249 m 1 ( { ω 1 } x 5 ) = 0.751 , m 1 ( Ω x 5 ) = 0.249 m 1 ( { ω 2 } x 6 ) = 0.751 , m 1 ( Ω x 6 ) = 0.249 m 1 ( { ω 2 } x 8 ) = 0.751 , m 1 ( Ω x 8 ) = 0.249 m 1 ( { ω 3 } x 12 ) = 0.428 , m 1 ( Ω x 12 ) = 0.572 .
The above mass functions were then combined at two levels sequentially. At the intra-class combination level, we combined those mass functions derived from the neighbors with the same class label using Equation (11) and obtained the sub-combination results as:
m 1 ( { ω 1 } { x 3 , x 5 } ) = 0.938 , m 1 ( Ω { x 3 , x 5 } ) = 0.062 m 1 ( { ω 2 } { x 6 , x 8 } ) = 0.938 , m 1 ( Ω { x 6 , x 8 } ) = 0.062 m 1 ( { ω 3 } { x 12 } ) = 0.428 , m 1 ( Ω { x 12 } ) = 0.572 .
Next, at the second level, we combined the above sub-combination results to get a global one. In this step, only the two mass functions having largest mass for the support class, i.e., m 1 ( · { x 3 , x 5 } ) and m 1 ( · { x 6 , x 8 } ) , were combined using Equation (13) to get the final evidential membership regarding the class of x i as:
m 1 ( { ω 1 } ) = 0.058 , m 1 ( { ω 2 } ) = 0.058 , m 1 ( { ω 1 , ω 2 } ) = 0.880 , m 1 ( Ω ) = 0.004 .
It can be seen that the focal set { ω 1 , ω 2 } obtained the largest mass. This indicates that the sample x 1 had a great chance of being in the overlapping region of class ω 1 and class ω 2 , which is consistent with the actual situation.

4. kNN Classification with Evidently Edited Training Samples

After the evidential editing procedure developed in Section 3, the problem now turns into classifying a query pattern y R P based on the evidently edited training set T . In this section, a classification procedure specifically designed for evidently edited training samples is developed in the belief function framework. This classification procedure is composed of the following two steps: evidence representation for the edited training samples and evidence combination for decision making.

4.1. Evidence Representation for the Edited Training Samples

Assume that the k nearest neighbors of the query pattern y have been selected from the edited training set. Generally, one training sample x i is a very reliable piece of evidence for the classification of y if it is very close to y . In contrast, if x i is far from y , then it is not reliable evidence. In the belief function society, the discounting operation proposed by Shafer [32] is a common tool to address the partially reliable evidence.
Denote as m i the evidential label of the training sample x i and β i the confidence degree of the class membership of y with respect to the training sample x i . The evidence provided by x i for the class membership of y is represented with a discounted mass function β i m i by discounting m i with a rate 1 β i as:
β i m i ( A ) = β i m i ( A ) , A 2 Ω Ω β i m i ( Ω ) = β i m i ( Ω ) + ( 1 β i ) .
The confidence degree β i is determined based on the distance d i between x i and y . Generally, a larger distance results in a smaller confidence degree, and therefore, β i should be a decreasing function of d i . A similar decreasing function with Equation (9) is used here to define the confidence degree β i ( 0 , 1 ] as:
β i = exp ( λ i d i 2 ) ,
where λ i is a positive parameter associated with the training sample x i and is defined as:
λ i = A 2 Ω Ω m i ( A ) d ¯ A + m i ( Ω ) d ¯ 2 ,
where d ¯ is the mean distance among all training samples and d ¯ A is the mean distance among training samples belonging to class set A, A 2 Ω Ω .
Remark 1.
In calculating the confidence degree, parameter λ i is designed by extending the parameter γ q in Equation (9) to the cases of evidential labels. In Equation (16), if the label of the training sample x i is crisp with ω q , i.e., m i ( { ω q } ) = 1 , m i ( A ) = 0 , A 2 Ω { ω q } , then the parameter λ i just reduces to γ q as a special case.

4.2. Evidence Combination for Decision Making

In this section, we will combine the above generated k mass functions into a single one in order to make a decision about the class of the query pattern y . The popular Dempster’s rule of combination relies on the assumption that the items of evidence to be combined are independent. However, as illustrated in the following example, the k mass functions derived from different edited samples cannot be regarded as fully independent any longer.
Example 3.
Figure 2 illustrates the dependence among different edited training samples, where the training samples are denoted by “” and the query pattern is denoted by “”. In the evidential editing process, k e d i t = 2 was assumed to search for the nearest neighbors, and in the classification process, the number of nearest neighbors k = 3 was assumed. We can see that x 1 , x 2 , and x 3 were the three nearest neighbors used for the classification of the query pattern y . In the evidential editing process, as the training sample x 4 was used to calculate both the class membership of x 1 and x 2 , the edited training samples x 1 and x 2 were no longer independent. In contrast, the edited training sample x 3 was still independent of both x 1 and x 2 as they did not use common training samples in the evidential editing process. Therefore, the items of evidence from different edited training samples may have partial dependence.
To account for this partial dependence, we used the parameterized t-norm-based rule shown in Equation (3) to combine the generated k mass functions to get the final result for query pattern y as:
m = β i 1 m i 1 s β i 2 m i 2 s s β i k m i k ,
where k is the number of nearest neighbors with i 1 , i 2 , , i k being the indices of the k nearest neighbors of y in T and s is the Frank t-norms parameter defined in Equation (4). Different values of parameter s result in a series of combination rules ranging from the cautious rule ( s = 0 ) to the Dempster’s rule ( s = 1 ). The selection of parameter s depends on the potential dependence of the edited training samples. A smaller value should be assigned to s for the case of larger dependence. In practice, we can use cross-validation to search for the optimal t-norm-based rule.
In order to make a decision based on the above combined mass function m, the pignistic probability B e t P shown in Equation (6) was calculated. Finally, the query pattern y was assigned to the class with the maximum pignistic probability.

5. Experiments

The performance of the proposed kNN classifier with evidential editing procedure (EEkNN) was evaluated using four different experiments. In the first experiment, the combination rules used in the classification process were evaluated under different dependence degrees of the edited samples. In the second experiment, the effects of the two main parameters k e d i t and k in the editing and classification processes were analyzed. In the last two experiments, the performance of the EEkNN classifier was compared with those of other kNN-based methods, including the kNN classifier with generalized editing procedure (GEkNN) [19], the kNN classifier with fuzzy editing procedure (FEkNN) [25], and the evidential kNN classifier (EkNN) [34], using synthetic datasets and real datasets, respectively.

5.1. Evaluation of the Combination Rules

This experiment was designed to evaluate the combination rules used in the classification process of the EEkNN classifier. A two-dimensional three-class classification problem was considered. The following normal class-conditional distributions were assumed:
Class A:
μ A = ( 6 , 6 ) T , A = 4 I ;
Class B:
μ B = ( 14 , 6 ) T , B = 4 I ;
Class C:
μ C = ( 14 , 14 ) T , C = 4 I .
A set of 150 training samples and a set of 3000 test samples were generated from the above distributions using equal prior probabilities. The average test classification rate over 30 independent trials was calculated. In the evidential editing process, k e d i t = 3 , 9 , 15 , 21 were selected, and in the classification process, values of k ranging from 1–25 have been investigated. The t-norm-based rules (TR) with parameter s ranging from 0–1 have been evaluated (the cautious rule (CR) was retrieved when s = 0 , and Dempster’s rule (DR) was retrieved when s = 1 ).
Figure 3 shows the classification accuracy for different combination rules. We note that the best combination rule varied with changes of the value of k e d i t . In other words, the k e d i t value had great influence on the dependence of the edited samples, and a larger k e d i t value tended to result in larger dependence. For one specific classification problem, the selection of the best combination rule depends on the potential dependence of the edited samples, which further depends on the utilized k e d i t value. Therefore, for the EEkNN classifier, the optimal t-norm-based rule should be searched for each specific k e d i t value.

5.2. Parameter Analysis

This experiment was designed to analyze the effect of parameters k e d i t and k for the proposed EEkNN classifier. The same training and test samples with the previous experiment were used. The difference was that in the evidential editing process, k e d i t = 3 , 6 , 9 , 12 , 15 , 18 , 21 , 24 were selected, and the optimal t-norm-based rule for each specific k e d i t value was used to make the classification. Average classification accuracy over the 30 trials with values of k ranging from 1–25 has been investigated.
From Figure 4, we can see that the classification performance can improve clearly as the parameter k e d i t increases within an interval ([3, 12] in this example). However, when k e d i t exceeded an upper boundary ( k e d i t ¯ = 12 in this example), the classification performance no longer improved ideally. In addition, when k e d i t took small values, the classification performance could improve as the parameter k increased. However, when k e d i t exceeded the upper boundary, the parameter k had little effect on the classification performance.

5.3. Synthetic Data Test

This experiment was designed to compare the proposed EEkNN classifier with other kNN-based classifiers using synthetic datasets with different class imprecision ratios, defined as the number of imprecise samples divided by the total number of training samples. A training sample x i is considered to be imprecise if a non-singleton set gets the largest mass after the evidential editing procedure. A two-dimensional four-class classification problem was considered. The following normal class-conditional distributions were assumed. For comparisons, we changed the variance of each distribution to control the class imprecision ratio.
Case 1
Class A: μ A = ( 0 , 0 ) T , A = I ; Class B: μ B = ( 5 , 0 ) T , B = I ;
Class C: μ C = ( 0 , 5 ) T , C = I ; Class D: μ C = ( 5 , 5 ) T , C = I . Imprecision ratio ρ = 33 %
Case 2
Class A: μ A = ( 0 , 0 ) T , A = 2 I ; Class B: μ B = ( 5 , 0 ) T , B = 2 I ;
Class C: μ C = ( 0 , 5 ) T , C = 2 I ; Class D: μ C = ( 5 , 5 ) T , C = 2 I . Imprecision ratio ρ = 60 %
Case 3
Class A: μ A = ( 0 , 0 ) T , A = 3 I ; Class B: μ B = ( 5 , 0 ) T , B = 3 I ;
Class C: μ C = ( 0 , 5 ) T , C = 3 I ; Class D: μ C = ( 5 , 5 ) T , C = 3 I . Imprecision ratio ρ = 79 %
A training set of 200 samples and a test set of 4000 samples were generated from the above distributions using equal prior probabilities. For each case, 30 trials were performed with 30 independent training sets. The average classification accuracy and the corresponding 95 % confidence interval were calculated. For each trial, the best values for the parameters k e d i t and s in the EEkNN classifier were determined in the sets { 3 , 6 , 9 , 12 , 15 , 18 , 21 , 24 } and { 1 , 10 1 , 10 2 , 10 3 , 10 4 , 10 5 , 0 } , respectively, by cross-validation. For all of the considered method, values of k ranging from 1–25 have been investigated.
Figure 5, Figure 6 and Figure 7 show the training set and the classification results for cases with different imprecision ratios. From the left three subfigures, we can see that the three cases corresponded to slight, moderate, and severe class overlapping, respectively. The average classification accuracy rates of different methods, as well as the corresponding 95 % confidence intervals of the proposed one are shown in the right three subfigures. It can be seen that for all the considered three cases, the proposed EEkNN classifier provided better classification accuracy than other kNN-based ones, because in our proposed EEkNN classifier, the uncertainty of samples in overlapping regions can be well characterized with the introduction of the evidential editing procedure. We also notice that the performance improvement was more significant for Case 3, where the samples from different classes overlapped severely. Furthermore, different from other kNN-based classifiers, the proposed one was less sensitive to the value of k, and it performed well even with a small value of k.

5.4. Real Data Test

This experiment was designed to compare the proposed EEkNN classifier with other kNN-based classifiers using some real-world classification problems from the well-known UCI Machine Learning Repository [44]. These datasets covered a variety of applications in many fields, i.e., biology, medicine, phytology, and astronomy. The main characteristics of the six real datasets used in this experiment are summarized in Table 3, where “# Samples” is the number of samples in the dataset, “# Features” is the number of features, and ”# Classes” is the number of classes. To assess the results, we considered the resampled paired test. A series of 30 trials was conducted. In each trial, the available samples were randomly divided into a training set and a test set (with equal sizes). For each dataset, we calculated the average classification rate of the 30 trials and the corresponding 95 % confidence interval. For the proposed EEkNN classifier, the best values for the parameters k e d i t and s were determined with the same procedure used in the previous experiment. For all of the considered methods, values of k ranging from 1–25 have been investigated.
Figure 8 shows the classification results of different methods for real datasets. It can be seen that, for most datasets, the EEkNN classifier provided better classification performance than other kNN-based ones. The reason is that in our proposed EEkNN classifier, the uncertainty of samples in overlapping regions or noisy patterns can be well characterized with the introduction of the evidential editing procedure. In the GEkNN classifier, however, each uncertain sample was either removed or assigned to a single class with great risk. Though in the FEkNN classifier, the fuzzy membership was reassigned to each uncertain sample, it could not address the involved imprecise information effectively. For the original EkNN classifier developed based on the belief function theory, the original training set was just used to make classification without considering any editing procedure. However, for dataset Glass, the classification performances of different methods were quite similar. The reason is that, for this dataset, the best classification performance was obtained when k took a small value, and under this circumstance, the evidential editing procedure could not improve the classification performance.

6. Conclusions

An evidential editing version of the kNN classifier (EEkNN) has been developed based on an evidential editing procedure that reassigns the original training samples with new labels represented by an evidential membership structure. Thanks to this procedure, noisy patterns or those situated in overlapping regions had less influence on the decisions. In addition, in the subsequent classification procedure, the parameterized t-norm-based rule was optimized to combine the k nearest neighbors of one query pattern by taking into account the potential dependence among them. Experiments based on both synthetic and real datasets have been carried out to evaluate the performance of the proposal. From the results reported in the last section, we can conclude that the proposed EEkNN classifier can achieve higher classification accuracy than other considered kNN-based methods, especially for datasets with high imprecision ratios. Moreover, the proposed EEkNN classifier was not too sensitive to the value of k, and it could gain a quite good performance even with k = 1 . This is an advantage in time- or space-critical applications, in which only a small value of k is permitted in the classification process.
The proposal can be potentially used in many classification applications where the available data are imperfect. For example, in brain–computer interface (BCI) systems [45], the electroencephalogram (EEG) signals may contain great uncertainties due to the varying brain dynamics and the presence of noise. The proposed EEkNN classifier can minimize the effect of these uncertainties with the introduction of the evidential editing procedure for the raw data.

Author Contributions

L.J. conceived of the idea and designed the methodology. X.G. wrote the paper. Q.P. provided the laboratory support and improved the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant Nos. 61790552 and 61801386) and the Natural Science Basic Research Plan in Shaanxi Province of China (Grant No. 2018JQ6043), the China Postdoctoral Science Foundation (Grant No. 2019M653743), and the Aerospace Science and Technology Foundation of China.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tran, T.T.; Choi, J.W.; Le, T.H.; Kim, J.W. A Comparative Study of Deep CNN in Forecasting and Classifying the Macronutrient Deficiencies on Development of Tomato Plant. Appl. Sci. 2019, 9, 1601. [Google Scholar] [CrossRef]
  2. Seo, Y.S.; Huh, J.H. Automatic emotion-based music classification for supporting intelligent IoT applications. Electronics 2019, 8, 164. [Google Scholar] [CrossRef]
  3. Iqbal, U.; Ying Wah, T.; Habib Ur Rehman, M.; Mastoi, Q. Usage of model driven environment for the classification of ECG features: A systematic review. IEEE Access 2018, 6, 23120–23136. [Google Scholar] [CrossRef]
  4. Wu, C.; Yue, J.; Wang, L.; Lyu, F. Detection and classification of recessive weakness in superbuck converter based on WPD-PCA and probabilistic neural network. Electronics 2019, 8, 290. [Google Scholar] [CrossRef]
  5. Donati, L.; Iotti, E.; Mordonini, G.; Prati, A. Fashion Product Classification through Deep Learning and Computer Vision. Appl. Sci. 2019, 9, 1385. [Google Scholar] [CrossRef]
  6. Jiao, L.; Denœux, T.; Pan, Q. A hybrid belief rule-based classification system based on uncertain training data and expert knowledge. IEEE Trans. Syst. Man Cybern. 2016, 46, 1711–1723. [Google Scholar] [CrossRef]
  7. Jain, A.K.; Duin, R.P.W.; Mao, J. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 4–37. [Google Scholar] [CrossRef]
  8. Fix, E.; Hodges, J. Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties; Technical Report 4; USAF School of Aviation Medicine: Randolph Field, TX, USA, 1951. [Google Scholar]
  9. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
  10. Dudani, S.A. The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 1976, 4, 325–327. [Google Scholar] [CrossRef]
  11. Jiao, L.; Pan, Q.; Feng, X. Multi-hypothesis nearest-neighbor classifier based on class-conditional weighted distance metric. Neurocomputing 2015, 151, 1468–1476. [Google Scholar] [CrossRef] [Green Version]
  12. Tang, B.; He, H. ENN: Extended nearest neighbor method for pattern recognition. IEEE Comput. Intell. Mag. 2015, 10, 52–60. [Google Scholar] [CrossRef]
  13. Yu, Z.; Chen, H.; Liu, J.; You, J.; Leung, H.; Han, G. Hybrid k-nearest neighbor classifier. IEEE Trans. Cybern. 2016, 46, 1263–1275. [Google Scholar] [CrossRef] [PubMed]
  14. Ma, H.; Gou, J.; Wang, X.; Ke, J.; Zeng, S. Sparse coefficient-based k-nearest neighbor classification. IEEE Access 2017, 5, 16618–16634. [Google Scholar] [CrossRef]
  15. Chatzigeorgakidis, G.; Karagiorgou, S.; Athanasiou, S.; Skiadopoulos, S. FML-kNN: Scalable machine learning on Big Data using k-nearest neighbor joins. J. Big Data 2018, 5, 1–27. [Google Scholar] [CrossRef]
  16. Devijver, P.; Kittler, J. Pattern Recognition: A Statistical Approach; Prentice Hall: Englewood Cliffs, NJ, USA, 1982. [Google Scholar]
  17. Wilson, D.L. Asymptotic properties of nearest neighbor rules using edited data sets. IEEE Trans. Syst. Man Cybern. 1972, 2, 408–421. [Google Scholar] [CrossRef]
  18. Tomek, I. An experiment with the edited nearest neighbor rule. IEEE Trans. Syst. Man Cybern. 1976, 6, 121–126. [Google Scholar] [CrossRef]
  19. Koplowitz, J.; Brown, T.A. On the relation of performance to editing in nearest neighbor rules. Pattern Recognit. 1981, 13, 251–255. [Google Scholar] [CrossRef]
  20. Kuncheva, L. Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit. Lett. 1995, 16, 809–814. [Google Scholar] [CrossRef]
  21. Jiang, Y.; Zhou, Z. Editing training data for kNN classifiers with neural network ensemble. In Advances in Neural Networks; Yin, F., Wang, J., Guo, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 356–361. [Google Scholar]
  22. Chang, R.; Pei, Z.; Zhang, C. A modified editing k-nearest neighbor rule. J. Comput. 2011, 6, 1493–1500. [Google Scholar] [CrossRef]
  23. Triguero, I.; Derrac, J.; Garcia, S.; Herrera, F. A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2012, 42, 86–100. [Google Scholar] [CrossRef]
  24. Garcia, S.; Derrac, J.; Cano, J.; Herrera, F. Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 417–435. [Google Scholar] [CrossRef]
  25. Keller, J.; Gray, M.; Givens, J. A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 1985, 15, 580–585. [Google Scholar] [CrossRef]
  26. Yang, M.; Chen, C. On the edited fuzzy k-nearest neighbor rule. IEEE Trans. Syst. Man Cybern. Part B Cybern. 1998, 28, 461–466. [Google Scholar] [CrossRef]
  27. Zhang, C.; Cheng, J.; Yi, L. A method based on the edited FKNN by the threshold value. J. Comput. 2013, 8, 1821–1825. [Google Scholar] [CrossRef]
  28. Liu, Z.; Pan, Q.; Dezert, J.; Mercier, G.; Liu, Y. Fuzzy-belief k-nearest neighbor classifier for uncertain data. In Proceedings of the 17th International Conference on Information Fusion, Salamanca, Spain, 7–10 July 2014; pp. 1–8. [Google Scholar]
  29. Kanj, S.; Abdallah, F.; Denœux, T.; Tout, K. Editing training data for multi-label classification with the k-nearest neighbor rule. Pattern Anal. Appl. 2015, 19, 145–161. [Google Scholar] [CrossRef]
  30. Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef] [Green Version]
  31. Dempster, A. Upper and lower probabilities induced by multivalued mapping. Ann. Math. Stat. 1967, 38, 325–339. [Google Scholar] [CrossRef]
  32. Shafer, G. A Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, USA, 1976. [Google Scholar]
  33. Smets, P. Decision making in the TBM: The necessity of the pignistic transformation. Int. J. Approx. Reason. 2005, 38, 133–147. [Google Scholar] [CrossRef]
  34. Denœux, T. A k-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Trans. Syst. Man Cybern. 1995, 25, 804–813. [Google Scholar] [CrossRef]
  35. Denœux, T.; Smets, P. Classification using belief functions relationship between case-based and model-based approaches. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2006, 36, 1395–1406. [Google Scholar] [CrossRef]
  36. Jiao, L.; Pan, Q.; Feng, X.; Yang, F. An evidential k-nearest neighbor classification method with weighted attributes. In Proceedings of the 16th International Conference on Information Fusion, Istanbul, Turkey, 9–12 July 2013; pp. 145–150. [Google Scholar]
  37. Liu, Z.; Pan, Q.; Dezert, J. A new belief-based k-nearest neighbor classification method. Pattern Recognit. 2013, 46, 834–844. [Google Scholar] [CrossRef]
  38. Su, Z.; Denœux, T.; Hao, Y.; Zhao, M. Evidential k-NN classification with enhanced performance via optimizing a class of parametric conjunctive t-rules. Knowl. Based Syst. 2018, 142, 7–16. [Google Scholar] [CrossRef]
  39. Jiao, L.; Geng, X.; Pan, Q. BPkNN: k-nearest neighbor classifier with pairwise distance metrics and belief function theory. IEEE Access 2019, 7, 48935–48947. [Google Scholar] [CrossRef]
  40. Jiao, L.; Denœux, T.; Pan, Q. Evidential editing k-nearest neighbor classifier. In Proceedings of the 13th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, Compiègne, France, 15–17 July 2015; pp. 461–471. [Google Scholar]
  41. Jiao, L. Classification of Uncertain Data in the Framework of Belief Functions: Nearest-Neighbor-Based and Rule-Based Approaches. Ph.D. Thesis, Université de Technologie de Compiègne, Compiègne, France, 2015. [Google Scholar]
  42. Denœux, T. Conjunctive and disjunctive combination of belief functions induced by nondistinct bodies of evidence. Artif. Intell. 2008, 172, 234–264. [Google Scholar] [CrossRef] [Green Version]
  43. Dubois, D.; Prade, H. Representation and combination of uncertainty with belief functions and possibility measures. Comput. Intell. 1988, 4, 244–264. [Google Scholar] [CrossRef]
  44. Dua, D.; Karra Taniskidou, E. UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 1 December 2017).
  45. Katona, J.; Kovari, A. Examining the learning efficiency by a brain computer interface system. Acta Polytech. Hung. 2018, 15, 251–280. [Google Scholar]
Figure 1. A simplified three-class classification example.
Figure 1. A simplified three-class classification example.
Electronics 08 00592 g001
Figure 2. Illustration of dependence among edited training samples.
Figure 2. Illustration of dependence among edited training samples.
Electronics 08 00592 g002
Figure 3. Classification results for different combination rules under different k e d i t values with values of k ranging from 1–25.
Figure 3. Classification results for different combination rules under different k e d i t values with values of k ranging from 1–25.
Electronics 08 00592 g003
Figure 4. Classification results of the EEkNN classifier for different values of k e d i t and k.
Figure 4. Classification results of the EEkNN classifier for different values of k e d i t and k.
Electronics 08 00592 g004
Figure 5. Training set and classification results for Case 1 with imprecision ratio ρ = 33 % .
Figure 5. Training set and classification results for Case 1 with imprecision ratio ρ = 33 % .
Electronics 08 00592 g005
Figure 6. Training set and classification results for Case 2 with imprecision ratio ρ = 60 % .
Figure 6. Training set and classification results for Case 2 with imprecision ratio ρ = 60 % .
Electronics 08 00592 g006
Figure 7. Training set and classification results for Case 3 with imprecision ratio ρ = 79 % .
Figure 7. Training set and classification results for Case 3 with imprecision ratio ρ = 79 % .
Electronics 08 00592 g007
Figure 8. Classification results of different methods for real datasets.
Figure 8. Classification results of different methods for real datasets.
Electronics 08 00592 g008
Table 1. List of symbols and definitions.
Table 1. List of symbols and definitions.
SymbolDefinitions
kNNk nearest neighbor
EkNNevidential k nearest neighbor
EEkNNevidential editing k nearest neighbor
FEkNNfuzzy editing k nearest neighbor
GEkNNgeneralized editing k nearest neighbor
knumber of nearest neighbors in the classification process
k e d i t number of nearest neighbors in the editing process
mmass function
sFrank t-norms parameter
T original training set
T edited training set
T i training set with x i excluded
x input feature vector
y query pattern
ω class label
Ω frame of discernment
Table 2. Example of the evidential membership.
Table 2. Example of the evidential membership.
A m 1 ( A ) m 2 ( A ) m 3 ( A ) m 4 ( A ) m 5 ( A )
00000
{ ω 1 } 0.20000
{ ω 2 } 0.30000.1
{ ω 1 , ω 2 } 00000
{ ω 3 } 0.50100.2
{ ω 1 , ω 3 } 00000
{ ω 2 , ω 3 } 01000.4
Ω 00010.3
Table 3. Description of the real datasets employed in the study.
Table 3. Description of the real datasets employed in the study.
Dataset# Samples# Features# Classes
Diabetes39382
Glass21496
Ionosphere21496
Seeds21073
Transfusion74842
Yeast1484810

Share and Cite

MDPI and ACS Style

Jiao, L.; Geng, X.; Pan, Q. EEkNN: k-Nearest Neighbor Classifier with an Evidential Editing Procedure for Training Samples. Electronics 2019, 8, 592. https://doi.org/10.3390/electronics8050592

AMA Style

Jiao L, Geng X, Pan Q. EEkNN: k-Nearest Neighbor Classifier with an Evidential Editing Procedure for Training Samples. Electronics. 2019; 8(5):592. https://doi.org/10.3390/electronics8050592

Chicago/Turabian Style

Jiao, Lianmeng, Xiaojiao Geng, and Quan Pan. 2019. "EEkNN: k-Nearest Neighbor Classifier with an Evidential Editing Procedure for Training Samples" Electronics 8, no. 5: 592. https://doi.org/10.3390/electronics8050592

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop