Next Article in Journal
Two Chebyshev Spectral Methods for Solving Normal Modes in Atmospheric Acoustics
Previous Article in Journal
Disentangling the Information in Species Interaction Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature Selection Combining Information Theory View and Algebraic View in the Neighborhood Decision System

1
College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
2
Engineering Technology Research Center for Computing Intelligence and Data Mining, Xinxiang 453007, China
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(6), 704; https://doi.org/10.3390/e23060704
Submission received: 29 April 2021 / Revised: 30 May 2021 / Accepted: 31 May 2021 / Published: 2 June 2021

Abstract

:
Feature selection is one of the core contents of rough set theory and application. Since the reduction ability and classification performance of many feature selection algorithms based on rough set theory and its extensions are not ideal, this paper proposes a feature selection algorithm that combines the information theory view and algebraic view in the neighborhood decision system. First, the neighborhood relationship in the neighborhood rough set model is used to retain the classification information of continuous data, to study some uncertainty measures of neighborhood information entropy. Second, to fully reflect the decision ability and classification performance of the neighborhood system, the neighborhood credibility and neighborhood coverage are defined and introduced into the neighborhood joint entropy. Third, a feature selection algorithm based on neighborhood joint entropy is designed, which improves the disadvantage that most feature selection algorithms only consider information theory definition or algebraic definition. Finally, experiments and statistical analyses on nine data sets prove that the algorithm can effectively select the optimal feature subset, and the selection result can maintain or improve the classification performance of the data set.

1. Introduction

Today, society has entered the era of network information, the rapid development of computer and network information technology that makes data and information in various fields increase rapidly. How to dig out potential and valuable information from the massive, disordered and strong interference data has posed an unprecedented challenge to the ability of intelligent information processing, which has produced a new field of artificial intelligence research, feature selection. Among the many methods of feature selection, rough set theory is an effective way to deal with complex systems, because it does not need to provide any prior information except for the data set [1].
Rough set theory is a theory proposed by Polish scientist Pawlak in 1982 to deal with uncertain, imprecise and fuzzy problems [1]. Its basic idea is to use equivalence relations to granulate the discrete sample space into a cluster of equivalence classes that do not intersect each other, therefore describing the knowledge and concepts in the sample space. Feature selection is one of the core contents of rough set theory and application research. Rough set theory performs information granulation on the original data set, deletes redundant conditional attributes without reducing the data classification ability, and obtains a more concise description than the original data set [2,3]. Classical rough set theory can only handle discrete data well, and cannot meet the large number of continuous and mixed data (including continuous and discrete) in practical applications [4,5,6]. Even if the discretization technology is adopted [7], the important information in the data will be lost, which will ultimately affect the selection result. For this reason, Wang et al. [8] proposed the k-nearest neighborhood rough set model. Chen et al. [9] explored the granular structure, distance and metric in the neighborhood system. Yao et al. [10] studied the relationship between the 1-step neighborhood system and rough set approximation. Based on the above research, Hu et al. [11] proposed the neighborhood rough set model and successfully applied it to the feature selection, classification and uncertainty reasoning of continuous and mixed data. As a data preprocessing method, feature selection based on the neighborhood rough set has been widely used in cancer classification [12], character recognition [13] and facial expression feature selection [14], and has good research value and application prospect.
The traditional feature selection methods have been proven to be NP hard problem by Wong and Ziarko [15]. Therefore, in the research of feature selection algorithms, how to speed up the convergence speed to reduce the time complexity has become a mainstream research direction [16]. Chen et al. [17] proposed a heuristic feature selection algorithm using joint entropy measurement. Jiang et al. [16] studied the feature selection accelerator based on the supervised neighborhood. Most of the above feature selection methods are based on monotonic evaluation functions to achieve feature selection [11]. However, the feature selection algorithm that satisfies the monotonicity has the problem that when the classification performance of the original data set is poor, the measured value of the evaluation function is low, and the final reduction effect is not good [18]. To solve this problem, Li et al. [19] proposed a non-monotonic feature selection algorithm based on decision rough set model. Sun et al. [18] designed a gene feature selection algorithm based on the uncertainty measurement of neighborhood entropy. Wang et al. [20] studied a greedy feature selection algorithm based on non-monotonic conditional discriminant index.
Some existing uncertainty measures cannot objectively reflect changes in classification decision capability [21]. Sun et al. [18] believes that credibility and coverage can reflect the classification ability of condition attributes relative to decision attributes, and condition attributes with higher credibility and coverage are more important for decision attributes. In addition, Tsumoto et al. [22] also emphasizes that credibility represents the sufficiency of propositions and coverage describes the necessity of propositions. Therefore, this paper defines the credibility and coverage in the neighborhood decision system, namely neighborhood credibility and neighborhood coverage.
The information theory definition based on information entropy and the algebraic definition based on approximate precision are two definitions form in the classic rough set theory [23]. The information theory definition based on information entropy considers the influence of attributes on uncertain subsets, while the algebraic definition based on approximate precision considers the influence of attributes on defined subsets [24,25], which are two measurement mechanisms with strong complementarity [26]. So far, most feature selection algorithms only consider information theory definition or algebraic definition. For example, Hu et al. [11] proposed a hybrid feature selection algorithm based on neighborhood information entropy. Wang et al. [27,28] used the equivalent relation matrix to calculate the concepts of knowledge granularity, resolution and attribute importance from the algebraic view of rough sets. Sun et al. [2,29] studied the feature selection method based on entropy measures. The uncertainty measures based on neighborhood information entropy reflect the information theory view in the neighborhood decision system, and the neighborhood approximate precision belongs to the algebraic view in the neighborhood decision system [18].
Inspired by the above, this paper combines the information theory view and algebra view in the neighborhood decision system, and proposes a heuristic non-monotonic feature selection algorithm. The experimental results on nine different scale data sets show that the algorithm can effectively select the optimal feature subset, and the selection results can maintain or improve the classification performance of the data set.
In summary, the main contributions of this paper are as follows:
  • The credibility and coverage degrees can reflect the decision-making ability and the classification ability of conditional attributes with respect to the decision attribute [18]. In order to effectively analyze the uncertainty of knowledge in the neighborhood rough set, the credibility and coverage are introduced into the neighborhood decision system, and then the neighborhood credibility and neighborhood coverage are defined and introduced into neighborhood joint entropy.
  • Based on the proposed neighborhood joint entropy, some uncertainty measures of neighborhood information entropy are studied, and the relationship between the measures is derived, which is conducive to understanding the nature of knowledge uncertainty in neighborhood decision systems.
  • To construct a more comprehensive measurement mechanism and overcome the problem of poor selection results when the classification performance of the original data set is not good, the information theory view and algebraic view in the neighborhood decision system are combined to propose a heuristic non-monotonic feature selection algorithm.
Section 2 briefly introduces the basic concepts of the neighborhood rough set and information entropy measures. Section 3 studies the heuristic non-monotonic feature selection algorithm based on information theory view and algebraic view. Section 4 analyzes the experimental results on four low-dimensional data sets and five high-dimensional data sets. Section 5 summarizes the content of this paper.

2. Basic Concepts

In this part, we will briefly review the basic concepts of information entropy measures and the neighborhood rough set [2,30,31,32,33].

2.1. Information Entropy Measures

D S = U , C D ,   V , f is called a decision system, where U = x 1 , x 2 , , x k is the sample set, C is the conditional attribute set, D is the classification decision attribute, V is the value of attribute, f : U × C V is a mapping function.
In the D S , if  B C divides the sample set U into U / B = X 1 , X 2 , , X K , then the information entropy is defined as
H B = i = 1 K p X i log p X i         X i U / B
p X i = X i U represents the probability of X i in the sample set.
In the D S , if  B , Q C , U / B = X 1 , X 2 , , X K , U / Q = Y 1 , Y 2 , , Y L , then the conditional information entropy of Q relative to B is defined as
H Q | B = i = 1 K p X i j = 1 L p Y j | X i log p Y j | X i    
where X i U / B ,   Y j U / Q ,   p Y j | X i = Y j X i | X i | .
In the D S , if  B , Q C , U / B = X 1 , X 2 , , X K , U / Q = Y 1 , Y 2 , , Y L , then the joint information entropy of Q and B is defined as
H Q , B = i = 1 K j = 1 L p X i Y j l o g p X i Y j
where X i U / B ,   Y j U / Q , p X i Y j = X i Y j U .
Theorem 1.
Given the D S , if  B , Q C , U / B = X 1 , X 2 , , X K , U / Q = Y 1 , Y 2 , , Y L , then H Q | B   = H Q , B H B .

2.2. Neighborhood Rough Set

N D S = U , C , D ,   δ is called the neighborhood decision system, where U is a sample set named universe, C is the conditional attribute set, D is decision attribute, and  δ is the neighborhood radius.
In the N D S , if  B C , then Minkowski distance between different sample points x i = x i 1 , x i 2 , , x i m and x j = x j 1 , x j 2 , , x j m on U is defined as
M D B x i ,   x j =   k = 1 B x i k x j k p 1 1 p p
Given the N D S and the distance measurement function M D , if  B C , then the neighborhood information granule of x i U relative to B is defined as
n B δ x i = x U | Δ B x i   ,   x δ                 δ > 0
n B δ x i represents the indistinguishable relation sample set of the x i under B.
In the N D S , if  U / D = { Y 1 , Y 2 , , Y L } , then the decision equivalence relation of x i U is defined as
x i D = Y j | x i Y j           j = 1 , 2 , L
In the N D S , if  B C , N B is the neighborhood relationship on U, then the neighborhood upper approximation set N B ¯ X and the neighborhood lower approximation set N B ̲ X of sample set X     U relative to B are respectively defined as
N B ¯ X = x i U | n B δ x i X       i = 1 , 2 , U
N B ̲ X = x i U | n B δ x i X       i = 1 , 2 , U
In the N D S , if  B C , U / D = { Y 1 , Y 2 , , Y L } , N B is the neighborhood relationship on U, then the upper approximate set N B ¯ D and the lower neighborhood approximate set N B ̲ D of D relative to B are respectively defined as
N B ¯ D = S = 1 L N B ¯ Y s
N B ̲ D =   S = 1 L N B ̲ Y s
In the N D S , if  B C , then the neighborhood approximate precision of the sample set X     U relative to B is defined as
P B X =   N B ̲ X N B ¯ X
In the N D S , if  B C , U / D = { Y 1 , Y 2 , , Y L } , then the neighborhood approximate precision of D relative to B is defined as
P B D =   N B ̲ D N B ¯ D
P B D describes the knowledge completeness of a set, considering the influence of attributes in the neighborhood decision system on the defined subset, and is the view of the neighborhood decision system under algebraic definition [18].

3. Feature Selection Algorithm Design

This part first defines the neighborhood credibility and neighborhood coverage. Second, some uncertainty measures of neighborhood information entropy are studied, and the relationship between the measures is derived. Then, using the information theory view and algebraic view in the neighborhood decision system, a heuristic non-monotonic feature selection algorithm is designed. The following introduces related concepts and their properties.

3.1. Neighborhood Credibility and Neighborhood Coverage

In the N D S , if  B     C , U / B = X 1 , X 2 , , X K , U / D = Y 1 , Y 2 , , Y L , then the credibility α i j and coverage κ i j  [18] are respectively defined as
α i j = X i Y j X i
κ i j = X i Y j Y j
where i = 1 , 2 , , K     and j = 1 , 2 , , L . Credibility and coverage reflect the classification ability of condition attributes relative to decision attributes. Condition attributes with higher credibility and coverage are more important for decision attributes [22].
Definition 1.
In the N D S , if  B C , then the joint neighborhood information granule of x i U is defined as
n B , D x i = n B δ x i x i D
n B , D x i combines the neighborhood information granule n B δ x i and decision equivalence relationship x i D , which more accurately reflects the amount of class information when each class in n B δ x i has a different distribution, and the amount of class information provided is embodied in the number of elements in n B , D x i . Therefore, n B , D x i can accurately reflect the decision information.
Definition 2.
In the N D S , if  B C , then the neighborhood credibility n α i and neighborhood coverage n κ i of x i U are respectively defined as
n α i = n B δ x i x i D | n B , D x i ) |
n κ i = n B δ x i x i D x i D
n α i and n κ i respectively use the joint neighborhood information granule and the decision equivalence relationship to describe the credibility and coverage of the neighborhood decision system, which makes full use of the decision information provided by the decision system.

3.2. Uncertainty Measures of Neighborhood Information Entropy

In the N D S , if  B C , then neighborhood entropy [34] of x i U is defined as
H δ x i B =   log n B δ x i U
In the N D S , if  B C , then the average neighborhood entropy [34] is defined as
H δ B =   1 U i = 1 U H δ x i B = 1 U i = 1 U log n B δ x i U
Definition 3.
In the N D S , if  B C , then new neighborhood entropy of x i U is defined as
H δ x i B =   log n B δ x i | n B , D x i
Definition 4.
In the N D S , if  B C , then the new average neighborhood entropy is defined as
H δ B = P B D U i = 1 U H δ x i B = P B D U i = 1 U log n B δ x i | n B , D x i |
The new average neighborhood entropy H δ B introduces the joint neighborhood information granule into neighborhood entropy, which makes full use of the decision information in the neighborhood decision system.
Definition 5.
In the N D S , if  B C , then neighborhood conditional entropy of D relative to B is defined as
H δ D | B =   P B D U i = 1 U log n B δ x i x i D 2 n B δ x i x i D
Definition 6.
In the N D S , if  B C , then neighborhood joint entropy of D and B is defined as
H δ D , B = P B D U i = 1 U log n B δ x i x i D 2 n B , D x i x i D    
Theorem 2.
Given the N D S , if  B C , then H δ D , B = P B D U i = 1 U log n κ i * n α i .
Proof of Theorem 2
  H δ D , B = P B D U i = 1 U log n B δ x i x i D 2 n B , D x i x i D = P B D U i = 1 U log n B δ x i x i D n B δ x i x i D n B , D x i x i D = P B D U i = 1 U log n B δ x i x i D n B , D x i n B δ x i x i D x i D = P B D U i = 1 U log n α i *   n κ i
From Theorem 2, we can see that the definition of neighborhood joint entropy can be derived from neighborhood credibility and neighborhood coverage.   □
Theorem 3.
Given the N D S , if  B C , then H δ D | B = H δ D , B H δ B .
Proof of Theorem 3
H δ D , B H δ B = P B D U i = 1 U log n B δ x i x i D 2 n B , D x i x i D + P B D U i = 1 U log n B δ x i n B , D x i = P B D U i = 1 U log n B δ x i x i D 2 n B , D x i x i D + P B D U i = 1 U log n B δ x i n B , D x i = P B D U i = 1 U log n B δ x i x i D 2 n B , D x i x i D n B , D x i n B δ x i = P B D U i = 1 U log n B δ x i x i D 2 n B δ x i x i D
According to Definition 5, H δ D | B = H δ D , B H δ B holds.    □
Sun et al. [18] shows that information entropy and its extension belong to the view under the information theory definition, and the neighborhood approximate precision comes from the view under the algebra definition. Therefore, Definitions 4–6 can be used to measure the uncertainty of knowledge in the neighborhood decision system from the information theory view and the algebraic view.

3.3. Heuristic Non-Monotonic Feature Selection Algorithm Design

The feature selection algorithm that satisfies the monotonicity has the problem that the reduction effect is not good when the classification performance of the original data set is poor. Therefore, based on the uncertainty measures combining algebraic view and information theory view in Section 3.2, a heuristic non-monotonic feature selection algorithm is designed.
Theorem 4.
Given the N D S , if  B 1 B 2 C , then H δ D , B is non-monotonic.
Proof of Theorem 4.
we can know that n B 1 δ x i n B 2 δ x i , so n B 1 δ x i x i D n B 2 δ x i x i D , n B 1 δ x i x i D n B 2 δ x i x i D and   n B 1 , D x i n B 2 , D x i from Equation (5). Then it can be deduced that the numerical relationship between n B 1 δ x i x i D 2 n B 1 , D x i and n B 2 δ x i x i D 2 n B 2 , D x i is not clear, so the numerical relationship between 1 U i = 1 U log n B 1 δ x i x i D 2 n B 1 , D x i x i D and 1 U i = 1 U log n B 2 δ x i x i D 2 n B 2 , D x i x i D is unknown. According to Equations (9), (10) and (12), we can obtain P B 1 D P B 2 D , so value relationship of P B 1 D U i = 1 U log n B 1 δ x i x i D 2 n B 1 , D x i x i D and P B 2 D U i = 1 U log n B 2 δ x i x i D 2 n B 2 , D x i x i D are uncertain. According to Equation (23), Theorem 4 holds.    □
Definition 7.
In the N D S , if  B C , attribute b B   satisfies H δ D , B H δ D , B b , then it is said that attribute b is redundant with respect to D, otherwise it is said that attribute b is indispensable for D. If B satisfies the following conditions, then B is called a feature subset of C.
(1) H δ D , B   H δ D , C
(2) H δ D , B > H δ D , B b               b B
Definition 8.
In the N D S , if  B C , then the importance of attribute b C B is defined as
S i g b , B , D = H δ D , B b H δ D , B
when B =   , S i g b , B , D = H δ D , b . The larger S i g b , B , D , the more important b is. From a numerical point of view, looking for an optimal feature subset is to find the B corresponding to the maximum H δ D , B .
To accurately reflect the decision information and eliminate redundant features, a heuristic non-monotonic feature selection algorithm based on neighborhood joint entropy (BONJE) is designed. The implementation steps of this algorithm are shown in Algorithm 1.
Algorithm 1: B0NJE Algorithm Steps.
Input: Given the N D S
Output: A feature subset B
1. Initialize B = A g e n t = , H δ D , B = 0
2. While  S i g C , B , D 0  do
3.  Let H = 0
4.  for any b C B  do
5.    Calculate H δ D , B b
6.    if H δ D , B b   > H  then
7.      Let A g e n t = B b and H = H δ D , B b
8.    end if
9.  end for
10.  Let B = A g e n t
11. end while
12. return A feature subset B
To facilitate the understanding of the specific calculation steps of the algorithm, an example is given below.
Example 1.
A N D S = U , C , D ,   δ is given in Table 1, where   U = x 1 , x 2 , x 3 , x 4 is the universe, C = a , b , c is the conditional attribute set, D = d is the decision attribute, and the neighborhood radius parameter δ = 0.3 .
Let the initial feature subset B = , the base of l o g is 10, the calculation result is kept to three decimal places. In the distance measurement function Equation (Section 2.2), p = 2 is used as the calculation function.
From Equation (6), we know that x 1 D = x 1 , x 2 , x 2 D = x 1 , x 2 , x 3 D = x 3 , x 4 , x 4 D = x 3 , x 4 .
When B = a, the distance between each sample is as follows: M D a x 1 , x 1 = 0 δ , M D a x 1 , x 2 = 0.09 δ , M D a x 1 , x 3 = 0.19 δ , M D a x 1 , x 4 = 0.49 δ , M D a x 2 , x 3 = 0.1 δ , M D a x 2 , x 4 = 0.4 δ , M D a x 3 , x 4 = 0.3 δ .
According to Equation (5), we obtain n a δ x 1 = x 1 , x 2 , x 3 , n a δ x 2 = x 1 , x 2 , x 3 , n a δ x 3 = x 1 , x 2 , x 3 , x 4 , n a δ x 4 = x 3 , x 4 .
We know that   n a , D x 1 = n a x 1 x 1 D = x 1 , x 2 , x 3 , n a , D x 2 = n a x 2 x 2 D = x 1 , x 2 , x 3 , n a , D x 3 = n a x 3 x 3 D = x 1 , x 2 , x 3 , x 4 , n a , D x 4 = n a x 4 x 4 D = x 3 , x 4 from Equation (15).
From Equations (9), (10) and (12), we can obtain N a ¯ D = x 1 , x 2 , x 3 , x 4 , N a ̲ D = x 4 , P a X = N a ̲ D N a ¯ D = 1 4 respectively.
According to Equation (23), we can obtain H δ D , a = P B D U i = 1 U log n B δ x i x i D 2 n B , D x i x i D = 1 1 4 4( ( 223 2 ) + ( 223 2 ) + ( 224 2 ) + ( 222 2 ) ) = 0.041
Similarly, H δ D , b = 0 , H δ D , c = 0.116 , H δ D , a , b = 0.195 , H δ D , a , c = 0.345 , H δ D , b , c = 0.116 , H δ D , a , b , c = 0.345 .
It can be seen from the results that H δ D , b < H δ D , a < H δ D , c , so add c to B. Since H δ D , c = H δ D , b , c < H δ D , a , c , so add a to B. H δ D , a , b , c = H δ D , a , c meets the suspension requirement, so B = a , c is the optimal feature subset.

4. Experiment and Analysis

This part uses the BONJE algorithm to select the appropriate neighborhood radius for different data sets and designs different comparative experiments to prove the efficiency of the BONJE algorithm in feature selection.

4.1. Experimental Data Introduction

To verify the efficiency of the BONJE algorithm in feature selection, this experiment selects nine data sets with different dimensions as the experimental objects, including 4 low-dimensional data sets (Wine, WDBC, WPBC, Ionosphere) and 5 high-dimensional data sets (Colon, SRBCT, DLBCL, Leukemia, Lung). The specific data of each data set is shown in Table 2.
Wine, WDBC (Wisconsin Diagnostic Breast Cancer), WPBC (Wisconsin Prognostic Breast Cancer), Ionosphere data sets are downloaded at https://archive.ics.uci.edu/ml/datasets.html (accessed on 31 May 2021). Colon data set is downloaded from http://eps.upo.es/bigs/datasets.html (accessed on 31 May 2021). SRBCT (Small Round Blue Cell Tumor) data set. DLBCL (Diffuse Large B Cell Lymphoma), Leukemia data sets are downloaded from http://www.gems-system.org. (accessed on 31 May 2021). Lung data set is downloaded from http://bioinformatics.rutgers.ed/Static/Supplemens/CompCancer/datasets (accessed on 31 May 2021).

4.2. Experimental Environment

The experiment in this paper is performed on a personal computer with Microsoft Windows 10 Professional Edition (64-bit), (Intel) Intel(R) Core(TM) i5-6500 CPU @ 3.20 GHz (3192 MHz) and 16.00 GB RAM. The simulation experiment is implemented on the IntelliJ IDEA 2020.1.2 platform using Java version “1.8.0_144”. C4.5, SVM (support vector machine) and KNN (k-nearest neighbors) classifiers are selected on Weka software to verify the classification accuracy of selected feature subsets, where SVM uses PolyKernel as the kernel function, and KNN sets K = 3. In order to reduce the generalization error, the three classifiers all adopt a ten-fold cross-validation method to obtain the final classification accuracy.

4.3. Neighborhood Radius Selection

Since the neighborhood radius affects the granularity of neighborhood information, and thus neighborhood joint entropy, it is very important to choose a proper neighborhood radius. In order to unify the value of the neighborhood radius, eliminate the difference in dimensions and make each feature be treated equally by the classifier, this experiment, first, normalizes the data ( x M i n M a x M i n ), then the neighborhood radius is set in [0.05, 1] with 0.05 as the interval. The number of selected features and the three classifiers average classification accuracy in the different neighborhood radii are shown in Figure 1.
For Wine data set in Figure 1a, as the neighborhood radius value increases, the number of selected features increases sharply. The number of selected features is small when the neighborhood radius value is in the interval [0.05, 0.15] and the average classification accuracy reaches the highest when δ = 0.1 in this interval. Similar to Wine data set, the δ values of WDBC and WPBC data sets are set to 0.05 and 0.1, respectively. For Ionosphere data set in Figure 1d, the average classification accuracy is higher when the neighborhood radius value is in the interval [0.05, 0.2] and the number of selected features is the least when δ = 0.05 in this interval. For Colon data set in Figure 1e, the change trend of the average classification accuracy is obvious. The number of selected features is small, and the classification accuracy is higher when δ = 0.25. Similar to Colon data set, the δ values of SRBCT, DLBCL, Leukemia, and Lung data sets can be set to 0.15, 0.3, 0.3, and 0.45, respectively. Therefore, the neighborhood radius values of the 9 data sets should be within [0.05, 0.45].

4.4. Classification Results of Bonje Algorithm

This part of the experiment compares the classification accuracy and the number of features between the original data and the feature subset selected by the BONJE algorithm. The comparison results are shown in Table 3. The neighborhood radius selected for different data sets are listed in the last column. In addition, the feature subsets selected by the BONJE algorithm for different data sets are shown in Table 4. Please note that the boldface indicates the better value in the comparison data.
From the comparison of average classification accuracy in Table 3, it can be seen that the average classification accuracy of the BONJE algorithm on the Wine, WDBC, and Ionosphere data sets is slightly lower than the original data by 0.2%, 0.2%, and 0.8%, respectively. The accuracy loss caused by the BONJE algorithm is controlled within 1%, which shows that the BONJE algorithm maintains the classification accuracy of the original data. The average classification accuracy of the BONJE algorithm on the WPBC, Colon, SRBCT, DLBCL, Leukemia, and Lung data sets is higher than the original data by 1.5%, 4.8%, 3.7%, 7.4%, 7.4%, 2.5%, respectively, which indicates that the BONJE algorithm eliminates many redundant features and improves the classification accuracy of the data set. From the comparison of feature number in Table 3, it can be seen that BONJE algorithm can delete redundant features without reducing the classification accuracy, especially in high-dimensional data sets. In summary, the BONJE algorithm can effectively select the optimal feature subset, and the feature selection result can maintain or improve the classification ability of the data set.

4.5. The Performance of BONJE Algorithm on Low-Dimensional Data Sets

This part of the experiment compares the BONJE algorithm with four other advanced feature selection algorithms in the low-dimensional data set from the perspective of the number of selected features and the classification accuracy of KNN and SVM classifiers. The four advanced feature selection algorithms are: (1) Classic Rough Set Algorithm (RS) [1], (2) Neighborhood Rough Set Algorithm (NRS) [40], (3) Covering Decision Algorithm (CDA) [41], (4) Maximum Decision Neighborhood Rough Set Algorithm (MDNRS) [35]. Table 5, Table 6 and Table 7 show the experimental results of five different feature selection algorithms.
Comprehensive analyses of Table 5, Table 6 and Table 7 show that for the Wine data set, CDA algorithm selects the least number of features, but the KNN classification accuracy and SVM classification accuracy of CDA algorithm are far lower than BONJE algorithm by 23.4% and 31.8% respectively, which indicates that CDA algorithm loses features with important information in the selection process; For WDBC data set, although BONJE algorithm has more selected features than other algorithms, the classification accuracy of BONJE algorithm under the two classifiers is higher than that of other algorithms; For WPBC data set, NRS algorithm and the CDA algorithm choose the least number of features, but their classification accuracy under the two classifiers is lower than BONJE algorithm; For Ionosphere data set, the classification accuracy of BONJE algorithm is relatively high compared to other algorithms, and the number of features selected by BONJE algorithm is smaller than other algorithms; In general, the average number of selected features of BONJE algorithm is less, and BONJE algorithm has the highest average classification accuracy under the two classifiers, which shows that BONJE algorithm has stable reduction ability and can improve the classification accuracy of data set in low-dimensional data.

4.6. The Performance of BONJE Algorithm on High-Dimensional Data Sets

This part of the experiment compares the BONJE algorithm with four other advanced entropy-based feature selection algorithms from the perspective of different high-dimensional data sets. The four entropy-based feature selection algorithms are: (1) the mutual entropy-based attribute reduction algorithm (MEAR) [42], (2) the entropy gain-based gene selection algorithm (EGGS) [17], (3) the EGGS algorithm combined with the Fisher score (EGES-FS) [29], (4) feature selection algorithm with the Fisher score based on decision neighborhood entropy (FSDNE) [18]. Table 8, Table 9, Table 10, Table 11 and Table 12 show the experimental results of five different entropy-based feature selection algorithms.
As shown in Table 8, the KNN classification accuracy and C4.5 classification accuracy of the BONJE algorithm are better than other algorithms. Although the SVM classification accuracy of the BONJE algorithm is slightly lower than that of the first-ranked MEAR algorithm by 0.9%, the average classification accuracy of the BONJE algorithm is much higher than the second-ranked FSDNE algorithm by 3.5%. In general, the BONJE algorithm has excellent performance on the Colon data set.
Table 9 shows that the KNN classification accuracy and C4.5 classification accuracy of the BONJE algorithm are better than other algorithms. Although the SVM classification accuracy of the BONJE algorithm is lower than that of the first-ranked FSDNE algorithm by 1.5%, the average classification accuracy of the BONJE algorithm is much higher than the second-ranked FSDNE algorithm by 4.2%. Therefore, BONJE has stable classification performance on the SRBCT data set.
According to the experimental results in Table 10, it can be clearly seen that the KNN classification accuracy, SVM classification accuracy and C4.5 classification accuracy of the BONJE algorithm are better than other algorithms. Compared with the BONJE algorithm, the MEAR and EGGS-FS algorithms select fewer features, but the average classification accuracy of the MEAR and EGGS-FS algorithms is much lower than the BONJE algorithm. Therefore, the BONJE algorithm can delete many redundant features on the DLBCL data set without reducing the data classification ability.
According to the results in Table 11, although the KNN classification accuracy of the BONJE algorithm is lower than that of the FSDNE algorithm, the SVM classification accuracy and C4.5 classification accuracy of the BONJE algorithm are as high as 95.8% and 94.4%, respectively. The average classification accuracy of the BONJE algorithm is 1.5% higher than that of the second-ranked FSDNE algorithm. Therefore, the BONJE algorithm can effectively select feature subsets on the Leukemia data set and improve the classification ability of the data set.
It can be seen from Table 12 that the number of features selected by the BONJE algorithm is relatively high compared with other algorithms, but the BONJE algorithm has the highest average classification accuracy. Therefore, the BONJE algorithm can effectively reduce noise and improve classification accuracy on the Lung data set.
Based on the above experimental results and analyses, the BONJE algorithm can effectively select feature subsets under high-dimensional data, and the feature selection results can improve the classification ability of the data set.

4.7. Comparison of BONJE Algorithm and Multiple Dimensionality Reduction Algorithms

To further verify the reduction performance and classification ability of the BONJE algorithm, this part of the experiment compares the BONJE algorithm with other 10 reduction algorithms from the perspective of the number of selected features and SVM classification accuracy on 3 representative tumor data sets (Colon, Leukemia, Lung). The ten different dimensionality reduction methods are: (1) the neighborhood rough set-based reduction algorithm (NRS) [35], (2) feature selection algorithm with Fisher linear discriminant (FLD-NRS) [32], (3) the gene selection algorithm based on locally linear embedding (LLE-NRS) [43], (4) the Relief algorithm [44] combined with the NRS algorithm(Relief + NRS) [35], (5) the fuzzy back-ward feature algorithm (FBFE) [44], (6) the binary differential evolution algorithm (BDE) [2], (7) the sequential forward selection algorithm (SFS) [29], (8) the Spearman’s rank correlation coefficient algorithm (SC2) [36], (9) the mutual information maximization algorithm (MIM) [2], (10) feature selection algorithm with the Fisher score based on decision neighborhood entropy (FSDNE) [18]. Table 13 and Table 14 show the experimental results of 11 dimensionality reduction algorithms.
According to the results in Table 13 and Table 14, the SVM classification accuracy of the BONJE and LLE-NRS algorithms on the Colon dataset is the same and ranked second, but the number of features selected by the LLE-NRS algorithm is twice that of BONJE algorithm. The SVM classification accuracy of the BONJE algorithm on the Colon data set is lower than that of the FLD-NRS algorithm, but the SVM classification accuracy of the BONJE algorithm on the Leukemia and Lung data sets is much higher than that of the FLD-NRS algorithm by 13% and 10.5%, respectively, which shows that the classification performance of the BONJE algorithm is more stable. Although the BDE algorithm selects the least number of features on the Colon data set, its SVM classification accuracy is only 75%, which indicates that the BDE algorithm loses some important features in the process of selecting feature subsets. The SVM classification accuracy of the BONJE algorithm on the Leukemia data set is 0.1% lower than that of the first-ranked SFS algorithm, and the number of selected features the BONJE algorithm is only one more than the SFS algorithm, so these two algorithms have similar performance on the Leukemia data set. Compared with other algorithms, the number of features selected by the BONJE algorithm on the Lung data set is higher, but the SVM classification accuracy of the BONJE algorithm is the highest. In general, the BONJE algorithm is at a medium level compared to other algorithms in terms of the number of selected features, and has the highest average classification accuracy in terms of SVM classification accuracy, which is enough to show that BONJE algorithm has a stable dimension reduction performance, and can select features with important classification information in the data set.

4.8. Statistical Analyses

To systematically explore the statistical significance of algorithm classification results, this part of the experiment introduces the Friedman statistic test [45] and Nemenyi test [46].
The calculation formula of Friedman statistic test is as follows:
χ F 2 = 12 N M M + 1 i = 1 M R i 2 3 N M + 1
F F = N 1 χ F 2 N M 1 χ F 2
where M is the number of algorithms, N is the number of data sets, and R i represents the average ranking of the classification accuracy of the i-th algorithm on all data sets. F F is an F-distribution with M 1 and M 1 N 1 degrees of freedom.
If the null hypothesis, all algorithms have the same performance, is rejected, it means that the performance of the algorithms is significantly different. Then, the Nemenyi test is used as a post-hoc test for algorithm comparison. If the average ranking difference between the algorithms is greater than the critical distance C D , it means that the algorithm with a high average ranking is better than the algorithm with a low average ranking.
The calculation formula of the critical distance C D is as follows:
C D = q α M M + 1 6 N
where q α is the critical list value of the test, α represents the significance level of Bonferroni-Dunn.
According to the classification accuracy results of Table 6 and Table 7 on low-dimensional data sets, the rankings of the five feature selection algorithms under the KNN and SVM classifiers are shown in Table 15 and Table 16, respectively. Please note that the content in parentheses in all tables is the classification accuracy under the corresponding classifier
According to the algorithm rankings in Table 15 and Table 16, the two evaluation measurement values (Friedman statistics χ F 2 and Iman-Davenport test F F ) of the five feature selection algorithms under the KNN and SVM classifiers are shown in Table 17.
When the significance level α = 0.1 , the critical value of Friedman statistic test F 4 , 12 = 2.480 . It can be seen from Table 17 that the F F values under the KNN and SVM classifiers are both greater than F 4 , 12 , so the null hypothesis under the two classifiers is rejected. Then Nemenyi test is used as a post-hoc test to compare the algorithm performance, and the comparison results are shown in Figure 2. It is worth noting that the average ranking of each algorithm is plotted along the axis in the graph, and the best ranking in the axis is on the left. In particular, when there are thick lines between the algorithms, it means that the classification capabilities of these algorithms are similar, otherwise, they will be regarded as significantly different from each other [47].
It can be clearly seen from Figure 2 that BONJE algorithm ranks first under the two classifiers. The classification performance of the BONJE, MDNRS, RS and NRS algorithms under the KNN classifier is similar, and the BONJE algorithm is significantly better than the CDA algorithm. Under the SVM classifier, the classification performance of BONJE, RS, CDA and MDNRS algorithms is similar, and the BONJE algorithm performs better than the NRS algorithm
According to the classification accuracy results of Table 8, Table 9, Table 10, Table 11 and Table 12 on high-dimensional data sets, the rankings of the entropy-based feature selection algorithms under the KNN, C4.5 and SVM classifiers are shown in Table 18, Table 19 and Table 20, respectively.
According to the algorithm rankings in Table 18, Table 19 and Table 20, the two evaluation measurement values of the five entropy-based feature selection algorithms under the KNN, SVM, and C4.5 classifiers are shown in Table 21.
When the significance level α = 0.1 , the critical value of Friedman statistic test F 4 , 16 = 2.333 , so null hypothesis under the three classifiers is rejected. The Nemenyi test is used as a post-hoc test to compare the performance of the algorithms, and the comparison results are shown in Figure 3.
According to the results in Figure 3, it can be seen that the ranking of BONJE algorithm is the best under the three classifiers. Under the KNN classifier, the classification performance of the BONJE, FSDNE and EGGS-FS algorithms is similar and the BONJE algorithm is significantly better than the MEAR and EGGS algorithms. Under the SVM classifier, the classification performance of the BONJE, FSDNE, EGGS-FS and FSDNE algorithms is similar, and the BONJE algorithm performs better than the EGGS algorithm. Under the C4.5 classifier, the BONJE algorithm has better classification performance than the EGGS and EGGS-FS algorithms.
According to the classification accuracy results of Table 14 on three representative tumor data sets, the rankings of the 11 dimensionality reduction algorithms under the SVM classifier are shown in Table 22.
According to the ranking in Table 22, the χ F 2 = 17.0491 and F F = 2.6329 of the 11 dimensionality reduction algorithms under the SVM classifier. When the significance level α = 0.1 , the critical value of Friedman statistic test F 10 , 20 = 1.9367 . F F = 2.8329 is greater than F 10 , 20 , so the null hypothesis under the SVM classifier is rejected. The Nemenyi test is used as a post-hoc test to compare the algorithm performance, and the comparison result is shown in Figure 4.
Figure 4 shows that the dimensionality reduction effect of BONJE is significantly better than NRS algorithm. In addition, BONJE algorithm has the highest ranking, which shows that BONJE algorithm has stable classification performance compared to other algorithms.
In general, the classification results of BONJE algorithm under different data sets are significantly better than different algorithms, which shows that the classification performance of BONJE algorithm is more stable and efficient from a statistical point of view.

5. Conclusions

Since the classification performance of many feature selection algorithms based on rough set theory and its extension is not ideal, this paper proposes a feature selection algorithm combining information theory view and algebraic view in the neighborhood decision system to deal with redundant features and noise in data. First, some uncertainty measures of the neighborhood information entropy are studied to measure the uncertainty of knowledge in the neighborhood decision system. In addition, the credibility and coverage are introduced into the neighborhood decision system, and then neighborhood credibility and neighborhood coverage are defined and introduced into neighborhood joint entropy. Finally, based on the information theory view and algebraic view in the neighborhood decision system, a heuristic non-monotonic feature selection algorithm is proposed. A series of comparative experiments and statistical analysis results on four low-dimensional data sets and five high-dimensional data sets show that the algorithm can effectively remove redundant features and select the optimal feature subset. Since the BONJE algorithm needs to frequently calculate the neighborhood information particles of all samples, it has a high time complexity when processing high-dimensional data. Moreover, the BONJE algorithm cannot completely balance the classification level of the selected feature subset. In future work, it is necessary to study more effective search methods and uncertainty evaluation criteria to reduce the time complexity and classification error of the algorithm.

Author Contributions

Conceptualization, J.X.; Methodology, K.Q.; Software, K.Q.; Formal analysis, J.Y. and M.Y.; Writing—original draft preparaton, K.Q.; Writing review and editing, J.Y. and M.Y.; Visualization, J.X. and M.Y.; Project administration, J.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant (61976082, 61976120, 62002103), and in part by the Key Scientific and Technological Projects of Henan Province under Grant 202102210165.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pawlak, Z. Rough sets and intelligent data analysis. Inf. Sci. 2002, 147, 1–12. [Google Scholar] [CrossRef] [Green Version]
  2. Sun, L.; Zhang, X.Y.; Xu, J.C.; Zhang, S.G. An Attribute Reduction Method Using Neighborhood Entropy Measures in Neighborhood Rough Sets. Entropy 2019, 21, 155. [Google Scholar] [CrossRef] [Green Version]
  3. Zhao, R.Y.; Zhang, H.; Li, C.L. Research on Discretization Model of Continuous Attributes of Rough Sets and Analysis of Main Points of Application. Comput. Eng. Appl. 2005, 41, 40–42. [Google Scholar]
  4. Shu, W.H.; Qian, W.B. Incremental feature selection for dynamic hybrid data using neighborhood rough set. Knowl. Based Syst. 2020. [Google Scholar] [CrossRef]
  5. Sun, L.; Wang, L.Y. Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl. Based Syst. 2020, 192, 105373.1–105373.17. [Google Scholar] [CrossRef]
  6. Wang, C.Z.; Huang, Y. Feature Selection Based on Neighborhood Self-Information. IEEE Trans. Cybern. 2020, 50, 4031–4042. [Google Scholar] [CrossRef]
  7. Miao, D.Q. Discretization of continuous attributes in rough set theory. Acta Autom. Sin. 2001, 27, 296–302. [Google Scholar]
  8. Wang, C.Z.; Shi, Y.P.; Fan, X.D.; Shao, M.W. Attribute reduction based on k-nearest neighborhood rough sets. Int. J. Approx. Reason. 2019, 106, 18–31. [Google Scholar] [CrossRef]
  9. Chen, Y.M.; Qin, N.; Li, W.; Xu, F.F. Granule structures, distances and measures in neighborhood systems. Knowl. Based Syst. 2019, 165, 268–281. [Google Scholar] [CrossRef]
  10. Yao, Y.Y. Relational interpretations of neighborhood operators and rough set approximation opera-tors. Inf. Sci. 1998, 111, 239–259. [Google Scholar] [CrossRef] [Green Version]
  11. Hu, Q.H.; Yu, D.R.; Liu, J.F. Neighborhood rough set based heterogeneous feature subset selection. Inf. Sci. 2008, 178, 3577–3594. [Google Scholar] [CrossRef]
  12. Sun, L.; Wang, W.; Xu, J.C.; Zhang, S.G. Improved LLE and neighborhood rough sets-based gene selection using Lebesgue measure for cancer classification on gene expression data. J. Intell. Fuzzy Syst. 2019, 37, 5731–5742. [Google Scholar] [CrossRef]
  13. Sahlol, A.T.; Kim, S. Handwritten Arabic Optical Character Recognition Approach Based on Hybrid Whale Optimization Algorithm With Neighborhood Rough Set. IEEE Access 2020, 8, 23011–23021. [Google Scholar] [CrossRef]
  14. Feng, L.; Li, C.; Chen, L. Facial expression feature selection method based on neighborhood rough set and quantum genetic algorithm. J. Hefei Univ. Technol. 2013, 36, 39–42. [Google Scholar]
  15. Wong, S.K.M.; Ziarko, W. On optimal decision rules in decision tables. Bull. Pol. Acad. Sci. Math. 1985, 33, 693–696. [Google Scholar]
  16. Jiang, Z.H.; Liu, K.Y.; Yang, X.B.; Yu, H.L.; Fujitac, H.; Qian, Y.H. Accelerator for supervised neighborhood based attribute reduction. Int. J. Approx. Reason. 2020, 119, 122–150. [Google Scholar] [CrossRef]
  17. Chen, Y.M.; Zhang, Z.J.; Zheng, J.Z.; Ma, Y.; Xue, Y. Gene selection for tumor classification using neighborhood rough sets and entropy measures. J. Biomed. Inform. 2017, 67, 59–68. [Google Scholar] [CrossRef] [PubMed]
  18. Sun, L.; Zhang, X.Y.; Qian, Y.H.; Xu, J.C.; Zhang, S.G. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf. Sci. 2019, 502, 18–41. [Google Scholar] [CrossRef]
  19. Li, J.T.; Dong, W.P.; Meng, D.Y. Grouped gene selection of cancer via adaptive sparse group lasso based on conditional mutual information. IEEE-ACM Trans. Comput. Biol. Bioinform. 2018, 15, 2028–2038. [Google Scholar] [CrossRef]
  20. Wang, C.Z.; Hu, Q.H.; Wang, X.Z.; Chen, D.G.; Qian, Y.H.; Dong, Z. Feature selection based on neighborhood discrimination index. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2986–2999. [Google Scholar] [CrossRef]
  21. Wang, C.Z.; Huang, Y. Attribute reduction with fuzzy rough self-information measures. Inf. Sci. 2021, 549, 68–86. [Google Scholar] [CrossRef]
  22. Tsumoto, S. Accuracy and coverage in rough set rule induction. In Proceedings of the International Conference on Rough Sets and Current Trends in Computing, Malvern, PA, USA, 14–16 October 2002; Springer: Berlin/Heidelberg, Germnay, 2002; pp. 373–380. [Google Scholar]
  23. Xu, J.C.; Wang, Y. Feature genes selection based on fuzzy neighborhood conditional entropy. J. Intell. Fuzzy Syst. 2019, 36, 117–126. [Google Scholar] [CrossRef]
  24. Sun, L.; Wang, L.Y. Feature Selection Using Fuzzy Neighborhood Entropy-Based Uncertainty Measures for Fuzzy Neighborhood Multigranulation Rough Sets. IEEE Trans. Fuzzy Syst. 2021, 29, 19–33. [Google Scholar] [CrossRef]
  25. Sun, L.; Yin, T.Y. Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Inf. Sci. 2020, 537, 401–424. [Google Scholar] [CrossRef]
  26. Sun, L.; Wang, L.Y. Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems. Knowl. Based Syst. 2019, 186, 104942.1–104942.19. [Google Scholar] [CrossRef]
  27. Wang, L.; Ye, J. Matrix method of knowledge granularity calculation and its application in attribute reduction. Comput. Eng. Sci. 2013, 35, 97–102. [Google Scholar]
  28. Wang, L.; Li, T.R. A method of knowledge granularity calculation based on matrix. Pattern Recognit. Artif. Intell. 2013, 26, 447–453. [Google Scholar]
  29. Sun, L.; Zhang, X.Y. Joint neighborhood entropy-based gene selection method with fisher score for tumor classification. Appl. Intell. 2019, 49, 1245–1259. [Google Scholar] [CrossRef]
  30. Miao, D.Q.; Hu, G.R. A heuristic algorithm for knowledge reduction. J. Comput. Res. Dev. 1999, 36, 681–684. [Google Scholar]
  31. Wang, G.Y.; Yang, D.C. Decision table reduction based on conditional information entropy. Chin. J. Comput. 2002, 25, 759–766. [Google Scholar]
  32. Sun, L.; Zhang, X.Y.; Xu, J.C.; Wang, W.; Liu, R.N. A gene selection approach based on the fisher linear discriminant and the neighborhood rough set. Bioengineered 2018, 9, 144–151. [Google Scholar] [CrossRef] [Green Version]
  33. Aziz, R.; Verma, C.K.; Srivastava, N. A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data. Genom. Data 2016, 8, 4–15. [Google Scholar] [CrossRef] [Green Version]
  34. Jiang, F.; Sui, Y.F.; Zhou, L. A relative decision entropy-based feature selection approach. Pattern Recognit. 2015, 48, 2151–2163. [Google Scholar] [CrossRef]
  35. Fan, X.D.; Zhao, W.D.; Wang, C.Z.; Huang, Y. Attribute reduction based on max-decision neighborhood rough set model. Knowl. Based Syst. 2018, 151, 16–23. [Google Scholar] [CrossRef]
  36. Xu, J.C.; Mu, H.Y.; Wang, Y.; Huang, F.Z. Feature genes selection using supervised locally linear embedding and correlation coefficient for microarray classification. Comput. Math. Med. 2018, 2018, 1–11. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  37. Tibshirani, R.; Hastie, T.; Narasimhan, B.; Chu, G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 2002, 99, 6567–6572. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Dong, H.B.; Li, T.; Ding, R.; Sun, J. A novel hybrid genetic algorithm with granular information for feature selection and optimization. Appl. Soft Comput. 2018, 65, 33–46. [Google Scholar] [CrossRef]
  39. Sun, S.Q.; Peng, Q.K.; Zhang, X.K. Global feature selection from microarray data using Lagrange multipliers. Knowl. Based Syst. 2016, 110, 267–274. [Google Scholar] [CrossRef]
  40. Yang, X.B.; Zhang, M.; Dou, H.L.; Yang, J.Y. Neighborhood systems-based rough sets in incomplete information system. Knowl. Based Syst. 2011, 24, 858–867. [Google Scholar] [CrossRef]
  41. Yang, J.; Liu, Y.L.; Feng, C.S.; Zhu, G.Q. Applying the Fisher score to identify Alzheimer’s disease-related genes. Genet. Mol. Res. 2016. [Google Scholar] [CrossRef]
  42. Xu, F.F.; Miao, D.Q.; Wei, L. Fuzzy-rough attribute reduction via mutual information with an application to cancer classification. Comput. Math. Appl. 2009, 57, 1010–1017. [Google Scholar] [CrossRef] [Green Version]
  43. Sun, L.; Xu, J.C.; Wang, W.; Yin, Y. Locally linear embedding and neighborhood rough set-based gene selection for gene expression data classification. Genet. Mol. Res. 2016. [Google Scholar] [CrossRef] [PubMed]
  44. Zhang, W.; Chen, J.J. Relief feature selection and parameter optimization for support vector machine based on mixed kernel function. J. Mater. Eng. Perform. 2018, 14, 280–289. [Google Scholar] [CrossRef]
  45. Dunn, Q.J. Multiple comparisons among means. J. Am. Stat. Assoc. 1961, 56, 52–64. [Google Scholar] [CrossRef]
  46. Friedman, M. A comparison of alternative tests of significance for the problem of mrankings. Ann. Math. Stat. 1940, 11, 86–92. [Google Scholar] [CrossRef]
  47. Lin, Y.J.; Li, Y.W.; Wang, C.X.; Chen, J.K. Attribute reduction for multi-label learning with fuzzy rough set. Knowl. Based Syst. 2018, 152, 51–61. [Google Scholar] [CrossRef]
Figure 1. The number of selected features and average classification accuracy of nine data sets in different neighborhood radii. (a) Wine. (b) WDBC. (c) WPBC. (d) Ionosphere. (e) Colon. (f) SRBCT. (g) DLBCL. (h) Leukemia1. (i) Lung.
Figure 1. The number of selected features and average classification accuracy of nine data sets in different neighborhood radii. (a) Wine. (b) WDBC. (c) WPBC. (d) Ionosphere. (e) Colon. (f) SRBCT. (g) DLBCL. (h) Leukemia1. (i) Lung.
Entropy 23 00704 g001aEntropy 23 00704 g001b
Figure 2. The five feature selection algorithms use the Nemenyi test under the two classifiers to compare the classification performance. (a) KNN. (b) SVM.
Figure 2. The five feature selection algorithms use the Nemenyi test under the two classifiers to compare the classification performance. (a) KNN. (b) SVM.
Entropy 23 00704 g002
Figure 3. The five entropy-based feature selection algorithms use the Nemenyi test under the three classifiers to compare the classification performance. (a) KNN. (b) SVM. (c) C4.5.
Figure 3. The five entropy-based feature selection algorithms use the Nemenyi test under the three classifiers to compare the classification performance. (a) KNN. (b) SVM. (c) C4.5.
Entropy 23 00704 g003
Figure 4. The 11 dimensionality reduction algorithms use the Nemenyi test under the SVM classifiers to compare the classification performance.
Figure 4. The 11 dimensionality reduction algorithms use the Nemenyi test under the SVM classifiers to compare the classification performance.
Entropy 23 00704 g004
Table 1. N D S .
Table 1. N D S .
Uabcd
x 1 0.120.410.61Y
x 2 0.210.150.14Y
x 3 0.310.110.26N
x 4 0.610.130.23N
Table 2. Description of the nine data sets.
Table 2. Description of the nine data sets.
No.Data SetsFeaturesSamplesClassesReference
1Wine131783(59/71/48)Fan et al. [35]
2WDBC305692(357/ 212)Fan et al. [35]
3WPBC321942(46/148)Fan et al. [35]
4Ionosphere343512(126/225)Fan et al. [35]
5Colon2000622(22/40)Xu et al. [36]
6SRBCT2308634(23/8/12/20)Tibshirani et al. [37]
7DLBCL5469772(58/19)Wang et al. [20]
8Leukemia7129722(47/25)Dong et al. [38]
9Lung125331812(31/150)Sun et al. [39]
Table 3. The classification results of the original data and the data processed by BONJE algorithm.
Table 3. The classification results of the original data and the data processed by BONJE algorithm.
Data SetsRaw DataBONJE Algorithm δ
FeaturesKNNSVMC4.5AVEFeaturesKNNSVMC4.5AVE
Wine130.9490.9830.9380.95770.9610.9610.9440.9550.1
WDBC300.9680.9770.9330.95970.9600.9630.9470.9570.05
WPBC320.7010.7630.7580.74170.7430.7630.7630.7560.1
Ionosphere340.8660.8860.9150.889130.8750.8490.9150.8810.05
Colon20000.7580.8550.8230.81280.8400.8400.9030.8600.25
SRBCT23080.8100.9840.8250.87350.9210.9210.8890.9100.15
DLBCL54690.9090.9740.7270.87080.9480.9480.9350.9440.3
Leukemia71290.8330.9860.7920.87080.9310.9580.9440.9440.3
Lung125330.9390.9940.9500.961160.9940.9940.9670.9860.45
Table 4. Feature subset selected on data set by BONFDE algorithm.
Table 4. Feature subset selected on data set by BONFDE algorithm.
Data SetsFeature Subset
Wine{10,13,8,12,1,3,4}
WDBC{11,22,10,29,25,21,27}
WPBC{27,3,22,31,12,9,11}
Ionosphere{21,11,4,29,30,5,16,34,26,27,20,19}
Colon{1047,1672,29,354,1037,11,734,625}
SRBCT{1954,2240,879,1716,1207}
DLBCL{856,4656,1698,2651,3627,4410,3139,2618}
Leukemia{758,2267,6041,1234,5503,6209,4184,2295}
Lung{3916,5239,2193,3389,8110,8369,11272,2203,3466,610,
12262,2139,1521,5858,3975,3334 }
Table 5. The number of selected features by the five feature selection algorithms on the low-dimensional data set.
Table 5. The number of selected features by the five feature selection algorithms on the low-dimensional data set.
Data SetsRSNRSCDAMDNRSBONJE
Wine53247
WDBC82227
WPBC72247
Ionosphere1789813
AVE9.253.753.754.58.5
Table 6. KNN classification accuracy of five feature selection algorithms on low-dimensional data sets.
Table 6. KNN classification accuracy of five feature selection algorithms on low-dimensional data sets.
Data SetsRSNRSCDAMDNRSBONJE
Wine0.8630.7530.7270.9110.961
WDBC0.9110.9230.9230.9300.960
WPBC0.7430.7380.7380.7610.743
Ionosphere0.8660.8590.8480.8910.875
AVE0.8460.8180.8090.8730.885
Table 7. SVM classification accuracy of five feature selection algorithms on low-dimensional data sets.
Table 7. SVM classification accuracy of five feature selection algorithms on low-dimensional data sets.
Data SetsRSNRSCDAMDNRSBONJE
Wine0.6400.4020.6430.9100.961
WDBC0.5890.5950.5950.8610.963
WPBC0.7780.7570.7570.6920.763
Ionosphere0.8810.8720.8780.8700.849
AVE0.7220.6570.7180.8330.884
Table 8. Experimental results of five entropy-based feature selection algorithms on the Colon data set.
Table 8. Experimental results of five entropy-based feature selection algorithms on the Colon data set.
AlgorithmsFeaturesKNNSVMC4.5AVE
MEAR50.7700.8490.8220.814
EGGS110.6490.5560.6460.617
EGGS-FS20.7020.6210.6720.665
FSDNE30.8400.8380.7960.825
BONJE80.8400.8400.9030.860
Table 9. Experimental results of five entropy-based feature selection algorithms on the SRBCT data set.
Table 9. Experimental results of five entropy-based feature selection algorithms on the SRBCT data set.
AlgorithmsFeaturesKNNSVMC4.5AVE
MEAR10.3890.3640.3650.373
EGGS120.5750.7030.5130.597
EGGS-FS10.6370.6510.6260.638
FSDNE90.8460.9360.8210.868
BONJE50.9210.9210.8890.910
Table 10. Experimental results of five entropy-based feature selection algorithms on the DLBCL data set.
Table 10. Experimental results of five entropy-based feature selection algorithms on the DLBCL data set.
AlgorithmsFeaturesKNNSVMC4.5AVE
MEAR20.7650.7770.7780.773
EGGS200.8540.7810.8260.820
EGGS-FS30.8700.8410.8010.837
FSDNE110.9460.9270.9030.925
BONJE80.9480.9480.9350.944
Table 11. Experimental results of five entropy-based feature selection algorithms on the Leukemia data set.
Table 11. Experimental results of five entropy-based feature selection algorithms on the Leukemia data set.
AlgorithmsFeaturesKNNSVMC4.5AVE
MEAR30.9280.9200.9340.927
EGGS80.6290.8020.7330.721
EGGS-FS50.8010.6800.8130.765
FSDNE90.9520.9290.9050.929
BONJE80.9310.9580.9440.944
Table 12. Experimental results of five entropy-based feature selection algorithms on the Lung data set.
Table 12. Experimental results of five entropy-based feature selection algorithms on the Lung data set.
AlgorithmsFeaturesKNNSVMC4.5AVE
MEAR60.9580.9290.9640.950
EGGS120.8590.9600.9660.928
EGGS-FS60.9790.9900.9550.975
FSDNE80.9870.9880.9790.985
BONJE160.9940.9940.9670.986
Table 13. The number of features selected by 11 dimensionality reduction algorithms.
Table 13. The number of features selected by 11 dimensionality reduction algorithms.
AlgorithmsColonLeukemiaLungAVE
NRS4534
FLD-NRS6635
LLE-NRS16221618
Relife+NRS9172316.33
FBFE35308048.33
BDE3734.33
SFS19739.67
SC24534
MIM19739.67
FSDNE3986.67
BONJE881610.67
Table 14. SVM classification accuracy of 11 dimensionality reduction algorithms.
Table 14. SVM classification accuracy of 11 dimensionality reduction algorithms.
AlgorithmsColonLeukemiaLungAVE
NRS0.6110.6450.6410.632
FLD-NRS0.8800.8280.8890.866
LLE-NRS0.8400.8680.9070.872
Relife+NRS0.5640.5630.9190.682
FBFE0.8330.9120.8520.866
BDE0.7500.8240.9800.851
SFS0.5210.9590.8330.771
SC20.8050.8520.8060.821
MIM0.6530.7270.7950.725
FSDNE0.8280.9280.9880.915
BONJE0.8400.9580.9940.931
Table 15. Classification accuracy ranking of five feature selection algorithms under KNN classifier.
Table 15. Classification accuracy ranking of five feature selection algorithms under KNN classifier.
Data SetsRSNRSCDAMDNRSBONJE
Wine3(0.863)4(0.753)5(0.727)2(0.911)1(0.961)
WDBC5(0.911)3.5(0.923)3.5(0.923)2(0.930)1(0.960)
WPBC3(0.740)4.5(0.738)4.5(0.738)1(0.761)2(0.743)
Ionosphere3(0.866)4(0.859)5(0.848)1(0.891)2(0.875)
Ave3.544.51.51.5
Table 16. Classification accuracy ranking of five feature selection algorithms under SVM classifier.
Table 16. Classification accuracy ranking of five feature selection algorithms under SVM classifier.
Data SetsRSNRSCDAMDNRSBONJE
Wine4(0.640)5(0.402)3(0.643)2(0.910)1(0.961)
WDBC3(0.598)4.5(0.595)4.5(0.595)2(0.861)1(0.963)
WPBC1(0.778)3.5(0.757)3.5(0.757)5(0.692)2(0.763)
Ionosphere1(0.881)4(0.832)3(0.848)5(0.830)2(0.849)
Ave2.254.253.53.51.5
Table 17. χ F 2 and F F under two classifiers of five feature selection algorithms.
Table 17. χ F 2 and F F under two classifiers of five feature selection algorithms.
KNNSVM
χ F 2 12.87.8
F F 122.8537
Table 18. Classification accuracy ranking of five entropy-based feature selection algorithms under KNN classifier.
Table 18. Classification accuracy ranking of five entropy-based feature selection algorithms under KNN classifier.
Data SetsMEAREGGSEGGS-FSFSDNEBONJE
Colon3(0.770)5(0.649)4(0.702)1.5(0.840)1.5(0.840)
SRBCT5(0.389)4(0.575)3(0.637)2(0.846)1(0.921)
DLBCL5(0.765)4(0.854)3(0.870)2(0.946)1(0.948)
Leukemia3(0.928)5(0.629)4(0.901)1(0.952)2(0.931)
Lung4(0.958)5(0.859)3(0.979)2(0.987)1(0.994)
AVE44.63.41.71.3
Table 19. Classification accuracy ranking of five entropy-based feature selection algorithms under SVM classifier.
Table 19. Classification accuracy ranking of five entropy-based feature selection algorithms under SVM classifier.
Data SetsMEAREGGSEGGS-FSFSDNEBONJE
Colon1(0.849)5(0.556)4(0.621)3(0.838)2(0.840)
SRBCT5(0.364)3(0.703)4(0.651)1(0.936)2(0.921)
DLBCL5(0.777)4(0.781)3(0.841)2(0.927)1(0.948)
Leukemia3(0.920)4(0.802)5(0.680)2(0.929)1(0.958)
Lung5(0.929)4(0.960)3(0.990)2(0.988)1(0.994)
AVE3.843.821.4
Table 20. Classification accuracy ranking of five entropy-based feature selection algorithms under C4.5 classifier.
Table 20. Classification accuracy ranking of five entropy-based feature selection algorithms under C4.5 classifier.
Data SetsMEAREGGSEGGS-FSFSDNEBONJE
Colon2(0.822)5(0.646)4(0.672)3(0.796)1(0.903)
SRBCT5(0.365)4(0.513)3(0.626)2(0.821)1(0.889)
DLBCL5(0.778)3(0.826)4(0.801)2(0.903)1(0.935)
Leukemia2(0.934)5(0.733)4(0.813)3(0.905)1(0.944)
Lung4(0.964)3(0.966)5(0.955)1(0.979)2(0.967)
AVE3.6442.21.2
Table 21. χ F 2 and F F under three classifiers of five entropy-based feature selection algorithms.
Table 21. χ F 2 and F F under three classifiers of five entropy-based feature selection algorithms.
KNNSVMC4.5
χ F 2 16.611.6812.48
F F 19.52945.61546.6383
Table 22. Classification accuracy ranking of eleven dimensionality reduction algorithms under SVM classifier.
Table 22. Classification accuracy ranking of eleven dimensionality reduction algorithms under SVM classifier.
AlgorithmsColonLeukemiaLungAVE
NRS9(0.611)10(0.645)11(0.641)10
FLD-NRS1(0.880)7(0.828)6(0.889)4.67
LLE-NRS2.5(0.840)5(0.868)5(0.907)4.17
Relife+NRS10(0.564)11(0.563)4(0.919)8.33
FBFE4(0.833)4(0.912)7(0.852)5
BDE7(0.750)8(0.824)3(0.980)6
SFS11(0.521)1(0.959)8(0.833)6.67
SC26(0.805)6(0.852)9(0.806)7
MIM8(0.653)9(0.727)10(0.795)9
FSDNE5(0.828)3(0.928)2(0.988)3.33
BONJE2.5(0.840)2(0.958)1(0.994)1.83
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xu, J.; Qu, K.; Yuan, M.; Yang, J. Feature Selection Combining Information Theory View and Algebraic View in the Neighborhood Decision System. Entropy 2021, 23, 704. https://doi.org/10.3390/e23060704

AMA Style

Xu J, Qu K, Yuan M, Yang J. Feature Selection Combining Information Theory View and Algebraic View in the Neighborhood Decision System. Entropy. 2021; 23(6):704. https://doi.org/10.3390/e23060704

Chicago/Turabian Style

Xu, Jiucheng, Kanglin Qu, Meng Yuan, and Jie Yang. 2021. "Feature Selection Combining Information Theory View and Algebraic View in the Neighborhood Decision System" Entropy 23, no. 6: 704. https://doi.org/10.3390/e23060704

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop