Next Article in Journal
Theoretical Model of Radial Scattering Velocity of Fragments of the Reactive Core PELE Projectile
Previous Article in Journal
The Consistency of Estimators in a Heteroscedastic Partially Linear Model with ρ-Mixing Errors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Fast Attribute Reduction Algorithm Based on a Positive Region Sort Ascending Decision Table

1
School of Physics and Electronics, Central South University, Changsha 410083, China
2
School of Automation, Central South University, Changsha 410083, China
*
Author to whom correspondence should be addressed.
Symmetry 2020, 12(7), 1189; https://doi.org/10.3390/sym12071189
Submission received: 19 June 2020 / Revised: 13 July 2020 / Accepted: 14 July 2020 / Published: 17 July 2020
(This article belongs to the Section Computer)

Abstract

:
Attribute reduction is one of the challenging problems in rough set theory. To accomplish an efficient reduction algorithm, this paper analyzes the shortcomings of the traditional methods based on attribute significance, and suggests a novel reduction way where the traditional attribute significance calculation is replaced by a special core attribute calculation. A decision table called the positive region sort ascending decision table (PR-SADT) is defined to optimize some key steps of the novel reduction method, including the special core attribute calculation, positive region calculation, etc. On this basis, a fast reduction algorithm is presented to obtain a complete positive region reduct. Experimental tests demonstrate that the novel reduction algorithm achieves obviously high computational efficiency.

1. Introduction

Due to the development of data collection technology, more objects and attributes are stored. However, storing and processing all attributes could be very expensive and impractical computationally [1]. To address this issue, it is necessary to omit several attributes that will not seriously impact the resulting classification (recognition) error, cf. [2]. In rough set theory, an important method is emphasized to solve this problem and is referred to as attribute reduction [3].
Attribute reduction is one of the most important contributions and challenges in rough set theory. It deletes redundant attributes to enhance the efficiency and accuracy of knowledge abstraction technologies, such as pattern recognition, data mining, knowledge discovery, and decision analysis [4,5,6,7,8,9]. In general, classical reduction methods are divided into three types, which are referred to as positive region reduction, boundary region reduction, and entropy based reduction, respectively [10]. The positive region reduction method ignores the discernibility relationship between rough granules [11,12,13,14]. The second type ignores the discernibility relationship between rough granules with the same decision value sets [15]. The third type ignores the discernibility relationship of rough granules with the same information entropy [16,17,18]. As a comparison, positive region reduction is the most popular and widely used, especially for dynamic data sets and big data [19,20,21].
At present, a very important challenge of attribute reduction is to design an efficient and complete algorithm. It should be noted that calculating all of the reducts is a NP (non-deterministic polynomial) hard problem [22]. Therefore, most of the fast reduction algorithms apply the heuristic construction to calculate a single reduct. A classical heuristic algorithm calculates an entire core set first and then iterates the following heuristic processes until the algorithm is finished. The heuristic processes are: calculate the attribute significances of all the attributes, select the attribute with the most attribute significance, alter the object set or discernibility matrix, and return to the next heuristic process.
To accomplish an efficient heuristic reduction algorithm, many techniques have been developed in the last twenty years. In [23,24], the researchers calculate the entire core set by analyzing all of the object pairs, and the time complexity of the core set calculation is O(|U|2|C|). By using the notion of information granules, several algorithms successfully reduce the object set from |U| to |U/C| and make the time complexity of the core set calculation be O(|C||U/C|2) [25,26]. Xu et al. proposed a fast core set algorithm with the complexity of O(|C||U|+|C|2|U/C|) [27]. At the same time, many formulas or methods were proposed to calculate the different types of attribute significances. Some classical formulas are designed based on the positive region [28,29,30], entropy [3,16,17,18], the discernibility ability of attributes [13,14,24,31,32], the relationship between attributes [33], etc. In addition, many researchers proposed the mixed formulas by combining rough set theory and other theories, such as fuzzy set [12], ant colony optimization [23], granular computing [2,6,16,34], etc.
Although the efficiencies of traditional heuristic methods are optimized by the existing techniques, there are still some problems unresolved. First, the computation of attribute significance is inefficient. As a common feature, the formula of attribute significance would run (2|C|−|R|+1)×|R|/2 times if the addition construction was adopted, or (|C|+|R|+1|)×(|C|−|R|)/2 times if the deletion construction was adopted. These repeated calculations on attribute significance consume some running time. Second, when many attributes have the same significance, one randomly selects any one in general. However, a different subset of the selected attributes may make a great difference in classification accuracy [35].
To address these problems, this paper proposes a novel heuristic method. It applies a special core attribute calculation to replace the traditional attribute significance calculation. In detail, the new method only iterates the following heuristic processes, which are described as: calculate a single relative core attribute, alter the decision table, and return the next heuristic process, respectively. The new method is of simple structure, and includes three important features. First, it abandons the notion of attribute significance. Second, it only calculates a single core attribute in each heuristic processes. Third, each conditional attribute is checked at most once.
In order to realize the new method efficiently, some definitions and technologies are suggested. First, we define a positive region sort ascending decision table (PR-SADT), shown as Definition 1 and Algorithm 1. Next, a special core calculation algorithm is proposed (shown as Algorithms 2 and 3), which not only calculates a core attribute quickly, but also deletes some redundant column data. Besides, the traditional positive region calculation algorithm is also optimized based on PR-SADT (shown as Algorithm 4). These technologies are essential to achieve a fast reduction algorithm as in Algorithm 5.
The remainder of this paper is structured as follows. Some basic concepts are briefly reviewed in Section 2, which include attribute reduction and the positive region. In Section 3, the positive region sort ascending decision table is defined, and some related properties are discussed. In Section 4, we propose the reduction algorithm based on PR-SADT and analyze the advantages of the novel algorithm. Section 5 presents some numerical experiments to validate the efficiency of the proposed algorithm. Finally, we conclude this paper and discuss the outlook for further work in Section 6.

2. Preliminaries

In rough set theory, data are represented in an information table where a set of objects is described by using a finite set of attributes. An information table S is represented as the following tuple:
S = ( U , At , { V a | a At } , { I a | a At } )
where U is the universe of objects, At is a finite non-empty set of attributes, Va is the set of values of attribute a, and Ia:UVa is an information function that maps an object of U to exactly one value in Va. As a special type, the information table S is also referred to as a decision table if At = CD, where C = {a1,a2,…,an} is the condition attribute set and D = {d} is the decision attribute set. The decision table is considered to be inconsistent if two objects with the same condition values have different decision values. For example, Table 1 is a classical inconsistent decision table.
Given a subset of attributes B C , a symmetric indiscernibility relationship IND(B) is defined as follows: I N D ( B ) = { ( x , y ) U × U | a B , I a ( x ) = I a ( y ) ) } . The equivalence class (or granule) of an object x with respect to C is as follows: [x]C = {yU|(x,y)∈IND(C)}. The union of all of the granules with respect to C is referred to as a partition of the universe, which is described as: U/C = {[x]C|xU}. [x]C is exact if it has one decision value; otherwise, it is rough. The union of all of the exact granules with respect to C is referred to as the positive region.
Given an information table S, an attribute set R is called a reduct if and only if it satisfies the following two conditions:
( 1 )   I N D ( R )   =   I N D ( A t ) ; ( 2 )   For   any   a R , I N D ( R { a } ) I N D ( A t ) .
A reduct is a subset of attributes that are jointly sufficient and individually necessary to represent the equivalent knowledge with the attribute set C [14]. In general, there are several reducts for an information table. The set of reducts is referred to as RED(S), and the intersection of all reducts is the core set, which is described as: Core(S) = ∩RED(S). The core attributes are so important that they should be added into the results for addition and addition–deletion construction methods and should not be deleted in the heuristic steps for the deletion construction method [36].

3. Positive Region Sort Ascending Decision Table and Its Properties

In this section, we defined a sort ascending decision table (SADT) and a positive region SADT and investigated some important properties. These definitions and properties are important to optimize the novel attribute reduction algorithm.

3.1. SADT

In general, a data set is arrayed in two ways: sort ascending or sort descending. They are both effective for the proposed algorithm in this paper. For convenience, only sort ascending is discussed here.
Definition 1.
A decision table S = ( U , At = C { d } , { V a | a At } , { I a | a At } ) is referred to as a sort ascending decision table (SADT) if and only if it satisfies the following conditions:
1 .   x i , x i + 1 U [ I a 1 ( x i ) I a 1 ( x i + 1 ) ] 2 .   x i , x i + 1 U [ ( x i , x i + 1 ) I N D ( B m ) I a m + 1 ( x i ) I a m + 1 ( x i + 1 ) ] 3 .   ( x i , x i + 1 ) I N D ( C ) I d ( x i ) I d ( x i + 1 ) .
where Bm = {a1,a2,…,am}.
All of the objects in a SADT are sorted based on the ordered condition attribute set C. The default significance is: a1>a2>…>a|C|. In real applications, the order of condition attributes would be adjusted based on prior knowledge. For example, if the test costs of condition attributes are referenced, one makes the cheap attributes in advance for calculating a reduct with a lower cost.
The SADT is easily realized by sort functions or algorithms [37], such as Bubble Sort, Selection Sort, Insertion Sort, Shell Sort, Merge Sort, Quick Sort, Heap Sort, Counting Sort, Bucket Sort, Radix Sort, etc. However, in order to obtain a fast reduction algorithm, these sort algorithms with linear time complexity of O(|U||C|), such as Counting Sort, Bucket Sort, and the algorithm in [30], are only suggested. It is noted that we did not discuss how to design a fast sort algorithm. Additionally, we suggest the sortrows function to construct a SADT. The code is listed as follows.
  • “[m,n] = size(S);
  • SADT = sortrows (S,1: n);”
  • Based on SADT, one easily obtains the following properties.
Property 1.
Given an attribute set Bm = {a1,a2,…,am}. If ( x i , x j ) I N D ( B m ) and i<k<j, then ( x i , x k ) I N D ( B m ) .
Property 2.
Let U/C = {X1,X2,,XK} be a partition of a SADT. For any XiU/C, it has Xi = {xp+1, xp+2,…,xp+q}. where p = j = 1 i 1 | X j | , q = |Xi|.
These properties show that the objects in a granule with respect to C or Bm are adjacent physically. It is thus easy to discern the repeat objects and U/C.

3.2. PR-SADT

Since only positive region reducts were discussed in this paper, a positive region SADT (PR-SADT) was defined to replace SADT if the original decision table was inconsistent.
Definition 2.
Given a SADT S, a positive region sort ascending decision table (PR-SADT) Sp = (U, C∪{d},{Va},{Ia}) satisfies the following condition
f o r   x U ,   I d ( x ) = d n e w   i f   | I d ( [ x ] ) | > 1
where |.| denotes the cardinality of a set, dnewis a new decision value. In the related experiments inSection 5, we set dnew= max(Id(x)) + 1.
Definition 2 shows that the PR-SADT changes all of the rough granules in a SADT to the exact granules. Based on Definition 2, one easily obtains the following properties.
Property 3.
There are the same number of granules in SADT and PR-SADT.
Property 4.
If the original SADT is inconsistent, then there are repeated objects in the PR-SADT.
The repeated objects are not valuable for the reduction algorithms based on the positive region. Instead, they add the running time and the space requirement. Thus, it is necessary to delete the repeated objects. A fast algorithm for constructing a PR-SADT without repeated objects is described as Algorithm 1.
Algorithm 1. Construct a PR-SADT without repeated objects.
Input: a SADT;
Output: a PR-SADT without repeated objects.
1: Begin
2: For k = |U|:-1:2
3:  If a C , [Ia(xk−1)= Ia (xk)] ∧ [Id (xk−1)≠Id (xk)]
4:   Id (xk−1) = dnew;
5:   Delete object xk;
6:  end
7: end
8: end
Algorithm 1 only compares the adjacent objects. The time complexity is O(|U||C|). In the next sections, we only discussed the PR-SADT calculated by Algorithm 1. Namely, the PR-SADT is defaulted as a decision table without any repeated objects for convenience.
Example 1.
A decision table in [10] is listed to show the difference between a SADT and a PR-SADT calculated by Algorithm 1. The original data set is sorted in ascending order and is presented in Table 1.
Table 1 has 11 objects that are classified as five granules {x1,x2,x3}, {x4,x5}, {x6,x7}, {x8,x9,x10}, and{x11}, and only the last granule {x11} is exact. The corresponding PR-SADT calculated by Algorithm 1 is presented in Table 2.
PR-SADT in Table 2 has five objects, and there is a new decision value “3”. PR-SADT also has five granules but does not have any rough granules or repeating objects.

4. The Reduction Algorithm Based on PR-SADT

In this section, we will discuss how to obtain a positive region reduct by using PR-SADT in theory. Next, two efficient subalgorithms are proposed. Finally, the complete reduction algorithm based on PR-SADT is presented.

4.1. Positive Region Reduction Method Based on PR-SADT

PR-SADT is different from the original decision table because it changes and deletes some objects. To obtain a positive region reduct of the original decision table, it is necessary to provide the related description.
In general, a positive region reduction keeps the positive region of the target decision table unchanged. Although all of the granules or objects in the positive region are exact, the rough granules or objects cannot be ignored. In [10], we noted that a positive region reduction method should satisfy the following discernibility matrix M = (m(i,j)).
m ( x , y ) = { a | I a ( [ x i ] ) I a ( [ x j ] ) }   i f   ( I d [ x i ] I d [ x j ] ) { m i n ( | I d [ x i ] | , | I d [ x j ] | ) = 1 }
Matrix M illustrates the discernibility relationships corresponding to positive region reduction. To explain these relationships, we classified and listed them in Table 3.
For the original inconsistent decision table, it is necessary to analyze the “type of granule pairs” and the “decision value set” for judging whether a granule pair should be discerned.
If the original decision table is reformed to a PR-SADT by using Algorithm 1, all of the rough granules in the original decision table are changed to exact granules with the new decision value dnew. This means that the third type is changed to the first type. In a similar way, the fourth type of the granule pair is changed to the second type.
In conclusion, the discernibility relationships corresponding to positive region reduction in PR-SADT are described in Table 4.
It is worth noting that there are no rough granules or repeating objects in a PR-SADT calculated by Algorithm 1. Each granule in a PR-SADT only has one object. Therefore, the object pair is used to judge the discernibility relationship for convenience. In Table 4, there are only two items, which are less than those in Table 3, and only the “decision value set” is necessary.
Based on Table 4, we gave a new definition on the positive region reduct, which is described as follows.
Definition 3.
Let Sp be a PR-SADT without repeated objects. An attribute set R⊆C is called a positive region reduct if and only if R satisfies the following two conditions:
  •   x , y U , a R [ I d ( x ) I d   ( y ) I d ( x ) I d   ( y ) ] ,
  •   a R , x , y U [ I d   ( x ) I d   ( y )   ( x , y ) I N D ( R { a } ) ] .
The first condition ensures the discernibility relationship corresponding to an unchanged positive region reduct. This means that each object pair with different decision values in PR-SADT should be discerned. The second condition means that each attribute in a reduct is necessary. They are jointly sufficient and individually necessary to represent a positive region reduct if a PR-SADT is constructed.

4.2. Fast Core Attribute Calculation Based on PR-SADT

In this section, a special core attribute calculation algorithm is presented for the novel heuristic reduction method.
Theorem 1.
Let a PR-SADT without repeated objects Sp and the last conditional attribute a n C , If a n is a core attribute, then x k U , which satisfies the conditions: (xk,xk+1) I N D ( B ) , I a n ( x k ) < I a n ( x k + 1 ) , and I d ( x k ) I d ( x k + 1 ) , where B = {a1,a2,…,an−1}.
Proof. 
In a consistent decision table, if a n C o r e ( S ) , then [ x i ] B U / B and |Id([xi]B)|>1. This means that x k , x k + 1 [ x i ] B , it has I d ( x k ) I d ( x k + 1 ) . Considering that Sp is a PR-SADT, it also has   I a n ( x k ) < I a n ( x k + 1 ) . □
Theorem 1 shows three necessary conditions on the last condition attribute an. At the same time, the conditions (xk,xk+1) I N D ( B ) , I a n ( x k ) < I a n ( x k + 1 ) mean that the attribute an is the unique attribute that discerns the object pair (xk, xk+1), and I d ( x k ) I d ( x k + 1 ) means that the object pair should be discerned according to Definition 3. Hence, the three conditions in Theorem 1 are also sufficient to check whether an is a core attribute or not. Based on the above conclusion, an algorithm is given as follows.
If flag = 1, then the last condition attribute is a core attribute. In the worst case, Algorithm 2 iterates through the data set and has a time complexity of O(|U||C|).
The output of Algorithm 2 has two possibilities. If the last condition attribute is not a core attribute (flag = 0), one can efficiently check a core attribute by applying Theorem 2, which is described as follows.
Algorithm 2. Check the last condition attribute an.
Input: a PR-SADT
Output: flag
1: Begin
2:  flag = 0;
3: for k = 1: |U|-1
4:  if (xk,xk+1) I N D ( B ) , I a n ( x k ) < I a n ( x k + 1 ) , and I d ( x k ) I d ( x k + 1 )
5:  flag = 1 and return
6: End
7: End
8: end
Theorem 2.
Suppose S1 is the new decision table when the last column data of a PR-SADT Sp is deleted. If a n C o r e ( S p ) , then RED(S1) RED(Sp) and RED(S1)≠ .
Proof. 
Let RED(Sp)= R1R2, where R2 is the set of reducts that includes the last condition attribute an. Owing to a n C o r e ( S p ) , R1 . According to the relationship between Sp and S1, it has R1= RED(S1). Thus, RED(S1) RED(Sp) and RED(S1)≠ . □
Theorem 2 shows that the column data corresponding to the last condition attribute is redundant for a heuristic reduction algorithm if an is not a core attribute. Namely, it is effective for obtaining a reduct of the original decision table based on S1 because RED(S1) RED(Sp) and RED(S1)≠ . To reduce the running time of all of the remaining heuristic steps, it is necessary to delete the data of column an.
It is worth noting that it is impossible to obtain a reduct including an if the last column data is deleted. This shortcoming is acceptable because only one reduct is required in a heuristic reduction algorithm.
Algorithm 3 has several special features. First, it only calculates a single core attribute. Second, it deletes some redundant column data. Third, the output of Algorithm 3 is a relative core attribute. In other words, owing to some redundant column data have been deleted in Algorithm 3, the output is just a core attribute of S1. Considering RED(S1) RED(Sp), it has C o r e ( S p ) C o r e ( S 1 ) . Thus, the output may not be a core attribute of the original decision table Sp.
The time complexity is dependent on the number of redundant condition attributes. In the worst case that the output is a1, the time complexity is O(|U||C|2/2). The more exact analysis on time complexity is shown in Section 4.4.
Algorithm 3. The special core attribute calculation algorithm.
Input: a PR-SADT
Output: a core attribute
1: Step 1: check the last condition attribute by Algorithm 2.
2: Step 2: if flag = 0, then delete the data corresponding to the last condition attribute and jump to step1; else step3.
3: Step 3: output the last condition attribute

4.3. Fast Positive Region Calculation Based on PR-SADT

In this section, a fast method based on PR-SADT is presented to calculate the positive region with respect to attribute set R.
Theorem 3.
Let attribute set R = {a1,a2,…,am}, U/R = {X1,X2,…,XK}. For X i U / R , if X i P O S R ( D ) = , then   x k , x k + 1 X i , and it satisfies: Id(xk)≠Id (xk+1).
Proof. 
In a PR-SADT, the objects in a granule with respect to attribute set R are adjacent. Suppose   X i = { x p + 1 , x p + 2 , , x p + q } , where q = |Xi|. Owing to X i P O S R ( D ) = , it has |Id (Xi)| >1. Hence, x k , x k + 1 X i and it satisfies Id (xk)≠Id (xk+1). □
Theorem 3 illustrates a simple way to discern the positive region with respect to R. The related algorithm is described as follows.
Algorithm 4 calculates the positive region with respect to R by scanning a PR-SADT once. The time complexity is O(|U||R|), where |R|≤|C|. As a contrast, the time complexity of a classical positive region calculation is O(|U|2|C|). The positive region calculation algorithm in [29] has the complexity of O(|U||C|2). In [32], the complexity of calculating the positive region is O(|U||C|log|U|).
Algorithm 4. Calculate the positive region with respect to R in a PR-SADT.
Input: a PR-SADT, attribute set R = {c1,c2,…,cm}.
Output: the positive region with respect to R.
1: Step 1: set the default value.
   PR =   , gra = {x1}, flag = 0.
2: Step 2: compare the adjacent object pair
  For i = 1: |U|−1
   gra = gra {xi+1} if ( x i , x i + 1 ) I N D ( R ) ;//discern the object in a granule;
   flag = 1 if ( x i , x i + 1 ) I N D ( R ) and Id (xi)≠Id (xi+1);//the granule is rough if flag is 1;
   PR = PR   gra if a R , I a ( x i ) I a ( x i + 1 ) and flag == 0;//record the exact granule;
   gra = {xi+1}, flag = 0 if a R , I a ( x i ) I a ( x i + 1 ) ;//prepare for the next granule
  end
3: Step 3: record the last exact granule.
  If flag==0
   PR=PR gra;//record the last exact granule
  end
4: Step 4: output PR.
Example 2.
According to Algorithm 4, the positive region of the PR-SADT in Table 2 is calculated by the following process.
Suppose R = {a1,a2}. In step 1, PR= , gra = {x1}, flag =0. In step 2, these parameters were calculated in Figure 1.
In step 3, the last object x5 is added into PR. Finally, output the positive region P R = { x 1 , x 2 , x 3 , x 4 , x 5 } .

4.4. The Attribute Reduction Algorithm Based on PR-SADT

The fast positive region reduction algorithm based on PR-SADT (FPRA) was proposed as Algorithm 5, and the related flow chart is described as Figure 2.
Algorithm 5. The fast positive region reduction algorithm based on PR-SADT (FPRA)
Input: a decision table S.
Output: a complete reduct.
1: Step 1. R =   . Sort the original decision table.
2: Step 2. Delete the repeated objects, and calculate a PR-SADT by Algorithm 1.
3: Step 3. Check the last condition attribute an by Algorithm 2. If it is a core attribute, then jump to step 5; else, step 4.
4: Step 4. Delete the last column data, and jump to step 3.
5: Step 5. R = R∪{ak }. Place the last column to the first column, and sort the decision table.
6: Step 6. Calculate the positive region with respect to R by Algorithm 4. Delete the positive region.
7: If Sp is null or Id(Sp) is dnew, then output the reduct R; else, jump to step 3.
Analysis on the completeness of FPRA:
FPRA satisfies two key features. First, it adopts the reduct construction by deletion. Second, each attribute in R is a core attribute with respect to the related heuristic steps. Thus, R is a complete reduct.
The detail proof is described as follows.
Considering any attribute a i R , there is a object pair (xk,xk+1), which satisfies the conditions according to step 3 in FPRA: (xk,xk+1) I N D ( B ) , I a n ( x k ) < I a n ( x k + 1 ) , and I d ( x k ) I d ( x k + 1 ) , where B=Ri∪{a1,a2,…,ai−1}, R i = { a j R | j > i } . This means that the object pair (xk,xk+1) cannot be discerned by B. At the same time, owing to R { a i } B , it is concluded that the object pair (xk,xk+1) cannot be discerned by R-{ai}. However, the object pair can be discerned by R according to Algorithm 5. Thus, attribute ai is essential for attribute set R.
In conclusion, the attributes of R are jointly sufficient and individually necessary for the original data set. Thus, R is a complete reduct.
Analysis on time complexity:
FPRA includes three subprocesses: the S1 process of constructing a PR-SADT (step 1 and step 2), the S2 process of calculating a core attributes (step 3->step 4->step 3) and the S3 process (step 5->step 6).
Considering an original decision table, one adopts the algorithm in [27,30] to construct a PR-SADT with the time complexity of O(|U||C|). However, the real running times of algorithms in [27,30] are dependent on the good programming style or habit. In the related experimental section, we apply the sortrows function to sort a decision table. Step 2 is accomplished by Algorithm 1, and the time complexity is O(|U||C|). Thus, the time complexity of the S1 subprocess is O(|U||C|).
In the next steps, the number of object sets and condition attribute sets are different in each heuristic process. The S2 process (step 3->step 4) deletes some related columns of data set, and the S3 process (step 5->step 6) rearranges the PR-SADT and deletes the related positive regions (some rows of data set). These two subprocesses reduce |U| and |C| and are highly efficient in optimizing the time complexity of FPRA.
Let Ui and Ci represent the object set and condition attribute set of the ith heuristic process, respectively. It has U 1 U 2 U k 1 , C 1 C 2 C k 1 , where k = |R| is the number of attributes in reduct R, C1 = C, and |U1| = |U/C|.
Step 3 is calculated with Algorithm 2, and the time complexity is O(|Ui||Ci|). Step 5 sorts the decision table, and the complexity of the ith heuristic process is O(|Ui||Ci|). The time complexity of step 6 includes two parts. One comes from Algorithm 4 and is represented as O(i |Ui|), where i is the number of attributes of R for the ith heuristic process. The other part originated by deleting the positive region, and it also has a time complexity of O(i |Ui|).
In Algorithm 5, S2 subprocess will be performed |R| times and thus has a time complexity of O ( i = 1 | R | | U i | j = 1 Q i ( | C i | j + 1 ) ) where Qi = |Ci|-|Ci+1|. S3 will also be performed |R| times with time complexity of O ( i = 1 | R | | U i | ( i + | C i | ) ) .
Finally, the total time complexity is O ( | U | | C | + i = 1 | R | | U i | ( i + | C i | ) + i = 1 | R | | U i | j = 1 Q i ( | C i | j + 1 ) ) . In the best case where R = {c|C|}, even the speed of O(|C||U|) is possible. In the worst case where R = C, the time complexity is O ( | U | | C | + i = 1 | C | | U i | ( i + | C i | ) + i = 1 | C | | U i | | C i | ) . Considering R is the output of FPRA, the time complexity is treated as O ( | U | | C | + i = 1 | C | | U i | ( i + 2 | C i | ) . The time complexity of FPRA is considerably less than those of traditional algorithms, which has a time complexity of O(|U|2|C|2) [2,27]. To stress the advantage of Algorithm 4, some excellent reduction algorithms are compared and listed in Table 5.
Obviously, the time complexity of FPRA is less than those of the algorithms in [2,38,39]. It is worth noting that the Ui of the algorithm in [1] is different from Ui of FPRA. This means that it is hard to compare the efficiencies of the two algorithms (algorithm in [1] and FPRA) by the time complexity in Table 5. The related experiments in Section 5 will propose the more effective evidence to represent the advantage of FPRA.
Analysis on the characteristic of FPRA:
To summarize, FPRA is complete and efficient. It has the following important features and advantages.
1.
FPRA is dependent on an efficient sort function.
FPRA just repeats a simple procedure: sort->compare->delete. Only the most efficient sort function is considered in FPRA. Thus, all the comparisons sort algorithms, such as Bubble sort (O(n2)), Shell sort (O(nlogn)), Merge sort (O(nlogn)), Quick sort (O(nlogn)), etc., are not suitable for FPRA because of the limit of O(nlogn). Instead, bucket sort algorithms are considered because their time complexities below O(nlogn). In fact, we did not pay attention to how to design a sort function because many tools or software provide the efficient sort functions. Additionally, the sortrows function or the Shuffle in MapReduce is highly recommended.
2.
FPRA does not calculate any attribute significances.
Most of traditional heuristic attribute reduction algorithms would provide a simple or complex definition to calculate attribute significances of all the condition attributes. No matter how simple the definition is, it is necessary to calculate and compare the significances of all the attributes and select the most significant attribute. This calculation process on significance would be run (2|C|-|R|+1) ×|R|/2 times if the addition construction was adopted, or (|C|+|R|+1|)×(|C|-|R|)/2 times if the deletion construction was adopted. As a comparison, the special core attribute calculation in FPRA only would be run |C| times.
3.
The heuristic method of FPRA is more efficient and concise.
The traditional heuristic algorithms include two kinds of calculation: the entire core set calculation before the heuristic process and attribute significance calculation in heuristic processes, respectively. As a comparison, FPRA only has a kind of calculation: core attribute calculation in heuristic processes. In detail, FPRA calculates a single core attribute in each heuristic process, while the traditional algorithms have to calculate the attribute significances of all the existed condition attributes.
Besides, each conditional attribute of FPRA is checked at most once. In the traditional heuristic algorithms, a conditional attribute would be checked (2|C|-|R|+1)×|R|/(2|C|) or (|C|+|R|+1|)×(|C|-|R|)/(2|C|) times in average. Therefore, FPRA is more efficient and concise than the traditional heuristic algorithms.

5. Experimental Results

In this section, we will evaluate the proposed approach (FPRA) based on several data sets from the UCI (University of California, Irvine) Repository [1,38,40,41]. The related works include performance analysis and comparison tests. All of the experiments on FPRA were conducted using a PC with Inter(R) CPU G645, 2.9 GHz and 1.81 GB memory.
Some data sets from the UCI Repository were used in the experiments as outlined in Table 6. There are some data sets with missing values, such as Mushroom and Breast-cancer-wisconsin. For uniform treatment of all data sets, we replaced the missing values with a new value that did not appear in the original data set. Some data sets, such as sensorless, were transformed into discrete data sets by a simple uniform discretization algorithm. Specifically, each of the related continuous columns was divided into 10 equal intervals.

5.1. Performance Analysis

At present, the time complexities of fast reduction algorithms have beyond O(|U||C|2) and entered the interval of (O(|U||C|), O(|U||C|2)). In order to illustrate the advantages on computational efficiency, many researchers have to apply some inexact and sealed parameters, such as Ui, Ci, etc., to describe the time complexities of proposed algorithms. These time complexities suffer two disadvantages.
  • It is difficult to estimate the real running times from the time complexities using sealed parameters. For example, the time complexity of the fast reduction algorithm in [1] is O ( | U | | C | + i = 1 | C | | U i | ( | C | i + 1 ) ) . It is less than O(|U||C|2). However, there are |C| sealed parameters |U1|,|U2|,…,|U|C||. It is hard to estimate the exact running time.
  • It is difficult to compare the computational efficiencies of different reduction algorithms. First, these sealed parameters are influenced by the heuristic constructions and real data sets. They have different values for different algorithms. Second, the time complexities with these sealed parameters are always very complex, such as FSPA, etc.
In this paper, the proposed algorithm FPRA is also confused by the above hard problems. It has some sealed parameters, which are U1,U2,..,U|R|,C1,C2,…,C|R|, respectively. It is very difficult to estimate the real efficiency based on the theoretical time complexity of   O ( | U | | C | + i = 1 | C | | U i | ( i + 2 | C i | ) . In order to resolve these hard problems, we suggest an approximate time complexity of FPRA, which is simple and easy to estimate the real running time. The detailed way is described as follows.
We will record the real running times of three subprocesses of FPRA and analyze the features of subprocesses. On the basis, an experimental model of time complexity is suggested.
Some classical data sets in UCI are applied to test the related running time and the experimental results are listed in Table 7, where T1, T2, and T3 are the running time with respect to the three subprocesses of S1, S2, and S3, respectively.
Especially, the covertype data set was seldom reported by the existing reduction algorithms because of 581,012 objects and 54 attributes. However, FPRA could calculate this data set within only 49.468 s. This phenomenon means that FPRA was efficient to the existing reduction algorithms. Some ratios on the time consumption of subprocesses are presented in Figure 3.
Figure 3 describes the ratios of |R|/|C|, T2/T, and T3/T, where the X-coordinate represents the ten data sets in Table 7, and T1, T2, and T3 are the running time with respect to the three subprocesses of S1, S2, and S3, respectively. Some important conclusions are presented as follows.
  • The S3 subprocess consumed the most running time when |R|/|C| was large. For data sets (3,4,6,7), the ratios of |R|/|C| were 1, 0.75, 0.8125, and 0.8095, respectively. The related ratios of T3/T were 0.811, 0.6868, 0.8295, and 0.8894.
  • The S2 subprocess consumed the most running time when |R|/|C| was small. For data sets (8, 10), the ratios of |R|/|C| were 13.3% and 11.1%, respectively. The related ratios of T2/T were 44.36% and 79.34%.
  • The trend of T3/T was similar to that of |R|/|C|; the trend of T2/T was opposite to that of |R|/|C|.
The above features show that the real running time was influenced by |R| as well as |U| and |C|.
Next, we compared the time complexity of FPRA with O(|U||C|2) and O(|U||C||R|). The related results are presented in Figure 4.
The time complexity of S1 was O(|U||C|). Suppose the real time complexity of FPRA is similar to O(k|U||C|). Then, k is described as the ratio of T/T1. In Figure 4, the ratios of T/T1 varied from 3.34 to 23.4, and the average value was 8.6. As a comparison, the average value of |C| was 40.4. Obviously, the time complexity of FPRA was considerably less than O(|U||C|2). The average value of |R| was 15.5, which was slightly more than that of the ratios of T/T1.
As a result, the real time complexity of FPRA is similar to O(|U||C||R|).
To obtain more accurate experimental results, we constructed 60 data sets based on six original data sets, which were shuttle_all, sensorless, connect_4, ipums.la.97, ipums.la.99, and covertype, respectively. For each original data set, we divided it into 10 parts of equal size. The first part was regarded as the first data set, the combination of the 1st part and the 2nd part was viewed as the second data set, and the combination of all ten parts was viewed as the tenth data set.
The related ratios for the real running time of the 60 data sets are shown in Figure 5.
In the 60 data sets in Figure 5, S3 consumed the most running time (T3/T > 50%) when |R|/|C| > 40%. S2 consumed the most time (T2/T > 50%) when |R|/|C| < 20%. In all of the subfigures, it was easy to determine that the trend of T2/T was opposite to those of T3/T. These features show that the real running time had a tight relationship with |R|.
Next, we evaluated the real time complexity with the 60 data sets.
In the 60 data sets in Figure 6, the curves on |C| were higher than the other curves. This shows that the real time complexity of FPRA was considerably less than O(|U||C|2). There were 46 data sets that had |R| > T/T1. The other 14 data sets satisfied the condition that |R| < T/T1. The average value of |C| for the 60 data sets was 45.5. As a comparison, the average values of T/T1 and |R| of the 60 data sets were 9.2252 and 15.3, respectively. In particular, in shuttle_all, ipums97, and ipums99 data sets, the curves of |R| and T/T1 were very similar.
As a result, the real time complexity of FPRA could be evaluated as O(|U||C||R|), which was less than O(|U||C|2). It is noted that O(|U||C||R|) was an experimental result, not a theoretical result.

5.2. Comparison Experiments

To illustrate the advantage of FPRA, it was compared with some existing fast reduction algorithms, which also calculated a positive region-based reduct.
In order to obtain fair and objective conclusion, all the running times of compared algorithms were recorded from the related literatures. That is, the running times of compared algorithms were proved by the original researchers. At the same time, we used the similar PC and the same data sets to obtain the real running times of FPRA. This method avoids the influences on programming habits of researchers and makes the conclusion objective.
Experiment 1.
It was compared with the classical reduction algorithm and the optimized algorithm in [1]. The experimental results are listed in Table 8.
PR is a classical reduction algorithm based on a positive region, and FSPA-PR is an optimized reduction algorithm proposed in [29]. The running times of PR and FSPA-PR are recorded from the literature [1].
In Table 8, three reducts of FPRA were larger than those of PR, and two reducts (Backup_large.test and Letter-recognition) were less than those of PR. This is due to the different heuristic construction. FPRA is based on reduct construction by deletion, while PR and FSPA-PR were based on reduct construction by addition. In [33], we noted that the reduct construction by deletion had a strong conservative property. As the price for obtaining a complete reduct, the construction by deletion was less effective in obtaining a minimal reduct.
In Table 8, FPRA clearly exhibited the best time efficiency on the nine datasets, and PR performed the worst. The ratios of running time on FPRA/PR varied from 0.09% to 12.8%. The other ratios of FPRA/FSPA-PR were from 0.12% to 17.6%. On average, for the nine data sets, the time consumption of FPRA was 0.14% of that of PR and 0.26% of that of FSPA-PR. The results show that the proposed algorithm FPRA was surprisingly efficient.
Experiment 2.
The proposed algorithm was also compared with algorithms in [38], and the results are shown in Table 9.
Algorithm ADM (Algorithm based on discernibility matrix) is a classical reduction algorithm based on the discernibility matrix and discernibility function. Its complexity is O(|U|2|C|2). Algorithm OADM (optimized ADM) is an optimized fast reduction algorithm proposed in [38], which has the complexity of O(|C|2|U|log|U|).
Table 9 shows that the running time of FPRA was considerably less than those of the compared algorithms. The ratios of running time of FPRA/ADM were only from 0.03% to 1.09%. The other ratios of FPRA/OADM were from 3.18% to 27.79%. On average, for the five data sets, the time consumption of FPRA was 0.071% of Algorithm ADM and 4.72% of Algorithm OADM.
Experiment 3.
We compared FPRA with the reduction algorithm in [40], and the results are shown in Table 10.
It is noted that the running times of Q-ARA (Quick Assignment Reduction Algorithm) were reported by the literature [40] and tested in similar PC. Table 10 shows that the running time of FPRA was considerably less than that of Q-ARA. The ratios of running time of FPRA/Q-ARA were only from 0 to 14.56%. On average, for the 11 data sets, the time consumption of FPRA was 0.56% of Algorithm Q-ARA.

6. Conclusions

In this paper, we proposed a unique and innovative heuristic method, which applies a special core attribute calculation to replace the traditional attribute significance calculation. This method was concise and each conditional attribute was only checked at most once.
The key of the proposed method is a sort function, and the surprisingly running efficiency of FPRA is dependent on the sortrows function. The T1 of Table 7 lists the exact times on sorting the original data and constructing a PR-SADT.
The experimental analysis shows that the real time complexity of FPRA was less than O(|U||C|2).
The proposed algorithm FPRA is also appropriate for big data reduction because it only uses two basic operations (sort and comparison), while MapReduce (model for big data) provides an efficient sort technology. This issue will be addressed in future work.

Author Contributions

All authors, L.Y. and Z.J. contribute equally to this article. All authors have read and approved the manuscript the final manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (Grant No.61502538, No.61273185); the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (Grant No. 61321003); and innovation-driven plan in Central South University.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Qian, Y.H.; Liang, J.Y.; Pedrycz, W.; Dang, C.Y. Positive approximation: An accelerator for attribute reduction in rough set theory. Artif. Intell. 2010, 174, 597–618. [Google Scholar] [CrossRef] [Green Version]
  2. Hu, Q.H.; Liu, J.F.; Yu, D.R. Mixed feature selection based on granulation and approximation. Knowl. Based Syst. 2008, 21, 294–304. [Google Scholar] [CrossRef]
  3. Wang, G.Y.; Ma, X.A.; Yu, H. Monotonic uncertainty measures for attribute reduction in probabilistic rough set model. Int. J. Approx. Reason. 2015, 59, 41–67. [Google Scholar] [CrossRef]
  4. Chang, S. A novel attribute reduction method based on rough sets and its application. Int. J. Adv. Comput. Technol. 2012, 4, 99–104. [Google Scholar]
  5. Hu, Q.H.; Yu, D.R.; Xie, Z.X. Neighborhood classifiers. Expert Syst. Appl. 2008, 34, 866–876. [Google Scholar] [CrossRef]
  6. Liang, J.; Wang, F.; Dang, C.; Qian, Y. An efficient rough feature selection algorithm with a multi-granulation view. Int. J. Approx. Reason. 2012, 53, 912–926. [Google Scholar] [CrossRef] [Green Version]
  7. Nie, S.Z.; Wang, Z.; Pujia, W.; Nie, Y.; Lu, P. Big data prediction of durations for online collective actions based on peak’s timing. Phys. A-Stat. Mech. Appl. 2018, 492, 138–154. [Google Scholar] [CrossRef]
  8. Skowron, A.; Jankowski, A.; Swiniarski, R. 30 Years of Rough Sets and Future Perspectives, Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing; Springer: Berlin/Heidelberg, Germany, 2013; pp. 1–10. [Google Scholar]
  9. Zong, F.; Tian, Y.D.; He, Y.N.; Tang, J.J.; Lv, Y.Y. Trip destination prediction based on multi-day GPS data. Phys. A-Stat. Mech. Appl. 2019, 515, 258–269. [Google Scholar] [CrossRef]
  10. Yin, L.; Gui, W.; Yang, C.; Wang, X.; Ling, C.X. Core set analysis in inconsistent decision tables. Inf. Sci. 2013, 241, 138–147. [Google Scholar] [CrossRef]
  11. Deng, S.; Yue, D.; Fu, X.; Zhou, A.H. Security risk assessment of cyber physical power system based on rough set and gene expression programming. IEEE/CAA J. Autom. Sin. 2015, 2, 431–439. [Google Scholar]
  12. Hu, Q.; Xie, Z.; Yu, D. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recognit. 2007, 40, 3509–3521. [Google Scholar] [CrossRef]
  13. Yang, M.; Yang, P. Algorithms based on general discernibility matrix for computation of a core and attribute reduction. Control Decis. 2008, 23, 1049–1054. [Google Scholar]
  14. Yao, Y.Y.; Zhao, Y. Discernibility matrix simplification for constructing attribute reducts. Inf. Sci. 2009, 179, 867–882. [Google Scholar] [CrossRef] [Green Version]
  15. Lu, Z.; Qin, Z.; Zhang, Y.; Fang, J. A fast selection approach based on rough set boundary regions. Pattern Recognit. Lett. 2014, 36, 81–88. [Google Scholar] [CrossRef]
  16. Qian, Y.; Liang, J. Combination entropy and combination granulation in rough set theory. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 2008, 16, 179–193. [Google Scholar] [CrossRef]
  17. Wang, C.; Ou, F.F. An attribute reduction algorithm based on conditional entropy and frequency of attributes. In Proceedings of the International Conference on Intelligent Computation Technology and Automation, Changsha, China, 20–22 October 2008; pp. 752–756. [Google Scholar]
  18. Wang, G.Y.; Zhao, J.; An, J.J.; Wu, Y. A comparative study of algebra viewpoint and information viewpoint in attribute reduction. Fundam. Inform. 2005, 68, 289–301. [Google Scholar]
  19. Qian, J.; Miao, D.; Zhang, Z.; Yue, X. Parallel attribute reduction algorithms using MapReduce. Inf. Sci. 2014, 279, 671–690. [Google Scholar] [CrossRef]
  20. Shu, W.H.; Qian, W.B. An incremental approach to attribute reduction from dynamic incomplete decision systems in rough set theory. Data Knowl. Eng. 2015, 100, 116–132. [Google Scholar] [CrossRef]
  21. Yin, L.Z.; Yang, C.H.; Wang, X.L.; Gui, W.H. An incremental algorithm for attribute reduction based on labeled discernibility matrix. Acta Autom. Sin. 2014, 40, 397–403. [Google Scholar]
  22. Yao, Y.Y. Duality in rough set theory based on the square of opposition. Fundam. Inform. 2013, 127, 49–64. [Google Scholar] [CrossRef]
  23. Chen, Y.M.; Miao, D.Q.; Wang, R.Z. A rough set approach to feature selection based on ant colony optimization. Pattern Recognit. Lett. 2010, 31, 226–233. [Google Scholar] [CrossRef]
  24. Yang, P.; Li, J.; Huang, Y. An attribute reduction algorithm by rough set based on binary discernibility matrix. In Proceedings of the Fuzzy Systems and Knowledge Discovery, Jinan, China, 18–20 October 2008; pp. 276–280. [Google Scholar]
  25. Xu, Z.Y.; Yang, B.R.; Song, W. Quick computing core algorithm based on discernibility matrix. Comput. Eng. Appl. 2006, 42, 4–6. [Google Scholar]
  26. Yang, M.; Sun, Z.H. Improvement of discernibility matrix and the computation of a core. J. Fudan. Univ. 2004, 43, 865–868. [Google Scholar]
  27. Xu, Z.Y.; Shu, W.H.; Qian, W.B.; Yang, B.Y. Quick algorithm for computing core of the positive region based on order relation. Comput. Sci. 2010, 37, 208–211. [Google Scholar]
  28. Liu, S.H.; Sheng, Q.J.; Wu, B.; Shi, Z.; Hu, F. Research on efficient algorithms for rough set methods. Chin. J. Comput. 2003, 26, 524–529. [Google Scholar]
  29. Shen, J.; Lv, Y. A rapid algorithm for reduction based on positive region attribute significance. In Proceedings of the Electrical and Control Engineering (ICECE), 2010 International Conference on, Wuhan, China, 25–27 June 2010; pp. 4940–4943. [Google Scholar]
  30. Xu, Z.Y.; Liu, Z.P.; Yang, B.R.; Song, W. A Quick Attribute reduction algorithm with complexity of max (O(|C||U|,O(|C|2|U/C|))). Chin. J. Comput. 2006, 29, 391–399. [Google Scholar]
  31. Zhang, J.; Zhang, X.Y.; Xu, W.H. Lower approximation reduction based on discernibility information tree in inconsistent ordered decision information systems. Symmetry 2018, 10, 696. [Google Scholar] [CrossRef] [Green Version]
  32. Zhao, Y.; Yao, Y.Y.; Luo, F. Data analysis based on discernibility and indiscernibility. Inf. Sci. 2007, 177, 4959–4976. [Google Scholar] [CrossRef]
  33. Yin, L.Z.; Yang, C.H.; Wang, X.L.; Gui, W.-H. Reduction method based on attribute repulsion matrix. Control Decis. 2013, 28, 434–438. [Google Scholar]
  34. Witold, P. Granular computing for data analytics: a manifesto of human-centric computing. IEEE/CAA J. Autom. Sin. 2018, 5, 1025–1034. [Google Scholar]
  35. Qian, J.; Miao, D.; Zhang, Z.; Li, W. Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation. Int. J. Approx. Reason. 2011, 52, 212–230. [Google Scholar] [CrossRef] [Green Version]
  36. Yao, Y.Y.; Zhao, Y.; Wang, J. On reduct construction algorithms. In Rough Sets and Knowledge Technology; Springer: Berlin/Heidelberg, Germany, 2006; pp. 297–304. [Google Scholar]
  37. Jehad, A.; Rami, M. An enhancement of major sorting algorithms. Int. Arab J. Inf. Technol. 2010, 7, 55–62. [Google Scholar]
  38. Meng, Z.; Shi, Z. A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets. Inf. Sci. 2009, 179, 2774–2793. [Google Scholar] [CrossRef]
  39. Qian, Y.; Liang, J.; Pedrycz, W.; Dang, C. An efficient accelerator for attribute reduction from incomplete data in rough set framework. Pattern Recognit. 2011, 44, 1658–1670. [Google Scholar] [CrossRef]
  40. Li, M.; Shang, C.; Feng, S.; Fan, J. Quick attribute reduction in inconsistent decision tables. Inf. Sci. 2014, 254, 155–180. [Google Scholar] [CrossRef]
  41. Song, M.; Wu, Y.F. Handbook of Research on Text and Web Mining Technologies; IGI Global: Hershey, PA, USA, 2009; Chapter XLIV; pp. 766–784. [Google Scholar]
Figure 1. The calculation process of step 2 of Algorithm 4.
Figure 1. The calculation process of step 2 of Algorithm 4.
Symmetry 12 01189 g001
Figure 2. The flow chart of fast positive region reduction algorithm based on PR-SADT (FPRA).
Figure 2. The flow chart of fast positive region reduction algorithm based on PR-SADT (FPRA).
Symmetry 12 01189 g002
Figure 3. The ratios of |R|/|C|, T2/T, and T3/T.
Figure 3. The ratios of |R|/|C|, T2/T, and T3/T.
Symmetry 12 01189 g003
Figure 4. Evaluation on T/T1 of FPRA.
Figure 4. Evaluation on T/T1 of FPRA.
Symmetry 12 01189 g004
Figure 5. The ratios of |R|/|C|, T2/T, and T3/T based on 60 data sets.
Figure 5. The ratios of |R|/|C|, T2/T, and T3/T based on 60 data sets.
Symmetry 12 01189 g005
Figure 6. Evaluation on T/T1 based on 60 data sets.
Figure 6. Evaluation on T/T1 based on 60 data sets.
Symmetry 12 01189 g006
Table 1. A classical inconsistent sort ascending decision table (SADT).
Table 1. A classical inconsistent sort ascending decision table (SADT).
a1a2a3a4a5d
x1000110
x2000111
x3000111
x4001010
x5001011
x6001110
x7001111
x8101110
x9101111
x10101112
x11111111
Table 2. Positive region (PR)-SADT corresponding to Table 1
Table 2. Positive region (PR)-SADT corresponding to Table 1
a1a2a3a4a5d
x1000113
x2001013
x3001113
x4101113
x5111111
Table 3. The discernibility relationships corresponding to a positive region reduct.
Table 3. The discernibility relationships corresponding to a positive region reduct.
Type of Granule PairDecision Value SetDiscern
1Two exact granules Id([x]C) = Id([y]C)No
2Two exact granules Id ([x]C) ≠ Id ([y]C)Yes
3Two rough granulesanyNo
4Exact granule and rough granuleanyYes
Table 4. The discernibility relationships corresponding to positive region reduction in PR-SADT.
Table 4. The discernibility relationships corresponding to positive region reduction in PR-SADT.
Type of Object PairDecision Value SetDiscern
1Two exact objects Id (x) = Id (y)No
2Two exact objects Id (x)≠Id (y)Yes
Table 5. Time complexity description.
Table 5. Time complexity description.
AlgorithmTime Complexity
FPRA in this paper O ( | U | | C | + i = 1 | C | | U i | ( i + 2 | C i | )
FSPA in [1] O ( | U | | C | + i = 1 | C | | U i | ( | C | i + 1 ) )
Algorithm in [38]O(|C|2|U|log|U|)
IFSPA in [39] O ( | C | 3 | U | + i = 1 | C | ( ( | C | i + 1 ) 2 | U i | + ( | C | i + 1 ) 3 | U i | ) )
Algorithm in [2]O(|U|2|C|2)
Table 6. Description of the data sets.
Table 6. Description of the data sets.
Data SetSize |U|Attributes |C|Classes |Vd|
1Dermatology358346
2Backup_large.test3763519
3Breast-cancer-wisconsin68392
4Tic-tac-toe95892
5Kr_vs_kp3196362
6mushroom5644222
7Ticdata20005822852
8nursery1296085
9Letter-recognition200001626
10Shuttle_all5800097
11sensorless585094811
12Connect-467557423
13Ipums.la.97701876010
14Ipums.la.99884436010
15covertype581012547
Table 7. The time consumption of subprocess of FPRA.
Table 7. The time consumption of subprocess of FPRA.
Data Set|U||R|/|C|Time of S1 T1(s)Time of S2 T2(s)Time of S3 T3 (s)Total Time T(s)
1mushroom56447/220.0470.0310.0790.157
2Ticdata2000582224/850.0780.2510.6240.953
3nursery129608/80.06200.2660.328
4Letter-recognition2000012/160.1090.0620.3750.546
5Shuttle_all580004/90.2350.0780.5160.829
6sensorless5850939/480.8280.5926.9078.327
7Connect_46755734/420.7191.14314.96716.829
8Ipums.la.97701878/600.7501.6571.3283.735
9Ipums.la.998844313/600.9062.3112.4245.641
10covertype5810126/544.14039.256.07849.468
Table 8. Comparison results with PR and FSPA-PR.
Table 8. Comparison results with PR and FSPA-PR.
Data Sets|U||C|PRFSPA-PRFPRA
Time (s)|R|Time (s)|R|Time (s)|R|
Dermatology358340.8438100.4375100.01611
Backup_large.test376350.6563100.4219100.0169
Breast-cancer-wisconsin68390.125040.093840.0165
Tic-tac-toe95890.359480.312580.0318
Kr_vs_kp31963628.03132921.5781290.40729
Mushroom56442224.875320.453130.1577
Ticdata2000582285886.453124296.375240.95324
Letter-recognition2000016282.640611112.625110.5468
Shuttle_all580009906.06254712.2540.8294
Table 9. Comparison results with the fast algorithms in [38].
Table 9. Comparison results with the fast algorithms in [38].
Data Sets|U||C|Running Time(s)
ADMOADMFPRA
Voting records435161.3750.1710.015
Breast Cancer Wisconsin68392.4370.0930.016
Tic-tac-toe958940.1360.031
Kr-vs-kp31963679.7196.1690.407
nursery1296081032.2510.3120.328
Table 10. Comparison results with Q-ARA in [40].
Table 10. Comparison results with Q-ARA in [40].
Data SetsObjects
|U|
Attributes
|C|
Classes
|Vd|
Running Time(s)
Q-ARAFPRA
waveform500021316.4660.156
Wine recognition1781330.1820.016
Statlog heart2701320.2750.015
Statlog project satellite image643536682.8120.281
Image segmentation23101972.1800.047
Pima indians diabets768820.1030.015
wdbc5693022.2260.032
wpbc1983421.3280
Sonar, mines vs. rocks2086020.3120.031
Glass identification214970.1180
ionosphere3513422.2110.015

Share and Cite

MDPI and ACS Style

Yin, L.; Jiang, Z. A Fast Attribute Reduction Algorithm Based on a Positive Region Sort Ascending Decision Table. Symmetry 2020, 12, 1189. https://doi.org/10.3390/sym12071189

AMA Style

Yin L, Jiang Z. A Fast Attribute Reduction Algorithm Based on a Positive Region Sort Ascending Decision Table. Symmetry. 2020; 12(7):1189. https://doi.org/10.3390/sym12071189

Chicago/Turabian Style

Yin, Linzi, and Zhaohui Jiang. 2020. "A Fast Attribute Reduction Algorithm Based on a Positive Region Sort Ascending Decision Table" Symmetry 12, no. 7: 1189. https://doi.org/10.3390/sym12071189

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop