Next Article in Journal
Generalized Proportional Caputo Fractional Differential Equations with Delay and Practical Stability by the Razumikhin Method
Next Article in Special Issue
A New Bilinear Supervised Neighborhood Discrete Discriminant Hashing
Previous Article in Journal
A Positivity-Preserving Improved Nonstandard Finite Difference Method to Solve the Black-Scholes Equation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Intuitionistic Fuzzy-Based Three-Way Label Enhancement for Multi-Label Classification

1
Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
2
China UnionPay Co., Ltd., Shanghai 201201, China
3
Postdoctoral Research Station of Computer Science and Technology, Fudan University, Shanghai 200433, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2022, 10(11), 1847; https://doi.org/10.3390/math10111847
Submission received: 11 April 2022 / Revised: 12 May 2022 / Accepted: 24 May 2022 / Published: 27 May 2022
(This article belongs to the Special Issue Soft Computing and Uncertainty Learning with Applications)

Abstract

:
Multi-label classification deals with the determination of instance-label associations for unseen instances. Although many margin-based approaches are delicately developed, the uncertainty classifications for those with smaller separation margins remain unsolved. The intuitionistic fuzzy set is an effective tool to characterize the concept of uncertainty, yet it has not been examined for multi-label cases. This paper proposed a novel model called intuitionistic fuzzy three-way label enhancement (IFTWLE) for multi-label classification. The IFTWLE combines label enhancement with an intuitionistic fuzzy set under the framework of three-way decisions. For unseen instances, we generated the pseudo-label for label uncertainty evaluation from a logical label-based model. An intuitionistic fuzzy set-based instance selection principle seamlessly bridges logical label learning and numerical label learning. The principle is hierarchically developed. At the label level, membership and non-membership functions are pair-wisely defined to measure the local uncertainty and generate candidate uncertain instances. After upgrading to the instance level, we select instances from the candidates for label enhancement, whereas they remained unchanged for the remaining. To the best of our knowledge, this is the first attempt to combine logical label learning with numerical label learning into a unified framework for minimizing classification uncertainty. Extensive experiments demonstrate that, with the selectively reconstructed label importance, IFTWLE achieves statistically superior over the state-of-the-art multi-label classification algorithms in terms of classification accuracy. The computational complexity of this algorithm is O n 2 m k , where n, m, and k denote the unseen instances count, label count, and average label-specific feature size, respectively.

1. Introduction

In multi-label settings [1,2,3], an instance is associated with multiple labels simultaneously. For example, a picture may be relevant to tags, such as sky, ocean, and seagull; a book may cover topics, such as sports and art. The goal of multi-label classification is to learn mapping from a feature space to a label space, such that the label associations of unseen instances can be determined. It widely applies in domains involving smart grid management [4], disease diagnosis [5], and image classification [6].
Traditionally, multi-label classification schema builds upon logical labels and provides qualitative associations between instances and labels. For robustness, many researchers focus on the extensions of linear or hyperplane-based models, and their work can be roughly categorized as problem transformation and algorithm adaptation, respectively. The previous transforms multi-label classification into a collection of simplified learning scenarios, such as single-label classification. Representative work involves binary relevance [7], random k-labelsets [8], and learning label-specific feature (LLSF) [9]. Comparatively, the latter extends the existing algorithms to simultaneously generate multi-outputs. Representative work includes multi-label forest (ML-Forest) [10] and multi-label twin support vector machine (MLTSVM) [11]. However, both strategies suffered from the calibrated threshold problem. Intuitively, the uncertainty of the label association is larger if the output is close to the threshold, and is smaller otherwise. Hence, we need labels with stronger supervision to boost the model performance.
Regarding a numerical label–this quantifies how much information a label has in describing an instance. For example, a facial expression can be interpreted as the combination of slight sadness, some anger, and a bit of disgust. The fitting of the numerical label is defined as label distribution learning [12]. Although numerical labels offer more discriminative information than logical labels, it is expensive to annotate all numerical labels manually. One feasible solution is to learn numerical labels from logical labels, also known as label enhancement [13,14,15,16]. However, the research studies have indistinguishably employed label enhancements for all instances, regardless of the uncertainty differences.
Three-way decisions [17], also known as trisecting–acting–outcome (TAO) model [18,19,20], is an emerging decision theory for problem solving with uncertainty [21,22,23]. It originates from the semantic explanations for three regions induced by rough sets and has become an emerging theory in the soft computing community [24,25,26,27]. The sequential three steps are trisecting, acting and outcome characterizing the routines on uncertainty. Typically, the trisecting step divides the information into three non-overlapping regions, with two regions as certain and the others as uncertain; the acting step takes the positive/negative strategy w.r.t. of certain objects whereas takes the deferment strategy w.r.t. of uncertain objects; the outcome step evaluates the performance induced by trisecting and acting. Such routines may continue until the uncertainty is negligible and the classification performance is improved. Existing three-way-based multi-label classifications [28,29,30,31,32] deal with label uncertainty either from ensemble features or ensemble algorithms, whereas the ensemble on logical and numerical labels remains untouched.

1.1. Motivation

The semantic uncertainty analysis on multi-label classification has limitations. The fuzzy set shows the effectiveness in measuring the membership degree towards multi-label [33,34,35], but it fails to consider the non-membership degree. As a generalization of the fuzzy set, an intuitionistic fuzzy number (IFN) is effective at quantifying the vagueness of a qualitative instance-label assignment [36], which offers heuristic information to select instances for label enhancement. This paper presents an intuitionistic fuzzy-based three-way label enhancement (IFTWLE) model. It implements three-way decisions by ”trisecting” unseen instances from label-specific learning and by ”acting” label enhancement of uncertain instances. Inspired by empirical studies on single-label [37,38,39], we employed an intuitionistic fuzzy number to search instances with uncertainty classifications. Concretely speaking, IFTWLE applies IFN at the label level and defines a pair of membership and non-membership functions for every unseen instance based on the generated pseudo-labels. The membership functions evaluate the weighted closeness of instances belonging to the specified class, whereas the non-membership functions evaluate the possibility of instances belonging to the complementary class. The ultimate uncertainty instances are determined via an aggregation function. Consequently, we preserve the predicted labels if the classifications are plausible and exploit the numerical label otherwise.

1.2. Contribution

Compared with the existing multi-label classifications models, our contributions are as follows:
(1)
For the first time, cascade learning of logical labels and numerical labels of multi-labels are presented and unified under the semantics of classification uncertainty. Determination of the label association for unseen instances broadens the application of the three-way decisions theory.
(2)
A novel instance selection principle in two stages is presented to integrate logical label learning with numerical label learning. In this way, instances that exhibit significant uncertainty across most labels are enhanced with better discrimination (regarding label importance).
(3)
This is the first attempt to employ an intuitionistic fuzzy set on multi-label classification. It addresses the issue of identifying potentially uncertain instances on each label without stipulating many hyperparameters.
(4)
The proposed IFTWLE has demonstrated effectiveness across many domains. The computational complexity is proportional to the quadratic instances count and linear to the label scale and label-specific features count.
The remainder of the paper is organized as follows. Section 2 reviews some preliminaries on label-specific learning, label enhancement, and the intuitionistic fuzzy set, Section 3 presents our proposed model for multi-label classification; experimental results are reported in Section 4. Section 5 concludes this work and identifies our future directions.

2. Preliminaries

In this section, we briefly review some preliminaries regarding label-specific feature learning and label enhancement.

2.1. Label-Specific Feature Learning

Label-specific feature [9,40,41,42] assumes that each label has unique characteristics and can be described by different feature subsets. For the simplicity of computation, the learning label-specific feature (LLSF) [9] considers second-order label correlation (a.k.a. pairwise, one label is at most dependent on another) and assumes three hypotheses:
(1)
Discrimination: the set of i-th label-specific feature should be most pertinent to the corresponding label ( l i ), and the included components should be different from other label-specific features.
(2)
Sparsity: the label-specific features should be sparse as compared to the feature space.
(3)
Shareability: the cardinality of common features of two label-specific features with strong label correlations should be larger than those with weak label correlations.
Based on the previous hypotheses, the objective function is formulated as:
min W 1 2 XW Y F 2 + δ 2 tr RW W + η W 1
where X and Y represent the features and labels of multi-label, W = w 1 , w 2 , , w i , , w m , and the basic element w i represents the weight for the label-specific feature of l i , which is composed of the non-zero terms in w i . · F 2 denotes the Frobenius norm. R = r i j represents a matrix composed of the second-order label relevance. r i j = 1 c i j , where c i j measures the correlation between label l i and label l j . The label correlation is computed with cosine similarity. tr · denotes the matrix trace. Symbols δ and η are the balance factors. The inner product of w i and w j describes the correlation between label l i and label l j from the feature view. The stronger the correlation is, the larger the inner product becomes, and vice versa.
For the prediction of unseen instances, LLSF employs logistic regression and can be denoted as:
Y ^ = s g n X W τ
where s g n ( · ) returns 1 if the condition holds, and returns 0 otherwise.

2.2. Label Enhancement

Label enhancement assumes that each instance is intrinsically represented by real-valued labels and, thus, can be recovered from qualitative logical label ( Y ) to quantitative numerical label ( U ) via the instance-level or label-level smoothness [15,16,43,44]. The distribution of numerical labels describes the relative importance of different labels in describing a given instance.
To guarantee the effectiveness, three hypotheses are presented in label enhancement multi-label learning (LEMLL) [16].
(1)
Linear relevance: the mapping from feature space to numerical label g : X U follows a linear model.
(2)
Label similarity: the values of learnt numerical labels should be approximated to the original logical labels.
(3)
Topology consistency: the instances with similar features share similar numerical label values.
Based on the previous assumptions, the objective function is given as:
min Θ , b , U i = 1 n L R R i + α Θ F 2 + β U Y F 2 + γ tr U MU s . t . R i = ξ i 2 = ξ i ξ i ξ i = u i Θ φ x i b L R R i = 0 R i < ε R i 2 2 R i ε + ε 2 R i ε
where i = 1 n L R R i is the loss function from the feature space to the numerical labels, with the regularizer as R i = ξ i 2 = ξ i ξ i , where ξ i = u i Θ φ x i b , and  φ x i is a mapping from instance x i to a high-dimensional space R H ; Θ and b are the parameters in the linear regression model. U Y F 2 is the implementation of the label similarity assumption measured by Frobenius norm (abbreviated as F). α , β , γ are all balance factors. tr U MU = U WU F 2 is the implementation of topology consistency, where tr · represents the matrix trace, and  M = I W I W , with  I being an identity matrix and W being the weight matrix constructed by the fully connected graph G = V , E , W , which describes the closeness among the arbitrary instances.
For predictions on unseen instances, LEMLL leverages a kernel logistic regression, which is denoted as:
Y ^ = s g n Θ φ X + b τ

2.3. Intuitionistic Fuzzy Set

Suppose an arbitrary instance x i X has the corresponding label y i , then for a nonempty set X , the intuitionistic fuzzy set is defined as follows:
A ˜ = { ( x i , μ A ˜ ( x i ) , ν A ˜ ( x i ) ) | x i X }
where μ A ˜ ( x i ) is the membership of instance x i expressing the chances of instance x i being a particular class A, ν A ˜ ( x i ) is the non-membership of instance x i X representing the possibility of instance x i not related to class A. μ and ν are functions mapping from X to [0,1]; the following two conditions are satisfied:
1.
μ A ˜ ( x i ) [ 0 , 1 ] , ν A ˜ ( x i ) [ 0 , 1 ] ;
2.
0 μ A ˜ ( x i ) + ν A ˜ ( x i ) 1 .
The hesitation of instance x i is defined as:
π A ˜ ( x i ) = 1 μ A ˜ ( x i ) ν A ˜ ( x i )
which measures the hesitation of instance x i related to class A. μ A ˜ and ν A ˜ construct the intuitionistic fuzzy number IFN: α = ( μ A ˜ , ν A ˜ ) , S ( α ) is used to compare two intuitionistic fuzzy numbers:
S ( α ) = μ A ˜ ν A ˜
which could compare instances ( x i , y i , μ A ˜ ( x i ) , ν A ˜ ( x i ) ) with ( x j , y j , μ A ˜ ( x j ) , ν A ˜ ( x j ) ) , and through the value of S ( α i ) and S ( α j ) , we could determine whether x i or x j is more likely to be associated with class A.

3. Proposed Model

3.1. Notations

For ease of reference, we present a nomenclature including the major notations and mathematical meanings in Table 1.

3.2. Problem Formulation

Let X 1 = x 1 , x 2 , , x n denotes a set of multi-label instances with the union of known logical label information Y 1 = y 1 , y 2 , , y n X 2 denotes the unseen instances. The logical label of x i to label set { l 1 , , l c , , l m } is denoted as y i = { y i 1 , , y i c , , y i m } where y i c = 1 holds (positive class) if x i is associated with label l c , and  y i c = 0 (negative class) otherwise. The numerical label of instance x i is denoted by u i = u i 1 , , u i c , , u i m 0 , 1 m , which satisfies c u i c = 1 . For an arbitrary label l c , we denote the degree of the membership function related to l c and non-membership functions unrelated to l c as I F N c x i = ( μ c ( x i ) , ν c ( x i ) ) , respectively, where μ c ( x i ) 0 , 1 , ν c ( x i ) 0 , 1 , and  0 μ c ( x i ) + ν c ( x i ) 1 . The uncertain instances set on the c-th label is denoted as X 2 ( μ c , ν c ) . Uncertain instances set on all labels are denoted as X 2 ( μ , ν ) , and certain instances set on all labels are complementary of X 2 ( μ , ν ) , denoted as ¬ X 2 ( μ , ν ) (i.e., X 2 = X 2 ( μ , ν ) ¬ X 2 ( μ , ν ) ). Our goal was to identify uncertain classifications from logical label-based learning and improve the performance with numerical label-based learning.

3.3. Basic Idea

IFTWLE is an implementation of the TAO model for unseen instances X 2 (see Figure 1). Firstly, the trisecting was realized by a logical label-based model (denoted as f 1 · ) with an intuitionistic fuzzy number (denoted as μ , ν ). An instance selection principle was developed to identify the uncertain instances. Secondly, the acting was realized on all instances with different strategies, with label enhancements on uncertain instances (denoted as f 2 · ), adopting the classifications otherwise. Finally, we conducted outcome evaluations on the deduced classifications.
For a non-trivial solution, the three-way classification result of the predicted multi-label set (i.e., Y ^ 2 * ) on X 2 is defined as:
Y ^ 2 * = f 1 ( ¬ X 2 ( μ , ν ) ) , x ¬ X 2 ( μ , ν ) f 2 ( X 2 ( μ , ν ) ) , x X 2 ( μ , ν )
where f 2 ( X 2 ( μ , ν ) ) refers to the predicted multi-label sets of deferred instances with large label uncertainty degree and f 1 ( ¬ X 2 ( μ , ν ) ) refers to the predicted multi-label sets of certain instances with little label uncertainty degrees.
Figure 2 describes the pipeline on the instance selection principle. By taking a problem transformation view, we assigned the membership function ( μ c ( x i ) ) and non-membership function ( ν c ( x i ) ) for all unseen instances on an arbitrary label l c , which was then incorporated to search for the candidate uncertain instances denoted as X 2 ( μ c , ν c ) . Final uncertain instances ( X 2 ( μ , ν ) ) were aggregated by considering the distributions of candidate uncertain instances across all labels. In what follows, we will elaborate the details from Section 3.4, Section 3.5 and Section 3.6.

3.4. Intuitionistic Fuzzy Membership Assignment

Although fuzzy membership assignment is capable of measuring the concept vagueness, it has the following drawbacks for multi-label classification:
  • Regardless of the concrete membership function definition, it can only describe the closeness of an instance belonging to a concept. The distribution of the heterogeneous class is thus neglected, which is of great importance in finding uncertain instances.
  • Multi-label data have some specialized characteristics, such as an imbalanced class. With the membership function only, the model cannot utilize such information effectively, which leads to the degeneration of model generality.
In [45], the intuitionistic fuzzy set figures out the support vectors from instances and improves the generalization of the support vector machine. In our case, instances have obtained the pseudo-label by conducting label-specific learning (i.e., f 1 ( · ) ). Therefore we assume a desirable hyperplane is available. This means the possibility of instances with noisy labels is less likely to occur, and misclassified instances are both far from the class center and circled by heterogeneous instances. Therefore, from the perspective of labels, such instances are compatible with the intuitionistic fuzzy set. The degrees of membership and non-membership functions for each unseen instance take the problem transformation view and are determined in a label-specific way. Without losing generality, we consider the construction of μ c x i and ν c x i on label l c .

3.4.1. Membership Function μ c ( x i )

Let Y ^ 2 be the pseudo-label set of the unseen instance set X 2 learnt from the LLSF model ( f 1 ); the membership of instances is determined by the relative similarity against the predicted class and other classes. In other words, the membership function of an instance is larger if both the relative distance to the predicted class is smaller and the relative distance to the other classes is larger. Using the class center as a representative, for two instances that are both pseudo-positively associated (i.e., y ^ i c = y ^ j c = 1 ), our goal can be formally written as:
μ c ( x i ) > μ c ( x j ) i f D ϕ c ( x i ) , C ^ c + r c + + ϵ D ϕ c ( x i ) , C ^ c r c + ϵ < D ϕ c ( x j ) , C ^ c + r c + + ϵ D ϕ c ( x j ) , C ^ c r c + ϵ
where D ϕ c ( x i ) , C ^ c + and D ϕ c ( x i ) , C ^ c are the distances of instance x i to the pseudo-positive class and pseudo-negative class, respectively. They are defined as:
D ϕ c ( x i ) , C ^ c + = ϕ c ( x i ) C ^ c +
D ϕ c ( x i ) , C ^ c = ϕ c ( x i ) C ^ c
where ϕ c ( x i ) represents the high-dimensional representation of instance x i given the label-specific feature w.r.t. l c , and  · is the distance between the instance and the corresponding pseudo-class center. Suppose K ( x i c , x j c ) denotes a kernel function on the label-specific feature w.r.t. l c , the inner product distance is expanded as:
ϕ c ( x i ) ϕ c ( x j ) = K ( x i c , x i c ) 2 K ( x i c , x j c ) + K ( x j c , x j c )
C ^ c + and C ^ c are the class centers of the pseudo-positive and pseudo-negative classes on label l c . The average value is measured by all the pseudo-positive instances that predicted the pseudo-label was 1 on label l c , denoted as C ^ c + ; the average value is measured by all the pseudo-negative instances that predicted pseudo-label was 0 on label l c , denoted as C ^ c .
C ^ c + = 1 l c + y ^ i c = 1 ϕ c ( x i )
C ^ c = 1 l c y ^ i c = 0 ϕ c ( x i )
where l c + = { x i | y ^ i c = 1 } and l c = { x i | y ^ i c = 0 } denote the pseudo-positive instance count and pseudo-negative instance count w.r.t. label l c , respectively.
r c + and r c are the radii of the pseudo-positive and pseudo-negative class on label l c , which can be quantified as:
r c + = max y i c = 1 ϕ c ( x i ) C ^ c +
r c = max y i c = 0 ϕ c ( x i ) C ^ c
By substituting Equations (13) and (14) into Equations (15) and (16), based on Equation (12), we have:
r c + = max y ^ i c = 1 K ( x i c , x i c ) 2 l c + y ^ j c = 1 K ( x i c , x j c ) + 1 l c + 2 y ^ m c = 1 y ^ n c = 1 K ( x m c , x n c )
r c = max y ^ i c = 0 K ( x i c , x i c ) 2 l c y ^ j c = 0 K ( x i c , x j c ) + 1 l c 2 y ^ m c = 0 y ^ n c = 0 K ( x m c , x n c )
one can infer that calculation of D ϕ c ( x i ) , C ^ c is costly if x i is pseudo-positive on l c . For simplicity, we use D ϕ c ( x i ) , C ^ c instead. In other words, we examine the dissimilarity of an instance to the class center of all instances.
For multi-label cases, the positive class is the minority class, whereas the negative class is the majority class. The imbalanced class distribution results in the different contributions to the membership function. The rationality is that the location of the class center of the negative class has a much lower empirical risk than the positive class, and the empirical risk for the center of the positive class becomes higher as the count for instances with the positive class becomes smaller. Therefore, we introduce symbol p c + and p c to represent the pseudo-positive instance weight and pseudo-negative instance weight w.r.t. label l c . Any l c , is defined as:
p c + = 2 × l c + l c + + C a r d ( X 2 )
p c = 2 × l c l c + C a r d ( X 2 )
where C a r d ( X 2 ) = X 2 refers to the instance number in the instance set X 2 . For each unseen instance x i , the degree of membership μ c ( x i ) is defined as:
μ c ( x i ) = 1 p c + × D ϕ c ( x i ) , C ^ c + r c + + ϵ + ( 1 p c + ) × D ϕ c ( x i ) , C ^ c r c + ϵ y ^ i c = 1 1 p c × D ϕ c ( x i ) , C ^ c r c + ϵ + ( 1 p c ) × D ϕ c ( x i ) , C ^ c r c + ϵ y ^ i c = 0
where ϵ 0 + is an adjustable parameter, r c + , r c , r c and C ^ c + , C ^ c , C ^ c are the radius and class centers of the pseudo-positive class, pseudo-negative class, and all unseen instances on label l c determined by f 1 c · . Figure 3 shows an example describing the effect of Equation (21), where both μ c ( x A ) > μ c ( x C ) and μ c ( x D ) > μ c ( x B ) hold.

3.4.2. Non-Membership Function ν c ( x i )

The non-membership of instances is determined by the following two factors. Firstly, it should be impacted by the dissimilarity and similarity of the instance to the predicted class and other classes. It means the non-membership is larger if an instance is located in the region where the probability of being other classes is larger. Secondly, it should be impacted by the heterogeneous instance distribution within the neighborhood. One can infer that the non-membership is larger if an instance is surrounded by instances from other classes. We assume membership is inversely proportional to non-membership, denoted as:
ν c ( x i ) 1 μ c ( x i )
The imbalanced class distribution [46] implies that the contributions of heterogeneous instances are label-dependent. In other words, for two instances, x i , and  x j , with the same number of heterogeneous instances in their corresponding neighbours, the non-membership of x i is larger than x j if x i belongs to the pseudo-negative class whereas x j belongs to the pseudo-positive class. To implement our assumption, we introduced two symbols, n c + and n c , to represent the prior probability of being pseudo-positive and pseudo-negative, respectively.
n c + = l c + C a r d ( X 2 )
n c = l c C a r d ( X 2 )
The two prior probabilities are incorporated to quantify the contribution of the heterogeneous instance within the neighborhood. From the perspective of the neighborhood, the non-membership degree is larger if more heterogeneous instances are included, and smaller if more homogeneous instances are included. Therefore, the weighted neighborhood difference of x i on label l c (i.e., ( ρ c ( x i ) ) ) is defined as:
ρ c ( x i ) = n c + N c ( x i ) n c + N c ( x i ) + n c N c ( x i ) + y ^ i c = 1 n c N c ( x i ) + n c + N c ( x i ) + n c N c ( x i ) + y ^ i c = 0
where N c ( x i ) + = x j x j N c ( x i ) y ^ j c = 1 denotes the pseudo-positive instance count in the r-neighborhood of x i on label l c ( r > 0 ). r > 0 is an adjustable parameter and N c ( x i ) = x j x j N c ( x i ) y ^ j c = 0 denotes the pseudo-negative instance count in the γ -neighborhood of x i on label l c . We assume the non-membership degree is proportional to the weighted neighborhood difference, denoted as:
ν ( x i ) ρ ( x i )
Based on assumptions (22) and (26), we define the non-membership degree as:
ν c ( x i ) = ( 1 μ c ( x i ) ) ρ c ( x i )
It is easy to validate 0 μ c ( x i ) + ν c ( x i ) 1 holds. Here, we offer some explanations. The  μ c ( x i ) receives the largest value when it is the center of the plausible pseudo-class, and it reaches 0 only when it is the farthest instance to the class center of all instances simultaneously. The  ν c ( x i ) itself is smaller than 1, as the two components are all less than 1. As  ρ c ( x i ) is smaller than 1, the  ν c ( x i ) is no larger than 1 μ c ( x i ) , which means 0 μ c ( x i ) + ν c ( x i ) 1 holds.
By referring to Equation (6), we define the hesitation degree as:
π c ( x i ) = 1 μ c ( x i ) ν c ( x i )

3.5. Label-Level Instance Selection

Having the membership/non-membership degrees, the unseen instances ( X 2 ) with the intuitionistic fuzzy membership ( IFX 2 ) are denoted as:
IFX 2 = { ( x 1 , y ^ 1 , μ 1 , ν 1 ) , ( x 2 , y ^ 2 , μ 2 , ν 2 ) , , ( x k , y ^ k , μ k , ν k ) }
where μ i = μ 1 ( x i ) , μ 2 ( x i ) , , μ m ( x i ) and ν i = ν 1 ( x i ) , ν 2 ( x i ) , , ν m ( x i ) denote the degrees of membership and non-membership functions w.r.t. x i across all labels, respectively.
In terms of each label, we can select the instances with uncertain classifications (denoted as X 2 ( μ c , ν c ) ) as:
X 2 ( μ c , ν c ) = { x i μ c ( x i ) > ν c ( x i ) ν c ( x i ) > 0 }
X 2 ( μ c , ν c ) can be interpreted as candidate uncertain instances. For example, assume four unseen samples with pseudo-labels, i.e.,  x A , x B , x C and x D in Figure 4. It is more likely that x B and x C belong to X 2 ( μ c , ν c ) than x A and x D , and they are more close to the hyperplane. As instances with small separation margins tend to be more uncertain, label enhancement on x C and x D will be more informative than on x A and x B , given the pseudo-label distribution on label l c . One should be aware that x C is less likely to be considered if we only apply the traditional fuzzy set model, as the affiliation degree to the positive class is much larger than that of the negative class. However, it will be conducive to enhancing generality if x C is selected, as there are two instances with negative classes in its neighbours. Hence, we can select instance x C via the intuitionistic fuzzy set.

3.6. Instance-Level Instance Selection

Now we determine which instances should be selected for the label enhancement. Since label enhancement works at the instance level, a straightforward notion is to select the instances that are uncertain across most labels. In this paper, we introduce the symbol U C i to represent the number of accumulated times that satisfy x i X 2 ( μ c , ν c ) for any l c . It is defined as:
U C i = c x i X 2 ( μ c , ν c )
where the notation · equals 1 if the condition meets and 0 otherwise. Based on the U C , the instances to be enhanced ( X 2 ( μ , ν ) ) can be computed as:
X 2 ( μ , ν ) = x i U C i U C j ¯ x j X 2 U C j > 0
where the notation U C j ¯ refers to the average non-zero U C j . This means that an instance will be enhanced only when it is recognized as uncertain in most labels.

3.7. Complexity Analysis

We summarize the procedures of IFTWLE in Algorithm 1. The most time-consuming step is the calculation of pair-wise distances within all unseen instances (i.e., r c ). Let k be the average length of label-specific features, n be the unseen instance count, and m be the label count, then the calculation for r c requires O ( n 2 m k ) . Therefore, the computational complexity of IFTWLE is much lower than that of training f 1 · [9] and f 2 · [16]. However, as the complexity is proportional to the quadratic of the instance count, it is inappropriate to employ this algorithm for large-scale multi-labels. The local estimation of the inner product should be considered to accelerate the instance selection.
Algorithm 1: IFTWLE
Mathematics 10 01847 i001

4. Experiments

4.1. Dataset Characteristics

To demonstrate the effectiveness and efficiency of the proposed model, we compared the classification performances on eight multi-label benchmarks from Mulan (http://mulan.sourceforge.net/datasets.html) (accessed on 1 March 2022) and Meka (http://waikato.github.io/meka/datasets/) (accessed on 1 March 2022) on domains including audio, music, text, biology, and images. The selected benchmarks were either small or moderate and were intensively referenced due to their limited baseline performances. In Table 2, for each dataset, “# Instances” means the number of instances, “# Features” means the number of features, “# Labels” means the total number of class labels, and “# Cardinality” means the average number of labels per instance of a dataset, where notation “#” means the number of.

4.2. Evaluation Metrics

We adopted six evaluation metrics (Hamming Loss, Ranking Loss, One Error, Coverage, Average Precision, and Micro F1) [47] to evaluate the classification performance. Except for the last two, which achieve better performance when the values are larger, the remaining metrics obtain better performances when values are smaller. Let Y i and ¬ Y i denote the relevant and irrelevant label sets in ground-truth, and n be the unseen instances count, then the formulas of metrics are enumerated as:
(1)
Hamming Loss (abbreviated as Hl) evaluates the average difference between predictions and ground-truth (see Formula (32)). The smaller the value of the Hamming Loss, the better the performance of the algorithm.
H l = 1 n i = 1 n 1 l C a r d ( f x i Δ Y i )
where Δ is the set symmetric difference and C a r d ( · ) is the set cardinality.
(2)
Ranking Loss (abbreviated as Rkl), evaluates the fraction that the irrelevant label ranks before the relevant label in the label predictions (see Formula (33)). The smaller the value of the Ranking Loss, the better the performance of the algorithm.
R k l = 1 n i = 1 n 1 C a r d ( Y i ) C a r d ( ¬ Y i ) × C a r d ( l a , l b r a n k i l a > r a n k i l b l a , l b Y i × ¬ Y i )
where r a n k i l j denotes the ranking position in ascending order for the j-th label on the i-th instance. C a r d ( · ) is the set cardinality.
(3)
One Error (abbreviated as Oe) evaluates the average fraction that the label ranking—first in prediction—is actually the irrelevant label (see Formula (34)). The smaller the value of One Error, the better the performance of the algorithm.
O e = 1 n i = 1 n arg min r a n k i l j l j Y i
where · equals 1 if the condition holds and equals 0 otherwise. The operator r a n k i l j denotes the ranking position in ascending order for the j-th label on the i-th instance.
(4)
Coverage (abbreviated as Cvg) evaluates the average fraction for inclusion of all ground-truth labels in the ranking of label predictions (see Formula (35)). The smaller the value of Coverage, the better the performance of the algorithm.
C v g = 1 n i = 1 n max l j Y i r a n k i l j 1
where r a n k i l j denotes the ranking position in ascending order for the j-th label on the i-th instance.
(5)
Average Precision (abbreviated as Ap) evaluates the average precision of the actual relevant label rankings before relevant labels are examined by the label predictions (see Formula (36)). The larger the value of Average Precision, the better the performance of the algorithm.
A p = 1 n i = 1 n 1 C a r d ( Y i ) l j Y i C a r d ( { l s Y i r a n k i l s r a n k i l j } ) r a n k i l j
where C a r d ( · ) is the set cardinality.

4.3. Experimental Settings

We examined whether IFTWLE gained superiority over state-of-the-art algorithms learnt on logical labels only. For this reason, we compared IFTWLE with LLSF, multi-label k-nearest neighbour (MLkNN), multi-label learning with label-specific features (LIFT), MLTSVM, multi-label learning with global and local label correlation (Glocal), and active k-label set ensembles (ACkEL). Detailed settings are as follows.
  • LLSF (code available at https://jiunhwang.github.io/ (accessed on 1 March 2022)) [9]: This method learns label-specific feature representations for all labels based on logical labels. It shares an identical structure with f 1 · . Comparing this method, we could examine the contributions of label enhancement. Parameters δ , η are tuned in { 2 10 , 2 9 , , 2 9 , 2 10 } . The calibrated threshold τ 1 is fixed as 0.5.
  • MLkNN (code available at http://www.lamda.nju.edu.cn/code_MLkNN.ashx (accessed on 1 March 2022)) [48]: It learns a conditional probability distribution on all features within the adapted k-neighborhood. The introduction of the neighborhood has some similarities with the components in the non-membership function ν ( x i ) . However, we took one further step and enhanced the results of the uncertain instances. The value k took the empirical value 10.
  • LIFT (code available at http://cse.seu.edu.cn/PersonalPage/zhangml/index.htm (accessed on 1 March 2022)) [40]: It learned different feature representations to determine the label association. It was the first trial of label-specific learning for multi-label classification. By comparing with this method, we could verify whether label enhancement improves label-specific learning. The ratio parameter is searched in { 0.1 , 0.2 , , 0.5 } .
  • MLTSVM (code available at http://www.optimal-group.org/Resource/MLTSVM.html (accessed on 1 March 2022)) [11]: It learns distance difference based on multiple nonparallel hyperplanes. The twin support vector machine is a variant of the support vector machine; we trained the enhanced model by a linear support vector machine. The penalty and kernel parameter were searched in { 2 6 , 2 5 , , 2 5 , 2 6 } and { 2 4 , 2 3 , , 2 3 , 2 4 } , respectively.
  • Glocal (code available at http://www.lamda.nju.edu.cn/code_Glocal.ashx (accessed on 1 March 2022)) [49]: It learns a mapping from the feature space to latent labels via a low-rank decomposition. It is the initial attempt to simultaneously leverage both global and local label correlations. The similarity is that we also consider global label correlation in a pairwise fashion. By comparing with this work, we can examine whether label importance is superior to local label correlation. The penalty λ takes the empirical value 1.
  • ACkEL (code available at https://github.com/xuwangfmc/AkEL (accessed on 1 March 2022)) [50]: It takes an ensemble strategy on the k-label set optimized by class separability and class uncertainty simultaneously. It is a revised version of the classic k-label set algorithm, which is assumed to gain a robust performance. By comparing with this work, we can examine whether the strategy, combining the pairwise label correlation and label importance, could gain superiority over a high-order label correlation. A one-versus-all multi-class strategy was conducted on each label set; parameters σ , β were searched in { 2 3 , 2 2 , , 2 10 , 2 11 } and { 0.1 , 0.3 , 0.5 , 0.7 , 0.9 } , respectively. The size for each label set was fixed as 3.
  • Proposed method. There are three groups of parameters. The parameters in constructing f 1 ( · ) and f 2 ( · ) take the recommended settings, as declared in [9] and [16]. For an intuitionistic fuzzy membership assignment, all components were automatically determined by data characteristics except for neighborhood radius r. It was searched in { 0.01 , 0.1 } via five-fold cross-validation.
We considered five evaluations [47] including Hamming Loss, One Error, Coverage, Ranking Loss, and Average Precision. Except for Average Precision, which obtains better performance if the metric becomes larger, the others achieve better performances if the metrics become smaller. The experiments were implemented using Matlab R2017b on a desktop PC with an Intel(R) Core i7 processor (2.60 GHz) and 8 GB of RAM. All parameters were selected via five-fold cross-validation.

4.4. Results

We evaluated the classification performances, considering algorithms on five evaluation metrics; we report them in Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9. The down arrow ↓ means the smaller the metrics is, the better the algorithm performance becomes. In contrast, the up arrow ↑ means the larger the metrics is, the better the algorithm performance becomes.
From the metric view, IFTWLE ranks in first place in 60% of cases ( 3 5 ) and in second place in 40% of cases. From the dataset view, IFTWLE ranks in first place in 42.5% of cases ( 17 40 ) and in second place in 17.5% of cases ( 7 40 ). It receives the best performance on the Coverage metric (with first place in 87.5% cases) and worst on the One Error metric (with second place in 75% of cases).
The Friedman test [51] was employed to calculate the relative performances among multiple algorithms over selected datasets. Given k comparing algorithms and N datasets, let R a n k j = 1 N i = 1 N r i j denote the average rank for the j-th algorithm. With the null hypothesis (i.e., H 0 ) of all algorithms obtaining identical performance, the Friedman statistic F F is distributed according to the F-distribution with the k 1 degree of freedom as the numerator and ( k 1 ) ( N 1 ) degree of freedom as the denominator, denoted as:
F F = N 1 χ F 2 N k 1 χ F 2
where
χ F 2 = 12 N k k + 1 j R a n k j 2 k k + 1 2 4
Table 3 presents the Friedman statistics F F and the corresponding critical value for all evaluation metrics in this setting. The results clearly support that, at the significance level, α = 0.05 , the null hypothesis (i.e., H 0 ) of the statistically indistinguishable performance of all algorithms on the considered metrics is rejected. It implies that it is feasible to examine if IFTWLE gains statistical superiority over other comparing algorithms by conducting a post hoc test, such as the Holm [51].
Furthermore, by regarding IFTWLE as the control algorithm, we employed the Holm procedure [51] to explore whether IFTWLE gains significant a performance difference against each of the considered algorithms. Without losing generality, we nominate A 1 as IFTWLE. For the other k 1 comparing algorithms (i.e., A j 2 j k ), we stipulate A j as the one that has the j 1 -th largest average ranking over all datasets on a specific evaluation metric. Consequently, we have the test statistic for comparing A 1 (i.e., IFTWLE) with A j as:
z j = R a n k 1 R a n k j R a n k 1 R a n k j k k + 1 6 N k k + 1 6 N 2 j k
Let p j denote the p-value of z j under a normal distribution. Given the significance level α = 0.05 , the Holm procedure works in a stepwise manner by checking whether the statistics p j are smaller than α k j + 1 in ascending order of j. Specifically, the Holm procedure continues until there exists a j * -th step, and j * denotes the first j, such that p j α k j + 1 holds (If p j < α k j + 1 holds for all j, j * takes the value of k + 1 ).
It can be seen in Table 4, Table 5, Table 6, Table 7 and Table 8 that IFTWLE statistically outperforms ACkEL on the Ranking Loss, Coverage, and Average Precision metrics, and statistically outperform MLTSVM on the Ranking Loss and Coverage metrics. IFTWLE achieves the most dominance on Coverage, which is statistically superior over all algorithms except MLkNN and LLSF. By finding instances with the most uncertain labels, it is more likely to revise a large proportion of misclassified labels, gaining more improvements in the Coverage metric. However, this strategy does not discriminate concerning the relative importance of labels in different instances, which lead to limited improvements in the One Error and Average Precision metrics.

5. Conclusions

This paper proposes a novel model called IFTWLE to deal with multi-label classification. Unlike conventional multi-label learning algorithms, which learn models on either logical labels or numerical labels, we integrated the two forms of labels under the three-way decisions umbrella by exploring classification uncertainty. The intuitionistic fuzzy set provides an insightful view into quantifying the label level uncertainty and it determines the uncertain instances in a group decision-making fashion. Comparisons on benchmarks demonstrate that IFTWLE significantly improves classification performance.
In the future, we will examine more combinations of label-specific algorithms and label enhancement algorithms to see whether some guidelines exist. Meanwhile, we will develop advanced instance selection principles by resorting to the optimization theory.

Author Contributions

Conceptualization, formal analysis, writing—original draft preparation, T.Z.; methodology, software, validation, writing—review and editing, Y.Z.; resources, supervision, funding acquisition, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China grant number 61976158, 61976160, 62076182, 62163016, 62006172, 61906137, and is also partially supported by the Jiangxi ”Double Thousand Plan”, grant number 20212ACB202001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, M.L.; Zhou, Z.H. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 2014, 26, 1819–1837. [Google Scholar] [CrossRef]
  2. Gibaja, E.; Ventura, S. A tutorial on multilabel learning. ACM Comput. Surv. 2015, 47, 1–38. [Google Scholar] [CrossRef]
  3. Liu, W.W.; Shen, X.B.; Wang, H.B.; Tsang, I.W. The emerging trends of multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, in press. [Google Scholar] [CrossRef] [PubMed]
  4. Tabatabaei, S.M.; Dick, S.; Xu, W.S. Toward non-intrusive load monitoring via multi-label classification. IEEE Trans. Smart Grid. 2017, 8, 26–40. [Google Scholar] [CrossRef]
  5. Fu, H.Z.; Cheng, J.; Xu, Y.W.; Wong, D.W.K.; Liu, J.; Cao, X.C. Joint optic disc and cup segmentation based on multi-label deep network and polar transformation. IEEE Trans. Med. Imag. 2018, 37, 1597–1605. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Wei, Y.C.; Xia, W.; Lin, M.; Huang, J.S.; Ni, B.B.; Dong, J.; Zhao, Y.; Yan, S.C. HCP: A flexible cnn framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1901–1907. [Google Scholar] [CrossRef] [Green Version]
  7. Boutell, M.R.; Luo, J.; Shen, X.; Brown, C.M. Learning multi-label scene classification. Pattern Recog. 2004, 37, 1757–1771. [Google Scholar] [CrossRef] [Green Version]
  8. Tsoumakas, G.; Vlahavas, I. Random k-labelsets: An ensemble method for multilabel classification. Lect. Notes Artif. Intell. 2007, 4701, 406–417. [Google Scholar]
  9. Huang, J.; Li, G.R.; Huang, Q.M.; Wu, X.D. Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans. Knowl. Data Eng. 2016, 28, 3309–3323. [Google Scholar] [CrossRef]
  10. Wu, Q.Y.; Tan, M.K.; Song, H.J.; Chen, J.; Ng, M.K. ML-FOREST: A multi-label tree ensemble method for multi-label classification. IEEE Trans. Knowl. Data Eng. 2016, 28, 2665–2680. [Google Scholar] [CrossRef]
  11. Chen, W.J.; Shao, Y.H.; Li, C.N.; Deng, N.Y. MLTSVM: A novel twin support vector machine to multi-label learning. Pattern Recog. 2016, 52, 61–74. [Google Scholar] [CrossRef]
  12. Geng, X. Label distribution learning. IEEE Trans. Knowl. Data Eng. 2016, 28, 1734–1748. [Google Scholar] [CrossRef] [Green Version]
  13. Tao, A.; Xu, N.; Geng, X. Labeling information enhancement for multi-label learning with low-rank subspace. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, Nanjing, China, 28–31 August 2018; pp. 671–683. [Google Scholar]
  14. Zhang, M.L.; Zhang, Q.W.; Fang, J.P.; Li, Y.K.; Geng, X. Leveraging implicit relative labeling importance information for effective multi-label learning. IEEE Trans. Knowl. Data Eng. 2021, 33, 2057–2070. [Google Scholar] [CrossRef]
  15. Xu, N.; Liu, Y.P.; Geng, X. Label enhancement for label distribution learning. IEEE Trans. Knowl. Data Eng. 2021, 32, 1632–1643. [Google Scholar] [CrossRef]
  16. Shao, R.F.; Xu, N.; Geng, X. Multi-label learning with label enhancement. In Proceedings of the International Conference on Data Mining, Singapore, 17–20 November 2018; pp. 437–446. [Google Scholar]
  17. Yao, Y.Y. Three-way decision: An interpretation of rules in rough set theory. In Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology, Gold Coast, Australia, 14–16 July 2009; pp. 642–649. [Google Scholar]
  18. Yao, Y.Y. Three-way decision and granular computing. Int. J. Approx. Reason. 2018, 103, 107–123. [Google Scholar] [CrossRef]
  19. Yao, Y.Y. Tri-level thinking: Models of three-way decision. Int. J. Mach. Learn. Cybern. 2020, 11, 947–959. [Google Scholar] [CrossRef]
  20. Yao, Y.Y. Three-way granular computing, rough sets, and formal concept analysis. Int. J. Approx. Reason. 2020, 116, 106–125. [Google Scholar] [CrossRef]
  21. Lang, G.M.; Miao, D.Q.; Fujita, H. Three-way group conflict analysis based on pythagorean fuzzy set theory. IEEE Trans. Fuzzy Syst. 2020, 28, 447–461. [Google Scholar] [CrossRef]
  22. Zhan, J.M.; Jiang, H.B.; Yao, Y.Y. Three-way multiattribute decision-making based on outranking relations. IEEE Trans. Fuzzy Syst. 2021, 29, 2844–2858. [Google Scholar] [CrossRef]
  23. Zhang, X.Y.; Gou, H.Y.; Lv, Z.Y.; Miao, D.Q. Double-quantitative distance measurement and classification learning based on the tri-level granular structure of neighborhood system. Knowl.-Based Syst. 2021, 217, 106799. [Google Scholar] [CrossRef]
  24. Jiang, C.M.; Guo, D.D.; Sun, L.J. Effectiveness measure for TAO model of three-way decisions with interval set. J. Intell. Syst. 2021, 40, 11071–11084. [Google Scholar] [CrossRef]
  25. Yang, J.L.; Yao, Y.Y.; Zhang, X.Y. A model of three-way approximation of intuitionistic fuzzy sets. Int. J. Mach. Learn. Cybern. 2022, 13, 163–174. [Google Scholar] [CrossRef]
  26. Guo, D.D.; Jiang, C.M.; Wu, P. Three-way decision based on confidence level change in rough set. Int. J. Approx. Reason. 2022, 143, 57–77. [Google Scholar] [CrossRef]
  27. Huang, X.F.; Zhan, J.M.; Ding, W.P.; Pedrycz, W. An error correction prediction model based on three-way decision and ensemble learning. Int. J. Approx. Reason. 2022, 146, 21–46. [Google Scholar] [CrossRef]
  28. Zhang, Y.J.; Miao, D.Q.; Zhang, Z.F.; Xu, J.F.; Luo, S. A three-way selective ensemble model for multi-label classification. Int. J. Approx. Reason. 2018, 103, 394–413. [Google Scholar] [CrossRef]
  29. Ren, F.J.; Wang, L. Sentiment analysis of text based on three-way decisions. J. Intell. Fuzzy Syst. 2017, 33, 245–254. [Google Scholar] [CrossRef]
  30. Zhang, Y.J.; Miao, D.Q.; Pedrycz, W.; Zhao, T.N.; Xu, J.F.; Yu, Y. Granular structure-based incremental updating for multi-label classification. Knowl.-Based Syst. 2020, 189, 105066. [Google Scholar] [CrossRef]
  31. Zhang, Y.J.; Zhao, T.N.; Miao, D.Q.; Pedrycz, W. Granular multilabel batch active learning with pairwise label correlation. IEEE Trans. Syst. Man Cybern.-Syst. 2022, 52, 3079–3091. [Google Scholar] [CrossRef]
  32. Qian, W.B.; Huang, J.T.; Wang, Y.L.; Xie, Y.H. Label distribution feature selection for multi-label classification with rough set. Int. J. Approx. Reason. 2021, 128, 32–55. [Google Scholar] [CrossRef]
  33. Kongsorot, Y.; Horata, P.; Musikawan, P.; Sunat, K. Kernel extreme learning machine based on fuzzy set theory for multi-label classification. Int. J. Mach. Learn. Cybern. 2019, 10, 979–989. [Google Scholar] [CrossRef]
  34. Yuichi, O.; Naoki, M.; Yusuke, N.; Hisao, I. Multiobjective fuzzy genetics-based machine learning for multi-label classification. In Proceedings of the IEEE International Conference on Fuzzy Systems, Glasgow, UK, 19–24 July 2020. [Google Scholar] [CrossRef]
  35. Che, X.Y.; Chen, D.G.; Mi, J.S. Feature distribution-based label correlation in multi-label classification. Int. J. Mach. Learn. Cybern. 2021, 12, 1705–1719. [Google Scholar] [CrossRef]
  36. Xiao, F.Y. A distance measure for intuitionistic fuzzy sets and its application to pattern classification problems. IEEE Trans. Syst. Man Cybern.-Syst. 2021, 51, 3980–3992. [Google Scholar] [CrossRef]
  37. Tian, Y.; Sun, M.; Deng, Z.B.; Luo, J.; Li, Y.Q. A new fuzzy set and nonkernel svm approach for mislabeled binary classification with applications. IEEE Trans. Fuzzy Syst. 2017, 25, 1536–1545. [Google Scholar] [CrossRef]
  38. Tian, Y.; Deng, Z.B.; Luo, J.; Li, Y.Q. An intuitionistic fuzzy set based (SVM)-V-3 model for binary classification with mislabeled information. Fuzzy Optim. Decis. Mak. 2018, 17, 475–494. [Google Scholar] [CrossRef]
  39. Rezvani, S.; Wang, X.Z.; Pourpanah, F. Intuitionistic fuzzy twin support vector machines. IEEE Trans. Fuzzy Syst. 2019, 27, 2140–2151. [Google Scholar] [CrossRef]
  40. Zhang, M.L.; Wu, L. LIFT: Multi-label learning with label-specific features. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 107–120. [Google Scholar] [CrossRef] [Green Version]
  41. Guo, Y.M.; Chung, F.L.; Li, G.Z.; Wang, J.C.; Gee, J.C. Leveraging label-specific discriminant mapping features for multi-label learning. ACM Trans. Knowl. Discov. Data 2019, 13, 24. [Google Scholar] [CrossRef]
  42. Yu, Z.B.; Zhang, M.L. Multi-label classification with label-specific feature generation: A wrapped approach. IEEE Trans. Pattern Anal. Mach. Intell. 2021, in press. [Google Scholar] [CrossRef]
  43. Jia, X.Y.; Lu, Y.N.; Zhang, F.W. Label enhancement by maintaining positive and negative label relation. IEEE Trans. Knowl. Data Eng. 2021, in press. [Google Scholar] [CrossRef]
  44. Zheng, Q.H.; Zhu, J.H.; Tang, H.Y.; Liu, X.Y.; Li, Z.Y.; Lu, H.M. Generalized label enhancement with sample correlations. IEEE Trans. Knowl. Data Eng. 2021, in press. [Google Scholar] [CrossRef]
  45. Ha, M.H.; Wang, C.; Chen, J.Q. The support vector machine based on intuitionistic fuzzy number and kernel function. Soft Comput. 2013, 17, 635–641. [Google Scholar] [CrossRef]
  46. Zhang, M.L.; Li, Y.K.; Yang, H.; Liu, X.Y. Towards class-imbalance aware multi-label learning. IEEE Trans. Cybern. 2020, in press. [Google Scholar] [CrossRef] [PubMed]
  47. Schapire, R.; Singer, Y. A boosting-based system for text categorization. Mach. Learn. 2000, 39, 135–168. [Google Scholar] [CrossRef] [Green Version]
  48. Zhang, M.L.; Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recog. 2007, 40, 2038–2048. [Google Scholar] [CrossRef] [Green Version]
  49. Zhu, Y.; Kwok, J.T.; Zhou, Z.H. Multi-label learning with global and local label correlation. IEEE Trans. Knowl. Data Eng. 2018, 30, 1081–1094. [Google Scholar] [CrossRef] [Green Version]
  50. Wang, R.; Kwong, S.; Wang, X.; Jia, Y.H. Active K-Labelsets Ensemble Multi-Label Classification. Pattern Recog. 2021, 109, 107583. [Google Scholar] [CrossRef]
  51. Demsar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Figure 1. Pipeline of IFTWLE. It follows the trisecting–acting–outcome framework.
Figure 1. Pipeline of IFTWLE. It follows the trisecting–acting–outcome framework.
Mathematics 10 01847 g001
Figure 2. Illustration for the instance selection principle in IFTWLE: we used six instances to explain our processing. After employing f 1 · generated by label-specific learning, we employed an intuitionistic fuzzy number for each label ( ( μ a ( x i ) , ν a ( x i ) ) for l a for example). The processed red and black circles refer to the instances that are classified as positive and negative class with limited uncertainty, and the green triangles are recognized as candidate uncertain instances. The final three uncertain instances (represented by hollow triangles) are denoted as X 2 ( μ , ν ) , which conducts label enhancement afterwards.
Figure 2. Illustration for the instance selection principle in IFTWLE: we used six instances to explain our processing. After employing f 1 · generated by label-specific learning, we employed an intuitionistic fuzzy number for each label ( ( μ a ( x i ) , ν a ( x i ) ) for l a for example). The processed red and black circles refer to the instances that are classified as positive and negative class with limited uncertainty, and the green triangles are recognized as candidate uncertain instances. The final three uncertain instances (represented by hollow triangles) are denoted as X 2 ( μ , ν ) , which conducts label enhancement afterwards.
Mathematics 10 01847 g002
Figure 3. Computing μ c ( x i ) : The red blocks and black circles represent the pseudo-positive and pseudo-negative w.r.t. label l c . The red four-angle star and black diamond represent the class center of pseudo-positive and pseudo-negative instances on label l c . The blue hexagon represents the center for all included instances on label l c . Instances x A and x C are two randomly selected instances with the pseudo-positive label on l c , whereas instances x B and x D are two randomly selected instances with the pseudo-negative label on l c . The blue, red, and black lines that connect the instances with the blue hexagon, red four-angle star, and black diamond represent the distances of instances to the instance center, pseudo-positive class center, and pseudo-negative class center, respectively.
Figure 3. Computing μ c ( x i ) : The red blocks and black circles represent the pseudo-positive and pseudo-negative w.r.t. label l c . The red four-angle star and black diamond represent the class center of pseudo-positive and pseudo-negative instances on label l c . The blue hexagon represents the center for all included instances on label l c . Instances x A and x C are two randomly selected instances with the pseudo-positive label on l c , whereas instances x B and x D are two randomly selected instances with the pseudo-negative label on l c . The blue, red, and black lines that connect the instances with the blue hexagon, red four-angle star, and black diamond represent the distances of instances to the instance center, pseudo-positive class center, and pseudo-negative class center, respectively.
Mathematics 10 01847 g003
Figure 4. Selection of uncertain candidates on label l c : The red blocks and black circles represent the instances with pseudo-positive and pseudo-negative on label l c . The red four-angle star and black diamond represent the class center of pseudo-positive and pseudo-negative instances on label l c , respectively. The blue hexagon represents the center for all included instances on label l c . The purple circle represents the region of the neighborhood. Based on (29), x B and x C will be selected (i.e., x B , x C X 2 ( μ c , ν c ) ).
Figure 4. Selection of uncertain candidates on label l c : The red blocks and black circles represent the instances with pseudo-positive and pseudo-negative on label l c . The red four-angle star and black diamond represent the class center of pseudo-positive and pseudo-negative instances on label l c , respectively. The blue hexagon represents the center for all included instances on label l c . The purple circle represents the region of the neighborhood. Based on (29), x B and x C will be selected (i.e., x B , x C X 2 ( μ c , ν c ) ).
Mathematics 10 01847 g004
Figure 5. Comparison of each algorithm on the Hamming Loss metric. The average rankings of algorithms on this metric: LIFT (2.625) > IFTWLE (3.4375) > ACkEL (3.6250) > MLTSVM (3.9375) > MLkNN (3.8125) > LLSF (5.0000) > Glocal (5.5625).
Figure 5. Comparison of each algorithm on the Hamming Loss metric. The average rankings of algorithms on this metric: LIFT (2.625) > IFTWLE (3.4375) > ACkEL (3.6250) > MLTSVM (3.9375) > MLkNN (3.8125) > LLSF (5.0000) > Glocal (5.5625).
Mathematics 10 01847 g005
Figure 6. Comparison of each algorithm on the Ranking Loss metric. The average rankings of algorithms on this metric: IFTWLE (2.0000) > MLkNN (3.1250) = LIFT (3.1250) > Glocal (3.5625) > LLSF (3.6875) > MLTSVM (5.8750) > ACkEL (6.6250).
Figure 6. Comparison of each algorithm on the Ranking Loss metric. The average rankings of algorithms on this metric: IFTWLE (2.0000) > MLkNN (3.1250) = LIFT (3.1250) > Glocal (3.5625) > LLSF (3.6875) > MLTSVM (5.8750) > ACkEL (6.6250).
Mathematics 10 01847 g006
Figure 7. Comparison of each algorithm on the One Error metric. The average rankings of algorithms on this metric: MLTSVM (1.2500) > LIFT (2.6250) > IFTWLE (3.6250) > Glocal (4.3750)>MLkNN (4.5000) > LLSF (4.6250) > ACkEL (7.0000).
Figure 7. Comparison of each algorithm on the One Error metric. The average rankings of algorithms on this metric: MLTSVM (1.2500) > LIFT (2.6250) > IFTWLE (3.6250) > Glocal (4.3750)>MLkNN (4.5000) > LLSF (4.6250) > ACkEL (7.0000).
Mathematics 10 01847 g007
Figure 8. Comparison of each algorithm on the Coverage metric. The average rankings of algorithms on this metric: IFTWLE (1.1875) > LLSF (1.8125) > MLkNN (3.0000) > ACkEL (4.7500) > LIFT (5.0000) > Glocal (5.8750) > MLTSVM (6.3750).
Figure 8. Comparison of each algorithm on the Coverage metric. The average rankings of algorithms on this metric: IFTWLE (1.1875) > LLSF (1.8125) > MLkNN (3.0000) > ACkEL (4.7500) > LIFT (5.0000) > Glocal (5.8750) > MLTSVM (6.3750).
Mathematics 10 01847 g008
Figure 9. Comparison of each algorithm on the Average Precision metric. The average rank of algorithms on this metric: IFTWLE (2.4375) > LIFT (3.2500) = Glocal (3.2500) > MLkNN (3.8125) > LLSF (4.0625) > MLTSVM (4.4375) > ACkEL (6.7500).
Figure 9. Comparison of each algorithm on the Average Precision metric. The average rank of algorithms on this metric: IFTWLE (2.4375) > LIFT (3.2500) = Glocal (3.2500) > MLkNN (3.8125) > LLSF (4.0625) > MLTSVM (4.4375) > ACkEL (6.7500).
Mathematics 10 01847 g009
Table 1. Notation of IFTWLE.
Table 1. Notation of IFTWLE.
NotationMathematical Meanings
C a r d ( · ) set cardinality
X 1 multi-label instances set
Y 1 logical label set
X 2 unseen instances set
x i an instance
y i logical label set of the instance x i
y ^ i pseudo-label set of the instance x i learnt by f 1
u i numerical label set of the instance x i
μ c x i membership degree of x i on label l c
ν c x i non-membership degree of x i on label l c
π c x i hesitation degree of x i on label l c
X 2 ( μ , ν ) uncertain instances set on all labels
¬ X 2 ( μ , ν ) certain instances set on all labels
f 1 · function of logical label-based learning
f 2 · function of numerical label-based learning
Y ^ 2 * final predicted multi-label set of X 2
l c label c
Y 2 ( μ , ν ) the label set of X 2 ( μ , ν ) learnt by f 1
¬ Y 2 ( μ , ν ) the label set of ¬ X 2 ( μ , ν ) learnt by f 1
X 2 ( μ c , ν c ) uncertain instance set on label l c
¬ X 2 ( μ c , ν c ) certain instance set on label l c
y ^ i c pseudo-label of instance x i on label l c learnt by f 1
ϕ c ( x i ) high-dimensional representation of instance x i gives a label-specific feature on label l c
C ^ c + class center of the pseudo-positive class on label l c
C ^ c class center of the pseudo-negative class on label l c
D ( · , · ) Euclidean distance
r c + radius of the pseudo-positive class on label l c
r c radius of the pseudo-negative class on label l c
x i c label-specific feature of instance x i on label l c
p c + pseudo-positive instance weight on label l c
p c pseudo-negative instance weight on label l c
ρ c weighted neighborhood difference of x i on label l c
rinstance neighborhood size measured by Euclidean distance D ( · , · )
N c ( x i ) + pseudo-positive instance count in the neighborhood of x i on label l c
N c ( x i ) pseudo-negative instance count in the neighborhood of x i on label l c
U C i the number of times that instance x i is regarded as the uncertain instance on all label sets
U C j ¯ mean of times that instance x j is regarded as the uncertain instance on all label sets
Table 2. Characteristics of data sets.
Table 2. Characteristics of data sets.
Data Set# Instances# Features# Labels# CardinalityDomain
birds645260191.014audio
emotions5937261.869music
enron17021001533.378text
genbase6621185271.252biology
languagelog14601004751.18text
medical9781449451.245text
scene240729461.074image
yeast2417103144.237biology
Table 3. Summary of the Friedman statistics F F ( k = 7 , N = 8 ) and critical values in terms of each evaluation measure (k:# comparing algorithms; N:# data sets).
Table 3. Summary of the Friedman statistics F F ( k = 7 , N = 8 ) and critical values in terms of each evaluation measure (k:# comparing algorithms; N:# data sets).
Metric F F Critical Value ( α = 0.05 )
Hamming Loss9.9910712.3239
Ranking Loss27.816964
One Error33.214286
Coverage41.852679
Average Precision19.473214
Table 4. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significant level α = 0.05 on the Hamming Loss metric.
Table 4. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significant level α = 0.05 on the Hamming Loss metric.
jAlgorithm z j pHolm
2Glocal−1.96740.04910.008
3LLSF−1.44660.14800.010
4MLTSVM−0.46290.64340.013
5MLkNN−0.34720.72840.017
6ACkEL−0.17360.86220.025
7LIFT0.75221.00000.050
Table 5. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level α = 0.05 on the Ranking Loss metric. Algorithms that are statistically inferior to IFTWLE are in bold.
Table 5. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level α = 0.05 on the Ranking Loss metric. Algorithms that are statistically inferior to IFTWLE are in bold.
jAlgorithm z j pHolm
2AC k EL−4.2819180.0000190.008
3MLTSVM−3.5875530.0003340.010
4LLSF−1.5623210.1182120.013
5Glocal−1.4465940.1480110.017
6MLkNN−1.0415480.2976210.025
7LIFT−1.0415480.2976210.050
Table 6. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level α = 0.05 on the One Error metric. Algorithms that are statistically inferior to IFTWLE are in bold.
Table 6. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level α = 0.05 on the One Error metric. Algorithms that are statistically inferior to IFTWLE are in bold.
jAlgorithm z j pHolm
2ACkEL−3.12460.00180.008
3LLSF−0.92580.35450.010
4MLkNN−0.81010.41790.013
5Glocal−0.69440.48740.017
6LIFT0.92581.00000.025
7MLTSVM2.19881.00000.050
Table 7. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level α = 0.05 on the Coverage metric. Algorithms that are statistically inferior to IFTWLE are in bold.
Table 7. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level α = 0.05 on the Coverage metric. Algorithms that are statistically inferior to IFTWLE are in bold.
jAlgorithm z j pHolm
2MLTSVM−4.8026920.0000020.008
3Glocal−4.3397820.0000140.010
4LIFT−3.5296890.0004160.013
5ACkEL−3.29823450.0009730.017
6MLkNN−1.6780490.0933380.025
7LLSF−0.5786380.5628340.050
Table 8. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level α = 0.05 on the Average Precision metric. Algorithms that are statistically inferior to IFTWLE are in bold.
Table 8. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level α = 0.05 on the Average Precision metric. Algorithms that are statistically inferior to IFTWLE are in bold.
jAlgorithm z j pHolm
2ACkEL−3.9925990.0001340.008
3MLTSVM−1.851640.0640780.010
4LLSF−1.5044580.1324640.013
5MLkNN−1.2730030.2030170.017
6LIFT−0.7522290.4519130.025
7Glocal−0.7522290.4519130.050
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhao, T.; Zhang, Y.; Miao, D. Intuitionistic Fuzzy-Based Three-Way Label Enhancement for Multi-Label Classification. Mathematics 2022, 10, 1847. https://doi.org/10.3390/math10111847

AMA Style

Zhao T, Zhang Y, Miao D. Intuitionistic Fuzzy-Based Three-Way Label Enhancement for Multi-Label Classification. Mathematics. 2022; 10(11):1847. https://doi.org/10.3390/math10111847

Chicago/Turabian Style

Zhao, Tianna, Yuanjian Zhang, and Duoqian Miao. 2022. "Intuitionistic Fuzzy-Based Three-Way Label Enhancement for Multi-Label Classification" Mathematics 10, no. 11: 1847. https://doi.org/10.3390/math10111847

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop