Intuitionistic Fuzzy-Based Three-Way Label Enhancement for Multi-Label Classification

Zhao, Tianna; Zhang, Yuanjian; Miao, Duoqian

doi:10.3390/math10111847

Open AccessArticle

Intuitionistic Fuzzy-Based Three-Way Label Enhancement for Multi-Label Classification

by

Tianna Zhao

^1,†

,

Yuanjian Zhang

^2,3,*,†

and

Duoqian Miao

^1,*

¹

Department of Computer Science and Technology, Tongji University, Shanghai 201804, China

²

China UnionPay Co., Ltd., Shanghai 201201, China

³

Postdoctoral Research Station of Computer Science and Technology, Fudan University, Shanghai 200433, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2022, 10(11), 1847; https://doi.org/10.3390/math10111847

Submission received: 11 April 2022 / Revised: 12 May 2022 / Accepted: 24 May 2022 / Published: 27 May 2022

(This article belongs to the Special Issue Soft Computing and Uncertainty Learning with Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Multi-label classification deals with the determination of instance-label associations for unseen instances. Although many margin-based approaches are delicately developed, the uncertainty classifications for those with smaller separation margins remain unsolved. The intuitionistic fuzzy set is an effective tool to characterize the concept of uncertainty, yet it has not been examined for multi-label cases. This paper proposed a novel model called intuitionistic fuzzy three-way label enhancement (IFTWLE) for multi-label classification. The IFTWLE combines label enhancement with an intuitionistic fuzzy set under the framework of three-way decisions. For unseen instances, we generated the pseudo-label for label uncertainty evaluation from a logical label-based model. An intuitionistic fuzzy set-based instance selection principle seamlessly bridges logical label learning and numerical label learning. The principle is hierarchically developed. At the label level, membership and non-membership functions are pair-wisely defined to measure the local uncertainty and generate candidate uncertain instances. After upgrading to the instance level, we select instances from the candidates for label enhancement, whereas they remained unchanged for the remaining. To the best of our knowledge, this is the first attempt to combine logical label learning with numerical label learning into a unified framework for minimizing classification uncertainty. Extensive experiments demonstrate that, with the selectively reconstructed label importance, IFTWLE achieves statistically superior over the state-of-the-art multi-label classification algorithms in terms of classification accuracy. The computational complexity of this algorithm is

O (n^{2} m k)

, where n, m, and k denote the unseen instances count, label count, and average label-specific feature size, respectively.

Keywords:

multi-label classification; uncertainty; three-way decisions; intuitionistic fuzzy number; instance selection principle; label enhancement

MSC:

03B52; 68T30; 68T37

1. Introduction

In multi-label settings [1,2,3], an instance is associated with multiple labels simultaneously. For example, a picture may be relevant to tags, such as sky, ocean, and seagull; a book may cover topics, such as sports and art. The goal of multi-label classification is to learn mapping from a feature space to a label space, such that the label associations of unseen instances can be determined. It widely applies in domains involving smart grid management [4], disease diagnosis [5], and image classification [6].

Traditionally, multi-label classification schema builds upon logical labels and provides qualitative associations between instances and labels. For robustness, many researchers focus on the extensions of linear or hyperplane-based models, and their work can be roughly categorized as problem transformation and algorithm adaptation, respectively. The previous transforms multi-label classification into a collection of simplified learning scenarios, such as single-label classification. Representative work involves binary relevance [7], random k-labelsets [8], and learning label-specific feature (LLSF) [9]. Comparatively, the latter extends the existing algorithms to simultaneously generate multi-outputs. Representative work includes multi-label forest (ML-Forest) [10] and multi-label twin support vector machine (MLTSVM) [11]. However, both strategies suffered from the calibrated threshold problem. Intuitively, the uncertainty of the label association is larger if the output is close to the threshold, and is smaller otherwise. Hence, we need labels with stronger supervision to boost the model performance.

Regarding a numerical label–this quantifies how much information a label has in describing an instance. For example, a facial expression can be interpreted as the combination of slight sadness, some anger, and a bit of disgust. The fitting of the numerical label is defined as label distribution learning [12]. Although numerical labels offer more discriminative information than logical labels, it is expensive to annotate all numerical labels manually. One feasible solution is to learn numerical labels from logical labels, also known as label enhancement [13,14,15,16]. However, the research studies have indistinguishably employed label enhancements for all instances, regardless of the uncertainty differences.

Three-way decisions [17], also known as trisecting–acting–outcome (TAO) model [18,19,20], is an emerging decision theory for problem solving with uncertainty [21,22,23]. It originates from the semantic explanations for three regions induced by rough sets and has become an emerging theory in the soft computing community [24,25,26,27]. The sequential three steps are trisecting, acting and outcome characterizing the routines on uncertainty. Typically, the trisecting step divides the information into three non-overlapping regions, with two regions as certain and the others as uncertain; the acting step takes the positive/negative strategy w.r.t. of certain objects whereas takes the deferment strategy w.r.t. of uncertain objects; the outcome step evaluates the performance induced by trisecting and acting. Such routines may continue until the uncertainty is negligible and the classification performance is improved. Existing three-way-based multi-label classifications [28,29,30,31,32] deal with label uncertainty either from ensemble features or ensemble algorithms, whereas the ensemble on logical and numerical labels remains untouched.

1.1. Motivation

The semantic uncertainty analysis on multi-label classification has limitations. The fuzzy set shows the effectiveness in measuring the membership degree towards multi-label [33,34,35], but it fails to consider the non-membership degree. As a generalization of the fuzzy set, an intuitionistic fuzzy number (IFN) is effective at quantifying the vagueness of a qualitative instance-label assignment [36], which offers heuristic information to select instances for label enhancement. This paper presents an intuitionistic fuzzy-based three-way label enhancement (IFTWLE) model. It implements three-way decisions by ”trisecting” unseen instances from label-specific learning and by ”acting” label enhancement of uncertain instances. Inspired by empirical studies on single-label [37,38,39], we employed an intuitionistic fuzzy number to search instances with uncertainty classifications. Concretely speaking, IFTWLE applies IFN at the label level and defines a pair of membership and non-membership functions for every unseen instance based on the generated pseudo-labels. The membership functions evaluate the weighted closeness of instances belonging to the specified class, whereas the non-membership functions evaluate the possibility of instances belonging to the complementary class. The ultimate uncertainty instances are determined via an aggregation function. Consequently, we preserve the predicted labels if the classifications are plausible and exploit the numerical label otherwise.

1.2. Contribution

Compared with the existing multi-label classifications models, our contributions are as follows:

(1): For the first time, cascade learning of logical labels and numerical labels of multi-labels are presented and unified under the semantics of classification uncertainty. Determination of the label association for unseen instances broadens the application of the three-way decisions theory.
(2): A novel instance selection principle in two stages is presented to integrate logical label learning with numerical label learning. In this way, instances that exhibit significant uncertainty across most labels are enhanced with better discrimination (regarding label importance).
(3): This is the first attempt to employ an intuitionistic fuzzy set on multi-label classification. It addresses the issue of identifying potentially uncertain instances on each label without stipulating many hyperparameters.
(4): The proposed IFTWLE has demonstrated effectiveness across many domains. The computational complexity is proportional to the quadratic instances count and linear to the label scale and label-specific features count.

The remainder of the paper is organized as follows. Section 2 reviews some preliminaries on label-specific learning, label enhancement, and the intuitionistic fuzzy set, Section 3 presents our proposed model for multi-label classification; experimental results are reported in Section 4. Section 5 concludes this work and identifies our future directions.

2. Preliminaries

In this section, we briefly review some preliminaries regarding label-specific feature learning and label enhancement.

2.1. Label-Specific Feature Learning

Label-specific feature [9,40,41,42] assumes that each label has unique characteristics and can be described by different feature subsets. For the simplicity of computation, the learning label-specific feature (LLSF) [9] considers second-order label correlation (a.k.a. pairwise, one label is at most dependent on another) and assumes three hypotheses:

(1): Discrimination: the set of i-th label-specific feature should be most pertinent to the corresponding label ( $l_{i}$ ), and the included components should be different from other label-specific features.
(2): Sparsity: the label-specific features should be sparse as compared to the feature space.
(3): Shareability: the cardinality of common features of two label-specific features with strong label correlations should be larger than those with weak label correlations.

Based on the previous hypotheses, the objective function is formulated as:

min_{W} \frac{1}{2} {∥XW - Y∥}_{F}^{2} + \frac{δ}{2} tr ({RW}^{⊤} W) + η {∥W∥}_{1}

(1)

where

X

and

Y

represent the features and labels of multi-label,

W = [w_{1}, w_{2}, \dots, w_{i}, \dots, w_{m}]

, and the basic element

w_{i}

represents the weight for the label-specific feature of

l_{i}

, which is composed of the non-zero terms in

w_{i}

.

{∥\cdot∥}_{F}^{2}

denotes the Frobenius norm.

R = [r_{i j}]

represents a matrix composed of the second-order label relevance.

r_{i j} = 1 - c_{i j}

, where

c_{i j}

measures the correlation between label

l_{i}

and label

l_{j}

. The label correlation is computed with cosine similarity.

tr (\cdot)

denotes the matrix trace. Symbols

δ

and

η

are the balance factors. The inner product of

w_{i}

and

w_{j}

describes the correlation between label

l_{i}

and label

l_{j}

from the feature view. The stronger the correlation is, the larger the inner product becomes, and vice versa.

For the prediction of unseen instances, LLSF employs logistic regression and can be denoted as:

\hat{Y} = s g n (X W - τ)

(2)

where

s g n (\cdot)

returns 1 if the condition holds, and returns 0 otherwise.

2.2. Label Enhancement

Label enhancement assumes that each instance is intrinsically represented by real-valued labels and, thus, can be recovered from qualitative logical label (

Y

) to quantitative numerical label (

U

) via the instance-level or label-level smoothness [15,16,43,44]. The distribution of numerical labels describes the relative importance of different labels in describing a given instance.

To guarantee the effectiveness, three hypotheses are presented in label enhancement multi-label learning (LEMLL) [16].

(1): Linear relevance: the mapping from feature space to numerical label $g : X \to U$ follows a linear model.
(2): Label similarity: the values of learnt numerical labels should be approximated to the original logical labels.
(3): Topology consistency: the instances with similar features share similar numerical label values.

Based on the previous assumptions, the objective function is given as:

\begin{matrix} min_{Θ, b, U} \sum_{i = 1}^{n} L_{R} (R_{i}) + α {∥Θ∥}_{F}^{2} + β {∥U - Y∥}_{F}^{2} + γ tr (U^{⊤} MU) \\ s . t . R_{i} = {∥ξ_{i}∥}_{2} = \sqrt{ξ_{i}^{⊤} ξ_{i}} \\ ξ_{i} = u_{i} - Θ φ (x_{i}) - b \\ L_{R} (R_{i}) = \{\begin{matrix} 0 & R_{i} < ε \\ R_{i}^{2} - 2 R_{i} ε + ε^{2} & R_{i} \geq ε \end{matrix} \end{matrix}

(3)

where

\sum_{i = 1}^{n} L_{R} (R_{i})

is the loss function from the feature space to the numerical labels, with the regularizer as

R_{i} = {∥ξ_{i}∥}_{2} = \sqrt{ξ_{i}^{⊤} ξ_{i}}

, where

ξ_{i} = u_{i} - Θ φ (x_{i}) - b

, and

φ (x_{i})

is a mapping from instance

x_{i}

to a high-dimensional space

R^{H}

;

Θ

and

b

are the parameters in the linear regression model.

{∥U - Y∥}_{F}^{2}

is the implementation of the label similarity assumption measured by Frobenius norm (abbreviated as F).

α, β, γ

are all balance factors.

tr (U^{⊤} MU) = {∥U - WU∥}_{F}^{2}

is the implementation of topology consistency, where

tr (\cdot)

represents the matrix trace, and

M = {(I - W)}^{⊤} (I - W)

, with

I

being an identity matrix and

W

being the weight matrix constructed by the fully connected graph

G = (V, E, W)

, which describes the closeness among the arbitrary instances.

For predictions on unseen instances, LEMLL leverages a kernel logistic regression, which is denoted as:

\hat{Y} = s g n (Θ φ (X) + b - τ)

(4)

2.3. Intuitionistic Fuzzy Set

Suppose an arbitrary instance

x_{i} \in X

has the corresponding label

y_{i}

, then for a nonempty set

X

, the intuitionistic fuzzy set is defined as follows:

\tilde{A} = {(x_{i}, μ_{\tilde{A}} (x_{i}), ν_{\tilde{A}} (x_{i})) | x_{i} \in X}

(5)

where

μ_{\tilde{A}} (x_{i})

is the membership of instance

x_{i}

expressing the chances of instance

x_{i}

being a particular class A,

ν_{\tilde{A}} (x_{i})

is the non-membership of instance

x_{i} \in X

representing the possibility of instance

x_{i}

not related to class A.

μ

and

ν

are functions mapping from

X

to [0,1]; the following two conditions are satisfied:

1.: $μ_{\tilde{A}} (x_{i}) \in [0, 1], ν_{\tilde{A}} (x_{i}) \in [0, 1]$ ;
2.: $0 \leq μ_{\tilde{A}} (x_{i}) + ν_{\tilde{A}} (x_{i}) \leq 1$ .

The hesitation of instance

x_{i}

is defined as:

π_{\tilde{A}} (x_{i}) = 1 - μ_{\tilde{A}} (x_{i}) - ν_{\tilde{A}} (x_{i})

(6)

which measures the hesitation of instance

x_{i}

related to class A.

μ_{\tilde{A}}

and

ν_{\tilde{A}}

construct the intuitionistic fuzzy number IFN:

α = (μ_{\tilde{A}}, ν_{\tilde{A}})

,

S (α)

is used to compare two intuitionistic fuzzy numbers:

S (α) = μ_{\tilde{A}} - ν_{\tilde{A}}

(7)

which could compare instances

(x_{i}, y_{i}, μ_{\tilde{A}} (x_{i}), ν_{\tilde{A}} (x_{i}))

with

(x_{j}, y_{j}, μ_{\tilde{A}} (x_{j}), ν_{\tilde{A}} (x_{j}))

, and through the value of

S (α_{i})

and

S (α_{j})

, we could determine whether

x_{i}

or

x_{j}

is more likely to be associated with class A.

3. Proposed Model

3.1. Notations

For ease of reference, we present a nomenclature including the major notations and mathematical meanings in Table 1.

3.2. Problem Formulation

Let

X_{1} = \{x_{1}, x_{2}, \dots, x_{n}\}

denotes a set of multi-label instances with the union of known logical label information

Y_{1} = \{y_{1}, y_{2}, \dots, y_{n}\}

;

X_{2}

denotes the unseen instances. The logical label of

x_{i}

to label set

{l_{1}, \dots, l_{c}, \dots, l_{m}}

is denoted as

y_{i} = {y_{i 1}, \dots, y_{i c}, \dots, y_{i m}}

where

y_{i c} = 1

holds (positive class) if

x_{i}

is associated with label

l_{c}

, and

y_{i c} = 0

(negative class) otherwise. The numerical label of instance

x_{i}

is denoted by

u_{i} = \{u_{i 1}, \dots, u_{i c}, \dots, u_{i m}\} \in {[0, 1]}^{m}

, which satisfies

\sum_{c} u_{i c} = 1

. For an arbitrary label

l_{c}

, we denote the degree of the membership function related to

l_{c}

and non-membership functions unrelated to

l_{c}

as

I F N_{c} (x_{i}) = (μ_{c} (x_{i}), ν_{c} (x_{i}))

, respectively, where

μ_{c} (x_{i}) \in [0, 1]

,

ν_{c} (x_{i}) \in [0, 1]

, and

0 ⩽ μ_{c} (x_{i}) + ν_{c} (x_{i}) ⩽ 1

. The uncertain instances set on the c-th label is denoted as

X_{2}^{(μ_{c}, ν_{c})}

. Uncertain instances set on all labels are denoted as

X_{2}^{(μ, ν)}

, and certain instances set on all labels are complementary of

X_{2}^{(μ, ν)}

, denoted as

\neg X_{2}^{(μ, ν)}

(i.e.,

X_{2} = X_{2}^{(μ, ν)} \cup \neg X_{2}^{(μ, ν)}

). Our goal was to identify uncertain classifications from logical label-based learning and improve the performance with numerical label-based learning.

3.3. Basic Idea

IFTWLE is an implementation of the TAO model for unseen instances

X_{2}

(see Figure 1). Firstly, the trisecting was realized by a logical label-based model (denoted as

f_{1} (\cdot)

) with an intuitionistic fuzzy number (denoted as

(μ, ν)

). An instance selection principle was developed to identify the uncertain instances. Secondly, the acting was realized on all instances with different strategies, with label enhancements on uncertain instances (denoted as

f_{2} (\cdot)

), adopting the classifications otherwise. Finally, we conducted outcome evaluations on the deduced classifications.

For a non-trivial solution, the three-way classification result of the predicted multi-label set (i.e.,

{\hat{Y}}_{2}^{*}

) on

X_{2}

is defined as:

{\hat{Y}}_{2}^{*} = \{\begin{matrix} f_{1} (\neg X_{2}^{(μ, ν)}), & x \in \neg X_{2}^{(μ, ν)} \\ f_{2} (X_{2}^{(μ, ν)}), & x \in X_{2}^{(μ, ν)} \end{matrix}

(8)

where

f_{2} (X_{2}^{(μ, ν)})

refers to the predicted multi-label sets of deferred instances with large label uncertainty degree and

f_{1} (\neg X_{2}^{(μ, ν)})

refers to the predicted multi-label sets of certain instances with little label uncertainty degrees.

Figure 2 describes the pipeline on the instance selection principle. By taking a problem transformation view, we assigned the membership function (

μ_{c} (x_{i})

) and non-membership function (

ν_{c} (x_{i})

) for all unseen instances on an arbitrary label

l_{c}

, which was then incorporated to search for the candidate uncertain instances denoted as

X_{2}^{(μ_{c}, ν_{c})}

. Final uncertain instances (

X_{2}^{(μ, ν)}

) were aggregated by considering the distributions of candidate uncertain instances across all labels. In what follows, we will elaborate the details from Section 3.4, Section 3.5 and Section 3.6.

3.4. Intuitionistic Fuzzy Membership Assignment

Although fuzzy membership assignment is capable of measuring the concept vagueness, it has the following drawbacks for multi-label classification:

Regardless of the concrete membership function definition, it can only describe the closeness of an instance belonging to a concept. The distribution of the heterogeneous class is thus neglected, which is of great importance in finding uncertain instances.
Multi-label data have some specialized characteristics, such as an imbalanced class. With the membership function only, the model cannot utilize such information effectively, which leads to the degeneration of model generality.

In [45], the intuitionistic fuzzy set figures out the support vectors from instances and improves the generalization of the support vector machine. In our case, instances have obtained the pseudo-label by conducting label-specific learning (i.e.,

f_{1} (\cdot)

). Therefore we assume a desirable hyperplane is available. This means the possibility of instances with noisy labels is less likely to occur, and misclassified instances are both far from the class center and circled by heterogeneous instances. Therefore, from the perspective of labels, such instances are compatible with the intuitionistic fuzzy set. The degrees of membership and non-membership functions for each unseen instance take the problem transformation view and are determined in a label-specific way. Without losing generality, we consider the construction of

μ_{c} (x_{i})

and

ν_{c} (x_{i})

on label

l_{c}

.

3.4.1. Membership Function $μ_{c} (x_{i})$

Let

{\hat{Y}}_{2}

be the pseudo-label set of the unseen instance set

X_{2}

learnt from the LLSF model (

f_{1}

); the membership of instances is determined by the relative similarity against the predicted class and other classes. In other words, the membership function of an instance is larger if both the relative distance to the predicted class is smaller and the relative distance to the other classes is larger. Using the class center as a representative, for two instances that are both pseudo-positively associated (i.e.,

{\hat{y}}_{i c} = {\hat{y}}_{j c} = 1

), our goal can be formally written as:

\begin{matrix} μ_{c} (x_{i}) > μ_{c} (x_{j}) i f & \frac{D (ϕ_{c} (x_{i}), {\hat{C}}_{c}^{+})}{r_{c}^{+} + ϵ} - \frac{D (ϕ_{c} (x_{i}), {\hat{C}}_{c}^{-})}{r_{c}^{-} + ϵ} \\ < & \frac{D (ϕ_{c} (x_{j}), {\hat{C}}_{c}^{+})}{r_{c}^{+} + ϵ} - \frac{D (ϕ_{c} (x_{j}), {\hat{C}}_{c}^{-})}{r_{c}^{-} + ϵ} \end{matrix}

(9)

where

D (ϕ_{c} (x_{i}), {\hat{C}}_{c}^{+})

and

D (ϕ_{c} (x_{i}), {\hat{C}}_{c}^{-})

are the distances of instance

x_{i}

to the pseudo-positive class and pseudo-negative class, respectively. They are defined as:

D (ϕ_{c} (x_{i}), {\hat{C}}_{c}^{+}) = ∥ϕ_{c} (x_{i}) - {\hat{C}}_{c}^{+}∥

(10)

D (ϕ_{c} (x_{i}), {\hat{C}}_{c}^{-}) = ∥ϕ_{c} (x_{i}) - {\hat{C}}_{c}^{-}∥

(11)

where

ϕ_{c} (x_{i})

represents the high-dimensional representation of instance

x_{i}

given the label-specific feature w.r.t.

l_{c}

, and

∥\cdot∥

is the distance between the instance and the corresponding pseudo-class center. Suppose

K (x_{i}^{c}, x_{j}^{c})

denotes a kernel function on the label-specific feature w.r.t.

l_{c}

, the inner product distance is expanded as:

∥ϕ_{c} (x_{i}) - ϕ_{c} (x_{j})∥ = \sqrt{K (x_{i}^{c}, x_{i}^{c}) - 2 K (x_{i}^{c}, x_{j}^{c}) + K (x_{j}^{c}, x_{j}^{c})}

(12)

{\hat{C}}_{c}^{+}

and

{\hat{C}}_{c}^{-}

are the class centers of the pseudo-positive and pseudo-negative classes on label

l_{c}

. The average value is measured by all the pseudo-positive instances that predicted the pseudo-label was 1 on label

l_{c}

, denoted as

{\hat{C}}_{c}^{+}

; the average value is measured by all the pseudo-negative instances that predicted pseudo-label was 0 on label

l_{c}

, denoted as

{\hat{C}}_{c}^{-}

.

{\hat{C}}_{c}^{+} = \frac{1}{l_{c}^{+}} \sum_{{\hat{y}}_{i c} = 1} ϕ_{c} (x_{i})

(13)

{\hat{C}}_{c}^{-} = \frac{1}{l_{c}^{-}} \sum_{{\hat{y}}_{i c} = 0} ϕ_{c} (x_{i})

(14)

where

l_{c}^{+} = |{x_{i} | {\hat{y}}_{i c} = 1}|

and

l_{c}^{-} = |{x_{i} | {\hat{y}}_{i c} = 0}|

denote the pseudo-positive instance count and pseudo-negative instance count w.r.t. label

l_{c}

, respectively.

r_{c}^{+}

and

r_{c}^{-}

are the radii of the pseudo-positive and pseudo-negative class on label

l_{c}

, which can be quantified as:

r_{c}^{+} = max_{y_{i c} = 1} ∥ϕ_{c} (x_{i}) - {\hat{C}}_{c}^{+}∥

(15)

r_{c}^{-} = max_{y_{i c} = 0} ∥ϕ_{c} (x_{i}) - {\hat{C}}_{c}^{-}∥

(16)

By substituting Equations (13) and (14) into Equations (15) and (16), based on Equation (12), we have:

r_{c}^{+} = max_{{\hat{y}}_{i c} = 1} \sqrt{K (x_{i}^{c}, x_{i}^{c}) - \frac{2}{l_{c}^{+}} \sum_{{\hat{y}}_{j c} = 1} K (x_{i}^{c}, x_{j}^{c}) + \frac{1}{{l_{c}^{+}}^{2}} \sum_{{\hat{y}}_{m c} = 1} \sum_{{\hat{y}}_{n c} = 1} K (x_{m}^{c}, x_{n}^{c})}

(17)

r_{c}^{-} = max_{{\hat{y}}_{i c} = 0} \sqrt{K (x_{i}^{c}, x_{i}^{c}) - \frac{2}{l_{c}^{-}} \sum_{{\hat{y}}_{j c} = 0} K (x_{i}^{c}, x_{j}^{c}) + \frac{1}{{l_{c}^{-}}^{2}} \sum_{{\hat{y}}_{m c} = 0} \sum_{{\hat{y}}_{n c} = 0} K (x_{m}^{c}, x_{n}^{c})}

(18)

one can infer that calculation of

D (ϕ_{c} (x_{i}), {\hat{C}}_{c}^{-})

is costly if

x_{i}

is pseudo-positive on

l_{c}

. For simplicity, we use

D (ϕ_{c} (x_{i}), {\hat{C}}_{c})

instead. In other words, we examine the dissimilarity of an instance to the class center of all instances.

For multi-label cases, the positive class is the minority class, whereas the negative class is the majority class. The imbalanced class distribution results in the different contributions to the membership function. The rationality is that the location of the class center of the negative class has a much lower empirical risk than the positive class, and the empirical risk for the center of the positive class becomes higher as the count for instances with the positive class becomes smaller. Therefore, we introduce symbol

p_{c}^{+}

and

p_{c}^{-}

to represent the pseudo-positive instance weight and pseudo-negative instance weight w.r.t. label

l_{c}

. Any

l_{c}

, is defined as:

p_{c}^{+} = \frac{2 \times l_{c}^{+}}{l_{c}^{+} + C a r d (X_{2})}

(19)

p_{c}^{-} = \frac{2 \times l_{c}^{-}}{l_{c}^{-} + C a r d (X_{2})}

(20)

where

C a r d (X_{2})

=

|X_{2}|

refers to the instance number in the instance set

X_{2}

. For each unseen instance

x_{i}

, the degree of membership

μ_{c} (x_{i})

is defined as:

μ_{c} (x_{i}) = \{\begin{matrix} 1 - (p_{c}^{+} \times \frac{D (ϕ_{c} (x_{i}), {\hat{C}}_{c}^{+})}{r_{c}^{+} + ϵ} + (1 - p_{c}^{+}) \times \frac{D (ϕ_{c} (x_{i}), {\hat{C}}_{c})}{r_{c} + ϵ}) & {\hat{y}}_{i c} = 1 \\ 1 - (p_{c}^{-} \times \frac{D (ϕ_{c} (x_{i}), {\hat{C}}_{c}^{-})}{r_{c}^{-} + ϵ} + (1 - p_{c}^{-}) \times \frac{D (ϕ_{c} (x_{i}), {\hat{C}}_{c})}{r_{c} + ϵ}) & {\hat{y}}_{i c} = 0 \end{matrix}

(21)

where

ϵ \to 0^{+}

is an adjustable parameter,

r_{c}^{+}

,

r_{c}^{-}

,

r_{c}

and

{\hat{C}}_{c}^{+}

,

{\hat{C}}_{c}^{-}

,

{\hat{C}}_{c}

are the radius and class centers of the pseudo-positive class, pseudo-negative class, and all unseen instances on label

l_{c}

determined by

f_{1}^{c} (\cdot)

. Figure 3 shows an example describing the effect of Equation (21), where both

μ_{c} (x_{A}) > μ_{c} (x_{C})

and

μ_{c} (x_{D}) > μ_{c} (x_{B})

hold.

3.4.2. Non-Membership Function $ν_{c} (x_{i})$

The non-membership of instances is determined by the following two factors. Firstly, it should be impacted by the dissimilarity and similarity of the instance to the predicted class and other classes. It means the non-membership is larger if an instance is located in the region where the probability of being other classes is larger. Secondly, it should be impacted by the heterogeneous instance distribution within the neighborhood. One can infer that the non-membership is larger if an instance is surrounded by instances from other classes. We assume membership is inversely proportional to non-membership, denoted as:

ν_{c} (x_{i}) \propto 1 - μ_{c} (x_{i})

(22)

The imbalanced class distribution [46] implies that the contributions of heterogeneous instances are label-dependent. In other words, for two instances,

x_{i}

, and

x_{j}

, with the same number of heterogeneous instances in their corresponding neighbours, the non-membership of

x_{i}

is larger than

x_{j}

if

x_{i}

belongs to the pseudo-negative class whereas

x_{j}

belongs to the pseudo-positive class. To implement our assumption, we introduced two symbols,

n_{c}^{+}

and

n_{c}^{-}

, to represent the prior probability of being pseudo-positive and pseudo-negative, respectively.

n_{c}^{+} = \frac{l_{c}^{+}}{C a r d (X_{2})}

(23)

n_{c}^{-} = \frac{l_{c}^{-}}{C a r d (X_{2})}

(24)

The two prior probabilities are incorporated to quantify the contribution of the heterogeneous instance within the neighborhood. From the perspective of the neighborhood, the non-membership degree is larger if more heterogeneous instances are included, and smaller if more homogeneous instances are included. Therefore, the weighted neighborhood difference of

x_{i}

on label

l_{c}

(i.e.,

(ρ_{c} (x_{i}))

) is defined as:

ρ_{c} (x_{i}) = \{\begin{matrix} \frac{n_{c}^{+} N_{c} {(x_{i})}^{-}}{n_{c}^{+} N_{c} {(x_{i})}^{-} + n_{c}^{-} N_{c} {(x_{i})}^{+}} & {\hat{y}}_{i c} = 1 \\ \frac{n_{c}^{-} N_{c} {(x_{i})}^{+}}{n_{c}^{+} N_{c} {(x_{i})}^{-} + n_{c}^{-} N_{c} {(x_{i})}^{+}} & {\hat{y}}_{i c} = 0 \end{matrix}

(25)

where

N_{c} {(x_{i})}^{+} = |\{x_{j} |x_{j} \in N_{c} (x_{i}) \land {\hat{y}}_{j c} = 1\}|

denotes the pseudo-positive instance count in the r-neighborhood of

x_{i}

on label

l_{c}

(

r > 0

).

r > 0

is an adjustable parameter and

N_{c} {(x_{i})}^{-} = |\{x_{j} |x_{j} \in N_{c} (x_{i}) \land {\hat{y}}_{j c} = 0\}|

denotes the pseudo-negative instance count in the

γ

-neighborhood of

x_{i}

on label

l_{c}

. We assume the non-membership degree is proportional to the weighted neighborhood difference, denoted as:

ν (x_{i}) \propto ρ (x_{i})

(26)

Based on assumptions (22) and (26), we define the non-membership degree as:

ν_{c} (x_{i}) = (1 - μ_{c} (x_{i})) ρ_{c} (x_{i})

(27)

It is easy to validate

0 ⩽ μ_{c} (x_{i}) + ν_{c} (x_{i}) ⩽ 1

holds. Here, we offer some explanations. The

μ_{c} (x_{i})

receives the largest value when it is the center of the plausible pseudo-class, and it reaches 0 only when it is the farthest instance to the class center of all instances simultaneously. The

ν_{c} (x_{i})

itself is smaller than 1, as the two components are all less than 1. As

ρ_{c} (x_{i})

is smaller than 1, the

ν_{c} (x_{i})

is no larger than

1 - μ_{c} (x_{i})

, which means

0 ⩽ μ_{c} (x_{i}) + ν_{c} (x_{i}) ⩽ 1

holds.

By referring to Equation (6), we define the hesitation degree as:

π_{c} (x_{i}) = 1 - μ_{c} (x_{i}) - ν_{c} (x_{i})

(28)

3.5. Label-Level Instance Selection

Having the membership/non-membership degrees, the unseen instances (

X_{2}

) with the intuitionistic fuzzy membership (

{IFX}_{2}

) are denoted as:

{IFX}_{2} = {(x_{1}, {\hat{y}}_{1}, μ_{1}, ν_{1}), (x_{2}, {\hat{y}}_{2}, μ_{2}, ν_{2}), \dots, (x_{k}, {\hat{y}}_{k}, μ_{k}, ν_{k})}

where

μ_{i} = (μ_{1} (x_{i}), μ_{2} (x_{i}), \dots, μ_{m} (x_{i}))

and

ν_{i} = (ν_{1} (x_{i}), ν_{2} (x_{i}), \dots, ν_{m} (x_{i}))

denote the degrees of membership and non-membership functions w.r.t.

x_{i}

across all labels, respectively.

In terms of each label, we can select the instances with uncertain classifications (denoted as

X_{2}^{(μ_{c}, ν_{c})}

) as:

X_{2}^{(μ_{c}, ν_{c})} = {x_{i} |μ_{c} (x_{i}) > ν_{c} (x_{i}) \land ν_{c} (x_{i}) > 0}

(29)

X_{2}^{(μ_{c}, ν_{c})}

can be interpreted as candidate uncertain instances. For example, assume four unseen samples with pseudo-labels, i.e.,

x_{A}

,

x_{B}

,

x_{C}

and

x_{D}

in Figure 4. It is more likely that

x_{B}

and

x_{C}

belong to

X_{2}^{(μ_{c}, ν_{c})}

than

x_{A}

and

x_{D}

, and they are more close to the hyperplane. As instances with small separation margins tend to be more uncertain, label enhancement on

x_{C}

and

x_{D}

will be more informative than on

x_{A}

and

x_{B}

, given the pseudo-label distribution on label

l_{c}

. One should be aware that

x_{C}

is less likely to be considered if we only apply the traditional fuzzy set model, as the affiliation degree to the positive class is much larger than that of the negative class. However, it will be conducive to enhancing generality if

x_{C}

is selected, as there are two instances with negative classes in its neighbours. Hence, we can select instance

x_{C}

via the intuitionistic fuzzy set.

3.6. Instance-Level Instance Selection

Now we determine which instances should be selected for the label enhancement. Since label enhancement works at the instance level, a straightforward notion is to select the instances that are uncertain across most labels. In this paper, we introduce the symbol

U C_{i}

to represent the number of accumulated times that satisfy

x_{i} \in X_{2}^{(μ_{c}, ν_{c})}

for any

l_{c}

. It is defined as:

U C_{i} = \sum_{c} [[x_{i} \in X_{2}^{(μ_{c}, ν_{c})}]]

(30)

where the notation

[[\cdot]]

equals 1 if the condition meets and 0 otherwise. Based on the

U C

, the instances to be enhanced (

X_{2}^{(μ, ν)}

) can be computed as:

X_{2}^{(μ, ν)} = \{x_{i} |U C_{i} ⩾ \bar{U C_{j}} \land x_{j} \in X_{2} \land U C_{j} > 0\}

(31)

where the notation

\bar{U C_{j}}

refers to the average non-zero

U C_{j}

. This means that an instance will be enhanced only when it is recognized as uncertain in most labels.

3.7. Complexity Analysis

We summarize the procedures of IFTWLE in Algorithm 1. The most time-consuming step is the calculation of pair-wise distances within all unseen instances (i.e.,

r_{c}

). Let k be the average length of label-specific features, n be the unseen instance count, and m be the label count, then the calculation for

r_{c}

requires

O (n^{2} m k)

. Therefore, the computational complexity of IFTWLE is much lower than that of training

f_{1} (\cdot)

[9] and

f_{2} (\cdot)

[16]. However, as the complexity is proportional to the quadratic of the instance count, it is inappropriate to employ this algorithm for large-scale multi-labels. The local estimation of the inner product should be considered to accelerate the instance selection.

Algorithm 1: IFTWLE

4. Experiments

4.1. Dataset Characteristics

To demonstrate the effectiveness and efficiency of the proposed model, we compared the classification performances on eight multi-label benchmarks from Mulan (http://mulan.sourceforge.net/datasets.html) (accessed on 1 March 2022) and Meka (http://waikato.github.io/meka/datasets/) (accessed on 1 March 2022) on domains including audio, music, text, biology, and images. The selected benchmarks were either small or moderate and were intensively referenced due to their limited baseline performances. In Table 2, for each dataset, “# Instances” means the number of instances, “# Features” means the number of features, “# Labels” means the total number of class labels, and “# Cardinality” means the average number of labels per instance of a dataset, where notation “#” means the number of.

4.2. Evaluation Metrics

We adopted six evaluation metrics (Hamming Loss, Ranking Loss, One Error, Coverage, Average Precision, and Micro F1) [47] to evaluate the classification performance. Except for the last two, which achieve better performance when the values are larger, the remaining metrics obtain better performances when values are smaller. Let

Y_{i}

and

\neg Y_{i}

denote the relevant and irrelevant label sets in ground-truth, and n be the unseen instances count, then the formulas of metrics are enumerated as:

(1): Hamming Loss (abbreviated as Hl) evaluates the average difference between predictions and ground-truth (see Formula (32)). The smaller the value of the Hamming Loss, the better the performance of the algorithm.

$H l = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{l} C a r d (f (x_{i}) Δ Y_{i})$

(32)

where $Δ$ is the set symmetric difference and $C a r d (\cdot)$ is the set cardinality.
(2): Ranking Loss (abbreviated as Rkl), evaluates the fraction that the irrelevant label ranks before the relevant label in the label predictions (see Formula (33)). The smaller the value of the Ranking Loss, the better the performance of the algorithm.

$R k l = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{C a r d (Y_{i}) C a r d (\neg Y_{i})} \times C a r d (\{(l_{a}, l_{b}) |r a n k_{i} (l_{a}) > r a n k_{i} (l_{b}) \land (l_{a}, l_{b}) \in Y_{i} \times \neg Y_{i}\})$

(33)

where $r a n k_{i} (l_{j})$ denotes the ranking position in ascending order for the j-th label on the i-th instance. $C a r d (\cdot)$ is the set cardinality.
(3): One Error (abbreviated as Oe) evaluates the average fraction that the label ranking—first in prediction—is actually the irrelevant label (see Formula (34)). The smaller the value of One Error, the better the performance of the algorithm.

$O e = \frac{1}{n} \sum_{i = 1}^{n} [[(\underset{l_{j}}{arg min r a n k_{i} (l_{j})}) \notin Y_{i}]]$

(34)

where $[[\cdot]]$ equals 1 if the condition holds and equals 0 otherwise. The operator $r a n k_{i} (l_{j})$ denotes the ranking position in ascending order for the j-th label on the i-th instance.
(4): Coverage (abbreviated as Cvg) evaluates the average fraction for inclusion of all ground-truth labels in the ranking of label predictions (see Formula (35)). The smaller the value of Coverage, the better the performance of the algorithm.

$C v g = \frac{1}{n} \sum_{i = 1}^{n} max_{l_{j} \in Y_{i}} r a n k_{i} (l_{j}) - 1$

(35)

where $r a n k_{i} (l_{j})$ denotes the ranking position in ascending order for the j-th label on the i-th instance.
(5): Average Precision (abbreviated as Ap) evaluates the average precision of the actual relevant label rankings before relevant labels are examined by the label predictions (see Formula (36)). The larger the value of Average Precision, the better the performance of the algorithm.

$A p = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{C a r d (Y_{i})} \sum_{l_{j} \in Y_{i}} \frac{C a r d ({l_{s} \in Y_{i} |r a n k_{i} (l_{s}) ⩽ r a n k_{i} (l_{j})})}{r a n k_{i} (l_{j})}$

(36)

where $C a r d (\cdot)$ is the set cardinality.

4.3. Experimental Settings

We examined whether IFTWLE gained superiority over state-of-the-art algorithms learnt on logical labels only. For this reason, we compared IFTWLE with LLSF, multi-label k-nearest neighbour (MLkNN), multi-label learning with label-specific features (LIFT), MLTSVM, multi-label learning with global and local label correlation (Glocal), and active k-label set ensembles (ACkEL). Detailed settings are as follows.

LLSF (code available at https://jiunhwang.github.io/ (accessed on 1 March 2022)) [9]: This method learns label-specific feature representations for all labels based on logical labels. It shares an identical structure with $f_{1} (\cdot)$ . Comparing this method, we could examine the contributions of label enhancement. Parameters $δ$ , $η$ are tuned in ${2^{- 10}, 2^{- 9}, \dots, 2^{9}, 2^{10}}$ . The calibrated threshold $τ_{1}$ is fixed as 0.5.
MLkNN (code available at http://www.lamda.nju.edu.cn/code_MLkNN.ashx (accessed on 1 March 2022)) [48]: It learns a conditional probability distribution on all features within the adapted k-neighborhood. The introduction of the neighborhood has some similarities with the components in the non-membership function $ν (x_{i})$ . However, we took one further step and enhanced the results of the uncertain instances. The value k took the empirical value 10.
LIFT (code available at http://cse.seu.edu.cn/PersonalPage/zhangml/index.htm (accessed on 1 March 2022)) [40]: It learned different feature representations to determine the label association. It was the first trial of label-specific learning for multi-label classification. By comparing with this method, we could verify whether label enhancement improves label-specific learning. The ratio parameter is searched in ${0.1, 0.2, \dots, 0.5}$ .
MLTSVM (code available at http://www.optimal-group.org/Resource/MLTSVM.html (accessed on 1 March 2022)) [11]: It learns distance difference based on multiple nonparallel hyperplanes. The twin support vector machine is a variant of the support vector machine; we trained the enhanced model by a linear support vector machine. The penalty and kernel parameter were searched in ${2^{- 6}, 2^{- 5}, \dots, 2^{5}, 2^{6}}$ and ${2^{- 4}, 2^{- 3}, \dots, 2^{3}, 2^{4}}$ , respectively.
Glocal (code available at http://www.lamda.nju.edu.cn/code_Glocal.ashx (accessed on 1 March 2022)) [49]: It learns a mapping from the feature space to latent labels via a low-rank decomposition. It is the initial attempt to simultaneously leverage both global and local label correlations. The similarity is that we also consider global label correlation in a pairwise fashion. By comparing with this work, we can examine whether label importance is superior to local label correlation. The penalty $λ$ takes the empirical value 1.
ACkEL (code available at https://github.com/xuwangfmc/AkEL (accessed on 1 March 2022)) [50]: It takes an ensemble strategy on the k-label set optimized by class separability and class uncertainty simultaneously. It is a revised version of the classic k-label set algorithm, which is assumed to gain a robust performance. By comparing with this work, we can examine whether the strategy, combining the pairwise label correlation and label importance, could gain superiority over a high-order label correlation. A one-versus-all multi-class strategy was conducted on each label set; parameters $σ$ , $β$ were searched in ${2^{- 3}, 2^{- 2}, \dots, 2^{10}, 2^{11}}$ and ${0.1, 0.3, 0.5, 0.7, 0.9}$ , respectively. The size for each label set was fixed as 3.
Proposed method. There are three groups of parameters. The parameters in constructing $f_{1} (\cdot)$ and $f_{2} (\cdot)$ take the recommended settings, as declared in [9] and [16]. For an intuitionistic fuzzy membership assignment, all components were automatically determined by data characteristics except for neighborhood radius r. It was searched in ${0.01, 0.1}$ via five-fold cross-validation.

We considered five evaluations [47] including Hamming Loss, One Error, Coverage, Ranking Loss, and Average Precision. Except for Average Precision, which obtains better performance if the metric becomes larger, the others achieve better performances if the metrics become smaller. The experiments were implemented using Matlab R2017b on a desktop PC with an Intel(R) Core i7 processor (2.60 GHz) and 8 GB of RAM. All parameters were selected via five-fold cross-validation.

4.4. Results

We evaluated the classification performances, considering algorithms on five evaluation metrics; we report them in Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9. The down arrow ↓ means the smaller the metrics is, the better the algorithm performance becomes. In contrast, the up arrow ↑ means the larger the metrics is, the better the algorithm performance becomes.

From the metric view, IFTWLE ranks in first place in 60% of cases (

\frac{3}{5}

) and in second place in 40% of cases. From the dataset view, IFTWLE ranks in first place in 42.5% of cases (

\frac{17}{40}

) and in second place in 17.5% of cases (

\frac{7}{40}

). It receives the best performance on the Coverage metric (with first place in 87.5% cases) and worst on the One Error metric (with second place in 75% of cases).

The Friedman test [51] was employed to calculate the relative performances among multiple algorithms over selected datasets. Given k comparing algorithms and N datasets, let

R a n k_{j} = (1 /N) \sum_{i = 1}^{N} r_{i}^{j}

denote the average rank for the j-th algorithm. With the null hypothesis (i.e.,

H_{0}

) of all algorithms obtaining identical performance, the Friedman statistic

F_{F}

is distributed according to the F-distribution with the

k - 1

degree of freedom as the numerator and

(k - 1) (N - 1)

degree of freedom as the denominator, denoted as:

F_{F} = \frac{(N - 1) χ_{F}^{2}}{N (k - 1) - χ_{F}^{2}}

(37)

where

χ_{F}^{2} = \frac{12 N}{k (k + 1)} [\sum_{j} R a n k_{j}^{2} - \frac{k {(k + 1)}^{2}}{4}]

(38)

Table 3 presents the Friedman statistics

F_{F}

and the corresponding critical value for all evaluation metrics in this setting. The results clearly support that, at the significance level,

α = 0.05

, the null hypothesis (i.e.,

H_{0}

) of the statistically indistinguishable performance of all algorithms on the considered metrics is rejected. It implies that it is feasible to examine if IFTWLE gains statistical superiority over other comparing algorithms by conducting a post hoc test, such as the Holm [51].

Furthermore, by regarding IFTWLE as the control algorithm, we employed the Holm procedure [51] to explore whether IFTWLE gains significant a performance difference against each of the considered algorithms. Without losing generality, we nominate

A_{1}

as IFTWLE. For the other

k - 1

comparing algorithms (i.e.,

A_{j} (2 ⩽ j ⩽ k)

), we stipulate

A_{j}

as the one that has the

j - 1

-th largest average ranking over all datasets on a specific evaluation metric. Consequently, we have the test statistic for comparing

A_{1}

(i.e., IFTWLE) with

A_{j}

as:

z_{j} = (R a n k_{1} - R a n k_{j}) / \sqrt{\frac{k (k + 1)}{6 N}} (2 \leq j \leq k)

(39)

Let

p_{j}

denote the p-value of

z_{j}

under a normal distribution. Given the significance level

α = 0.05

, the Holm procedure works in a stepwise manner by checking whether the statistics

p_{j}

are smaller than

α /(k - j + 1)

in ascending order of j. Specifically, the Holm procedure continues until there exists a

j^{*}

-th step, and

j^{*}

denotes the first j, such that

p_{j} ⩾ α /(k - j + 1)

holds (If

p_{j} < α /(k - j + 1)

holds for all j,

j^{*}

takes the value of

k + 1

).

It can be seen in Table 4, Table 5, Table 6, Table 7 and Table 8 that IFTWLE statistically outperforms ACkEL on the Ranking Loss, Coverage, and Average Precision metrics, and statistically outperform MLTSVM on the Ranking Loss and Coverage metrics. IFTWLE achieves the most dominance on Coverage, which is statistically superior over all algorithms except MLkNN and LLSF. By finding instances with the most uncertain labels, it is more likely to revise a large proportion of misclassified labels, gaining more improvements in the Coverage metric. However, this strategy does not discriminate concerning the relative importance of labels in different instances, which lead to limited improvements in the One Error and Average Precision metrics.

5. Conclusions

This paper proposes a novel model called IFTWLE to deal with multi-label classification. Unlike conventional multi-label learning algorithms, which learn models on either logical labels or numerical labels, we integrated the two forms of labels under the three-way decisions umbrella by exploring classification uncertainty. The intuitionistic fuzzy set provides an insightful view into quantifying the label level uncertainty and it determines the uncertain instances in a group decision-making fashion. Comparisons on benchmarks demonstrate that IFTWLE significantly improves classification performance.

In the future, we will examine more combinations of label-specific algorithms and label enhancement algorithms to see whether some guidelines exist. Meanwhile, we will develop advanced instance selection principles by resorting to the optimization theory.

Author Contributions

Conceptualization, formal analysis, writing—original draft preparation, T.Z.; methodology, software, validation, writing—review and editing, Y.Z.; resources, supervision, funding acquisition, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China grant number 61976158, 61976160, 62076182, 62163016, 62006172, 61906137, and is also partially supported by the Jiangxi ”Double Thousand Plan”, grant number 20212ACB202001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, M.L.; Zhou, Z.H. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 2014, 26, 1819–1837. [Google Scholar] [CrossRef]
Gibaja, E.; Ventura, S. A tutorial on multilabel learning. ACM Comput. Surv. 2015, 47, 1–38. [Google Scholar] [CrossRef]
Liu, W.W.; Shen, X.B.; Wang, H.B.; Tsang, I.W. The emerging trends of multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, in press. [Google Scholar] [CrossRef] [PubMed]
Tabatabaei, S.M.; Dick, S.; Xu, W.S. Toward non-intrusive load monitoring via multi-label classification. IEEE Trans. Smart Grid. 2017, 8, 26–40. [Google Scholar] [CrossRef]
Fu, H.Z.; Cheng, J.; Xu, Y.W.; Wong, D.W.K.; Liu, J.; Cao, X.C. Joint optic disc and cup segmentation based on multi-label deep network and polar transformation. IEEE Trans. Med. Imag. 2018, 37, 1597–1605. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wei, Y.C.; Xia, W.; Lin, M.; Huang, J.S.; Ni, B.B.; Dong, J.; Zhao, Y.; Yan, S.C. HCP: A flexible cnn framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1901–1907. [Google Scholar] [CrossRef] [Green Version]
Boutell, M.R.; Luo, J.; Shen, X.; Brown, C.M. Learning multi-label scene classification. Pattern Recog. 2004, 37, 1757–1771. [Google Scholar] [CrossRef] [Green Version]
Tsoumakas, G.; Vlahavas, I. Random k-labelsets: An ensemble method for multilabel classification. Lect. Notes Artif. Intell. 2007, 4701, 406–417. [Google Scholar]
Huang, J.; Li, G.R.; Huang, Q.M.; Wu, X.D. Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans. Knowl. Data Eng. 2016, 28, 3309–3323. [Google Scholar] [CrossRef]
Wu, Q.Y.; Tan, M.K.; Song, H.J.; Chen, J.; Ng, M.K. ML-FOREST: A multi-label tree ensemble method for multi-label classification. IEEE Trans. Knowl. Data Eng. 2016, 28, 2665–2680. [Google Scholar] [CrossRef]
Chen, W.J.; Shao, Y.H.; Li, C.N.; Deng, N.Y. MLTSVM: A novel twin support vector machine to multi-label learning. Pattern Recog. 2016, 52, 61–74. [Google Scholar] [CrossRef]
Geng, X. Label distribution learning. IEEE Trans. Knowl. Data Eng. 2016, 28, 1734–1748. [Google Scholar] [CrossRef] [Green Version]
Tao, A.; Xu, N.; Geng, X. Labeling information enhancement for multi-label learning with low-rank subspace. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, Nanjing, China, 28–31 August 2018; pp. 671–683. [Google Scholar]
Zhang, M.L.; Zhang, Q.W.; Fang, J.P.; Li, Y.K.; Geng, X. Leveraging implicit relative labeling importance information for effective multi-label learning. IEEE Trans. Knowl. Data Eng. 2021, 33, 2057–2070. [Google Scholar] [CrossRef]
Xu, N.; Liu, Y.P.; Geng, X. Label enhancement for label distribution learning. IEEE Trans. Knowl. Data Eng. 2021, 32, 1632–1643. [Google Scholar] [CrossRef]
Shao, R.F.; Xu, N.; Geng, X. Multi-label learning with label enhancement. In Proceedings of the International Conference on Data Mining, Singapore, 17–20 November 2018; pp. 437–446. [Google Scholar]
Yao, Y.Y. Three-way decision: An interpretation of rules in rough set theory. In Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology, Gold Coast, Australia, 14–16 July 2009; pp. 642–649. [Google Scholar]
Yao, Y.Y. Three-way decision and granular computing. Int. J. Approx. Reason. 2018, 103, 107–123. [Google Scholar] [CrossRef]
Yao, Y.Y. Tri-level thinking: Models of three-way decision. Int. J. Mach. Learn. Cybern. 2020, 11, 947–959. [Google Scholar] [CrossRef]
Yao, Y.Y. Three-way granular computing, rough sets, and formal concept analysis. Int. J. Approx. Reason. 2020, 116, 106–125. [Google Scholar] [CrossRef]
Lang, G.M.; Miao, D.Q.; Fujita, H. Three-way group conflict analysis based on pythagorean fuzzy set theory. IEEE Trans. Fuzzy Syst. 2020, 28, 447–461. [Google Scholar] [CrossRef]
Zhan, J.M.; Jiang, H.B.; Yao, Y.Y. Three-way multiattribute decision-making based on outranking relations. IEEE Trans. Fuzzy Syst. 2021, 29, 2844–2858. [Google Scholar] [CrossRef]
Zhang, X.Y.; Gou, H.Y.; Lv, Z.Y.; Miao, D.Q. Double-quantitative distance measurement and classification learning based on the tri-level granular structure of neighborhood system. Knowl.-Based Syst. 2021, 217, 106799. [Google Scholar] [CrossRef]
Jiang, C.M.; Guo, D.D.; Sun, L.J. Effectiveness measure for TAO model of three-way decisions with interval set. J. Intell. Syst. 2021, 40, 11071–11084. [Google Scholar] [CrossRef]
Yang, J.L.; Yao, Y.Y.; Zhang, X.Y. A model of three-way approximation of intuitionistic fuzzy sets. Int. J. Mach. Learn. Cybern. 2022, 13, 163–174. [Google Scholar] [CrossRef]
Guo, D.D.; Jiang, C.M.; Wu, P. Three-way decision based on confidence level change in rough set. Int. J. Approx. Reason. 2022, 143, 57–77. [Google Scholar] [CrossRef]
Huang, X.F.; Zhan, J.M.; Ding, W.P.; Pedrycz, W. An error correction prediction model based on three-way decision and ensemble learning. Int. J. Approx. Reason. 2022, 146, 21–46. [Google Scholar] [CrossRef]
Zhang, Y.J.; Miao, D.Q.; Zhang, Z.F.; Xu, J.F.; Luo, S. A three-way selective ensemble model for multi-label classification. Int. J. Approx. Reason. 2018, 103, 394–413. [Google Scholar] [CrossRef]
Ren, F.J.; Wang, L. Sentiment analysis of text based on three-way decisions. J. Intell. Fuzzy Syst. 2017, 33, 245–254. [Google Scholar] [CrossRef]
Zhang, Y.J.; Miao, D.Q.; Pedrycz, W.; Zhao, T.N.; Xu, J.F.; Yu, Y. Granular structure-based incremental updating for multi-label classification. Knowl.-Based Syst. 2020, 189, 105066. [Google Scholar] [CrossRef]
Zhang, Y.J.; Zhao, T.N.; Miao, D.Q.; Pedrycz, W. Granular multilabel batch active learning with pairwise label correlation. IEEE Trans. Syst. Man Cybern.-Syst. 2022, 52, 3079–3091. [Google Scholar] [CrossRef]
Qian, W.B.; Huang, J.T.; Wang, Y.L.; Xie, Y.H. Label distribution feature selection for multi-label classification with rough set. Int. J. Approx. Reason. 2021, 128, 32–55. [Google Scholar] [CrossRef]
Kongsorot, Y.; Horata, P.; Musikawan, P.; Sunat, K. Kernel extreme learning machine based on fuzzy set theory for multi-label classification. Int. J. Mach. Learn. Cybern. 2019, 10, 979–989. [Google Scholar] [CrossRef]
Yuichi, O.; Naoki, M.; Yusuke, N.; Hisao, I. Multiobjective fuzzy genetics-based machine learning for multi-label classification. In Proceedings of the IEEE International Conference on Fuzzy Systems, Glasgow, UK, 19–24 July 2020. [Google Scholar] [CrossRef]
Che, X.Y.; Chen, D.G.; Mi, J.S. Feature distribution-based label correlation in multi-label classification. Int. J. Mach. Learn. Cybern. 2021, 12, 1705–1719. [Google Scholar] [CrossRef]
Xiao, F.Y. A distance measure for intuitionistic fuzzy sets and its application to pattern classification problems. IEEE Trans. Syst. Man Cybern.-Syst. 2021, 51, 3980–3992. [Google Scholar] [CrossRef]
Tian, Y.; Sun, M.; Deng, Z.B.; Luo, J.; Li, Y.Q. A new fuzzy set and nonkernel svm approach for mislabeled binary classification with applications. IEEE Trans. Fuzzy Syst. 2017, 25, 1536–1545. [Google Scholar] [CrossRef]
Tian, Y.; Deng, Z.B.; Luo, J.; Li, Y.Q. An intuitionistic fuzzy set based (SVM)-V-3 model for binary classification with mislabeled information. Fuzzy Optim. Decis. Mak. 2018, 17, 475–494. [Google Scholar] [CrossRef]
Rezvani, S.; Wang, X.Z.; Pourpanah, F. Intuitionistic fuzzy twin support vector machines. IEEE Trans. Fuzzy Syst. 2019, 27, 2140–2151. [Google Scholar] [CrossRef]
Zhang, M.L.; Wu, L. LIFT: Multi-label learning with label-specific features. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 107–120. [Google Scholar] [CrossRef] [Green Version]
Guo, Y.M.; Chung, F.L.; Li, G.Z.; Wang, J.C.; Gee, J.C. Leveraging label-specific discriminant mapping features for multi-label learning. ACM Trans. Knowl. Discov. Data 2019, 13, 24. [Google Scholar] [CrossRef]
Yu, Z.B.; Zhang, M.L. Multi-label classification with label-specific feature generation: A wrapped approach. IEEE Trans. Pattern Anal. Mach. Intell. 2021, in press. [Google Scholar] [CrossRef]
Jia, X.Y.; Lu, Y.N.; Zhang, F.W. Label enhancement by maintaining positive and negative label relation. IEEE Trans. Knowl. Data Eng. 2021, in press. [Google Scholar] [CrossRef]
Zheng, Q.H.; Zhu, J.H.; Tang, H.Y.; Liu, X.Y.; Li, Z.Y.; Lu, H.M. Generalized label enhancement with sample correlations. IEEE Trans. Knowl. Data Eng. 2021, in press. [Google Scholar] [CrossRef]
Ha, M.H.; Wang, C.; Chen, J.Q. The support vector machine based on intuitionistic fuzzy number and kernel function. Soft Comput. 2013, 17, 635–641. [Google Scholar] [CrossRef]
Zhang, M.L.; Li, Y.K.; Yang, H.; Liu, X.Y. Towards class-imbalance aware multi-label learning. IEEE Trans. Cybern. 2020, in press. [Google Scholar] [CrossRef] [PubMed]
Schapire, R.; Singer, Y. A boosting-based system for text categorization. Mach. Learn. 2000, 39, 135–168. [Google Scholar] [CrossRef] [Green Version]
Zhang, M.L.; Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recog. 2007, 40, 2038–2048. [Google Scholar] [CrossRef] [Green Version]
Zhu, Y.; Kwok, J.T.; Zhou, Z.H. Multi-label learning with global and local label correlation. IEEE Trans. Knowl. Data Eng. 2018, 30, 1081–1094. [Google Scholar] [CrossRef] [Green Version]
Wang, R.; Kwong, S.; Wang, X.; Jia, Y.H. Active K-Labelsets Ensemble Multi-Label Classification. Pattern Recog. 2021, 109, 107583. [Google Scholar] [CrossRef]
Demsar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]

Figure 1. Pipeline of IFTWLE. It follows the trisecting–acting–outcome framework.

Figure 2. Illustration for the instance selection principle in IFTWLE: we used six instances to explain our processing. After employing

f_{1} (\cdot)

generated by label-specific learning, we employed an intuitionistic fuzzy number for each label (

(μ_{a} (x_{i}), ν_{a} (x_{i}))

for

l_{a}

for example). The processed red and black circles refer to the instances that are classified as positive and negative class with limited uncertainty, and the green triangles are recognized as candidate uncertain instances. The final three uncertain instances (represented by hollow triangles) are denoted as

X_{2}^{(μ, ν)}

, which conducts label enhancement afterwards.

Figure 2. Illustration for the instance selection principle in IFTWLE: we used six instances to explain our processing. After employing

f_{1} (\cdot)

generated by label-specific learning, we employed an intuitionistic fuzzy number for each label (

(μ_{a} (x_{i}), ν_{a} (x_{i}))

for

l_{a}

for example). The processed red and black circles refer to the instances that are classified as positive and negative class with limited uncertainty, and the green triangles are recognized as candidate uncertain instances. The final three uncertain instances (represented by hollow triangles) are denoted as

X_{2}^{(μ, ν)}

, which conducts label enhancement afterwards.

Figure 3. Computing

μ_{c} (x_{i})

: The red blocks and black circles represent the pseudo-positive and pseudo-negative w.r.t. label

l_{c}

. The red four-angle star and black diamond represent the class center of pseudo-positive and pseudo-negative instances on label

l_{c}

. The blue hexagon represents the center for all included instances on label

l_{c}

. Instances

x_{A}

and

x_{C}

are two randomly selected instances with the pseudo-positive label on

l_{c}

, whereas instances

x_{B}

and

x_{D}

are two randomly selected instances with the pseudo-negative label on

l_{c}

. The blue, red, and black lines that connect the instances with the blue hexagon, red four-angle star, and black diamond represent the distances of instances to the instance center, pseudo-positive class center, and pseudo-negative class center, respectively.

Figure 3. Computing

μ_{c} (x_{i})

: The red blocks and black circles represent the pseudo-positive and pseudo-negative w.r.t. label

l_{c}

. The red four-angle star and black diamond represent the class center of pseudo-positive and pseudo-negative instances on label

l_{c}

. The blue hexagon represents the center for all included instances on label

l_{c}

. Instances

x_{A}

and

x_{C}

are two randomly selected instances with the pseudo-positive label on

l_{c}

, whereas instances

x_{B}

and

x_{D}

are two randomly selected instances with the pseudo-negative label on

l_{c}

. The blue, red, and black lines that connect the instances with the blue hexagon, red four-angle star, and black diamond represent the distances of instances to the instance center, pseudo-positive class center, and pseudo-negative class center, respectively.

Figure 4. Selection of uncertain candidates on label

l_{c}

: The red blocks and black circles represent the instances with pseudo-positive and pseudo-negative on label

l_{c}

. The red four-angle star and black diamond represent the class center of pseudo-positive and pseudo-negative instances on label

l_{c}

, respectively. The blue hexagon represents the center for all included instances on label

l_{c}

. The purple circle represents the region of the neighborhood. Based on (29),

x_{B}

and

x_{C}

will be selected (i.e.,

x_{B}, x_{C} \in X_{2}^{(μ_{c}, ν_{c})}

).

Figure 4. Selection of uncertain candidates on label

l_{c}

: The red blocks and black circles represent the instances with pseudo-positive and pseudo-negative on label

l_{c}

. The red four-angle star and black diamond represent the class center of pseudo-positive and pseudo-negative instances on label

l_{c}

, respectively. The blue hexagon represents the center for all included instances on label

l_{c}

. The purple circle represents the region of the neighborhood. Based on (29),

x_{B}

and

x_{C}

will be selected (i.e.,

x_{B}, x_{C} \in X_{2}^{(μ_{c}, ν_{c})}

).

Figure 5. Comparison of each algorithm on the Hamming Loss metric. The average rankings of algorithms on this metric: LIFT (2.625) > IFTWLE (3.4375) > ACkEL (3.6250) > MLTSVM (3.9375) > MLkNN (3.8125) > LLSF (5.0000) > Glocal (5.5625).

Figure 6. Comparison of each algorithm on the Ranking Loss metric. The average rankings of algorithms on this metric: IFTWLE (2.0000) > MLkNN (3.1250) = LIFT (3.1250) > Glocal (3.5625) > LLSF (3.6875) > MLTSVM (5.8750) > ACkEL (6.6250).

Figure 7. Comparison of each algorithm on the One Error metric. The average rankings of algorithms on this metric: MLTSVM (1.2500) > LIFT (2.6250) > IFTWLE (3.6250) > Glocal (4.3750)>MLkNN (4.5000) > LLSF (4.6250) > ACkEL (7.0000).

Figure 8. Comparison of each algorithm on the Coverage metric. The average rankings of algorithms on this metric: IFTWLE (1.1875) > LLSF (1.8125) > MLkNN (3.0000) > ACkEL (4.7500) > LIFT (5.0000) > Glocal (5.8750) > MLTSVM (6.3750).

Figure 9. Comparison of each algorithm on the Average Precision metric. The average rank of algorithms on this metric: IFTWLE (2.4375) > LIFT (3.2500) = Glocal (3.2500) > MLkNN (3.8125) > LLSF (4.0625) > MLTSVM (4.4375) > ACkEL (6.7500).

Table 1. Notation of IFTWLE.

Notation	Mathematical Meanings
$C a r d (\cdot)$	set cardinality
$X_{1}$	multi-label instances set
$Y_{1}$	logical label set
$X_{2}$	unseen instances set
$x_{i}$	an instance
$y_{i}$	logical label set of the instance $x_{i}$
${\hat{y}}_{i}$	pseudo-label set of the instance $x_{i}$ learnt by $f_{1}$
$u_{i}$	numerical label set of the instance $x_{i}$
$μ_{c} (x_{i})$	membership degree of $x_{i}$ on label $l_{c}$
$ν_{c} (x_{i})$	non-membership degree of $x_{i}$ on label $l_{c}$
$π_{c} (x_{i})$	hesitation degree of $x_{i}$ on label $l_{c}$
$X_{2}^{(μ, ν)}$	uncertain instances set on all labels
$\neg X_{2}^{(μ, ν)}$	certain instances set on all labels
$f_{1} (\cdot)$	function of logical label-based learning
$f_{2} (\cdot)$	function of numerical label-based learning
${\hat{Y}}_{2}^{*}$	final predicted multi-label set of $X_{2}$
$l_{c}$	label c
$Y_{2}^{(μ, ν)}$	the label set of $X_{2}^{(μ, ν)}$ learnt by $f_{1}$
$\neg Y_{2}^{(μ, ν)}$	the label set of $\neg X_{2}^{(μ, ν)}$ learnt by $f_{1}$
$X_{2}^{(μ_{c}, ν_{c})}$	uncertain instance set on label $l_{c}$
$\neg X_{2}^{(μ_{c}, ν_{c})}$	certain instance set on label $l_{c}$
${\hat{y}}_{i c}$	pseudo-label of instance $x_{i}$ on label $l_{c}$ learnt by $f_{1}$
$ϕ_{c} (x_{i})$	high-dimensional representation of instance $x_{i}$ gives a label-specific feature on label $l_{c}$
${\hat{C}}_{c}^{+}$	class center of the pseudo-positive class on label $l_{c}$
${\hat{C}}_{c}^{-}$	class center of the pseudo-negative class on label $l_{c}$
$D (\cdot, \cdot)$	Euclidean distance
$r_{c}^{+}$	radius of the pseudo-positive class on label $l_{c}$
$r_{c}^{-}$	radius of the pseudo-negative class on label $l_{c}$
$x_{i}^{c}$	label-specific feature of instance $x_{i}$ on label $l_{c}$
$p_{c}^{+}$	pseudo-positive instance weight on label $l_{c}$
$p_{c}^{-}$	pseudo-negative instance weight on label $l_{c}$
$ρ_{c}$	weighted neighborhood difference of $x_{i}$ on label $l_{c}$
r	instance neighborhood size measured by Euclidean distance $D (\cdot, \cdot)$
$N_{c} {(x_{i})}^{+}$	pseudo-positive instance count in the neighborhood of $x_{i}$ on label $l_{c}$
$N_{c} {(x_{i})}^{-}$	pseudo-negative instance count in the neighborhood of $x_{i}$ on label $l_{c}$
$U C_{i}$	the number of times that instance $x_{i}$ is regarded as the uncertain instance on all label sets
$\bar{U C_{j}}$	mean of times that instance $x_{j}$ is regarded as the uncertain instance on all label sets

Table 2. Characteristics of data sets.

Data Set	# Instances	# Features	# Labels	# Cardinality	Domain
birds	645	260	19	1.014	audio
emotions	593	72	6	1.869	music
enron	1702	1001	53	3.378	text
genbase	662	1185	27	1.252	biology
languagelog	1460	1004	75	1.18	text
medical	978	1449	45	1.245	text
scene	2407	294	6	1.074	image
yeast	2417	103	14	4.237	biology

Table 3. Summary of the Friedman statistics

F_{F}

(

k = 7

,

N = 8

) and critical values in terms of each evaluation measure (k:# comparing algorithms; N:# data sets).

Table 3. Summary of the Friedman statistics

F_{F}

(

k = 7

,

N = 8

) and critical values in terms of each evaluation measure (k:# comparing algorithms; N:# data sets).

Metric	$F_{F}$	Critical Value ( $α = 0.05$ )
Hamming Loss	9.991071	2.3239
Ranking Loss	27.816964
One Error	33.214286
Coverage	41.852679
Average Precision	19.473214

Table 4. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significant level

α = 0.05

on the Hamming Loss metric.

Table 4. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significant level

α = 0.05

on the Hamming Loss metric.

j	Algorithm	$z_{j}$	p	Holm
2	Glocal	−1.9674	0.0491	0.008
3	LLSF	−1.4466	0.1480	0.010
4	MLTSVM	−0.4629	0.6434	0.013
5	MLkNN	−0.3472	0.7284	0.017
6	ACkEL	−0.1736	0.8622	0.025
7	LIFT	0.7522	1.0000	0.050

Table 5. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level

α = 0.05

on the Ranking Loss metric. Algorithms that are statistically inferior to IFTWLE are in bold.

Table 5. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level

α = 0.05

on the Ranking Loss metric. Algorithms that are statistically inferior to IFTWLE are in bold.

j	Algorithm	$z_{j}$	p	Holm
2	AC $k$ EL	−4.281918	0.000019	0.008
3	MLTSVM	−3.587553	0.000334	0.010
4	LLSF	−1.562321	0.118212	0.013
5	Glocal	−1.446594	0.148011	0.017
6	MLkNN	−1.041548	0.297621	0.025
7	LIFT	−1.041548	0.297621	0.050

Table 6. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level

α = 0.05

on the One Error metric. Algorithms that are statistically inferior to IFTWLE are in bold.

Table 6. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level

α = 0.05

on the One Error metric. Algorithms that are statistically inferior to IFTWLE are in bold.

j	Algorithm	$z_{j}$	p	Holm
2	ACkEL	−3.1246	0.0018	0.008
3	LLSF	−0.9258	0.3545	0.010
4	MLkNN	−0.8101	0.4179	0.013
5	Glocal	−0.6944	0.4874	0.017
6	LIFT	0.9258	1.0000	0.025
7	MLTSVM	2.1988	1.0000	0.050

Table 7. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level

α = 0.05

on the Coverage metric. Algorithms that are statistically inferior to IFTWLE are in bold.

Table 7. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level

α = 0.05

on the Coverage metric. Algorithms that are statistically inferior to IFTWLE are in bold.

j	Algorithm	$z_{j}$	p	Holm
2	MLTSVM	−4.802692	0.000002	0.008
3	Glocal	−4.339782	0.000014	0.010
4	LIFT	−3.529689	0.000416	0.013
5	ACkEL	−3.2982345	0.000973	0.017
6	MLkNN	−1.678049	0.093338	0.025
7	LLSF	−0.578638	0.562834	0.050

Table 8. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level

α = 0.05

on the Average Precision metric. Algorithms that are statistically inferior to IFTWLE are in bold.

Table 8. Comparison of IFTWLE (control algorithm) against other comparing algorithms (with the Holm procedure as the post hoc test) at the significance level

α = 0.05

on the Average Precision metric. Algorithms that are statistically inferior to IFTWLE are in bold.

j	Algorithm	$z_{j}$	p	Holm
2	ACkEL	−3.992599	0.000134	0.008
3	MLTSVM	−1.85164	0.064078	0.010
4	LLSF	−1.504458	0.132464	0.013
5	MLkNN	−1.273003	0.203017	0.017
6	LIFT	−0.752229	0.451913	0.025
7	Glocal	−0.752229	0.451913	0.050

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, T.; Zhang, Y.; Miao, D. Intuitionistic Fuzzy-Based Three-Way Label Enhancement for Multi-Label Classification. Mathematics 2022, 10, 1847. https://doi.org/10.3390/math10111847

AMA Style

Zhao T, Zhang Y, Miao D. Intuitionistic Fuzzy-Based Three-Way Label Enhancement for Multi-Label Classification. Mathematics. 2022; 10(11):1847. https://doi.org/10.3390/math10111847

Chicago/Turabian Style

Zhao, Tianna, Yuanjian Zhang, and Duoqian Miao. 2022. "Intuitionistic Fuzzy-Based Three-Way Label Enhancement for Multi-Label Classification" Mathematics 10, no. 11: 1847. https://doi.org/10.3390/math10111847

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intuitionistic Fuzzy-Based Three-Way Label Enhancement for Multi-Label Classification

Abstract

1. Introduction

1.1. Motivation

1.2. Contribution

2. Preliminaries

2.1. Label-Specific Feature Learning

2.2. Label Enhancement

2.3. Intuitionistic Fuzzy Set

3. Proposed Model

3.1. Notations

3.2. Problem Formulation

3.3. Basic Idea

3.4. Intuitionistic Fuzzy Membership Assignment

3.4.1. Membership Function $μ_{c} (x_{i})$

3.4.2. Non-Membership Function $ν_{c} (x_{i})$

3.5. Label-Level Instance Selection

3.6. Instance-Level Instance Selection

3.7. Complexity Analysis

4. Experiments

4.1. Dataset Characteristics

4.2. Evaluation Metrics

4.3. Experimental Settings

4.4. Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Intuitionistic Fuzzy-Based Three-Way Label Enhancement for Multi-Label Classification

Abstract

1. Introduction

1.1. Motivation

1.2. Contribution

2. Preliminaries

2.1. Label-Specific Feature Learning

2.2. Label Enhancement

2.3. Intuitionistic Fuzzy Set

3. Proposed Model

3.1. Notations

3.2. Problem Formulation

3.3. Basic Idea

3.4. Intuitionistic Fuzzy Membership Assignment

3.4.1. Membership Function μ c ( x i )

3.4.2. Non-Membership Function ν c ( x i )

3.5. Label-Level Instance Selection

3.6. Instance-Level Instance Selection

3.7. Complexity Analysis

4. Experiments

4.1. Dataset Characteristics

4.2. Evaluation Metrics

4.3. Experimental Settings

4.4. Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.4.1. Membership Function $μ_{c} (x_{i})$

3.4.2. Non-Membership Function $ν_{c} (x_{i})$