This section analyzes matrix factorization and ranking support vector machine collaborative learning methods in detail. Firstly, a new multi-label classification framework is constructed based on the matrix factorization method. Then, we propose the kernelization of a linear model.
3.1. Preliminary
In the context of a given matrix, denoted as A, its transpose is represented as . The i-th row and j-th column of A are denoted as and , respectively. The vector -norm of is represented as (or ). The matrix -norm of A is denoted as , while the Frobenius norm is represented as (or ). The trace operator for a matrix is denoted as , and the rank of a matrix is represented as . The trace norm (or nuclear norm) of A, denoted as , is calculated as or as the sum of the i-th largest singular values of A, represented as .
3.2. Robust Ranking Support Vector Machine
We commence with the fundamental linear Rank-SVM approach [
19] method for multi-label classification. For instance,
, its real-valued prediction is obtained by
, where
is the bias and
is the parameter matrix. To simplify the formulation, we can absorb
into
by appending 1 to each instance
x as an additional feature. The objective of Rank-SVM is to minimize the ranking loss while maximizing the margin. The ranking learning step can be formulated as follows:
where
(or
) represents the index set of relevant (or irrelevant) labels associated with the instance
. The notation
represents the cardinality of a set, and the tradeoff hyper-parameter
C is utilized to control the complexity of the model.
However, the performance of the Rank-SVM method is very sensitive to noise data samples and may decline sharply when the noise is high. In this paper, the robustness of feature space is used to solve the impact of noise on classification performance. Given a data matrix , where d represents the feature dimension and n denotes the number of data samples. To fully study the impact of noise on data, this paper introduces matrix factorization to solve this problem. We decompose the noisy data X into U and V, where is the basis matrix and is the coefficient matrix, and it can be considered as the new representation of X. Then, training the clean data matrix U into Rank SVM can make our model more robust and resistant to noise.
In addition, since the features selected from the matrix factorization and the multi-label classifier are independent, the low-rank representation is not optimal for the multi-label classification. This paper proposes a new multi-label classification model to improve the classification performance in high noisy data scenarios, seamlessly integrating the matrix factorization and multi-label classifier into a unified framework. Specifically, we use joint learning to combine feature selection and classifier learning, to learn each other’s parameters, and finally select the best features through joint optimization, thus realizing a joint learning framework. The robust ranking support vector machine via matrix factorization is as follows:
where
S is the similarity matrix, in general, a smaller distance
corresponds to a larger weight
, defined as
and
, which implies
and
. We introduced manifold learning into the proposed framework to maintain manifold relationships between low rank samples to enhance robust low rank representations. With this robust low rank representation, we can find robust low-order features and apply them to obtain better classifiers.
can eliminate redundancy and irrelevant information.
3.3. Kernelization
The model described in Equation (
4) is a linear multi-label classifier, limiting its effectiveness in capturing nonlinear relationships between the input and output. To address this limitation, we propose the utilization of kernel methods to develop nonlinear multi-label classifiers. The kernel function is then introduced into our proposed framework via the dual formulation of the problem in Equation (
4) using the Karush–Kuhn–Tucker theorem. Dual variables are then added to transform the constrained problem into an unconstrained problem which can be optimized using the Lagrangian function as shown below:
We now seek a saddle point of the Lagrangian, which would be the minimum for the primal variables
and the maximum for the dual variable
. To find the minimum over the primal variables we require,
Similarly, for
we require
Similarly, for
w we require
Let
be a feature mapping function that maps
x from
to a Hilbert space
H. Consequently, the optimization problem described in Equation (
4) can be reformulated as follows:
where
and
Define
to be the kernel matrix (or Gram matrix) in the RKHS. Consequently, the kernel framework is established as follows
3.4. Optimization
Since our model involves the
-norm, the problem’s orthogonal and nonnegative constraints are still challenging. Therefore, this article proposes a new and effective optimization method based on ALM to solve the problem. We aim to introduce auxiliary variables to separate constraints and keep their equivalence during optimization. Specifically, we introduce two auxiliary variables
and
and transform the objective function into the following form
where
u is the regularization parameter that determines the penalty for infeasibility,
,
and
are the Lagrangian multipliers that penalize the gap between the target and the auxiliary variable.
With the transformation, we can adopt alternative optimization to iteratively solve the problem. Specifically, we optimize the objective function with respect to one variable while fixing the remaining variables. The iteration steps are detailed as follows.
(1) Update
U: By fixing the other variables, the optimization formula for
U becomes
According to Karush–Kuhn–Tucker (KKT) condition and
=
, it can be verified that the optimal solution should be
(2) Update
V: By fixing the other variables, the optimization formula for
V becomes
Considering
, the above formula can be rewritten as
This problem is commonly referred to as the orthogonal procrustes problem, for which the global optimal solution can be obtained through the singular value decomposition of
. To be more specific, given
The following formula can update
V(3) Update
E: By fixing the other variables, the optimization formula for
E becomes
Let
,
, and we further have
To solve the above equation, we introduce the following lemma, also presented in [
28], with detailed proof.
Lemma 1. Given a matrix and a positive scalar λ, then is the optimal solution ofand the column of According to Lemma 1, the solution of the above problem is
(4) Update
Z: By fixing the other variables, the optimization formula for
Z becomes
We first introduce the soft thresholding (shrinkage) operator
where
represents the
function. By applying the shrinkage operator element-wise to the singular values of
, the optimal update for
Z can be expressed as follows:
in which
,
and
are the SVD factorization
(5) Update
S: By fixing the other variables, the optimization formula for
S can be derived as
where
, then
Considering the constraints
Taking the derivative with respect to
and setting it to zero
The
j entry of
is shown below
According to KKT conditions
(6) Update
H: By fixing the other variables, the optimization formula for
H becomes
To begin, we introduce the soft thresholding (shrinkage) operator
where
represents the
function. By applying the shrinkage operator to the singular values of
element-wisely, the optimal updation of
H is given by
in which
,
and
are the SVD factorization
(7) Update
: By fixing the other variables, the optimization formula for
becomes
Then, we can obtain the optimal solution of Equation (
37) through the general Quadratic programming solution method or SMO algorithm.
(8) Update
u,
,
and
: Finally, we need to update the ALM parameters
where the parameter
is the learning rate that controls the convergence speed. The flowchart of the proposed approach is presented in Algorithm 1.
Algorithm 1 The proposed approach. |
- 1:
Input: Training set , parameter , , - 2:
Initialize: - 3:
while not converged do - 4:
Fix others and update U by Equation ( 12) - 5:
Fix others and update V by Equation ( 16) - 6:
Fix others and update E by Equation ( 21) - 7:
Fix others and update Z by Equation ( 24) - 8:
Fix others and update S by Equation ( 32) - 9:
Fix others and update H by Equation ( 35) - 10:
Fix others and update by solving Equation ( 37) - 11:
end while - 12:
Output: The correlation matrices U, V and .
|