Next Article in Journal
Multivariate Modeling for Spatio-Temporal Radon Flux Predictions
Previous Article in Journal
Weighted Sum Secrecy Rate Maximization for Joint ITS- and IRS-Empowered System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Kernel-Free Quadratic Surface Regression for Multi-Class Classification

1
College of Mathematics and Systems Science, Xinjiang University, Urumuqi 830046, China
2
Institute of Mathematics and Physics, Xinjiang University, Urumuqi 830046, China
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(7), 1103; https://doi.org/10.3390/e25071103
Submission received: 22 May 2023 / Revised: 14 July 2023 / Accepted: 14 July 2023 / Published: 24 July 2023
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
For multi-class classification problems, a new kernel-free nonlinear classifier is presented, called the hard quadratic surface least squares regression (HQSLSR). It combines the benefits of the least squares loss function and quadratic kernel-free trick. The optimization problem of HQSLSR is convex and unconstrained, making it easy to solve. Further, to improve the generalization ability of HQSLSR, a softened version (SQSLSR) is proposed by introducing an ε -dragging technique, which can enlarge the between-class distance. The optimization problem of SQSLSR is solved by designing an alteration iteration algorithm. The convergence, interpretability and computational complexity of our methods are addressed in a theoretical analysis. The visualization results on five artificial datasets demonstrate that the obtained regression function in each category has geometric diversity and the advantage of the ε -dragging technique. Furthermore, experimental results on benchmark datasets show that our methods perform comparably to some state-of-the-art classifiers.

1. Introduction

Consider a training set:
T 1 = { ( x i , y i ) } i = 1 n ,
comprising n samples, each represented by a d-dimensional vector x i R d , and a corresponding label y i { 1 , 2 , , K } , indicating the class of sample in K classes.
For multi-class classification, one popular strategy is to encode each label using one-hot encoding. Consequently, the original training set: T 1 (1) is transformed into a new training set
T 2 = { ( x i , y i ) } i = 1 n ,
where each sample corresponds to a label vector y i = one - hot ( y i ) (Definition 3). Our goal is to find K functions f k ( x ) , k = 1 , 2 , , K that satisfy f ( x i ) y i , where f ( x i ) = ( f 1 ( x i ) , f 2 ( x i ) , , f K ( x i ) ) T for i = 1 , 2 , , n . Once these K functions are determined, a new sample x can be classified using the decision rule
g ( x ) = arg max k = 1 , 2 , K f k ( x ) .
In recent years, numerous studies have focused on the multi-class classification problem. In 1994, Imran Naseem et al. [1,2] proposed the original least square regression classifier (LSR) based on the label vectors. This method assigns input samples to the class represented by the label vector closest to the predicted vector. To improve the accuracy of LSR, Xian et al. [3] introduced the ε -dragging technique to expand the interval between different classes, creating a discriminative LSR (DLSR). Zhang et al. [4] proposed a retargeted LSR (ReLSR) which learns soft labels with large margin constraints directly from training data. Wen et al. [5] proposed an inter-class sparsity DLSR (ICS_DLSR) by introducing inter-class sparsity constraints. Wang et al. [6] proposed a relaxed group low-rank regression model (RGLRR) that incorporates sparsity consistency and graph embedding into the group low-rank regression model. Recently, scholars have proposed several methods to improve the classification accuracy of DLSR, including the margin scalable DLSR (MSDLSR) [7], the robust DLSR (RODLSR) [8], regularized label relaxation linear regression (RLRLR) [9], low-rank DLSR (LRDLSR) [10], and discriminative least squares regression based on within-class scatter minimization (WSCDLSR) [11]. To improve the classification accuracy of ReLSR, Zhang et al. [12] introduced the intra-class compactness graph into ReLSR, proposing the discriminative marginalized LSR (DMLSR). Additionally, LSR has been extended for feature selection by Zhang et al. [13] and Zhao et al. [14]. All of the above methods are linear classification models, which have less computation time but have difficulty handling nonlinearly separable data. The kernel ridge regression classifier (KRR) was proposed to address the defects previously mentioned, using the kernel trick [15,16]. However, it is challenging to select the appropriate kernel function and kernel parameter.
In 2008, the quadratic surface SVM (QSSVM) [17] was proposed to address the issue of excessive kernel parameter selection in SVM [18], utilizing a kernel-free technique. Later, Luo et al. [19] introduced the soft margin quadratic SVM (SQSSVC). Subsequently, further studies have been conducted, including classification problems [20,21,22,23], regression problems [24], clustering problems [25], and applications [26,27,28,29].
In this paper, we propose two nonlinear classification models, the hard quadratic surface least squares regression (HQSLSR) and its softened version, the soft quadratic surface least squares regression (SQSLSR). The main contributions of this work are summarized as follows:
(1) We propose a novel nonlinear model (HQSLSR), by utilizing a kernel-free trick, which avoids the difficulty of selecting the appropriate kernel functions and corresponding parameters and maintains good interpretability. Moreover, a softened version (SQSLSR) is developed, which employs the ε -dragging technique to enlarge inter-class distances so that its discriminant ability is improved further.
(2) The proposed HQSLSR yields a convex optimization problem without constraints, which can be directly solved. An alteration iteration algorithm is designed for SQLSR, which involves only the convex optimization problem and leads to quick convergence. Additionally, the computational complexity and interpretability of our methods are also discussed.
(3) In numerical experiments, the geometric intuition and advantage of the ε -dragging technique for our methods on artificial datasets are demonstrated. The experimental results over benchmark datasets exhibit that our methods achieve comparable accuracy to other nonlinear classifiers while requiring less computational time cost.
This paper is organized as follows. Section 2 briefly describes related work. Section 3 presents the proposed HQSLSR and SQSLSR models and their respective algorithms. Section 4 discusses relevant characteristics. Section 5 presents experimental results, and finally, we conclude in Section 6.

2. Related Works

In this section, following the presentation of notations, we provide a concise introduction to two fundamental approaches: least squares regression classifiers (LSR) [1] and discriminative least squares regression classifiers (DLSR) [3].

2.1. Notations

We begin by presenting the notations employed in this paper. Lowercase boldface and uppercase boldface fonts represent vectors and matrices, respectively. The vector ( 1 , 1 , , 1 ) T R n is represented by 1 n . Define the zero vector and null matrix as 0 and O , respectively. For a matrix W = ( w i j ) d × K , its i-th column is denoted as w i . In addition, we give the following three definitions.
Definition 1.
For any real symmetric matrix A = ( a i j ) d × d S d , its half-vectorization operator can be defined as follows:
hvec ( A ) = ( a 11 , a 12 , , a 1 d , a 22 , , a 2 d , , a d d ) T R d 2 + d 2 .
Definition 2.
For any vector x = ( x 1 , x 2 , , x d ) T , its quadratic vector with cross terms can be defined as follows:
lvec ( x ) = ( 1 2 x 1 2 , x 1 x 2 , , x 1 x d , 1 2 x 2 2 , x 2 x 3 , , 1 2 x d 2 ) T R d 2 + d 2 .
Definition 3.
For any given positive integer k { 1 , 2 , , K } , the one-hot encoding operator is defined as follows:
one - hot ( k ) = e k ,
where e k is the K-dimensional unit vector, with the k-th element 1.

2.2. Least Squares Regression Classifier

Given a training set T 2 (2), the goal of LSR is to find the following K linear functions:
f k ( x ) = w k T x + c k , k = 1 , 2 , , K ,
where w k R d , c k R , k = 1 , 2 , , K .
To obtain the K linear functions (4), the following optimization problem is formulated as
min W , c X T W + 1 n c T Y F 2 + λ W F 2 ,
where the sample matrix X = ( x 1 , x 2 , , x n ) R d × n is formed by all the samples in the training set T 2 (2), the label matrix Y = ( y 1 , y 2 , , y n ) T R n × K is formed by the label vectors in T 2 (2), and W = ( w 1 , w 2 , , w K ) R d × K c = ( c 1 , c 2 , c K ) T R K are formed by the normal vectors and biases of the K linear functions (4), respectively.
Clearly, the optimization problem (5) is a convex optimization problem, and its solution has the following form:
W = ( X H X T + λ I ) 1 X H Y ,   c = 1 n Y T 1 n W T X 1 n ,
where H = I 1 n 1 n 1 n T . Thus, once the solutions W , c of the optimization problem (5) is obtained, we can find the K linear functions.
For a new sample x R d , its class is obtained by the following decision function:
g ( x ) = arg max k = 1 , 2 , K w k T x + c k .

2.3. Discriminative Least Squares Regression Classifier

Xiang et al. [3] proposed the discriminative least squares regression classifier (DLSR) to improve the classification performance of LSR.
For the training set T 2 (2), we define the constant matrix B = ( B i k ) n × K as follows:
B i k = + 1 , if y i k = 1 , 1 , otherwise ,
where y i k represents the k-th component of the label vector y i of the i-th sample, the optimization problem of DLSR is formulated as follows:
min W , c , E X T W + 1 n c T Y B E F 2 + λ W F 2 , s . t . E O ,
where ⊙ is the Hadamard product of matrices. E = ( ε i k ) n × K is an ε -dragging matrix to be found, and each of its non-negative elements ε i k is called the ε -dragging factor.
It is evident that DLSR takes into account the inter-class distance based on LSR. Specifically, DLSR increases inter-class distances by introducing the ε -dragging technique, causing different classes of regression targets to move in opposite directions.

3. Kernel-Free Nonlinear Least Squares Regression Classifiers

For multi-class classification problems with the training set T 2 (2), we propose the hard quadratic surface least squares regression classifier (HQSLSR) and its softened version (SQSLSR). The relevant properties of our methods are also analyzed theoretically.

3.1. Hard Quadratic Surface Least Squares Regression Classifier

For the training set T 2 (2), we aim to find K quadratic functions as follows:
f k ( x ) = 1 2 x T A k x + b k T x + c k ,   k = 1 , 2 , , K ,
where A k S d , B k R d , c k R . If these K quadratic functions are found, the label of a new sample x is determined by the following decision rule:
g ( x ) = arg max k = 1 , 2 , K 1 2 x T A k x + b k T x + c k .
In order to find the K quadratic functions (9), we construct the following optimization problem:
min A k , b k , c k i = 1 n k = 1 K ( 1 2 x i T A k x i + b k T x i + c k y i k ) 2 + λ k = 1 K ( hvec ( A k ) 2 2 + b k 2 2 ) ,
where λ is the regularization parameter, hvec ( A k ) is a vector by Definition 1, which is constituted by the upper triangular elements of the symmetry matrix A k , and y i k indicates the k-th component of the label vector y i of the i-th sample. For the objective function (11), its first term minimizes the sum of the squares of the errors between the real and predicted label; the second term is a regularization term about the model coefficients, which aims to enhance the generalization ability of our model. It is worth noting that the upper triangular elements of the matrix A k instead of all elements are involved in the regularization term by using the symmetry of the matrix.
For convenience, by using the symmetry of the matrix A k and following Definitions 1 to 2, the first term of the objective function in the optimization problem (11) is simplified as follows:
i = 1 n k = 1 K ( 1 2 x i T A k x i + b k T x i + c k y i k ) 2 = i = 1 n k = 1 K ( hvec ( A k ) T lvec ( x i ) + b k T x i + c k y i k ) 2 = i = 1 n k = 1 K ( w k T z i + c k y i k ) 2 ,
where
w k = ( hvec ( A k ) T , b k T ) T , k = 1 , , K ,
z i = ( lvec ( x i ) T , x i T ) T , i = 1 , , n .
By Equation (13), minimizing k = 1 K ( hvec ( A k ) 2 2 + b k 2 2 ) is equivalent to minimizing k = 1 K w k 2 2 . Furthermore, combining Equation (12), the optimization problem (11) is further formulated as
min W , c J 1 ( W , c ) = Z T W + 1 n c T Y F 2 + λ W F 2 ,
where Z = ( z 1 , z 2 , , z n ) R d 2 + 3 d 2 × n , W = ( w 1 , w 2 , , w K ) R d 2 + 3 d 2 × K ,   c = ( c 1 , c 2 , , c K ) T R K .
Next, the solution of the optimization problem (15) is given by the following theorem.
Theorem 1.
The optimal solution of the optimization problem (15) is as follows
W = ( Z H Z T + λ I ) 1 Z H Y ,
c = 1 n Y T 1 n W T Z 1 n ,
where  H = I 1 n 1 n 1 n T .
Proof. 
Obviously, Formula (15) is a convex optimization problem. According to the optimality condition of the unconstrained optimization problem, we have
c J 1 ( W , c ) = W T Z 1 n + c 1 n T 1 n Y T 1 n = 0 ,
W J 1 ( W , c ) = Z Z T W + Z 1 n c T Z Y + λ W = 0 .
According to Equation (18), we obtain
c = 1 n Y T 1 n W T Z 1 n .
By substituting Equation (20) into Equation (19), we have
W = ( Z H Z T + λ I ) 1 Z H Y ,
where  H = I 1 n 1 n 1 n T .    □
After solving the optimization problem (15) from Theorem 1, W k and c k are obtained by the k-th column of matrix w and the k-th component of vector c , respectively. Then, A k and b k can be obtained by Equation (13). Therefore, the decision function in Equation (10) can be established.

3.2. Soft Quadratic Surface Least Squares Regression Classifier

In this subsection, we propose the SQSLSR by introducing the ε -dragging factor into the HQSLSR. For the training set T 2 (2), the following optimization problem is constructed:
min i = 1 n k = 1 K 1 2 x i T A k x i + b k T x i + c k ( y i k + B i k ε i k ) 2 + λ k = 1 K ( hvec ( A k ) 2 2 + b k 2 2 ) , s . t . ε i k 0 , i = 1 , 2 , , n , k = 1 , 2 , , K ,
where A k , b k , c k , ε i k , i = 1 , 2 , , n , k = 1 , 2 , , K are variables to be found, respectively. ε i k 0 is the ε -dragging factor, and the constant B i k is defined in detail in Equation (7). The distance between the label vectors of different classes is expanded by using the ε -dragging factor. Therefore, compared with the HQSLSR model, the SQSLSR model distinguishes samples from different classes more easily.
For simplicity, by defining the ε -dragging matrix E as being similar to the transformation of the optimization problem (11), the optimization problem (22) is equivalently expressed as follows:
min W , c , E J 2 ( W , c , E ) = Z T W + 1 n c T ( Y + B E ) F 2 + λ W F 2 , s . t . E O ,
where E O means that the elements of the matrix E are non-negative. To solve the optimization problem (23), we use the alternating iteration method.
First, update W and c . By fixing the dragging matrix E and letting Y ˜ = Y + B E , the optimization problem (23) is simplified as follows:
min W , c Z T W + 1 n c T Y ˜ F 2 + λ W F 2 .
Similar to the solution of the optimization problem (15), the iterative equation for the optimization problem (24) with respect to W and c is as follows:
W = ( Z H Z T + λ I ) 1 Z H Y ˜ ,
c = 1 n Y ˜ T 1 n W T Z 1 n ,
where H = I 1 n 1 n 1 n T .
Then, update the draggings matrix E . By fixing W , c and letting the residual matrix R = Z T W + 1 n c T Y , the optimization problem (23) is transformed into
min E R B E F 2 , s . t . E O .
The solution to the optimization problem (27) can be obtained by the following equation:
E = max ( B R , O ) .
Specifically, according to the definition of the Frobenius norm, solving the optimization problem (27) is equivalent to solving the following n × K subproblems:
min ε i k ( R i k B i k ε i k ) 2 , s . t . ε i k 0 , i = 1 , 2 , , n , k = 1 , 2 , , K ,
where R i k is the element of the i-th row and k-th column of the matrix R . Since B i k 2 = 1 , we have ( R i k B i k ε i k ) 2 = ( B i k R i k ε i k ) 2 . Then the solution to the optimization problem (29) is ε i k = max ( B i k R i k , 0 ) . Thus, Equation (28) is the solution to the optimization problem (27).
Through the above solution process, we briefly summarize the algorithm of the optimization problem (23) as follows:
After obtaining A k , b k , c k , k = 1 , 2 , , K by Algorithm 1, the corresponding decision function (10) can also be constructed.
Algorithm 1 SQSLSR
Input: Training set T 2 = { ( x i , y i ) x i R d , y i R K } , maximum iteration number T = 20 , parameter λ
1:
Define the matrix E W , W 0   and vector c , c 0
2:
Initialize E = O , W 0 = O , c 0 = 0
3:
Transform Z i i = 1 , 2 , , n , by (14)
4:
Construct the matrix Z = ( z 1 , z 2 , , z n ) and Y = ( y 1 , y 2 , , y n ) T
5:
Calculate  H = I 1 n 1 n 1 n T and V = ( Z H Z T + λ I ) 1 Z H
6:
for  t = 1 : T  do
7:
    Y ˜ = Y + B E
8:
   Calculate W = V Y ˜
9:
   Calculate c by (26)
10:
   Calculate E by (28)
11:
   if  W W 0 F 2 + c c 0 2 2 10 3  then
12:
     stop
13:
   end if
14:
    W 0 = W , c 0 = c
15:
end for
16:
Calculate A k , b k and c k by the inverse operation of w k = ( hvec ( A k ) T , b k T ) T , where  k = 1 , 2 , , K , W 0 = ( w 1 , w 2 , , w K ) , and c 0 = ( c 1 , c 2 , , c K ) T
Output:  A k , b k , c k , k = 1 , 2 , , K .

4. Discussion

In this section, we first discuss the convergence of Algorithm 1. Then, we discuss the computational complexities of HQSLSR and SQSLSR, respectively. Lastly, we analyze their interpretability.

4.1. Convergence Analysis

Since Algorithm 1 adopts an iterative method to solve the optimization problem (23), its convergence is discussed in this subsection.
Theorem 2.
If the sequence of iterations { W t , c t , E t } can be obtained by Algorithm 1, then the objective function J 2 ( W t , c t , E t ) of the optimization problem (23) is monotonically decreasing.
Proof. 
First, let t be the number of current iterations. Then, we define the value of the objective function of the optimization problem (23) as J 2 ( W t , c t , E t ) .
By the strong convexity of the optimization problem, given E t , W t + 1 and c t + 1 can be obtained from Equations (25) and (26), respectively, and have the following inequality:
J 2 ( W t + 1 , c t + 1 , E t ) J 2 ( W t , c t , E t ) .
Then, fixing W t + 1 and c t + 1 , E t + 1 can be obtained from Equation (28), and with the following inequality:
J 2 ( W t + 1 , c t + 1 , E t + 1 ) J 2 ( W t + 1 , c t + 1 , E t ) .
Combining the inequalities (30) and (31), we have the following inequality:
J 2 ( W t + 1 , c t + 1 , E t + 1 ) J 2 ( W t , c t , E t ) ,
Thus, the proof is complete. □

4.2. Computational Complexity

In this subsection, we provide a detailed analysis of the computational complexities of our methods. Here, n, d, and K represent the number of samples, features, and classes, respectively. From Definition 1, Definition 2, and Equation (12), it can be observed that our methods aim to transform the feature dimension of the sample from a d-dimensional space to an l = d 2 + 3 d 2 -dimensional space. For simplicity, we ignore the computational cost of addition and subtraction.
The HQSLSR classifier is solved by Equations (16) and (17), which involve matrix inversion and multiplication. Therefore, the computational complexity of the HQSLSR classifier is about O ( l 3 + n l 2 + ( n 2 + n K ) l ) .
According to Algorithm 1, we briefly analyze the computational complexity of SQSLSR. The computational complexity of SQSLSR is mainly concentrated on steps 5, 8, 9, and 10 of Algorithm 1. Step 5 involves matrix inversion and multiplication, and its computational complexity is O ( l 3 + n l 2 + n 2 l ) . Steps 8, 9, and 10 involve only matrix multiplication, so the computational complexity of each iteration is about O ( n K l + n K ) . In summary, the total computational complexity of SQSLSR is about O ( l 3 + n l 2 + n 2 l + t ( n K l + n K ) ) , where t is the number of iterations.

4.3. Interpretability

Although HQSLSR and SQSLSR are kernel-free, they can achieve the goal of nonlinear separation and retain interpretability. Therefore, we further elaborate on their interpretability.
Note that the decision functions of our methods are constructed by the separation quadratic function
h ( x ) = 1 2 x T A x + b T x + c = 1 2 i = 1 d j = 1 d a i j x i x j + i = 1 d b i x i + c ,
where x i is the i-th feature of the vector x R d , a i j is the element of the i-th row and j-th column of the symmetry matrix A S d , and b i is the i-th component of the vector b R d , c R . From the quadratic function (33), we can see that the values of b i , a i i ( i = j ) , and a i j ( i j ) determine the contributions of the first order term and the second order term of the i-th feature x i , and the cross term of x i and x j , respectively. Roughly speaking, let θ i , h ( x ) = | a i i | + | a i j | + | b i | ( j = 1 , 2 , , d , j i ) , the higher the value of θ i , h ( x ) , the more the i-th feature x i contributes to the quadratic function (33).
For K quadratic functions f k ( x ) , k = 1 , , K as shown in Equation (10), let θ i , k = θ i , f k ( x ) represents the contribution of the i-th feature to the k-th quadratic function f k ( x ) , k = 1 , , K . Let θ i = k = 1 K θ i , k , i = 1 , d . The larger θ i is, the more important the i-th feature is to the decision function (10). In particular, when θ i = 0 , the i-th feature of x does not work. Therefore, our methods have a certain interpretability.

5. Numerical Experiments

In this section, we first implement our SQSLSR and HQSLSR on five artificial datasets to show their geometric meaning and compare them with LSR and DLSR. We also carry out our SQSLSR and HQSLSR on 16 UCI benchmark datasets, and compare their accuracy with LSR, DLSR, LRDLSR, WCSDLSR, linear discriminant analysis(LDA), QSSVM, reg-LSDWPTSVM [22], SVM, and KRR. For convenience, SVMs with a linear kernel and rbf kernel are denoted by SVM-L and SVM-R, respectively. KRRs with an RBF kernel and polynomial kernel are denoted as KRR-R and KRR-P, respectively. Remarkably, on multi-class classification datasets, the SVM and QSSVM methods use the one-against-rest strategy [30]. We adopt the five-fold cross-validation to select the parameters in these methods. The regularization parameters of SQSLSR and other methods are selected from the set { 2 8 , 2 7 , , 2 8 } . The parameters of the RBF kernel and polynomial kernel are selected from the set { 2 6 , 2 4 , , 2 6 } . All numerical experiments are executed using MATLAB R2020(b) on a computer with a 2.80 GHz (I7-1165G7) CPU and 16 G available memory.

5.1. Experimental Results on Artificial Datasets

We construct five artificial datasets to demonstrate the geometric meaning of our methods and the advantage of the ε -dragging technique. Datasets I-IV are binary classifications, where each dataset contains 300 points, and each class has 150 points. Dataset V has three classifications, and each class has 20 points. As the decision functions of our proposed HQSLSR and SQSLSR methods, as well as the comparison methods LSR and DLSR, are all composed of K regression functions, we present K pairs of regression curves f k ( x ) = 0 and 1 , k = 1 , 2 to display their classification results. Here, f k ( x ) = 1 is the regression curve of the k-th class, f k ( x ) = 0 is the regression curve of samples other than class k, k = 1, 2.
The first-class samples, f 1 ( x ) = 1 and f 1 ( x ) = 0 are indicated by the blue “+”, blue line and blue dotted line, respectively. The second-class samples, f 2 ( x ) = 1 and f 2 ( x ) = 0 are represented by the red “∘”, red line and red dotted line, respectively. The accuracy of each method on the artificial dataset is shown in the top right corner.
The artificial dataset I is linearly separable. Figure 1 shows the results of the four methods, including LSR, DLSR, HQSLSR, and SQSLSR. It can be observed that f 1 ( x ) = 1 and f 2 ( x ) = 0 coincide; f 2 ( x ) = 1 and f 1 ( x ) = 0 coincide too. The samples of each class come close to the corresponding regression curve, and stay away from the regression curves of the other classes. In addition, the four methods can correctly classify the samples on this linear separable artificial dataset I.
As shown in Figure 2, the artificial dataset II includes some intersecting samples. Our methods outperform LSR and DLSR in terms of classification accuracy, because our HQSLSR and SQSLSR can obtain two pairs of regression curves, while LSR and DLSR can only obtain two pairs of straight regression lines. It is worth noting that the accuracy of SQSLSR is slightly higher than that of HQSLSR, because the SQSLSR uses the ε -dragging technique to relax the binary labels into continuous real values, which enlarges the distances between different classes and makes the discrimination better.
Figure 3 shows the visualization results of the artificial dataset III, which is sampled from two parabolas. Note that our HQSLSR and SQSLSR can obtain parabolic-type regression curves while LSR and DLSR can only obtain straight regression lines, so our methods are more suitable for this nonlinearly separable dataset.
The results of the artificial dataset IV are shown in Figure 4. The nonlinearly separable dataset IV is obtained by sampling from two concentric circles. Obviously, our HQSLSR and SQSLSR have higher accuracy for this classification task, as shown in Figure 4. However, from the first two subfigures, it is not difficult to find that samples of these two classes are far away from their respective regression curves, resulting in poor results of LSR and DLSR. Note that f 1 ( x ) = 0 and f 2 ( x ) = 1 coincide and lie at the center of the concentric circles, which are not easy to observe. Thus we only display f 1 ( x ) = 0.1 and f 2 ( x ) = 0.9 , as shown in last two subfigures.
We conducted experiments on the artificial dataset V to investigate the influence of the ε -dragging technique. The dataset consists of 60 samples from three classes, with 20 samples from each class arranged in three groups: left, middle, and right. By solving the optimization problems of HQSLSR (15) and SQSLSR (23) on dataset V, we obtained the corresponding regression labels f ˜ ( x ) = ( f ˜ 1 ( x ) , f ˜ 2 ( x ) , f ˜ 3 ( x ) ) T and f ( x ) = ( f 1 ( x ) , f 2 ( x ) , f 3 ( x ) ) T , where f ˜ k ( x ) , f k ( x ) , k = 1 , 2 , 3 represent the three regression functions solved by HQSLSR and SQSLSR, respectively. The difference caused by the ε -dragging technique is represented by D = ( f ( x ) f ˜ ( x ) ) , which includes three components related to the corresponding three classes. Figure 5 illustrates the relationship between the index of training samples and the three components of the difference D .
According to the results presented in Figure 5b, the first component of the difference matrix D exhibits positive values for the first 20 samples, while negative values are observed for the last 40 samples. This observation suggests that the introduction of the ε -dragging technique has effectively increased the gap in the first component of the difference matrix D between the first class and the remaining classes. Additionally, Figure 5c,d demonstrate that the second and third components of the difference matrix D highlight the second and third classes of samples, respectively. Therefore, the ε -dragging technique has successfully enlarged the differences in regression labels among samples from different classes, thereby enhancing the robustness of the model.
Based on the experimental results presented above, it can be concluded that the regression curve f k ( x ) = 1 , k = 1 , 2 , , K should be close to the samples from the k-th class while being distant from the samples of other classes. The K pairs of regression curves can be modeled as arbitrary quadratic surfaces in the plane. This approach enables HQSLSR and its softened version (SQSLSR) to achieve higher accuracy. SQSLSR utilizes the ε -dragging technique to relax the labels, which forces the regression labels of different classes to move in opposite directions, thereby increasing the distances between classes. Consequently, SQSLSR exhibits better discriminative ability than HQSLSR.

5.2. Experimental Results on Benchmark Datasets

In order to validate the performances of our HQSLSR and SQSLSR, we compare them with linear methods LSR, DLSR, LDA, SVM-L, LRDLSR, WCSDLSR, and nonlinear methods QSSVM, SVM-R, KRR-R, KRR-P, and reg-LSDWPTSVM. These methods are implemented on 16 UCI benchmark datasets. Numerical results are obtained by repeating five-fold cross-validation five times, including average accuracy (Acc), standard deviation (Std), and computing time (Time). The best results are highlighted in boldface. Lastly, we also calculated the sensitivity and specificity of each method on six datasets to further evaluate their classification performances. Table 1 summarizes the basic information about the 16 UCI benchmark datasets, which are taken from the website https://archive.ics.uci.edu/ml/index.php (the above datasets accessed on 18 August 2021).
In Table 2, we show the experimental results of the above 13 methods on the 16 benchmark datasets. It is obvious from Table 2 that our HQSLSR and SQSLSR outperform linear methods LSR, LDA, DLSR, LRDLSR, WCSDLSR, and SVM-L in terms of classification accuracy on almost all datasets. Moreover, the accuracy of our HQSLSR and SQSLSR are similar to other nonlinear classification methods: SVM-R, SVM-P, KRR-R, KRR-P, QSSVM, and reg-LSDWPTSVM. Note that our SQSLSR has the highest classification accuracy on most datasets. In addition, in terms of computation time, our methods not only have less time cost than the compared nonlinear methods, but also have a narrow gap with the fastest linear method LSR. In general, our HQSLSR and SQSLSR can achieve higher accuracy without increasing the time cost too much, and the generalization ability of SQSLSR in particular is better.
To further evaluate the classification performances of these 13 methods, we show the specificity and sensitivity of the 13 methods on the datasets in Table 3. It can be seen from Table 3, our HQSLSR and SQSLSR perform well in terms of specificity and sensitivity on most of the benchmark datasets.

5.3. Convergence Analysis

In this subsection, we experimentally validate the convergence of Algorithm 1. As shown in Figure 6, the value of the objective function monotonically decreases with the increasing number of iterations in six benchmark datasets. Moreover, our SQSLSR converges within five steps on most of the datasets, which indicates Algorithm 1 converges quickly.

5.4. Statistical Analysis

In this subsection, we use the Friedman test [31] and the Neymani test [32] to further illustrate the differences between our two methods and other methods.
First, we carry out the Friedman test, where the original hypothesis is that all methods have the same classification accuracy and computation time. We ranked these 13 methods based on their accuracy and computation time on the 16 benchmark datasets and presented the average rank r i ( i = 1 , 2 , , 13 ) for each algorithm in Table 4 and Table 5. Let N and s denote the number of datasets and algorithms, respectively. The relevant statistics are obtained by
τ χ 2 = 12 N s ( s + 1 ) ( i s r i 2 s ( s + 1 ) 2 4 ) ,
τ F = ( N 1 ) τ χ 2 N ( s 1 ) τ χ 2 ,
where τ F follows an F-distribution with degrees of freedom s 1 and ( s 1 ) ( N 1 ) . According to Equation (35), we obtain two Friedman statistics τ F , which are = 12.6243 and 109.9785 , and the critical value corresponding to α = 0.05 is F α = 1.8063 . Since τ F > F α , we reject the original hypothesis.
Rejection of the original hypothesis suggests that our HQSLSR, SQSLSR, and other methods perform differently in terms of accuracy and computation time. To further distinguish these methods in terms of classification accuracy and computation time, a Nemenyi test is further adopted, and the critical difference is calculated with the following equation:
C D = q α s ( s + 1 ) 6 N ,
when α = 0.05 , q α = 3.313 , we obtain C D = 4.5616 by Equation (36).
Figure 7 and Figure 8 visually display the results of the Friedman test and the Nemenyi post hoc test. The average rank of each method is marked along the axis. Groups of methods that are not significantly different are connected by red lines.
On the one hand, our methods HQSLSR and SQSLSR are not very different from SVM-R, KRR-R, and KRR-P and are significantly better than LSR, DLSR, LDA, SVM-L, and QSSVM in terms of classification accuracy. On the other hand, our methods HQSLSR and SQSLSR are not very different from LSR, DLSR, and LDA and are significantly better than WCSDLSR, KRR-R, KRR-P, SVM-L, reg-LSDWPTSVM, SVM-R, and QSSVM in terms of computation time. In general, our HQSLSR and SQSLSR can achieve higher accuracy while maintaining relatively small computation time.

6. Conclusions

In this paper, utilizing the kernel-free trick and ε -dragging technique, we propose two classifiers, HQSLSR and its softened version (SQSLSR). On the one hand, the quadratic surface kernel-free trick is introduced, which avoids the difficulty of selecting the appropriate kernel functions and corresponding parameters while maintaining good interpretability. On the other hand, utilizing the ε -dragging technique makes the labels more flexible and enhances the generalization ability of SQSLSR. Our HQSLSR can be solved directly, while SQSLSR is solved by an alternating iteration algorithm which we designed. Additionally, the computational complexity, convergence analysis, and interpretability of our methods are also addressed. The experimental results on artificial and benchmark datasets confirm the feasibility and effectiveness of our proposed methods.
In future work, we aim to address several challenges to extend the HQSLSR and SQSLSR models. Specifically, we plan to simplify the quadratic surface to enable our approaches to process high-dimensional data, such as image data. Moreover, we intend to incorporate suitable sparse regularization terms to achieve feature selection.

Author Contributions

Conceptualization, Z.Y.; methodology, C.W. and Z.Y.; software, C.W.; validation, Z.Y., J.Y. and X.Y.; formal analysis, Z.Y.; data curation, C.W.; writing—original draft preparation, C.W.; writing—review and editing, Z.Y., J.Y. and X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (No. 12061071).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All of the benchmark datasets used in our numerical experiments are from the UCI Machine Learning Repository, which are available at https://archive.ics.uci.edu/ml/index.php (the above datasets accessed on 18 August 2021).

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

  1. Hastie, T.; Tibshirani, R.; Buja, A. Flexible discriminant analysis by optimal scoring. J. Am. Stat. Assoc. 1993, 89, 1255–1270. [Google Scholar] [CrossRef]
  2. Hastie, T.; Tibshirani, R.; Friedman, J. Linear methods for classification. In The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009; Volume 2, pp. 103–106. [Google Scholar]
  3. Xiang, S.; Nie, F.; Meng, G.; Pan, C.; Zhang, C. Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1738–1754. [Google Scholar] [CrossRef] [PubMed]
  4. Zhang, X.; Wang, L.; Xiang, S.; Liu, C. Retargeted least squares regression algorithm. IEEE Trans. Neural Netw. Learn. Syst. 2014, 26, 2206–2213. [Google Scholar] [CrossRef] [PubMed]
  5. Wen, J.; Li, Z.; Ma, Z.; Xu, Y. Inter-class sparsity based discriminative least square regression. Neural Netw. 2016, 102, 36–47. [Google Scholar] [CrossRef] [PubMed]
  6. Wang, S.; Ge, H.; Yang, J.; Tong, Y. Relaxed group low rank regression model for multi-class classification. Multimed. Tools Appl. 2021, 80, 9459–9477. [Google Scholar] [CrossRef]
  7. Wang, L.; Zhang, X.; Pan, C. Msdlsr: Margin scalable discriminative least squares regression for multicategory classification. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 2711–2717. [Google Scholar] [CrossRef]
  8. Wang, L.; Liu, S.; Pan, C. RODLSR: Robust discriminative least squares regression model for multi-category classification. In Proceedings of the 2017 IEEE ICASSP, New Orleans, LA, USA, 5–9 March 2017; pp. 2407–2411. [Google Scholar]
  9. Fang, X.; Xu, Y.; Li, X.; Lai, Z. Regularized label relaxation linear regression. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 1006–1018. [Google Scholar] [CrossRef]
  10. Chen, Z.; Wu, X.; Kittler, J. Low-rank discriminative least squares regression for image classification. Signal Process. 2020, 173, 107485. [Google Scholar] [CrossRef] [Green Version]
  11. Ma, J.; Zhou, S. Discriminative least squares regression for multiclass classification based on within-class scatter minimization. Appl. Intell. 2022, 52, 622–635. [Google Scholar] [CrossRef]
  12. Zhang, J.; Li, W.; Tao, R.; Du, Q. Discriminative marginalized least squares regression for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3148–3161. [Google Scholar] [CrossRef]
  13. Zhang, R.; Nie, F.; Li, X. Feature selection under regularized orthogonal least square regression with optimal scaling. Neurocomputing 2018, 273, 547–553. [Google Scholar] [CrossRef]
  14. Zhao, S.; Wu, J.; Zhang, B.; Fei, L. Low-rank inter-class sparsity based semi-flexible target least squares regression for feature representation. Pattern Recognit. 2022, 123, 108346. [Google Scholar] [CrossRef]
  15. An, S.; Liu, W.; Venkatesh, S. Face recognition using kernel ridge regression. In Proceedings of the 2007 IEEE CVPR, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–7. [Google Scholar]
  16. Zhang, X.; Chao, W.; Li, Z.; Liu, C.; Li, R. Multi-modal kernel ridge regression for social image classification. Appl. Soft Comput. 2018, 67, 117–125. [Google Scholar] [CrossRef]
  17. Dagher, I. Quadratic kernel-free nonlinear support vector machine. J. Glob. Optim. 2008, 41, 15–30. [Google Scholar] [CrossRef]
  18. Cortes, C.; Vapnik, V. Support vector machine. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  19. Luo, J.; Fang, S.; Deng, A.; Guo, X. Soft quadratic surface support vector machine for binary classification. Asia Pac. J. Oper. Res. 2016, 33, 1650046. [Google Scholar] [CrossRef]
  20. Mousavi, J.; Gao, Z.; Han, L.; Lim, A. Quadratic surface support vector machine with L1 norm regularization. J. Ind. Manag. Optim. 2022, 18, 1835–1861. [Google Scholar] [CrossRef]
  21. Zhan, Y.; Bai, Y.; Zhang, W.; Ying, S. A p-admm for sparse quadratic kernel-free least squares semi-supervised support vector machine. Neurocomputing 2018, 306, 37–50. [Google Scholar] [CrossRef]
  22. Gao, Z.; Fang, S.; Gao, X.; Luo, J.; Medhin, N. A novel kernel-free least squares twin support vector machine for fast and accurate multi-class classification. Knowl. Based Syst. 2021, 226, 107123. [Google Scholar] [CrossRef]
  23. Luo, A.; Yan, X.; Luo, J. A novel chinese points of interest classification method based on weighted quadratic surface support vector machine. Neural Process. Lett. 2022, 54, 1–20. [Google Scholar] [CrossRef]
  24. Ye, J.; Yang, Z.; Li, Z. Quadratic hyper-surface kernel-free least squares support vector regression. Intell. Data Anal. 2021, 25, 265–281. [Google Scholar] [CrossRef]
  25. Luo, J.; Tian, Y.; Yan, X. Clustering via fuzzy one-class quadratic surface support vector machine. Soft Comput. 2017, 21, 5859–5865. [Google Scholar] [CrossRef]
  26. Bai, Y.; Han, X.; Chen, T.; Yu, H. Quadratic kernel-free least squares support vector machine for target diseases classification. J. Comb. Optim. 2015, 30, 850–870. [Google Scholar] [CrossRef]
  27. Gao, Z.; Wang, Y.; Huang, M.; Luo, J.; Tang, S. A kernel-free fuzzy reduced quadratic surface ν-support vector machine with applications. Appl. Soft Comput. 2022, 127, 109390. [Google Scholar] [CrossRef]
  28. Luo, J.; Yan, X.; Tian, Y. Unsupervised quadratic surface support vector machine with application to credit risk assessment. Eur. J. Oper. Res. 2020, 280, 1008–1017. [Google Scholar] [CrossRef]
  29. Gao, Z.; Fang, S.; Luo, J.; Medhin, N. A kernel-free double well potential support vector machine with applications. Eur. J. Oper. Res. 2021, 290, 248–262. [Google Scholar] [CrossRef]
  30. Hsu, C.; Lin, C. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 2022, 13, 415–425. [Google Scholar]
  31. Demšar, J. Statistical comparisons of classifiers over multiple datasets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  32. Garciía, S.; Fernández, A.; Luengo, J.; Francisco, H. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 2010, 180, 2044–2064. [Google Scholar] [CrossRef]
Figure 1. Classification results of the artificial dataset I.
Figure 1. Classification results of the artificial dataset I.
Entropy 25 01103 g001
Figure 2. Classification results of the artificial dataset II.
Figure 2. Classification results of the artificial dataset II.
Entropy 25 01103 g002
Figure 3. Classification results of the artificial dataset III.
Figure 3. Classification results of the artificial dataset III.
Entropy 25 01103 g003
Figure 4. Classification results of the artificial dataset IV.
Figure 4. Classification results of the artificial dataset IV.
Entropy 25 01103 g004
Figure 5. Training samples and the differences caused by ε -dragging technique: (a) sixty training samples in three classes; (b) the first component of the difference D ; (c) the second component of the difference D ; and (d) the third component of the difference D .
Figure 5. Training samples and the differences caused by ε -dragging technique: (a) sixty training samples in three classes; (b) the first component of the difference D ; (c) the second component of the difference D ; and (d) the third component of the difference D .
Entropy 25 01103 g005
Figure 6. Convergence of SQSLSR.
Figure 6. Convergence of SQSLSR.
Entropy 25 01103 g006
Figure 7. Friedman test and Nemenyi post hoc test of accuracy.
Figure 7. Friedman test and Nemenyi post hoc test of accuracy.
Entropy 25 01103 g007
Figure 8. Friedman test and the Nemenyi post hoc test of computation time.
Figure 8. Friedman test and the Nemenyi post hoc test of computation time.
Entropy 25 01103 g008
Table 1. Basic information of benchmark datasets.
Table 1. Basic information of benchmark datasets.
DatasetsSamplesAttributesClass
Haberman30632
Appendicitis10672
Monk-243262
Breast27792
Seeds21073
Iris15043
Contraceptive147393
Balance62543
Vehicle846184
X8D5K100085
Vowel990136
Ecoli36676
Segmentationation2310197
Zoo101167
Yeast1484810
Led7digit500711
Table 2. Classification results on the 16 benchmark datasets.
Table 2. Classification results on the 16 benchmark datasets.
LSRDLSRSVM-LSVM-RQSSVMLDAKRR-RKRR-PLRDLSRWCSDLSRreg-LSDWPTSVMHQSLSRSQSLSR
HabermanAcc±Std 0.7049 ± 0.0345 0.7377 ± 0.0000 0.7091 ± 0.0383 0.7223 ± 0.0345 0.7158 ± 0.0390 0.6699 ± 0.0490 0.7148 ± 0.0206 0.7411 ± 0.0304 0.7129 ± 0.0135 0.7418 ± 0.0220 0.7158 ± 0 , 0302 0.7443 ± 0.0245 0.7639 ± 0.0080
Time (s)0.00040.00301.06361.12240.90230.00160.20930.24070.06850.00860.03690.00440.0048
Monk-2Acc±Std 0.7763 ± 0.0131 0.7879 ± 0.0135 0.8057 ± 0.0316 0.9954 ± 0.0148 0.9839 ± 0.0213 0.7901 ± 0.0104 0.9424 ± 0.0026 0.9554 ± 0.0001 0.7970 ± 0.0396 0.7546 ± 0.0266 0.9930 ± 0.0104 0.9767 ± 0.0001 0.9770 ± 0.0001
Time (s)0.00080.00301.44252.46771.87160.07160.42120.45640.03900.01840.73270.00820.0102
AppendicitisAcc±Std 0.8127 ± 0.0000 0.8286 ± 0.0380 0.8121 ± 0.0638 0.8965 ± 0.0125 0.8485 ± 0.0825 0.6892 ± 0.0534 0.8000 ± 0.0222 0.8667 ± 0.0356 0.8200 ± 0.0213 0.8108 ± 0.0493 0.8675 ± 0409 0.9048 ± 0.0052 0.9143 ± 0.0233
Time (s)0.00100.00320.12210.12710.13100.07240.03800.01190.04050.02561.15400.00440.0044
BreastAcc±Std 0.7110 ± 0.0139 0.7201 ± 0.0021 0.7255 ± 0.0410 0.7440 ± 0.0432 0.6571 ± 0.0573 0.6785 ± 0.0418 0.7645 ± 0.0230 0.7174 ± 0.0244 0.7390 ± 0.0532 0.6819 ± 0.0632 0.0 . 6706 ± 0577 0.7646 ± 0.0182 0.7681 ± 0.0177
Time (s)0.00090.00321.03490.88870.93830.00480.18070.18910.03890.00776.70080.00800.0086
SeedsAcc±Std 0.9429 ± 0.0117 0.9619 ± 0.0190 0.8667 ± 0.0614 0.9286 ± 0.0261 0.9143 ± 0.0190 0.9667 ± 0.0117 0.9571 ± 0.0178 0.9762 ± 0.0150 0.9762 ± 0.0337 0.0 . 9524 ± 0.0238 0.0 . 9581 ± 0.0 . 0433 0.9810 ± 0.0095 0.9857 ± 0.0117
Time (s)0.00270.00700.67340.95770.79200.00670.11660.13600.03930.00961.73350.00580.0474
IrisAcc±Std 0.8333 ± 0.0365 0.8400 ± 0.0249 0.7200 ± 0.0691 0.9667 ± 0.0298 0.9333 ± 0.0333 0.9467 ± 0.0163 0.9533 ± 0.0339 0.9662 ± 0.0163 0.8333 ± 0.0572 0.8133 ± 0.0298 0.9600 ± 0.0149 0.9733 ± 0.0249 0.9667 ± 0.0030
Time (s)0.00400.00280.33340.47200.23080.00420.05900.06400.04000.00530.13850.00320.0032
ContraceptiveAcc±Std 0.5031 ± 0.0172 0.5088 ± 0.0216 0.3508 ± 0.0246 0.5479 ± 0.0153 0.4379 ± 0.0425 0.5112 ± 0.0482 0.5427 ± 0.0185 0.5417 ± 0.0230 0.4939 ± 0.0268 0.4996 ± 0.0199 0.4773 ± 0.0321 0.5475 ± 0.0112 0.5448 ± 0.0171
Time (s)0.00330.034050.565449.5618152.47660.01975.69636.47890.08361.094639.77780.04780.4666
BalanceAcc±Std 0.8592 ± 0.0099 0.8609 ± 0.0027 0.8384 ± 0.0391 0.9002 ± 0.0274 0.9440 ± 0.0236 0.6880 ± 0.0209 0.9121 ± 0.0073 0.9105 ± 0.0078 0.8739 ± 0.0146 0.0 . 8824 ± 0.0409 0.0 . 9056 ± 0.0215 0.9153 ± 0.0063 0.9162 ± 0.0062
Time (s)0.00220.01000.88386.84471.88520.00501.01221.06890.07030.14820.14960.01220.6072
X8D5KAcc±Std 1.0000 ± 0.0000 1.0000 ± 0.0000 0.8750 ± 0.0040 1.0000 ± 0.0000 0.9860 ± 0.0020 1.0000 ± 0.0000 1.0000 ± 0.0000 1.0000 ± 0.0000 1.0000 ± 0.0000 1.0000 ± 0.0000 1.0000 ± 0.0000 1.0000 ± 0.0000 1.0000 ± 0.0000
Time (s)0.01340.002317.178619.331441.57400.06173.41473.71190.13610.481227.10150.02770.1834
VehicleAcc±Std 0.7521 ± 0.0335 0.7686 ± 0.0238 0.6399 ± 0.0631 0.6661 ± 0.0374 0.7694 ± 0.0305 0.7694 ± 0.0375 0.7675 ± 0.0319 0.8287 ± 0.0328 0.7637 ± 0.0439 0.7471 ± 0.79 0.7494 ± 0.0148 0.8229 ± 0.0207 0.8321 ± 0.0066
Time (s)0.00250.035621.284225.7992414.17900.07372.42831.98870.09760.40684872.98050.08100.1314
ZooAcc±Std 0.9328 ± 0.0249 0.9399 ± 0.0200 0.8910 ± 0.0306 0.9210 ± 0.0406 0.8819 ± 0.0481 0.8654 ± 0.0250 0.9299 ± 0.0302 0.9474 ± 0.0008 0.9437 ± 0.0598 0.0 . 9210 ± 0.0266 0.0 . 9505 ± 0.0354 0.9527 ± 0.0028 0.9600 ± 0.0020
Time (s)0.01180.01790.63210.25032.34200.03580.18270.29040.08400.01613486.36050.00720.0540
YeastAcc±Std 0.5508 ± 0.0161 0.5684 ± 0.0107 0.5162 ± 0.0373 0.6004 ± 0.0165 0.5596 ± 0.0055 0.5045 ± 0.0138 0.5926 ± 0.0202 0.6007 ± 0.0185 0.5354 ± 0.0210 0.0 . 5451 ± 0.0266 0.5445 ± 0.0145 0.6097 ± 0.0224 0.6154 ± 0.0183
Time (s)0.00581.3578145.6627158.0168109.69640.083712.884927.38490.26022.2958132.73440.04521.6300
EcoliAcc±Std 0.7136 ± 0.0135 0.7482 ± 0.0240 0.7469 ± 0.0418 0.8900 ± 0.0341 0.8007 ± 0.0271 0.8544 ± 0.0254 0.7317 ± 0.0200 0.8928 ± 0.0146 0.7977 ± 0.0265 0.0 . 8303 ± 0.0313 0.0 . 8720 ± 0.0523 0.8927 ± 0.0254 0.8751 ± 0.0172
Time (s)0.00200.05184.34795.62605.89220.00372.28951.47220.11050.17388.54130.00880.0594
Led7digitAcc±Std 0.7177 ± 0.0261 0.7349 ± 0.0274 0.5420 ± 0.0105 0.6820 ± 0.0000 0.6660 ± 0.0543 0.7420 ± 0.0264 0.7331 ± 0.0374 0.7246 ± 0.0147 0.7138 ± 0.0497 0.7040 ± 0.0241 0.0 . 6960 ± 0.0456 0.7407 ± 0.0367 0.7412 ± 0.0236
Time (s)0.00310.441414.847181.760429.92380.00721.35081.44450.14760.30500.25.40770.01160.4652
VowelAcc±Std 0.4335 ± 0.0201 0.4354 ± 0.0312 0.4101 ± 0.0215 0.9848 ± 0.0090 0.8192 ± 0.0200 0.5722 ± 0.0232 0.9939 ± 0.0059 0.8131 ± 0.0209 0.4647 ± 0.0369 0.3979 ± 0.0438 0.9556 ± 0.0131 0.8202 ± 0.0336 0.8667 ± 0.0174
Time (s)0.00390.278074.694881.7602485.91840.14855.767311.12410.30472.05531044.90180.04343.1902
SegmentationAcc±Std 0.8403 ± 0.0025 0.8403 ± 0.0096 0.9307 ± 0.0071 0.9476 ± 0.0146 0.9392 ± 0.0114 0.9100 ± 0.0118 0.9420 ± 0.0068 0.8952 ± 0.0050 0.8666 ± 0.0112 0.8429 ± 0.0309 0.9221 ± 0.0592 0.9429 ± 0.0060 0.9483 ± 0.0000
Time (s)0.00600.4242299.6623294.53383053.90000.387723.130320.14490.26356.65948310.98280.20283.9048
Table 3. Specificity and sensitivity results of each method.
Table 3. Specificity and sensitivity results of each method.
DatasetSensitivitySpecificity
AppendicitisHabermanContraceptiveX8D5KEcoliYeastAppendicitisHabermanContraceptiveX8D5KEcoliYeast
LSR0.22730.21430.47401.00000.72470.39860.93750.95510.74341.00000.97090.9389
DLSR0.44000.22500.47881.00000.71670.38140.96470.95110.74221.00000.97040.9405
SVM(line)0.40000.18750.40160.99100.85590.46770.94120.92000.69580.99770.96670.9357
SVM(rbf)0.50000.30580.47551.00000.84760.55330.92940.84440.74031.00000.96550.9424
QSSVM0.51420.20700.35300.99600.70140.40620.94120.94670.74241.00000.96590.9361
LDA0.56330.52140.48711.00000.82230.55560.65920.72360.75841.00000.96090.9398
KRR-R0.45210.22220.52801.00000.71390.55520.93060.93870.76261.00000.97050.9467
KRR-P0.49480.30000.52341.00000.85360.53670.96400.93760.76351.00000.97170.9339
LRDLSR0.4000.22500.61281.00000.53170.30360.93330.95040.74391.00000.96620.9354
WCSDLSR0.33330.36840.46531.00000.78540.32690.94440.94460.73701.00000.96300.9380
reg-DWPDSVM0.48670.41110.47301.00000.85400.52480.94220.91490.73001.00000.97160.94511
HQSLSR0.57000.38750.52491.000085810.55750.96470.94670.76711.00000.97650.9465
SQSLSR0.68240.31760.52261.00000.86470.56290.96670.95110.77951.00000.97970.9460
Table 4. Ranks of accuracy.
Table 4. Ranks of accuracy.
DatasetsLSRDLSRSVM-LSVM-RQSVMLDAKRR-RKRR-PLRDLSRWCSDLSRreg-LSDWPTSVMHQSLSRSQSLSR
Haberman1251167.513941037.521
Monk-212118131076913254
Appendicitis97103613125811421
Breast97641311385101221
Seeds106131112583.53.59721
Iris10.59132.5876410.512512.5
Contraceptive87131126451091123
Balance1110127134598632
X8D5K661361266666666
Vehicle9613124.54.5728111031
Zoo76119.512138459.5321
Yeast86124713531191021
Ecoli13101138612197524
Led7digit74131112156891032
Vowel11101226817913354
Segmentation12.512.56258491011731
Average ranks9.68757.6562511.06255.31258.06258.593756.31254.90628.18759.406256.781252.81252.21875
Table 5. Ranks of computation time.
Table 5. Ranks of computation time.
DatasetsLSRDLSRSVM-LSVM-RQSVMLDAKRR-RKRR-PLRDLSRWCSDLSRreg-LSDWPTSVMHQSLSRSQSLSR
Haberman13121311291086745
Monk-212111312789651034
Appendicitis1210111297586133.53.5
Breast12121011389741356
Seeds14101211389651327
Iris4112131158976102.52.5
Contraceptive13121113289571046
Balance13913122101156748
X8D5K21101113489571236
Vehicle12101112398571346
Zoo24119125810731316
Yeast15121310389471126
Ecoli14101112298671335
Led7digit16101312289451137
Vowel14101112389561327
Segmentation15111012498371326
Average ranks1.31253.187510.750011.562511.75003.68758.31258.81255.68755.875011.375035.6875
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, C.; Yang, Z.; Ye, J.; Yang, X. Kernel-Free Quadratic Surface Regression for Multi-Class Classification. Entropy 2023, 25, 1103. https://doi.org/10.3390/e25071103

AMA Style

Wang C, Yang Z, Ye J, Yang X. Kernel-Free Quadratic Surface Regression for Multi-Class Classification. Entropy. 2023; 25(7):1103. https://doi.org/10.3390/e25071103

Chicago/Turabian Style

Wang, Changlin, Zhixia Yang, Junyou Ye, and Xue Yang. 2023. "Kernel-Free Quadratic Surface Regression for Multi-Class Classification" Entropy 25, no. 7: 1103. https://doi.org/10.3390/e25071103

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop