Next Article in Journal
LLC Resonant Voltage Multiplier-Based Differential Power Processing Converter Using Voltage Divider with Reduced Voltage Stress for Series-Connected Photovoltaic Panels under Partial Shading
Next Article in Special Issue
A Life Prediction Model of Flywheel Systems Using Stochastic Hybrid Automaton
Previous Article in Journal
A Smart Overvoltage Monitoring and Hierarchical Pattern Recognizing System for Power Grid with HTS Cables
Previous Article in Special Issue
Fault Tolerant Control of Electronic Throttles with Friction Changes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Effective Multiclass Twin Hypersphere Support Vector Machine and Its Practical Engineering Applications

1
School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
2
College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
3
College of Engineering, Bohai University, Jinzhou 121000, China
*
Authors to whom correspondence should be addressed.
Electronics 2019, 8(10), 1195; https://doi.org/10.3390/electronics8101195
Submission received: 27 September 2019 / Revised: 16 October 2019 / Accepted: 17 October 2019 / Published: 20 October 2019
(This article belongs to the Special Issue Fault Detection and Diagnosis of Intelligent Mechatronic Systems)

Abstract

:
Twin-KSVC (Twin Support Vector Classification for K class) is a novel and efficient multiclass twin support vector machine. However, Twin-KSVC has the following disadvantages. (1) Each pair of binary sub-classifiers has to calculate inverse matrices. (2) For nonlinear problems, a pair of additional primal problems needs to be constructed in each pair of binary sub-classifiers. For these disadvantages, a new multi-class twin hypersphere support vector machine, named Twin Hypersphere-KSVC, is proposed in this paper. Twin Hypersphere-KSVC also evaluates each sample into 1-vs-1-vs-rest structure, as in Twin-KSVC. However, our Twin Hypersphere-KSVC does not seek two nonparallel hyperplanes in each pair of binary sub-classifiers as in Twin-KSVC, but a pair of hyperspheres. Compared with Twin-KSVC, Twin Hypersphere-KSVC avoids computing inverse matrices, and for nonlinear problems, can apply the kernel trick to linear case directly. A large number of comparisons of Twin Hypersphere-KSVC with Twin-KSVC on a set of benchmark datasets from the UCI repository and several real engineering applications, show that the proposed algorithm has higher training speed and better generalization performance.

1. Introduction

Support vector machine (SVM) [1,2], as a computationally powerful tool for classification, have already applied in wide engineering problems [3,4,5,6,7,8]. The SVM has three elements that make it so successful, including structural risk minimization (SRM) principle, kernel trick and dual theory. However, SVM has to solve a large-sized quadratic programming problem (QPP), which greatly limits its applications. To improve learning complexity of SVM, Jayadeva et al. proposed twin SVM (TSVM) [9]. Unlike SVM that seeks an optimal separating hyperplane which maximizes the margin of two classes of samples, TSVM constructs two nonparallel proximal hyperplanes, each of which is close to the corresponding class as possible, and keeps as far away as possibly from the opposite class. The strategy makes TSVM only need to solve two smaller QPPs, instead of one larger QPP as in SVM. Due to its high learning speed, TSVM has attracted interest in recent years. Many improvements have also been proposed [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28], where the twin hypersphere SVM (THSVM) [25,26,27,28] is an excellent improvement of TSVM. Unlike TSVM, THSVM seeks a pair of hyperspheres, instead of two nonparallel hyperplanes, to depict two classes of samples. Compared with TSVM, THSVM has better classification performance.
The SVM and TSVM can only solve the binary classification problems, however, many practical engineering problems are often related to multi-class classification in the real world. Currently, we usually use 1-vs-rest and 1-vs-1 strategy to resolve multi-class classification problems. In the 1-vs-1 SVM, K ( K 1 ) / 2 binary SVM sub-classifiers are constructed. Each sub-classifier can be trained by using two classes of samples. Because only two classes are considered for each sub-classifier in the 1-vs-1 SVM and the rest samples are not involved, the 1-vs-1 SVM may get unfavorable classification results. The 1-vs-rest SVM needs to construct K binary SVM sub-classifiers. Each sub-classifier can be trained by using all the samples; thus, the 1-vs-rest SVM may lead to class imbalance problems. For above drawbacks of 1-vs-1 SVM and 1-vs-rest SVM, Angulo et al. proposed a new support vector classification-regression machine for K class (K-SVCR) [29]. K ( K 1 ) / 2 binary SVM sub-classifiers are constructed, each of which is trained with all the samples and evaluates each sample into 1-vs-1-vs-rest structure. K-SVCR avoids the class imbalance problems and information loss. Compared with 1-vs-1 SVM and 1-vs-rest SVM, K-SVCR achieves better generalization performance. Twin-KSVC [30,31], being an effective extension of K-SVCR, is based on TSVM and also evaluates each sample into 1-vs-1-vs-rest structure. Twin-KSVC achieves higher learning speed in comparison with K-SVCR. However, Twin-KSVC has the following disadvantages:
  • Each pair of sub-classifiers has to calculate inverse matrices, which is extraordinarily time-consuming for the large-scale engineering problems.
  • For nonlinear problems, each pair of sub-classifiers needs to construct a pair of additional primal problems, instead of directly applying the kernel trick to linear case as in SVM.
For the disadvantages of Twin-KSVC, in this paper, we propose a Twin Hypersphere-KSVC, inspired by THSVM. The Twin Hypersphere-KSVC also evaluates each sample into 1-vs-1-vs-rest structure, as in Twin-KSVC. However, our Twin Hypersphere-KSVC does not seek two nonparallel hyperplanes in each pair of binary sub-classifiers as in Twin-KSVC, but a pair of hyperspheres. Compared with Twin-KSVC, Twin Hypersphere-KSVC avoids computing inverse matrices, and for nonlinear problems, can apply the kernel trick to linear case directly.
This paper is outlined as follows. We briefly review the related multi-class classification algorithms in Section 2. In Section 3, the Twin Hypersphere-KSVC is proposed in detail. The experimental results on a set of benchmark datasets and several real engineering problems are presented in Section 4 and the conclusions are drawn in last section.

2. Related Works

In this paper, we consider a multi-class classification problem with a training dataset D = { x p k R d | k = 1 , , K , p = 1 , , n k } , where K is the number of classes and n k is the number of the samples of the k-th class. The size of training dataset is n = n 1 + + n K . Denote, for convenience, by X k the sets of samples of the k-th class, i.e., X k = { x p k | p = 1 , , n k } .

2.1. Review of K-SVCR Multi-Classifier

The K-SVCR multi-classifier [29] is based on decomposition-reconstruction strategy. K ( K 1 ) / 2 binary SVM sub-classifiers are constructed, each of which evaluates each sample into 1-vs-1-vs-rest structure. The classification result of K-SVCR is shown in Figure 1a intuitively.
The sub-classifier f i j ( x ) for two focused classes i and j in K-SVCR seeks an optimal hyperplane
w i j · x + b i j = 0 ,
here w i j R d is the normal vector and b i j R is the bias term. The optimal hyperplane can be obtained by resolving the following QPP:
min 1 2 w i j 2 + c 1 ( p = 1 n i η p i j + p = 1 n j η p i j * ) + c 2 p = 1 n i j ¯ ( ξ p i j + ξ p i j * ) , s . t . w i j · x p i + b i j 1 η p i j , p = 1 , , n i , w i j · x p j + b i j 1 + η p i j * , p = 1 , , n j , ε ξ p i j * < w i j · x p i j ¯ + b i j ε + ξ p i j , p = 1 , , n i j ¯ , η p i j 0 , p = 1 , , n i , η p i j * 0 , p = 1 , , n j , ξ p i j 0 , ξ p i j * 0 , p = 1 , , n i j ¯ ,
where x p i j ¯ D X i X j , n i j ¯ = n n i n j , ξ p i j , ξ p i j * and η p i j , η p i j * are slack variables, the parameter ε is restricted to [ 0 , 1 ) .
For a testing sample x, the sub-classifier f i j ( x ) = w i j · x + b i j determines its class by
F i j ( x ) = 1 , i f f i j ( x ) < ε , 1 , i f f i j ( x ) > ε , 0 , e l s e .
For the testing sample x, the final label can be determined by vote rule.

2.2. Review of Twin-KSVC Multi-Classifier

Twin-KSVC [30,31] is an improvement of K-SVCR. The Twin-KSVC constructs K ( K 1 ) / 2 pairs of binary TSVM sub-classifiers, which evaluates each sample into 1-vs-1-vs-rest structure. The classification result is intuitively presented in Figure 1b.
The sub-classifiers f i ( x ) and f j ( x ) for two focused classes i and j in Twin-KSVC seek a pair of hyperplane
w i · x + b i = 0 a n d w j · x + b j = 0
where w i ( j ) R d and b i ( j ) R are the normal vector and the bias term of the corresponding hyperplane, respectively. The two hyperplanes can be obtained by resolving the QPPs as follows:
min 1 2 p = 1 n i ( w i · x p i + b i ) 2 + c 1 p = 1 n j η p i + c 2 p = 1 n i j ¯ ξ p i , s . t . ( w i · x p j + b i ) + η p i 1 , p = 1 , , n j , ( w i · x p i j ¯ + b i ) + ξ p i 1 ε , p = 1 , , n i j ¯ , η p i 0 , p = 1 , , n j , ξ p i 0 , p = 1 , , n i j ¯ ,
min 1 2 p = 1 n j ( w j · x p j + b j ) 2 + c 3 p = 1 n i η p j + c 4 p = 1 n i j ¯ ξ p j , s . t . ( w j · x p i + b j ) + η p j 1 , p = 1 , , n i , ( w j · x p i j ¯ + b j ) + ξ p j 1 ε , p = 1 , , n i j ¯ , η p j 0 , p = 1 , , n i , ξ p j 0 , p = 1 , , n i j ¯ ,
where η p i ( o r j ) and ξ p i ( o r j ) are slack variables.
For a testing sample x, the sub-classifiers f i ( x ) = x T w i + b i and f j ( x ) = x T w j + b j assign its class by
F i j ( x ) = 1 , i f f j ( x ) < 1 ε , 1 , i f f i ( x ) > 1 + ε , 0 , e l s e .
For the testing sample x, the final label can be also determined by vote rule.

2.3. Review of THKSVM Multi-Classifier

THKSVM (Twin Hypersphere Multiclass Support Vector Machine) [32] integrates THSVM and 1-vs-rest structure. THKSVM constructs K hyperspheres in the training stage, whose classification result is intuitively shown in Figure 1c.
The sub-classifier for the focused classes i in THKSVM seeks a hypersphere
x a i 2 = R i 2
where a i R d and R i R are the center and the radius of the corresponding hypersphere, respectively. The hypersphere can be constructed by resolving the following QPP:
min 1 2 p = 1 n i ¯ x p i ¯ a i 2 v 1 R i 2 + c 1 p = 1 n i η p i , s . t . x p i a i 2 R i 2 η p i , R i 2 0 , η p i 0 , p = 1 , , n i ,
where x p i ¯ D X i , n i ¯ = n n i and η p i 0 are slack variables.
The class of a testing sample x can be determined by
C l a s s k = arg max i = 1 , , K x a i 2 R i 2 .

3. Twin Hypersphere-KSVC

Twin Hypersphere-KSVC, inspired by THSVM and 1-vs-1-vs-rest structure, constructs K ( K 1 ) / 2 pairs of hyperspheres in the training stage. For two focused classes i and j, Twin Hypersphere-KSVC seeks a pair of hypersphere ( a i , R i ) and ( a j , R j ) , where a i ( R i ) and a j ( R j ) are respectively the centers (radii) of the corresponding hyperspheres. Each hypersphere covers the corresponding class as many as possibly, keeps as far away as possibly from another class, contains the rest of samples as little as possibly, and the radius of the hypersphere is as small as possible. The Twin Hypersphere-KSVC are intuitively presented in Figure 1d.

3.1. Linear Case

For the linear case, each pair of hyperspheres ( a i , R i ) and ( a j , R j ) for two focused classes i and j in Twin Hypersphere-KSVC is constructed by resolving the following QPPs:
min R i 2 v 1 n j p = 1 n j x p j a i 2 + c 1 n i p = 1 n i η p i + c 2 n i j ¯ p = 1 n i j ¯ ξ p i , s . t . x p i a i 2 R i 2 + η p i , p = 1 , , n i , x p i j ¯ a i 2 R i 2 ξ p i , p = 1 , , n i j ¯ , η p i 0 , p = 1 , , n i , ξ p i 0 , p = 1 , , n i j ¯ , R i 2 0 ,
min R j 2 v 2 n i p = 1 n i x p i a j 2 + c 3 n j p = 1 n j η p j + c 4 n i j ¯ p = 1 n i j ¯ ξ p j , s . t . x p j a j 2 R j 2 + η p j , p = 1 , , n j , x p i j ¯ a j 2 R j 2 ξ p j , p = 1 , , n i j ¯ , η p j 0 , p = 1 , , n j , ξ p j 0 , p = 1 , , n i j ¯ , R j 2 0 .
where η p i ( j ) and ξ p i ( j ) are slack variables.
The Lagrangian function L for the QPP (11) is given by:
L = R i 2 v 1 n j p = 1 n j x p j a i 2 + c 1 n i p = 1 n i η p i + c 2 n i j ¯ p = 1 n i j ¯ ξ p i + p = 1 n i α p ( x p i a i 2 R i 2 η p i ) p = 1 n i j ¯ β p ( x p i j ¯ a i 2 R i 2 + ξ p i ) p = 1 n i s p η p i p = 1 n i j ¯ q p ξ p i λ R i 2 .
The Karush-Kuhn-Tucker (KKT) conditions are satisfied as follows:
2 v 1 n j p = 1 n j ( x p j a i ) 2 p = 1 n i α p ( x p i a i ) + 2 p = 1 n i j ¯ β p ( x p i j ¯ a i ) = 0 ,
1 p = 1 n i α p + p = 1 n i j ¯ β p λ = 0 ,
c 1 n i α p s p = 0 , p = 1 , , n i ,
c 2 n i j ¯ β p q p = 0 , p = 1 , , n i j ¯ ,
α p ( x p i a i 2 R i 2 η p i ) = 0 , p = 1 , , n i ,
β p ( x p i j ¯ a i 2 R i 2 + ξ p i ) = 0 , p = 1 , , n i j ¯ ,
s p η p i = 0 , p = 1 , , n i ,
q p ξ p i = 0 , p = 1 , , n i j ¯ ,
λ R i 2 = 0 .
From (14), (15) and (22), we can obtain
a i = 1 1 v 1 ( p = 1 n i α p x p i v 1 n j p = 1 n j x p j p = 1 n i j ¯ β p x p i j ¯ )
By denoting
v i = 1 1 v 1
and substituting (16)–(23) into (13), the dual optimal problem of (11) is obtained as follows:
max p = 1 n i α p x p i · x p i p = 1 n i j ¯ β p x p i j ¯ · x p i j ¯ + 2 v i v 1 n j p 1 = 1 n j p 2 = 1 n i α p 2 x p 1 j · x p 2 i 2 v i v 1 n j p 1 = 1 n j p 2 = 1 n i j ¯ β p 2 x p 1 j · x p 2 i j ¯ v i p 1 = 1 n i p 2 = 1 n i α p 1 α p 2 x p 1 i · x p 2 i + 2 v i p 1 = 1 n i p 2 = 1 n i j ¯ α p 1 β p 2 x p 1 i · x p 2 i j ¯ v i p 1 = 1 n i j ¯ p 2 = 1 n i j ¯ β p 1 β p 2 x p 1 i j ¯ · x p 2 i j ¯ s . t . 1 p = 1 n i α p + p = 1 n i j ¯ β p = 0 , 0 α p c 1 n i , p = 1 , , n i , 0 β p c 2 n i j ¯ , p = 1 , , n i j ¯ .
By defining α = α 1 α n i and β = β 1 β n i j ¯ , the optimal problem (25) can be reformed as
max v i ( α T β T ) X i T X i X i T X i j ¯ X i j ¯ T X i X i j ¯ T X i j ¯ α β + d i a g ( X i T X i ) + 2 v i v 1 n j X i T X j e j d i a g ( X i j ¯ T X i j ¯ ) 2 v i v 1 n j X i j ¯ T X j e j T α β , s . t . e i T e i j ¯ T α β = 1 , 0 e i α c 1 n i e i , 0 e i j ¯ β c 2 n i j ¯ e i j ¯ .
According to the KKT conditions (16)–(21), we can obtain R i 2 by the following formula:
R i 2 = x * a i 2 ,
where x * S i 1 S i 2 , S i 1 = { x p i | 0 < α p < c 1 n i } and S i 2 = { x p i j ¯ | 0 < β p < c 2 n i j ¯ } .
By denoting v j = 1 1 v 2 , the dual optimal problem of (12) can be obtained as follows:
max p = 1 n j θ p x p j · x p j p = 1 n i j ¯ γ p x p i j ¯ · x p i j ¯ + 2 v j v 2 n i p 1 = 1 n i p 2 = 1 n j θ p 2 x p 1 i · x p 2 j 2 v j v 2 n i p 1 = 1 n i p 2 = 1 n i j ¯ γ p 2 x p 1 i · x p 2 i j ¯ v j p 1 = 1 n j p 2 = 1 n j θ p 1 θ p 2 x p 1 j · x p 2 j + 2 v j p 1 = 1 n j p 2 = 1 n i j ¯ θ p 1 γ p 2 x p 1 j · x p 2 i j ¯ v j p 1 = 1 n i j ¯ p 2 = 1 n i j ¯ γ p 1 γ p 2 x p 1 i j ¯ x p 2 i j ¯ , s . t . 1 p = 1 n j θ p + p = 1 n i j ¯ γ p = 0 , 0 θ p c 3 n j , p = 1 , , n j , 0 γ p c 4 n i j ¯ , p = 1 , , n i j ¯ .
By defining θ = θ 1 θ n j and γ = γ 1 γ n i j ¯ , the (28) can be reformulated as
max v j ( θ T γ T ) X j T X j X j T X i j ¯ X i j ¯ T X j X i j ¯ T X i j ¯ θ γ + d i a g ( X j T X j ) + 2 v j v 2 n i X j T X i e i d i a g ( X i j ¯ T X i j ¯ ) 2 v j v 2 n i X i j ¯ T X i e i T θ γ , s . t . e j T e i j ¯ T θ γ = 1 , 0 e j θ c 3 n j e j , 0 e i j ¯ γ c 4 n i j ¯ e i j ¯ .
We can compute R j 2 by the following formula:
R j 2 = x * a j 2 ,
where x * S j 1 S j 2 , S j 1 = { x p j | 0 < θ p < c 3 n j } and S j 2 = { x p i j ¯ | 0 < γ p < c 4 n i j ¯ } .

3.2. Nonlinear Case

We extend the linear Twin Hypersphere-KSVC to the nonlinear case by directly considering the nonlinear map φ : R d H (H is a high-dimensional Hilbert space), instead of the kernel generated surfaces in Twin-KSVC.
min R i 2 v 1 n j p = 1 n j φ ( x p j ) a i 2 + c 1 n i p = 1 n i η p i + c 2 n i j ¯ p = 1 n i j ¯ ξ p i , s . t . φ ( x p i ) a i 2 R i 2 + η p i , p = 1 , , n i , φ ( x p i j ¯ ) a i 2 R i 2 ξ p i , p = 1 , , n i j ¯ , η p i 0 , p = 1 , , n i , ξ p i 0 , p = 1 , , n i j ¯ , R i 2 0 ,
min R j 2 v 2 n i p = 1 n i φ ( x p i ) a j 2 + c 3 n j p = 1 n j η p j + c 4 n i j ¯ p = 1 n i j ¯ ξ p j , s . t . φ ( x p j ) a j 2 R j 2 + η p j , p = 1 , , n j , φ ( x p i j ¯ ) a j 2 R j 2 ξ p j , p = 1 , , n i j ¯ , η p j 0 , p = 1 , , n j , ξ p j 0 , p = 1 , , n i j ¯ , R j 2 0 .
According to the dual theory, one can get the dual optimal problems of (31) and (32) as follows:
max v i ( α T β T ) K ( X i , X i ) K ( X i , X i j ¯ ) K ( X i j ¯ , X i ) K ( X i j ¯ , X i j ¯ ) α β + d i a g ( K ( X i , X i ) ) + 2 v i v 1 n j K ( X i , X j ) e j d i a g ( K ( X i j ¯ , X i j ¯ ) ) 2 v i v 1 n j K ( X i j ¯ , X j ) e j T α β , s . t . e i T e i j ¯ T α β = 1 , 0 e i α c 1 n i e i , 0 e i j ¯ β c 2 n i j ¯ e i j ¯ ,
max v j ( θ T γ T ) K ( X j , X j ) K ( X j , X i j ¯ ) K ( X i j ¯ , X j ) K ( X i j ¯ , X i j ¯ ) θ γ + d i a g ( K ( X j , X j ) ) + 2 v j v 2 n i K ( X j , X i ) e i d i a g ( K ( X i j ¯ , X i j ¯ ) ) 2 v j v 2 n i K ( X i j ¯ , X i ) e i T θ γ , s . t . e j T e i j ¯ T θ γ = 1 , 0 e j θ c 3 n j e j , 0 e i j ¯ γ c 4 n i j ¯ e i j ¯ ,
where K ( . , . ) is a kernel matrix.

3.3. Decision Rule

For a testing sample x, each pair of sub-classifiers determines its label by
F i j ( x ) = 1 φ ( x ) a i 2 / R i 2 φ ( x ) a j 2 / R j 2 1 φ ( x ) a i 2 / R i 2 > φ ( x ) a j 2 / R j 2
wher
φ ( x ) a i 2 = K ( x , x ) + 2 v i ( p = 1 n i α p K ( x p i , x ) v 1 n j p = 1 n j K ( x p j , x ) p = 1 n i j ¯ β p K ( x p i j ¯ , x ) ) + v i 2 ( p 1 = 1 n i p 2 = 1 n i α p 1 α p 2 K ( x p 1 i , x p 2 i ) + ( v 1 n j ) 2 p 1 = 1 n j p 2 = 1 n j K ( x p 1 j , x p 2 j ) + p 1 = 1 n i j ¯ p 2 = 1 n i j ¯ β p 1 β p 2 K ( x p 1 i j ¯ , x p 2 i j ¯ ) 2 v 1 n j p 1 = 1 n i p 2 = 1 n j α p 1 K ( x p 1 i , x p 2 j ) 2 p 1 = 1 n i p 2 = 1 n i j ¯ α p 1 β p 2 K ( x p 1 i , x p 2 i j ¯ ) + 2 v 1 n j p 1 = 1 n j p 2 = 1 n i j ¯ β p 2 K ( x p 1 i , x p 2 i j ¯ ) )
and
φ ( x ) a j 2 = K ( x , x ) + 2 v j ( p = 1 n j θ p K ( x p j , x ) v 2 n i p = 1 n i K ( x p i , x ) p = 1 n i j ¯ γ p K ( x p i j ¯ , x ) ) + v j 2 ( p 1 = 1 n j p 2 = 1 n j θ p 1 θ p 2 K ( x p 1 j , x p 2 j ) + ( v 2 n i ) 2 p 1 = 1 n i p 2 = 1 n i K ( x p 1 i , x p 2 i ) + p 1 = 1 n i j ¯ p 2 = 1 n i j ¯ γ p 1 γ p 2 K ( x p 1 i j ¯ , x p 2 i j ¯ ) 2 v 2 n i p 1 = 1 n j p 2 = 1 n i θ p 1 K ( x p 1 j , x p 2 i ) 2 p 1 = 1 n j p 2 = 1 n i j ¯ θ p 1 γ p 2 K ( x p 1 j , x p 2 i j ¯ ) + 2 v 2 n i p 1 = 1 n i p 2 = 1 n i j ¯ γ p 2 K ( x p 1 i , x p 2 i j ¯ ) ) .
For the testing sample x, the final label can be also determined by vote rule.

3.4. Analysis of Learning Complexity

Next, the learning complexity of the proposed Twin Hypersphere-KSVC will be discussed. We take the 4-class classification as an example, suppose samples of 4 classes are approximately equal, and present the learning complexity of K-SVCR, Twin-KSVC and Twin Hypersphere-KSVC in Table 1. The main calculating burden in Twin-KSVC includes solving QPPs and calculating inverse matrices. Therefore the learning complexities of linear and nonlinear Twin-KSVC are respectively K ( K 1 ) ( O ( d 3 ) + O ( ( 3 4 n ) 3 ) ) and K ( K 1 ) ( O ( n 3 ) + O ( ( 3 4 n ) 3 ) ). However, K-SVCR and Twin Hypersphere-KSVC avoid computing inverse matrices, the learning complexities of linear and nonlinear K-SVCR are both K ( K 1 ) 2 O ( ( 3 2 n ) 3 ) while the learning complexities of linear and nonlinear Twin Hypersphere-KSVC are both K ( K 1 ) O ( ( 3 4 n ) 3 ) . From the above analysis, we can see that our Twin Hypersphere-KSVC requires less learning time.

4. Experiments

In this section, we investigate classification performance of our Twin Hypersphere-KSVC, ITKSVC [31], THKSVM, Twin-KSVC and K-SVCR on a set of benchmark datasets from the UCI repository and several real engineering problems. The above algorithms are implemented in Matlab R2012a, and we use the “quadprog.m” function to solve the QPP and the “inv.m” function to calculate matrix inversion.
The parameter selection directly affects the classification performance of the above algorithms. We use the most popular exhaustive search to determine the parameters in this section. The K-SVCR includes two penalty parameters c i ( i = 1 , 2 ) and bandwidth parameter ε . The Twin-KSVC has five parameters, including four penalty parameters c i ( i = 1 , 2 , 3 , 4 ) and bandwidth parameter ε . The THKSVM holds the penalty parameters c 1 and v 1 . The ITKSVC contains seven parameters which are six penalty parameters c i ( i = 1 , 2 , 3 , 4 , 5 , 6 ) and bandwidth parameter ε . There exist six penalty parameters c i ( i = 1 , 2 , 3 , 4 ) and v i ( i = 1 , 2 ) in our Twin Hypersphere-KSVC. The optimal values of penalty parameters c i are searched from set { 2 7 , , 2 7 } , penalty parameters v i from { 0.1 , , 0.9 } and bandwidth parameter ε from set { 0 , , 0.5 } .

4.1. Benchmark Datasets

We compare Twin Hypersphere-KSVC with ITKSVC, THKSVM, Twin-KSVC and K-SVCR on a set of benchmark datasets from the UCI repository in this subsection. The benchmark datasets are presented in Table 2. The 5-fold cross validation is used to estimate the testing accuracy and we use radial basis function K ( x , y ) = e x y 2 / σ 2 in this subsection, where the parameter σ are selected from { 2 7 , , 2 7 } .
The predicting accuracy and learning time of five algorithms on UCI benchmark data sets are respectively shown in Table 3 and Table 4. By observing Table 3 and Table 4, one can come to the following conclusions.
(1)
The Twin Hypersphere-KSVC, ITKSVC, Twin-KSVC and K-SVCR obtain better test accuracy than THKSVM. This is mainly because the sub-classifiers in Twin Hypersphere-KSVC, ITKSVC, Twin-KSVC and K-SVCR avoid the class imbalance problem appearing in THKSVM.
(2)
The Twin Hypersphere-KSVC works faster than Twin-KSVC, K-SVCR and ITKSVC. It is mainly because, compared with Twin-KSVC, the sub-classifiers in Twin Hypersphere-KSVC avoid calculating inverse matrices which appear in Twin-KSVC, while compared with K-SVCR and ITKSVC, each pair of sub-classifiers of our Twin Hypersphere-KSVC only needs to resolve two smaller QPPs.
(3)
For predicting accuracy of Twin-KSVC, K-SVCR, ITKSVC and Twin Hypersphere-KSVC, we can observe that not any method is superior to others for all data sets. We can apply Friedman test to analyze the test accuracy of these classifiers statistically [33,34]. The ranks of five algorithms for all data set in the light of test accuracy are presented in Table 5. The Friedman statistic χ F 2 can be calculated by
χ F 2 = 12 m 2 m 1 ( m 1 + 1 ) [ k 1 = 1 m 1 r a n k k 1 2 m 1 ( m 1 + 1 ) 2 4 ] ,
where r a n k k 1 = 1 m 2 k 2 = 1 m 2 r k 1 k 2 , r k 1 k 2 denotes the rank of the k 1 -th of m 1 classifiers on the k 2 -th of m 2 data sets. Because χ F 2 is undesirably conservative, we can use the other statistic
F F = ( m 2 1 ) χ F 2 m 2 ( m 1 1 ) χ F 2 ,
which is distributed by the F ( m 1 1 , ( m 1 1 ) ( m 2 1 ) ) .
We can calculate the statistic F F = 10.41 , where F F F ( 4 , 28 ) . For the level of significance α = 0.05 , F ( 4 , 28 ) = 2.95 is smaller than F F , which means there are significant differences among these classifiers. We can see from Table 5 that the average rank of Twin Hypersphere-KSVC is lower than ITKSVC, and is higher than Twin-KSVC and K-SVCR. It implies that classification accuracy of Twin Hypersphere-KSVC is slightly lower than ITKSVC, however, is much higher than Twin-KSVC and K-SVCR.

4.2. Handwritten Digits Recognition

We use Twin Hypersphere-KSVC to recognize handwritten digits. The USPS database is used to compare our Twin Hypersphere-KSVC with Twin-KSVC, THKSVM, ITKSVC and K-SVCR. The USPS dataset consists of 8-bit grayscale images of handwritten digits from 0 to 9, as presented in Figure 2. We choose 55 images for each handwritten digit in the USPS database, 550 images in total. We only consider linear kernel function K ( x , y ) = x T y , and also use 5-fold cross validation to estimate the testing accuracy.
The handwritten digits recognition results of five algorithms are presented in Table 6. From Table 6, we can observe that, the proposed Twin Hypersphere-KSVC obtains better accuracy among all algorithms. In terms of learning time, our Twin Hypersphere-KSVC costs shorter learning time, compared with ITKSVC, Twin-KSVC and K-SVCR.

4.3. Text Classification

We evaluate our Twin Hypersphere-KSVC to text classification in this subsection and compare it with the other algorithms on the Reuters21578 dataset. We choose 6 classes from the Reuters21578 dataset, 708 documents in total, which are presented in Table 7. We also consider linear kernel function in this subsection.
The experimental results of five algorithms for text classification are presented in Table 8. By observing Table 8, we can notice that, our Twin Hypersphere-KSVC can get better accuracy among all multi-classifiers. The proposed Twin Hypersphere-KSVC runs faster in comparison with ITKSVC, Twin-KSVC and K-SVCR.

5. Conclusions

In this paper, we propose a novel multi-class classification algorithm, named Twin Hypersphere-KSVC. The Twin Hypersphere-KSVC evaluates each training sample into 1-vs-1-vs-rest structure, as in Twin-KSVC and K-SVCR, and constructs two hyperspheres in each pair of sub-classifiers, instead of two nonparallel hyperplanes as in Twin-KSVC. Compared with Twin-KSVC, the sub-classifiers in Twin Hypersphere-KSVC avoid computing inverse matrices, and for nonlinear problems, can apply the kernel trick to linear case directly. The classification results on a set of benchmark datasets from UCI repository, handwritten digits recognition and text classification, show that the Twin Hypersphere-KSVC gets better classification performance in comparison with the other classical multi-classifiers.

Author Contributions

Investigation, Q.A., Y.W.; Methodology, Q.A., A.W.; Software, Q.A.; Validation, Q.A., W.W.; Supervision, A.W., A.Z.; Funding acquisition, Q.A., A.W.; Writing—original draft, Q.A.; Writing—review and editing, Q.A.

Funding

This research was funded by Natural Science Foundation of Liaoning province in China (20180551048 and 201601291) and Talent Cultivation Project of University of Science and Technology Liaoning in China (2018RC05).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Vapnik, V.N. The Nature of Statistic Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2000; pp. 17–34. [Google Scholar]
  2. Zhang, X. Introduction to statistical learning theory and support vector machines. Acta Automatica Sinica 2000, 26, 32–42. [Google Scholar]
  3. Long, W.; Song, L.; Tian, Y. A new graphic kernel method of stock price trend prediction based on financial news semantic and structural similarity. Expert Syst. Appl. 2019, 118, 411–424. [Google Scholar] [CrossRef]
  4. Lei, C.; Deng, J.; Cao, K.; Xiao, Y.; Ma, L.; Wang, W.; Ma, T.; Shu, C. A comparison of random forest and support vector machine approaches to predict coal spontaneous combustion in gob. Fuel 2019, 239, 297–311. [Google Scholar] [CrossRef]
  5. Zhao, X.; Zang, W.; Lv, R.; Cui, W. Effective information filtering mining of internet of brain things based on support vector machine. IEEE Access 2019, 7, 191–202. [Google Scholar] [CrossRef]
  6. Xie, F.; Li, F.; Lei, C.; Yang, J.; Zhang, Y. Unsupervised band selection based on artificial bee colony algorithm for hyperspectral image classification. Appl. Soft Comput. 2019, 75, 428–440. [Google Scholar] [CrossRef]
  7. Qiao, X.; Bao, J.; Zhang, H.; Wan, F.; Li, D. fvUnderwater sea cucumber identification based on Principal Component Analysis and Support Vector Machine. Measurement 2019, 133, 444–455. [Google Scholar] [CrossRef]
  8. Maltarollo, V.G.; Kronenberger, T.; Espinoza, G.Z.; Oliveira, P.R.; Honorio, K.M. Advances with support vector machines for novel drug discovery. Expert Opin. Drug Discov. 2019, 14, 23–33. [Google Scholar] [CrossRef]
  9. Jayadeva; Khemchandani, R.; Chandra, S. Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 905–910. [Google Scholar] [CrossRef]
  10. Mir, A.; Nasiri, J.A. KNN-based least squares twin support vector machine for pattern classification. Appl. Intell. 2018, 48, 4551–4564. [Google Scholar] [CrossRef]
  11. Wang, H.; Zhou, Z.; Xu, Y. An improved ν-twin bounded support vector machine. Appl. Intell. 2018, 48, 1041–1053. [Google Scholar] [CrossRef]
  12. Shao, Y.; Zhang, C.; Wang, X.; Deng, N. Improvements on twin support vector machines. IEEE Trans. Neural Netw. 2011, 22, 962–968. [Google Scholar] [CrossRef] [PubMed]
  13. Qi, Z.; Tian, Y.; Shi, Y. Structural twin support vector machine for classification. Knowl. Based Syst. 2013, 43, 74–81. [Google Scholar] [CrossRef]
  14. Tian, Y.; Qi, Z.; Ju, X.; Shi, Y.; Liu, X. Nonparallel support vector machines for pattern classification. IEEE Trans. Syst. Man Cybern. 2014, 44, 1067–1079. [Google Scholar] [CrossRef]
  15. Wang, Z.; Shao, Y.; Bai, L.; Deng, N. Twin support vector machine for clustering. IEEE Trans. Neural Netw. 2015, 26, 2583–2588. [Google Scholar] [CrossRef]
  16. Ye, Q.; Zhao, C.; Gao, S.; Zheng, H. Weighted twin support vector machines with local information and its application. Neural Netw. 2012, 35, 31–39. [Google Scholar] [CrossRef]
  17. Peng, X.; Xu, D. Bi-density twin support vector machines for pattern recognition. Neurocomputing 2013, 99, 134–143. [Google Scholar] [CrossRef]
  18. Chen, S.; Wu, X. A new fuzzy twin support vector machine for pattern classification. Int. J. Mach. Learn. Cybern. 2018, 9, 1553–1564. [Google Scholar] [CrossRef]
  19. Xu, Y.; Yang, Z.; Pan, X. A novel twin support-vector machine with pinball loss. IEEE Trans. Neural Netw. 2017, 28, 359–370. [Google Scholar] [CrossRef]
  20. Chen, W.; Shao, Y.; Li, C.; Deng, N. MLTSVM: A novel twin support vector machine to multi-label learning. Pattern Recognit. 2016, 52, 61–74. [Google Scholar] [CrossRef]
  21. Tang, J.; Li, D.; Tian, Y.; Liu, D. Multi-view learning based on nonparallel support vector machine. Knowl. Based Syst. 2018, 158, 94–108. [Google Scholar] [CrossRef]
  22. Tang, L.; Tian, Y.; Yang, C. Nonparallel support vector regression model and its SMO-type solver. Neural Netw. 2018, 105, 431–446. [Google Scholar] [CrossRef] [PubMed]
  23. Xie, X. Improvement on projection twin support vector machine. Neural Comput. Appl. 2018, 30, 371–387. [Google Scholar] [CrossRef]
  24. Tang, L.; Tian, Y.; Yang, C.; Pardalos, P.M. Ramp-loss nonparallel support vector regression: Robust, sparse and scalable approximation. Knowl. Based Syst. 2018, 147, 55–67. [Google Scholar] [CrossRef]
  25. Peng, X.; Xu, D. A twin-hypersphere support vector machine classifier and the fast learning algorithm. Inf. Sci. 2013, 221, 12–27. [Google Scholar] [CrossRef]
  26. Xu, Y.; Wang, Q.; Pang, X.; Tian, Y. Maximum margin of twin spheres machine with pinball loss for imbalanced data classification. Appl. Intell. 2018, 48, 23–34. [Google Scholar] [CrossRef]
  27. Peng, X.; Shen, J. A twin-hyperspheres support vector machine with automatic variable weights for data classification. Inf. Sci. 2017, 417, 216–235. [Google Scholar] [CrossRef]
  28. Ai, Q.; Wang, A.; Wang, Y.; Sun, H. Improvements on twin-hypersphere support vector machine using local density information. Prog. Artif. Intell. 2018, 7, 167–175. [Google Scholar] [CrossRef]
  29. Angulo, C.; Parra, X.; Catala, A. K-SVCR. A support vector machine for multi-class classification. Neurocomputing 2003, 55, 57–77. [Google Scholar] [CrossRef]
  30. Xu, Y.; Guo, R.; Wang, L. A twin multi-class classification support vector machine. Cogn. Comput. 2013, 5, 580–588. [Google Scholar] [CrossRef]
  31. Ai, Q.; Wang, A.; Wang, Y.; Sun, H. An improved Twin-KSVC with its applications. Neural Comput. Appl. 2018. [Google Scholar] [CrossRef]
  32. Xu, Y.; Guo, R. A twin hyper-sphere multi-class classification support vector machine. J. Intell. Fuzzy Syst. 2014, 27, 1783–1790. [Google Scholar]
  33. Ar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  34. García, S.; Fernández, A.; Luengo, J.; Herrera, F. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 2010, 180, 2044–2064. [Google Scholar] [CrossRef]
Figure 1. (a) K-SVCR. (b) Twin-KSVC. (c) THKSVM. (d) Twin Hypersphere-KSVC.
Figure 1. (a) K-SVCR. (b) Twin-KSVC. (c) THKSVM. (d) Twin Hypersphere-KSVC.
Electronics 08 01195 g001
Figure 2. Illustration of 10 digits in the USPS dataset.
Figure 2. Illustration of 10 digits in the USPS dataset.
Electronics 08 01195 g002
Table 1. The learning complexities of Twin-KSVC, K-SVCR and Twin Hypersphere-KSVC.
Table 1. The learning complexities of Twin-KSVC, K-SVCR and Twin Hypersphere-KSVC.
KernelTwin-KSVCK-SVCRTwin Hypersphere-KSVC
linear kernel K ( K 1 ) O ( d 3 ) + O ( ( 3 4 n ) 3 ) ) K ( K 1 ) 2 O ( ( 3 2 n ) 3 ) K ( K 1 ) O ( ( 3 4 n ) 3 )  
nonlinear kernel K ( K 1 ) ( O ( n 3 ) + O ( ( 3 4 n ) 3 ) ) K ( K 1 ) 2 O ( ( 3 2 n ) 3 ) K ( K 1 ) O ( ( 3 4 n ) 3 )
Table 2. The statistics of benchmark datasets.
Table 2. The statistics of benchmark datasets.
Dataset#Attributes#Samples#Classes
Wine131783
Iris41503
Ecoli73275
Soybean35474
Hayes-roth51323
Teaching-evaluation51513
Dermatology343586
Balance46253
Table 3. Classification accuracy of THKSVM, ITKSVC, Twin-KSVC, K-SVCR and Twin Hypersphere-KSVC on UCI benchmark datasets.
Table 3. Classification accuracy of THKSVM, ITKSVC, Twin-KSVC, K-SVCR and Twin Hypersphere-KSVC on UCI benchmark datasets.
DatasetTwin-KSVC(%)THKSVM(%)K-SVCR(%)ITKSVC(%)Twin Hypersphere-KSVC(%)
Wine97.29 ± 2.2892.24 ± 10.2297.96 ± 2.4797.53 ± 2.2098.30 ± 2.55
Iris93.47 ± 3.6791.20 ± 4.5295.47 ± 3.0795.53 ± 4.0396.00 ± 3.06
Ecoli84.33 ± 4.6077.75 ± 2.8087.34 ± 3.0785.63 ± 3.7083.04 ± 4.21
Soybean98.00 ± 4.47100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0
Hayes-roth61.07 ± 7.9039.40 ± 2.0554.02 ± 8.3171.28 ± 9.9252.50 ± 8.21
Teaching-evaluation56.43 ± 8.2350.28 ± 11.7151.91 ± 8.3452.02 ± 6.2759.44 ± 7.52
Dermatology95.43 ± 2.1970.18 ± 3.1697.66 ± 1.8294.19 ± 2.7496.64 ± 2.39
Balance96.92 ± 1.2990.05 ± 2.9494.37 ± 1.9697.73 ± 1.1990.91 ± 1.34
The bolded classification accuracy is the highest one in all multi-classifiers.
Table 4. Learning time of THKSVM, ITKSVC, Twin-KSVC, K-SVCR and Twin Hypersphere-KSVC on UCI benchmark datasets.
Table 4. Learning time of THKSVM, ITKSVC, Twin-KSVC, K-SVCR and Twin Hypersphere-KSVC on UCI benchmark datasets.
DatasetTwin-KSVC(s)THKSVM(s)K-SVCR(s)ITKSVC(s)Twin Hypersphere-KSVC(s)
Wine2.19850.27492.53621.86731.1457
Iris1.72600.25332.05691.53470.7429
Ecoli38.79931.032962.296827.298715.1017
Soybean1.15270.17870.46660.63550.5374
Hayes-roth2.43580.18481.76961.13640.5671
Teaching-evaluation3.65120.18491.71801.23301.0899
Dermatology114.4241.081189.205741.046719.6405
Balance45.29073.780468.738143.711313.5046
Table 5. The rank of five algorithms on UCI benchmark datasets in the light of classification accuracy.
Table 5. The rank of five algorithms on UCI benchmark datasets in the light of classification accuracy.
DatasetTwin-KSVCTHKSVMK-SVCRITKSVCTwin Hypersphere-KSVC
Wine45231
Iris45321
Ecoli35124
Soybean52.52.52.52.5
Hayes-roth25314
Teaching-evaluation25431
Dermatology35142
Balance25314
Average3.134.692.812.312.44
Table 6. Classification results of Twin Hypersphere-KSVC, Twin-KSVC, THKSVM, K-SVCR and ITKSVC for handwritten digits recognition on the USPS dataset.
Table 6. Classification results of Twin Hypersphere-KSVC, Twin-KSVC, THKSVM, K-SVCR and ITKSVC for handwritten digits recognition on the USPS dataset.
AlgorithmsTwin-KSVCTHKSVMK-SVCRITKSVCTwin Hypersphere-KSVC
Accuracy (%)53.13 ± 5.3872.55 ± 3.5585.09 ± 2.9977.18 ± 4.2485.75 ± 2.15
Learning time (s)377.54632.1101641.946348.6982342.983
Table 7. The statistics of text classification dataset.
Table 7. The statistics of text classification dataset.
LabelCocoaCoffeeCornCiceCubberSoybean
Training dataset52101176463583
Test dataset234476211536
Table 8. Classification results of Twin Hypersphere-KSVC, Twin-KSVC, THKSVM, K-SVCR and ITKSVC for text classification on the Reuters21578 dataset.
Table 8. Classification results of Twin Hypersphere-KSVC, Twin-KSVC, THKSVM, K-SVCR and ITKSVC for text classification on the Reuters21578 dataset.
AlgorithmsTwin-KSVCTHKSVMK-SVCRITKSVCTwin Hypersphere-KSVC
F10.44740.59320.70760.70780.7414
Learning time (s)36.94860.475868.794726.421611.9575

Share and Cite

MDPI and ACS Style

Ai, Q.; Wang, A.; Zhang, A.; Wang, W.; Wang, Y. An Effective Multiclass Twin Hypersphere Support Vector Machine and Its Practical Engineering Applications. Electronics 2019, 8, 1195. https://doi.org/10.3390/electronics8101195

AMA Style

Ai Q, Wang A, Zhang A, Wang W, Wang Y. An Effective Multiclass Twin Hypersphere Support Vector Machine and Its Practical Engineering Applications. Electronics. 2019; 8(10):1195. https://doi.org/10.3390/electronics8101195

Chicago/Turabian Style

Ai, Qing, Anna Wang, Aihua Zhang, Wenhui Wang, and Yang Wang. 2019. "An Effective Multiclass Twin Hypersphere Support Vector Machine and Its Practical Engineering Applications" Electronics 8, no. 10: 1195. https://doi.org/10.3390/electronics8101195

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop