An Effective Multiclass Twin Hypersphere Support Vector Machine and Its Practical Engineering Applications

Ai, Qing; Wang, Anna; Zhang, Aihua; Wang, Wenhui; Wang, Yang

doi:10.3390/electronics8101195

Open AccessArticle

An Effective Multiclass Twin Hypersphere Support Vector Machine and Its Practical Engineering Applications

by

Qing Ai

^1,2,*

,

Anna Wang

^2,*,

Aihua Zhang

³,

Wenhui Wang

² and

Yang Wang

²

¹

School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China

²

College of Information Science and Engineering, Northeastern University, Shenyang 110819, China

³

College of Engineering, Bohai University, Jinzhou 121000, China

^*

Authors to whom correspondence should be addressed.

Electronics 2019, 8(10), 1195; https://doi.org/10.3390/electronics8101195

Submission received: 27 September 2019 / Revised: 16 October 2019 / Accepted: 17 October 2019 / Published: 20 October 2019

(This article belongs to the Special Issue Fault Detection and Diagnosis of Intelligent Mechatronic Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Twin-KSVC (Twin Support Vector Classification for K class) is a novel and efficient multiclass twin support vector machine. However, Twin-KSVC has the following disadvantages. (1) Each pair of binary sub-classifiers has to calculate inverse matrices. (2) For nonlinear problems, a pair of additional primal problems needs to be constructed in each pair of binary sub-classifiers. For these disadvantages, a new multi-class twin hypersphere support vector machine, named Twin Hypersphere-KSVC, is proposed in this paper. Twin Hypersphere-KSVC also evaluates each sample into 1-vs-1-vs-rest structure, as in Twin-KSVC. However, our Twin Hypersphere-KSVC does not seek two nonparallel hyperplanes in each pair of binary sub-classifiers as in Twin-KSVC, but a pair of hyperspheres. Compared with Twin-KSVC, Twin Hypersphere-KSVC avoids computing inverse matrices, and for nonlinear problems, can apply the kernel trick to linear case directly. A large number of comparisons of Twin Hypersphere-KSVC with Twin-KSVC on a set of benchmark datasets from the UCI repository and several real engineering applications, show that the proposed algorithm has higher training speed and better generalization performance.

Keywords:

K-SVCR; Twin-KSVC; 1-vs-1-vs-rest; twin hypersphere support vector machine

1. Introduction

Support vector machine (SVM) [1,2], as a computationally powerful tool for classification, have already applied in wide engineering problems [3,4,5,6,7,8]. The SVM has three elements that make it so successful, including structural risk minimization (SRM) principle, kernel trick and dual theory. However, SVM has to solve a large-sized quadratic programming problem (QPP), which greatly limits its applications. To improve learning complexity of SVM, Jayadeva et al. proposed twin SVM (TSVM) [9]. Unlike SVM that seeks an optimal separating hyperplane which maximizes the margin of two classes of samples, TSVM constructs two nonparallel proximal hyperplanes, each of which is close to the corresponding class as possible, and keeps as far away as possibly from the opposite class. The strategy makes TSVM only need to solve two smaller QPPs, instead of one larger QPP as in SVM. Due to its high learning speed, TSVM has attracted interest in recent years. Many improvements have also been proposed [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28], where the twin hypersphere SVM (THSVM) [25,26,27,28] is an excellent improvement of TSVM. Unlike TSVM, THSVM seeks a pair of hyperspheres, instead of two nonparallel hyperplanes, to depict two classes of samples. Compared with TSVM, THSVM has better classification performance.

The SVM and TSVM can only solve the binary classification problems, however, many practical engineering problems are often related to multi-class classification in the real world. Currently, we usually use 1-vs-rest and 1-vs-1 strategy to resolve multi-class classification problems. In the 1-vs-1 SVM,

K (K - 1) / 2

binary SVM sub-classifiers are constructed. Each sub-classifier can be trained by using two classes of samples. Because only two classes are considered for each sub-classifier in the 1-vs-1 SVM and the rest samples are not involved, the 1-vs-1 SVM may get unfavorable classification results. The 1-vs-rest SVM needs to construct K binary SVM sub-classifiers. Each sub-classifier can be trained by using all the samples; thus, the 1-vs-rest SVM may lead to class imbalance problems. For above drawbacks of 1-vs-1 SVM and 1-vs-rest SVM, Angulo et al. proposed a new support vector classification-regression machine for K class (K-SVCR) [29].

K (K - 1) / 2

binary SVM sub-classifiers are constructed, each of which is trained with all the samples and evaluates each sample into 1-vs-1-vs-rest structure. K-SVCR avoids the class imbalance problems and information loss. Compared with 1-vs-1 SVM and 1-vs-rest SVM, K-SVCR achieves better generalization performance. Twin-KSVC [30,31], being an effective extension of K-SVCR, is based on TSVM and also evaluates each sample into 1-vs-1-vs-rest structure. Twin-KSVC achieves higher learning speed in comparison with K-SVCR. However, Twin-KSVC has the following disadvantages:

Each pair of sub-classifiers has to calculate inverse matrices, which is extraordinarily time-consuming for the large-scale engineering problems.
For nonlinear problems, each pair of sub-classifiers needs to construct a pair of additional primal problems, instead of directly applying the kernel trick to linear case as in SVM.

For the disadvantages of Twin-KSVC, in this paper, we propose a Twin Hypersphere-KSVC, inspired by THSVM. The Twin Hypersphere-KSVC also evaluates each sample into 1-vs-1-vs-rest structure, as in Twin-KSVC. However, our Twin Hypersphere-KSVC does not seek two nonparallel hyperplanes in each pair of binary sub-classifiers as in Twin-KSVC, but a pair of hyperspheres. Compared with Twin-KSVC, Twin Hypersphere-KSVC avoids computing inverse matrices, and for nonlinear problems, can apply the kernel trick to linear case directly.

This paper is outlined as follows. We briefly review the related multi-class classification algorithms in Section 2. In Section 3, the Twin Hypersphere-KSVC is proposed in detail. The experimental results on a set of benchmark datasets and several real engineering problems are presented in Section 4 and the conclusions are drawn in last section.

2. Related Works

In this paper, we consider a multi-class classification problem with a training dataset

D = {x_{p}^{k} \in R^{d} | k = 1, \dots, K, p = 1, \dots, n_{k}}

, where K is the number of classes and

n_{k}

is the number of the samples of the k-th class. The size of training dataset is

n = n_{1} + \dots + n_{K}

. Denote, for convenience, by

X_{k}

the sets of samples of the k-th class, i.e.,

X_{k} = {x_{p}^{k} | p = 1, \dots, n_{k}}

.

2.1. Review of K-SVCR Multi-Classifier

The K-SVCR multi-classifier [29] is based on decomposition-reconstruction strategy.

K (K - 1) / 2

binary SVM sub-classifiers are constructed, each of which evaluates each sample into 1-vs-1-vs-rest structure. The classification result of K-SVCR is shown in Figure 1a intuitively.

The sub-classifier

f^{i j} (x)

for two focused classes i and j in K-SVCR seeks an optimal hyperplane

w^{i j} \cdot x + b^{i j} = 0,

(1)

here

w^{i j} \in R^{d}

is the normal vector and

b^{i j} \in R

is the bias term. The optimal hyperplane can be obtained by resolving the following QPP:

\begin{matrix} min \frac{1}{2} {∥w^{i j}∥}^{2} + c_{1} (\sum_{p = 1}^{n_{i}} η_{p}^{i j} + \sum_{p = 1}^{n_{j}} η_{p}^{i j *}) + c_{2} \sum_{p = 1}^{n_{\bar{i j}}} (ξ_{p}^{i j} + ξ_{p}^{i j *}), \\ s . t . w^{i j} \cdot x_{p}^{i} + b^{i j} \geq 1 - η_{p}^{i j}, p = 1, \dots, n_{i}, \\ w^{i j} \cdot x_{p}^{j} + b^{i j} \leq - 1 + η_{p}^{i j *}, p = 1, \dots, n_{j}, \\ - ε - ξ_{p}^{i j *} < w^{i j} \cdot x_{p}^{\bar{i j}} + b^{i j} \leq ε + ξ_{p}^{i j}, p = 1, \dots, n_{\bar{i j}}, \\ η_{p}^{i j} \geq 0, p = 1, \dots, n_{i}, η_{p}^{i j *} \geq 0, p = 1, \dots, n_{j}, \\ ξ_{p}^{i j} \geq 0, ξ_{p}^{i j *} \geq 0, p = 1, \dots, n_{\bar{i j}}, \end{matrix}

(2)

where

x_{p}^{\bar{i j}} \in D - X_{i} - X_{j}

,

n_{\bar{i j}} = n - n_{i} - n_{j}

,

ξ_{p}^{i j}

,

ξ_{p}^{i j *}

and

η_{p}^{i j}

,

η_{p}^{i j *}

are slack variables, the parameter

ε

is restricted to

[0, 1)

.

For a testing sample x, the sub-classifier

f^{i j} (x) = w^{i j} \cdot x + b^{i j}

determines its class by

\begin{matrix} F^{i j} (x) = \{\begin{matrix} - 1, & i f f^{i j} (x) < ε, \\ 1, & i f f^{i j} (x) > ε, \\ 0, & e l s e . \end{matrix} \end{matrix}

(3)

For the testing sample x, the final label can be determined by vote rule.

2.2. Review of Twin-KSVC Multi-Classifier

Twin-KSVC [30,31] is an improvement of K-SVCR. The Twin-KSVC constructs

K (K - 1) / 2

pairs of binary TSVM sub-classifiers, which evaluates each sample into 1-vs-1-vs-rest structure. The classification result is intuitively presented in Figure 1b.

The sub-classifiers

f^{i} (x)

and

f^{j} (x)

for two focused classes i and j in Twin-KSVC seek a pair of hyperplane

w^{i} \cdot x + b^{i} = 0 a n d w^{j} \cdot x + b^{j} = 0

(4)

where

w^{i (j)} \in R^{d}

and

b^{i (j)} \in R

are the normal vector and the bias term of the corresponding hyperplane, respectively. The two hyperplanes can be obtained by resolving the QPPs as follows:

\begin{matrix} min \frac{1}{2} \sum_{p = 1}^{n_{i}} {(w^{i} \cdot x_{p}^{i} + b^{i})}^{2} + c_{1} \sum_{p = 1}^{n_{j}} η_{p}^{i} + c_{2} \sum_{p = 1}^{n_{\bar{i j}}} ξ_{p}^{i}, \\ s . t . - (w^{i} \cdot x_{p}^{j} + b^{i}) + η_{p}^{i} \geq 1, p = 1, \dots, n_{j}, \\ - (w^{i} \cdot x_{p}^{\bar{i j}} + b^{i}) + ξ_{p}^{i} \geq 1 - ε, p = 1, \dots, n_{\bar{i j}}, \\ η_{p}^{i} \geq 0, p = 1, \dots, n_{j}, \\ ξ_{p}^{i} \geq 0, p = 1, \dots, n_{\bar{i j}}, \end{matrix}

(5)

\begin{matrix} min \frac{1}{2} \sum_{p = 1}^{n_{j}} {(w^{j} \cdot x_{p}^{j} + b^{j})}^{2} + c_{3} \sum_{p = 1}^{n_{i}} η_{p}^{j} + c_{4} \sum_{p = 1}^{n_{\bar{i j}}} ξ_{p}^{j}, \\ s . t . (w^{j} \cdot x_{p}^{i} + b^{j}) + η_{p}^{j} \geq 1, p = 1, \dots, n_{i}, \\ (w^{j} \cdot x_{p}^{\bar{i j}} + b^{j}) + ξ_{p}^{j} \geq 1 - ε, p = 1, \dots, n_{\bar{i j}}, \\ η_{p}^{j} \geq 0, p = 1, \dots, n_{i}, \\ ξ_{p}^{j} \geq 0, p = 1, \dots, n_{\bar{i j}}, \end{matrix}

(6)

where

η_{p}^{i (o r j)}

and

ξ_{p}^{i (o r j)}

are slack variables.

For a testing sample x, the sub-classifiers

f^{i} (x) = x^{T} w^{i} + b^{i}

and

f^{j} (x) = x^{T} w^{j} + b^{j}

assign its class by

\begin{matrix} F^{i j} (x) = \{\begin{matrix} - 1, & i f f^{j} (x) < 1 - ε, \\ 1, & i f f^{i} (x) > - 1 + ε, \\ 0, & e l s e . \end{matrix} \end{matrix}

(7)

For the testing sample x, the final label can be also determined by vote rule.

2.3. Review of THKSVM Multi-Classifier

THKSVM (Twin Hypersphere Multiclass Support Vector Machine) [32] integrates THSVM and 1-vs-rest structure. THKSVM constructs K hyperspheres in the training stage, whose classification result is intuitively shown in Figure 1c.

The sub-classifier for the focused classes i in THKSVM seeks a hypersphere

{∥x - a_{i}∥}^{2} = R_{i}^{2}

(8)

where

a_{i} \in R^{d}

and

R_{i} \in R

are the center and the radius of the corresponding hypersphere, respectively. The hypersphere can be constructed by resolving the following QPP:

\begin{matrix} min \frac{1}{2} \sum_{p = 1}^{n_{\bar{i}}} {∥x_{p}^{\bar{i}} - a_{i}∥}^{2} - v_{1} R_{i}^{2} + c_{1} \sum_{p = 1}^{n_{i}} η_{p}^{i}, \\ s . t . {∥x_{p}^{i} - a_{i}∥}^{2} \geq R_{i}^{2} - η_{p}^{i}, \\ R_{i}^{2} \geq 0, η_{p}^{i} \geq 0, p = 1, \dots, n_{i}, \end{matrix}

(9)

where

x_{p}^{\bar{i}} \in D - X_{i}

,

n_{\bar{i}} = n - n_{i}

and

η_{p}^{i} \geq 0

are slack variables.

The class of a testing sample x can be determined by

C l a s s k = arg max_{i = 1, \dots, K} \frac{{∥x - a_{i}∥}^{2}}{R_{i}^{2}} .

(10)

3. Twin Hypersphere-KSVC

Twin Hypersphere-KSVC, inspired by THSVM and 1-vs-1-vs-rest structure, constructs

K (K - 1) / 2

pairs of hyperspheres in the training stage. For two focused classes i and j, Twin Hypersphere-KSVC seeks a pair of hypersphere

(a_{i}, R_{i})

and

(a_{j}, R_{j})

, where

a_{i}

(

R_{i}

) and

a_{j}

(

R_{j}

) are respectively the centers (radii) of the corresponding hyperspheres. Each hypersphere covers the corresponding class as many as possibly, keeps as far away as possibly from another class, contains the rest of samples as little as possibly, and the radius of the hypersphere is as small as possible. The Twin Hypersphere-KSVC are intuitively presented in Figure 1d.

3.1. Linear Case

For the linear case, each pair of hyperspheres

(a_{i}, R_{i})

and

(a_{j}, R_{j})

for two focused classes i and j in Twin Hypersphere-KSVC is constructed by resolving the following QPPs:

\begin{matrix} min R_{i}^{2} - \frac{v_{1}}{n_{j}} \sum_{p = 1}^{n_{j}} {∥x_{p}^{j} - a_{i}∥}^{2} + \frac{c_{1}}{n_{i}} \sum_{p = 1}^{n_{i}} η_{p}^{i} + \frac{c_{2}}{n_{\bar{i j}}} \sum_{p = 1}^{n_{\bar{i j}}} ξ_{p}^{i}, \\ s . t . {∥x_{p}^{i} - a_{i}∥}^{2} \leq R_{i}^{2} + η_{p}^{i}, p = 1, \dots, n_{i}, \\ {∥x_{p}^{\bar{i j}} - a_{i}∥}^{2} \geq R_{i}^{2} - ξ_{p}^{i}, p = 1, \dots, n_{\bar{i j}}, \\ η_{p}^{i} \geq 0, p = 1, \dots, n_{i}, \\ ξ_{p}^{i} \geq 0, p = 1, \dots, n_{\bar{i j}}, \\ R_{i}^{2} \geq 0, \end{matrix}

(11)

\begin{matrix} min R_{j}^{2} - \frac{v_{2}}{n_{i}} \sum_{p = 1}^{n_{i}} {∥x_{p}^{i} - a_{j}∥}^{2} + \frac{c_{3}}{n_{j}} \sum_{p = 1}^{n_{j}} η_{p}^{j} + \frac{c_{4}}{n_{\bar{i j}}} \sum_{p = 1}^{n_{\bar{i j}}} ξ_{p}^{j}, \\ s . t . {∥x_{p}^{j} - a_{j}∥}^{2} \leq R_{j}^{2} + η_{p}^{j}, p = 1, \dots, n_{j}, \\ {∥x_{p}^{\bar{i j}} - a_{j}∥}^{2} \geq R_{j}^{2} - ξ_{p}^{j}, p = 1, \dots, n_{\bar{i j}}, \\ η_{p}^{j} \geq 0, p = 1, \dots, n_{j}, \\ ξ_{p}^{j} \geq 0, p = 1, \dots, n_{\bar{i j}}, \\ R_{j}^{2} \geq 0 . \end{matrix}

(12)

where

η_{p}^{i (j)}

and

ξ_{p}^{i (j)}

are slack variables.

The Lagrangian function L for the QPP (11) is given by:

\begin{matrix} L = R_{i}^{2} - \frac{v_{1}}{n_{j}} \sum_{p = 1}^{n_{j}} {∥x_{p}^{j} - a_{i}∥}^{2} + \frac{c_{1}}{n_{i}} \sum_{p = 1}^{n_{i}} η_{p}^{i} + \frac{c_{2}}{n_{\bar{i j}}} \sum_{p = 1}^{n_{\bar{i j}}} ξ_{p}^{i} + \sum_{p = 1}^{n_{i}} α_{p} ({∥x_{p}^{i} - a_{i}∥}^{2} - R_{i}^{2} - η_{p}^{i}) \\ - \sum_{p = 1}^{n_{\bar{i j}}} β_{p} ({∥x_{p}^{\bar{i j}} - a_{i}∥}^{2} - R_{i}^{2} + ξ_{p}^{i}) - \sum_{p = 1}^{n_{i}} s_{p} η_{p}^{i} - \sum_{p = 1}^{n_{\bar{i j}}} q_{p} ξ_{p}^{i} - λ R_{i}^{2} . \end{matrix}

(13)

The Karush-Kuhn-Tucker (KKT) conditions are satisfied as follows:

2 \frac{v_{1}}{n_{j}} \sum_{p = 1}^{n_{j}} (x_{p}^{j} - a_{i}) - 2 \sum_{p = 1}^{n_{i}} α_{p} (x_{p}^{i} - a_{i}) + 2 \sum_{p = 1}^{n_{\bar{i j}}} β_{p} (x_{p}^{\bar{i j}} - a_{i}) = 0,

(14)

1 - \sum_{p = 1}^{n_{i}} α_{p} + \sum_{p = 1}^{n_{\bar{i j}}} β_{p} - λ = 0,

(15)

\frac{c_{1}}{n_{i}} - α_{p} - s_{p} = 0, p = 1, \dots, n_{i},

(16)

\frac{c_{2}}{n_{\bar{i j}}} - β_{p} - q_{p} = 0, p = 1, \dots, n_{\bar{i j}},

(17)

α_{p} ({∥x_{p}^{i} - a_{i}∥}^{2} - R_{i}^{2} - η_{p}^{i}) = 0, p = 1, \dots, n_{i},

(18)

β_{p} ({∥x_{p}^{\bar{i j}} - a_{i}∥}^{2} - R_{i}^{2} + ξ_{p}^{i}) = 0, p = 1, \dots, n_{\bar{i j}},

(19)

s_{p} η_{p}^{i} = 0, p = 1, \dots, n_{i},

(20)

q_{p} ξ_{p}^{i} = 0, p = 1, \dots, n_{\bar{i j}},

(21)

λ R_{i}^{2} = 0 .

(22)

From (14), (15) and (22), we can obtain

a_{i} = \frac{1}{1 - v_{1}} (\sum_{p = 1}^{n_{i}} α_{p} x_{p}^{i} - \frac{v_{1}}{n_{j}} \sum_{p = 1}^{n_{j}} x_{p}^{j} - \sum_{p = 1}^{n_{\bar{i j}}} β_{p} x_{p}^{\bar{i j}})

(23)

By denoting

v_{i} = \frac{1}{1 - v_{1}}

(24)

and substituting (16)–(23) into (13), the dual optimal problem of (11) is obtained as follows:

\begin{matrix} max \sum_{p = 1}^{n_{i}} α_{p} x_{p}^{i} \cdot x_{p}^{i} - \sum_{p = 1}^{n_{\bar{i j}}} β_{p} x_{p}^{\bar{i j}} \cdot x_{p}^{\bar{i j}} + 2 v_{i} \frac{v_{1}}{n_{j}} \sum_{p_{1} = 1}^{n_{j}} \sum_{p_{2} = 1}^{n_{i}} α_{p_{2}} x_{p_{1}}^{j} \cdot x_{p_{2}}^{i} - 2 v_{i} \frac{v_{1}}{n_{j}} \sum_{p_{1} = 1}^{n_{j}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} β_{p_{2}} x_{p_{1}}^{j} \cdot x_{p_{2}}^{\bar{i j}} \\ - v_{i} \sum_{p_{1} = 1}^{n_{i}} \sum_{p_{2} = 1}^{n_{i}} α_{p_{1}} α_{p_{2}} x_{p_{1}}^{i} \cdot x_{p_{2}}^{i} + 2 v_{i} \sum_{p_{1} = 1}^{n_{i}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} α_{p_{1}} β_{p_{2}} x_{p_{1}}^{i} \cdot x_{p_{2}}^{\bar{i j}} - v_{i} \sum_{p_{1} = 1}^{n_{\bar{i j}}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} β_{p_{1}} β_{p_{2}} x_{p_{1}}^{\bar{i j}} \cdot x_{p_{2}}^{\bar{i j}} \\ s . t . 1 - \sum_{p = 1}^{n_{i}} α_{p} + \sum_{p = 1}^{n_{\bar{i j}}} β_{p} = 0, \\ 0 \leq α_{p} \leq \frac{c_{1}}{n_{i}}, p = 1, \dots, n_{i}, \\ 0 \leq β_{p} \leq \frac{c_{2}}{n_{\bar{i j}}}, p = 1, \dots, n_{\bar{i j}} . \end{matrix}

(25)

By defining

α = (\begin{matrix} α_{1} \\ ⋮ \\ α_{n_{i}} \end{matrix})

and

β = (\begin{matrix} β_{1} \\ ⋮ \\ β_{n_{\bar{i j}}} \end{matrix})

, the optimal problem (25) can be reformed as

\begin{matrix} max - v_{i} (\begin{matrix} α^{T} & β^{T} \end{matrix}) (\begin{matrix} X_{i}^{T} X_{i} & - X_{i}^{T} X_{\bar{i j}} \\ - X_{\bar{i j}}^{T} X_{i} & X_{\bar{i j}}^{T} X_{\bar{i j}} \end{matrix}) (\begin{matrix} α \\ β \end{matrix}) + {(\begin{matrix} d i a g (X_{i}^{T} X_{i}) + 2 v_{i} \frac{v_{1}}{n_{j}} X_{i}^{T} X_{j} e_{j} \\ - d i a g (X_{\bar{i j}}^{T} X_{\bar{i j}}) - 2 v_{i} \frac{v_{1}}{n_{j}} X_{\bar{i j}}^{T} X_{j} e_{j} \end{matrix})}^{T} (\begin{matrix} α \\ β \end{matrix}), \\ s . t . (\begin{matrix} e_{i}^{T} & - e_{\bar{i j}}^{T} \end{matrix}) (\begin{matrix} α \\ β \end{matrix}) = 1, \\ 0 e_{i} \leq α \leq \frac{c_{1}}{n_{i}} e_{i}, \\ 0 e_{\bar{i j}} \leq β \leq \frac{c_{2}}{n_{\bar{i j}}} e_{\bar{i j}} . \end{matrix}

(26)

According to the KKT conditions (16)–(21), we can obtain

R_{i}^{2}

by the following formula:

R_{i}^{2} = {∥x^{*} - a_{i}∥}^{2},

(27)

where

x^{*} \in S_{i 1} \cup S_{i 2}

,

S_{i 1} = {x_{p}^{i} | 0 < α_{p} < \frac{c_{1}}{n_{i}}}

and

S_{i 2} = {x_{p}^{\bar{i j}} | 0 < β_{p} < \frac{c_{2}}{n_{\bar{i j}}}}

.

By denoting

v_{j} = \frac{1}{1 - v_{2}}

, the dual optimal problem of (12) can be obtained as follows:

\begin{matrix} max \sum_{p = 1}^{n_{j}} θ_{p} x_{p}^{j} \cdot x_{p}^{j} - \sum_{p = 1}^{n_{\bar{i j}}} γ_{p} x_{p}^{\bar{i j}} \cdot x_{p}^{\bar{i j}} + 2 v_{j} \frac{v_{2}}{n_{i}} \sum_{p_{1} = 1}^{n_{i}} \sum_{p_{2} = 1}^{n_{j}} θ_{p_{2}} x_{p_{1}}^{i} \cdot x_{p_{2}}^{j} - 2 v_{j} \frac{v_{2}}{n_{i}} \sum_{p_{1} = 1}^{n_{i}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} γ_{p_{2}} x_{p_{1}}^{i} \cdot x_{p_{2}}^{\bar{i j}} \\ - v_{j} \sum_{p_{1} = 1}^{n_{j}} \sum_{p_{2} = 1}^{n_{j}} θ_{p_{1}} θ_{p_{2}} x_{p_{1}}^{j} \cdot x_{p_{2}}^{j} + 2 v_{j} \sum_{p_{1} = 1}^{n_{j}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} θ_{p_{1}} γ_{p_{2}} x_{p_{1}}^{j} \cdot x_{p_{2}}^{\bar{i j}} - v_{j} \sum_{p_{1} = 1}^{n_{\bar{i j}}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} γ_{p_{1}} γ_{p_{2}} x_{p_{1}}^{\bar{i j}} x_{p_{2}}^{\bar{i j}}, \\ s . t . 1 - \sum_{p = 1}^{n_{j}} θ_{p} + \sum_{p = 1}^{n_{\bar{i j}}} γ_{p} = 0, \\ 0 \leq θ_{p} \leq \frac{c_{3}}{n_{j}}, p = 1, \dots, n_{j}, \\ 0 \leq γ_{p} \leq \frac{c_{4}}{n_{\bar{i j}}}, p = 1, \dots, n_{\bar{i j}} . \end{matrix}

(28)

By defining

θ = (\begin{matrix} θ_{1} \\ ⋮ \\ θ_{n_{j}} \end{matrix})

and

γ = (\begin{matrix} γ_{1} \\ ⋮ \\ γ_{n_{\bar{i j}}} \end{matrix})

, the (28) can be reformulated as

\begin{matrix} max - v_{j} (\begin{matrix} θ^{T} & γ^{T} \end{matrix}) (\begin{matrix} X_{j}^{T} X_{j} & - X_{j}^{T} X_{\bar{i j}} \\ - X_{\bar{i j}}^{T} X_{j} & X_{\bar{i j}}^{T} X_{\bar{i j}} \end{matrix}) (\begin{matrix} θ \\ γ \end{matrix}) + {(\begin{matrix} d i a g (X_{j}^{T} X_{j}) + 2 v_{j} \frac{v_{2}}{n_{i}} X_{j}^{T} X_{i} e_{i} \\ - d i a g (X_{\bar{i j}}^{T} X_{\bar{i j}}) - 2 v_{j} \frac{v_{2}}{n_{i}} X_{\bar{i j}}^{T} X_{i} e_{i} \end{matrix})}^{T} (\begin{matrix} θ \\ γ \end{matrix}), \\ s . t . (\begin{matrix} e_{j}^{T} & - e_{\bar{i j}}^{T} \end{matrix}) (\begin{matrix} θ \\ γ \end{matrix}) = 1, \\ 0 e_{j} \leq θ \leq \frac{c_{3}}{n_{j}} e_{j}, \\ 0 e_{\bar{i j}} \leq γ \leq \frac{c_{4}}{n_{\bar{i j}}} e_{\bar{i j}} . \end{matrix}

(29)

We can compute

R_{j}^{2}

by the following formula:

R_{j}^{2} = {∥x^{*} - a_{j}∥}^{2},

(30)

where

x^{*} \in S_{j 1} \cup S_{j 2}

,

S_{j 1} = {x_{p}^{j} | 0 < θ_{p} < \frac{c_{3}}{n_{j}}}

and

S_{j 2} = {x_{p}^{\bar{i j}} | 0 < γ_{p} < \frac{c_{4}}{n_{\bar{i j}}}}

.

3.2. Nonlinear Case

We extend the linear Twin Hypersphere-KSVC to the nonlinear case by directly considering the nonlinear map

φ : R^{d} \to H

(H is a high-dimensional Hilbert space), instead of the kernel generated surfaces in Twin-KSVC.

\begin{matrix} min R_{i}^{2} - \frac{v_{1}}{n_{j}} \sum_{p = 1}^{n_{j}} {∥φ (x_{p}^{j}) - a_{i}∥}^{2} + \frac{c_{1}}{n_{i}} \sum_{p = 1}^{n_{i}} η_{p}^{i} + \frac{c_{2}}{n_{\bar{i j}}} \sum_{p = 1}^{n_{\bar{i j}}} ξ_{p}^{i}, \\ s . t . {∥φ (x_{p}^{i}) - a_{i}∥}^{2} \leq R_{i}^{2} + η_{p}^{i}, p = 1, \dots, n_{i}, \\ {∥φ (x_{p}^{\bar{i j}}) - a_{i}∥}^{2} \geq R_{i}^{2} - ξ_{p}^{i}, p = 1, \dots, n_{\bar{i j}}, \\ η_{p}^{i} \geq 0, p = 1, \dots, n_{i}, \\ ξ_{p}^{i} \geq 0, p = 1, \dots, n_{\bar{i j}}, \\ R_{i}^{2} \geq 0, \end{matrix}

(31)

\begin{matrix} min R_{j}^{2} - \frac{v_{2}}{n_{i}} \sum_{p = 1}^{n_{i}} {∥φ (x_{p}^{i}) - a_{j}∥}^{2} + \frac{c_{3}}{n_{j}} \sum_{p = 1}^{n_{j}} η_{p}^{j} + \frac{c_{4}}{n_{\bar{i j}}} \sum_{p = 1}^{n_{\bar{i j}}} ξ_{p}^{j}, \\ s . t . {∥φ (x_{p}^{j}) - a_{j}∥}^{2} \leq R_{j}^{2} + η_{p}^{j}, p = 1, \dots, n_{j}, \\ {∥φ (x_{p}^{\bar{i j}}) - a_{j}∥}^{2} \geq R_{j}^{2} - ξ_{p}^{j}, p = 1, \dots, n_{\bar{i j}}, \\ η_{p}^{j} \geq 0, p = 1, \dots, n_{j}, \\ ξ_{p}^{j} \geq 0, p = 1, \dots, n_{\bar{i j}}, \\ R_{j}^{2} \geq 0 . \end{matrix}

(32)

According to the dual theory, one can get the dual optimal problems of (31) and (32) as follows:

\begin{matrix} max - v_{i} (\begin{matrix} α^{T} & β^{T} \end{matrix}) (\begin{matrix} K (X_{i}, X_{i}) & - K (X_{i}, X_{\bar{i j}}) \\ - K (X_{\bar{i j}}, X_{i}) & K (X_{\bar{i j}}, X_{\bar{i j}}) \end{matrix}) (\begin{matrix} α \\ β \end{matrix}) \\ + {(\begin{matrix} d i a g (K (X_{i}, X_{i})) + 2 v_{i} \frac{v_{1}}{n_{j}} K (X_{i}, X_{j}) e_{j} \\ - d i a g (K (X_{\bar{i j}}, X_{\bar{i j}})) - 2 v_{i} \frac{v_{1}}{n_{j}} K (X_{\bar{i j}}, X_{j}) e_{j} \end{matrix})}^{T} (\begin{matrix} α \\ β \end{matrix}), \\ s . t . (\begin{matrix} e_{i}^{T} & - e_{\bar{i j}}^{T} \end{matrix}) (\begin{matrix} α \\ β \end{matrix}) = 1, \\ 0 e_{i} \leq α \leq \frac{c_{1}}{n_{i}} e_{i}, \\ 0 e_{\bar{i j}} \leq β \leq \frac{c_{2}}{n_{\bar{i j}}} e_{\bar{i j}}, \end{matrix}

(33)

\begin{matrix} max - v_{j} (\begin{matrix} θ^{T} & γ^{T} \end{matrix}) (\begin{matrix} K (X_{j}, X_{j}) & - K (X_{j}, X_{\bar{i j}}) \\ - K (X_{\bar{i j}}, X_{j}) & K (X_{\bar{i j}}, X_{\bar{i j}}) \end{matrix}) (\begin{matrix} θ \\ γ \end{matrix}) \\ + {(\begin{matrix} d i a g (K (X_{j}, X_{j})) + 2 v_{j} \frac{v_{2}}{n_{i}} K (X_{j}, X_{i}) e_{i} \\ - d i a g (K (X_{\bar{i j}}, X_{\bar{i j}})) - 2 v_{j} \frac{v_{2}}{n_{i}} K (X_{\bar{i j}}, X_{i}) e_{i} \end{matrix})}^{T} (\begin{matrix} θ \\ γ \end{matrix}), \\ s . t . (\begin{matrix} e_{j}^{T} & - e_{\bar{i j}}^{T} \end{matrix}) (\begin{matrix} θ \\ γ \end{matrix}) = 1, \\ 0 e_{j} \leq θ \leq \frac{c_{3}}{n_{j}} e_{j}, \\ 0 e_{\bar{i j}} \leq γ \leq \frac{c_{4}}{n_{\bar{i j}}} e_{\bar{i j}}, \end{matrix}

(34)

where

K (., .)

is a kernel matrix.

3.3. Decision Rule

For a testing sample x, each pair of sub-classifiers determines its label by

F_{i j} (x) = \{\begin{matrix} 1 & {∥φ (x) - a_{i}∥}^{2} / R_{i}^{2} \leq {∥φ (x) - a_{j}∥}^{2} / R_{j}^{2} \\ - 1 & {∥φ (x) - a_{i}∥}^{2} / R_{i}^{2} > {∥φ (x) - a_{j}∥}^{2} / R_{j}^{2} \end{matrix}

(35)

wher

\begin{matrix} {∥φ (x) - a_{i}∥}^{2} = K (x, x) + 2 v_{i} (\sum_{p = 1}^{n_{i}} α_{p} K (x_{p}^{i}, x) - \frac{v_{1}}{n_{j}} \sum_{p = 1}^{n_{j}} K (x_{p}^{j}, x) - \sum_{p = 1}^{n_{\bar{i j}}} β_{p} K (x_{p}^{\bar{i j}}, x)) \\ + v_{i}^{2} (\sum_{p_{1} = 1}^{n_{i}} \sum_{p_{2} = 1}^{n_{i}} α_{p_{1}} α_{p_{2}} K (x_{p_{1}}^{i}, x_{p_{2}}^{i}) + {(\frac{v_{1}}{n_{j}})}^{2} \sum_{p_{1} = 1}^{n_{j}} \sum_{p_{2} = 1}^{n_{j}} K (x_{p_{1}}^{j}, x_{p_{2}}^{j}) + \sum_{p_{1} = 1}^{n_{\bar{i j}}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} β_{p_{1}} β_{p_{2}} K (x_{p_{1}}^{\bar{i j}}, x_{p_{2}}^{\bar{i j}}) \\ - \frac{2 v_{1}}{n_{j}} \sum_{p_{1} = 1}^{n_{i}} \sum_{p_{2} = 1}^{n_{j}} α_{p_{1}} K (x_{p_{1}}^{i}, x_{p_{2}}^{j}) - 2 \sum_{p_{1} = 1}^{n_{i}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} α_{p_{1}} β_{p_{2}} K (x_{p_{1}}^{i}, x_{p_{2}}^{\bar{i j}}) + \frac{2 v_{1}}{n_{j}} \sum_{p_{1} = 1}^{n_{j}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} β_{p_{2}} K (x_{p_{1}}^{i}, x_{p_{2}}^{\bar{i j}})) \end{matrix}

(36)

and

\begin{matrix} {∥φ (x) - a_{j}∥}^{2} = K (x, x) + 2 v_{j} (\sum_{p = 1}^{n_{j}} θ_{p} K (x_{p}^{j}, x) - \frac{v_{2}}{n_{i}} \sum_{p = 1}^{n_{i}} K (x_{p}^{i}, x) - \sum_{p = 1}^{n_{\bar{i j}}} γ_{p} K (x_{p}^{\bar{i j}}, x)) \\ + v_{j}^{2} (\sum_{p_{1} = 1}^{n_{j}} \sum_{p_{2} = 1}^{n_{j}} θ_{p_{1}} θ_{p_{2}} K (x_{p_{1}}^{j}, x_{p_{2}}^{j}) + {(\frac{v_{2}}{n_{i}})}^{2} \sum_{p_{1} = 1}^{n_{i}} \sum_{p_{2} = 1}^{n_{i}} K (x_{p_{1}}^{i}, x_{p_{2}}^{i}) + \sum_{p_{1} = 1}^{n_{\bar{i j}}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} γ_{p_{1}} γ_{p_{2}} K (x_{p_{1}}^{\bar{i j}}, x_{p_{2}}^{\bar{i j}}) \\ - \frac{2 v_{2}}{n_{i}} \sum_{p_{1} = 1}^{n_{j}} \sum_{p_{2} = 1}^{n_{i}} θ_{p_{1}} K (x_{p_{1}}^{j}, x_{p_{2}}^{i}) - 2 \sum_{p_{1} = 1}^{n_{j}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} θ_{p_{1}} γ_{p_{2}} K (x_{p_{1}}^{j}, x_{p_{2}}^{\bar{i j}}) + \frac{2 v_{2}}{n_{i}} \sum_{p_{1} = 1}^{n_{i}} \sum_{p_{2} = 1}^{n_{\bar{i j}}} γ_{p_{2}} K (x_{p_{1}}^{i}, x_{p_{2}}^{\bar{i j}})) . \end{matrix}

(37)

For the testing sample x, the final label can be also determined by vote rule.

3.4. Analysis of Learning Complexity

Next, the learning complexity of the proposed Twin Hypersphere-KSVC will be discussed. We take the 4-class classification as an example, suppose samples of 4 classes are approximately equal, and present the learning complexity of K-SVCR, Twin-KSVC and Twin Hypersphere-KSVC in Table 1. The main calculating burden in Twin-KSVC includes solving QPPs and calculating inverse matrices. Therefore the learning complexities of linear and nonlinear Twin-KSVC are respectively

K (K - 1)

(

O (d^{3})

+

O ({(\frac{3}{4} n)}^{3})

) and

K (K - 1)

(

O (n^{3})

+

O ({(\frac{3}{4} n)}^{3})

). However, K-SVCR and Twin Hypersphere-KSVC avoid computing inverse matrices, the learning complexities of linear and nonlinear K-SVCR are both

\frac{K (K - 1)}{2} O ({(\frac{3}{2} n)}^{3})

while the learning complexities of linear and nonlinear Twin Hypersphere-KSVC are both

K (K - 1)

O ({(\frac{3}{4} n)}^{3})

. From the above analysis, we can see that our Twin Hypersphere-KSVC requires less learning time.

4. Experiments

In this section, we investigate classification performance of our Twin Hypersphere-KSVC, ITKSVC [31], THKSVM, Twin-KSVC and K-SVCR on a set of benchmark datasets from the UCI repository and several real engineering problems. The above algorithms are implemented in Matlab R2012a, and we use the “quadprog.m” function to solve the QPP and the “inv.m” function to calculate matrix inversion.

The parameter selection directly affects the classification performance of the above algorithms. We use the most popular exhaustive search to determine the parameters in this section. The K-SVCR includes two penalty parameters

c_{i} (i = 1, 2)

and bandwidth parameter

ε

. The Twin-KSVC has five parameters, including four penalty parameters

c_{i} (i = 1, 2, 3, 4)

and bandwidth parameter

ε

. The THKSVM holds the penalty parameters

c_{1}

and

v_{1}

. The ITKSVC contains seven parameters which are six penalty parameters

c_{i} (i = 1, 2, 3, 4, 5, 6)

and bandwidth parameter

ε

. There exist six penalty parameters

c_{i} (i = 1, 2, 3, 4)

and

v_{i} (i = 1, 2)

in our Twin Hypersphere-KSVC. The optimal values of penalty parameters

c_{i}

are searched from set

{2^{- 7}, \dots, 2^{7}}

, penalty parameters

v_{i}

from

{0.1, \dots, 0.9}

and bandwidth parameter

ε

from set

{0, \dots, 0.5}

.

4.1. Benchmark Datasets

We compare Twin Hypersphere-KSVC with ITKSVC, THKSVM, Twin-KSVC and K-SVCR on a set of benchmark datasets from the UCI repository in this subsection. The benchmark datasets are presented in Table 2. The 5-fold cross validation is used to estimate the testing accuracy and we use radial basis function

K (x, y) = e^{- {∥x - y∥}^{2} / σ^{2}}

in this subsection, where the parameter

σ

are selected from

{2^{- 7}, \dots, 2^{7}}

.

The predicting accuracy and learning time of five algorithms on UCI benchmark data sets are respectively shown in Table 3 and Table 4. By observing Table 3 and Table 4, one can come to the following conclusions.

(1): The Twin Hypersphere-KSVC, ITKSVC, Twin-KSVC and K-SVCR obtain better test accuracy than THKSVM. This is mainly because the sub-classifiers in Twin Hypersphere-KSVC, ITKSVC, Twin-KSVC and K-SVCR avoid the class imbalance problem appearing in THKSVM.
(2): The Twin Hypersphere-KSVC works faster than Twin-KSVC, K-SVCR and ITKSVC. It is mainly because, compared with Twin-KSVC, the sub-classifiers in Twin Hypersphere-KSVC avoid calculating inverse matrices which appear in Twin-KSVC, while compared with K-SVCR and ITKSVC, each pair of sub-classifiers of our Twin Hypersphere-KSVC only needs to resolve two smaller QPPs.
(3): For predicting accuracy of Twin-KSVC, K-SVCR, ITKSVC and Twin Hypersphere-KSVC, we can observe that not any method is superior to others for all data sets. We can apply Friedman test to analyze the test accuracy of these classifiers statistically [33,34]. The ranks of five algorithms for all data set in the light of test accuracy are presented in Table 5. The Friedman statistic $χ_{F}^{2}$ can be calculated by

$χ_{F}^{2} = \frac{12 m_{2}}{m_{1} (m_{1} + 1)} [\sum_{k_{1} = 1}^{m_{1}} r a n k_{k_{1}}^{2} - \frac{m_{1} {(m_{1} + 1)}^{2}}{4}],$

(38)

where $r a n k_{k_{1}} = \frac{1}{m_{2}} \sum_{k_{2} = 1}^{m_{2}} r_{k_{1}}^{k_{2}}$ , $r_{k_{1}}^{k_{2}}$ denotes the rank of the $k_{1}$ -th of $m_{1}$ classifiers on the $k_{2}$ -th of $m_{2}$ data sets. Because $χ_{F}^{2}$ is undesirably conservative, we can use the other statistic

$F_{F} = \frac{(m_{2} - 1) χ_{F}^{2}}{m_{2} (m_{1} - 1) - χ_{F}^{2}},$

(39)

which is distributed by the $F (m_{1} - 1, (m_{1} - 1) (m_{2} - 1))$ .

We can calculate the statistic

F_{F} = 10.41

, where

F_{F} \sim F (4, 28)

. For the level of significance

α = 0.05

,

F (4, 28) = 2.95

is smaller than

F_{F}

, which means there are significant differences among these classifiers. We can see from Table 5 that the average rank of Twin Hypersphere-KSVC is lower than ITKSVC, and is higher than Twin-KSVC and K-SVCR. It implies that classification accuracy of Twin Hypersphere-KSVC is slightly lower than ITKSVC, however, is much higher than Twin-KSVC and K-SVCR.

4.2. Handwritten Digits Recognition

We use Twin Hypersphere-KSVC to recognize handwritten digits. The USPS database is used to compare our Twin Hypersphere-KSVC with Twin-KSVC, THKSVM, ITKSVC and K-SVCR. The USPS dataset consists of 8-bit grayscale images of handwritten digits from 0 to 9, as presented in Figure 2. We choose 55 images for each handwritten digit in the USPS database, 550 images in total. We only consider linear kernel function

K (x, y) = x^{T} y

, and also use 5-fold cross validation to estimate the testing accuracy.

The handwritten digits recognition results of five algorithms are presented in Table 6. From Table 6, we can observe that, the proposed Twin Hypersphere-KSVC obtains better accuracy among all algorithms. In terms of learning time, our Twin Hypersphere-KSVC costs shorter learning time, compared with ITKSVC, Twin-KSVC and K-SVCR.

4.3. Text Classification

We evaluate our Twin Hypersphere-KSVC to text classification in this subsection and compare it with the other algorithms on the Reuters21578 dataset. We choose 6 classes from the Reuters21578 dataset, 708 documents in total, which are presented in Table 7. We also consider linear kernel function in this subsection.

The experimental results of five algorithms for text classification are presented in Table 8. By observing Table 8, we can notice that, our Twin Hypersphere-KSVC can get better accuracy among all multi-classifiers. The proposed Twin Hypersphere-KSVC runs faster in comparison with ITKSVC, Twin-KSVC and K-SVCR.

5. Conclusions

In this paper, we propose a novel multi-class classification algorithm, named Twin Hypersphere-KSVC. The Twin Hypersphere-KSVC evaluates each training sample into 1-vs-1-vs-rest structure, as in Twin-KSVC and K-SVCR, and constructs two hyperspheres in each pair of sub-classifiers, instead of two nonparallel hyperplanes as in Twin-KSVC. Compared with Twin-KSVC, the sub-classifiers in Twin Hypersphere-KSVC avoid computing inverse matrices, and for nonlinear problems, can apply the kernel trick to linear case directly. The classification results on a set of benchmark datasets from UCI repository, handwritten digits recognition and text classification, show that the Twin Hypersphere-KSVC gets better classification performance in comparison with the other classical multi-classifiers.

Author Contributions

Investigation, Q.A., Y.W.; Methodology, Q.A., A.W.; Software, Q.A.; Validation, Q.A., W.W.; Supervision, A.W., A.Z.; Funding acquisition, Q.A., A.W.; Writing—original draft, Q.A.; Writing—review and editing, Q.A.

Funding

This research was funded by Natural Science Foundation of Liaoning province in China (20180551048 and 201601291) and Talent Cultivation Project of University of Science and Technology Liaoning in China (2018RC05).

Conflicts of Interest

The authors declare no conflict of interest.

References

Vapnik, V.N. The Nature of Statistic Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2000; pp. 17–34. [Google Scholar]
Zhang, X. Introduction to statistical learning theory and support vector machines. Acta Automatica Sinica 2000, 26, 32–42. [Google Scholar]
Long, W.; Song, L.; Tian, Y. A new graphic kernel method of stock price trend prediction based on financial news semantic and structural similarity. Expert Syst. Appl. 2019, 118, 411–424. [Google Scholar] [CrossRef]
Lei, C.; Deng, J.; Cao, K.; Xiao, Y.; Ma, L.; Wang, W.; Ma, T.; Shu, C. A comparison of random forest and support vector machine approaches to predict coal spontaneous combustion in gob. Fuel 2019, 239, 297–311. [Google Scholar] [CrossRef]
Zhao, X.; Zang, W.; Lv, R.; Cui, W. Effective information filtering mining of internet of brain things based on support vector machine. IEEE Access 2019, 7, 191–202. [Google Scholar] [CrossRef]
Xie, F.; Li, F.; Lei, C.; Yang, J.; Zhang, Y. Unsupervised band selection based on artificial bee colony algorithm for hyperspectral image classification. Appl. Soft Comput. 2019, 75, 428–440. [Google Scholar] [CrossRef]
Qiao, X.; Bao, J.; Zhang, H.; Wan, F.; Li, D. fvUnderwater sea cucumber identification based on Principal Component Analysis and Support Vector Machine. Measurement 2019, 133, 444–455. [Google Scholar] [CrossRef]
Maltarollo, V.G.; Kronenberger, T.; Espinoza, G.Z.; Oliveira, P.R.; Honorio, K.M. Advances with support vector machines for novel drug discovery. Expert Opin. Drug Discov. 2019, 14, 23–33. [Google Scholar] [CrossRef]
Jayadeva; Khemchandani, R.; Chandra, S. Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 905–910. [Google Scholar] [CrossRef]
Mir, A.; Nasiri, J.A. KNN-based least squares twin support vector machine for pattern classification. Appl. Intell. 2018, 48, 4551–4564. [Google Scholar] [CrossRef]
Wang, H.; Zhou, Z.; Xu, Y. An improved ν-twin bounded support vector machine. Appl. Intell. 2018, 48, 1041–1053. [Google Scholar] [CrossRef]
Shao, Y.; Zhang, C.; Wang, X.; Deng, N. Improvements on twin support vector machines. IEEE Trans. Neural Netw. 2011, 22, 962–968. [Google Scholar] [CrossRef] [PubMed]
Qi, Z.; Tian, Y.; Shi, Y. Structural twin support vector machine for classification. Knowl. Based Syst. 2013, 43, 74–81. [Google Scholar] [CrossRef]
Tian, Y.; Qi, Z.; Ju, X.; Shi, Y.; Liu, X. Nonparallel support vector machines for pattern classification. IEEE Trans. Syst. Man Cybern. 2014, 44, 1067–1079. [Google Scholar] [CrossRef]
Wang, Z.; Shao, Y.; Bai, L.; Deng, N. Twin support vector machine for clustering. IEEE Trans. Neural Netw. 2015, 26, 2583–2588. [Google Scholar] [CrossRef]
Ye, Q.; Zhao, C.; Gao, S.; Zheng, H. Weighted twin support vector machines with local information and its application. Neural Netw. 2012, 35, 31–39. [Google Scholar] [CrossRef]
Peng, X.; Xu, D. Bi-density twin support vector machines for pattern recognition. Neurocomputing 2013, 99, 134–143. [Google Scholar] [CrossRef]
Chen, S.; Wu, X. A new fuzzy twin support vector machine for pattern classification. Int. J. Mach. Learn. Cybern. 2018, 9, 1553–1564. [Google Scholar] [CrossRef]
Xu, Y.; Yang, Z.; Pan, X. A novel twin support-vector machine with pinball loss. IEEE Trans. Neural Netw. 2017, 28, 359–370. [Google Scholar] [CrossRef]
Chen, W.; Shao, Y.; Li, C.; Deng, N. MLTSVM: A novel twin support vector machine to multi-label learning. Pattern Recognit. 2016, 52, 61–74. [Google Scholar] [CrossRef]
Tang, J.; Li, D.; Tian, Y.; Liu, D. Multi-view learning based on nonparallel support vector machine. Knowl. Based Syst. 2018, 158, 94–108. [Google Scholar] [CrossRef]
Tang, L.; Tian, Y.; Yang, C. Nonparallel support vector regression model and its SMO-type solver. Neural Netw. 2018, 105, 431–446. [Google Scholar] [CrossRef] [PubMed]
Xie, X. Improvement on projection twin support vector machine. Neural Comput. Appl. 2018, 30, 371–387. [Google Scholar] [CrossRef]
Tang, L.; Tian, Y.; Yang, C.; Pardalos, P.M. Ramp-loss nonparallel support vector regression: Robust, sparse and scalable approximation. Knowl. Based Syst. 2018, 147, 55–67. [Google Scholar] [CrossRef]
Peng, X.; Xu, D. A twin-hypersphere support vector machine classifier and the fast learning algorithm. Inf. Sci. 2013, 221, 12–27. [Google Scholar] [CrossRef]
Xu, Y.; Wang, Q.; Pang, X.; Tian, Y. Maximum margin of twin spheres machine with pinball loss for imbalanced data classification. Appl. Intell. 2018, 48, 23–34. [Google Scholar] [CrossRef]
Peng, X.; Shen, J. A twin-hyperspheres support vector machine with automatic variable weights for data classification. Inf. Sci. 2017, 417, 216–235. [Google Scholar] [CrossRef]
Ai, Q.; Wang, A.; Wang, Y.; Sun, H. Improvements on twin-hypersphere support vector machine using local density information. Prog. Artif. Intell. 2018, 7, 167–175. [Google Scholar] [CrossRef]
Angulo, C.; Parra, X.; Catala, A. K-SVCR. A support vector machine for multi-class classification. Neurocomputing 2003, 55, 57–77. [Google Scholar] [CrossRef]
Xu, Y.; Guo, R.; Wang, L. A twin multi-class classification support vector machine. Cogn. Comput. 2013, 5, 580–588. [Google Scholar] [CrossRef]
Ai, Q.; Wang, A.; Wang, Y.; Sun, H. An improved Twin-KSVC with its applications. Neural Comput. Appl. 2018. [Google Scholar] [CrossRef]
Xu, Y.; Guo, R. A twin hyper-sphere multi-class classification support vector machine. J. Intell. Fuzzy Syst. 2014, 27, 1783–1790. [Google Scholar]
Ar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
García, S.; Fernández, A.; Luengo, J.; Herrera, F. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 2010, 180, 2044–2064. [Google Scholar] [CrossRef]

Figure 1. (a) K-SVCR. (b) Twin-KSVC. (c) THKSVM. (d) Twin Hypersphere-KSVC.

Figure 2. Illustration of 10 digits in the USPS dataset.

Table 1. The learning complexities of Twin-KSVC, K-SVCR and Twin Hypersphere-KSVC.

Kernel	Twin-KSVC	K-SVCR	Twin Hypersphere-KSVC
linear kernel	$K (K - 1) O (d^{3}) + O ({(\frac{3}{4} n)}^{3}))$	$\frac{K (K - 1)}{2} O ({(\frac{3}{2} n)}^{3})$	$K (K - 1) O ({(\frac{3}{4} n)}^{3})$
nonlinear kernel	$K (K - 1) (O (n^{3}) + O ({(\frac{3}{4} n)}^{3}))$	$\frac{K (K - 1)}{2} O ({(\frac{3}{2} n)}^{3})$	$K (K - 1)$ $O ({(\frac{3}{4} n)}^{3})$

Table 2. The statistics of benchmark datasets.

Dataset	#Attributes	#Samples	#Classes
Wine	13	178	3
Iris	4	150	3
Ecoli	7	327	5
Soybean	35	47	4
Hayes-roth	5	132	3
Teaching-evaluation	5	151	3
Dermatology	34	358	6
Balance	4	625	3

Table 3. Classification accuracy of THKSVM, ITKSVC, Twin-KSVC, K-SVCR and Twin Hypersphere-KSVC on UCI benchmark datasets.

Dataset	Twin-KSVC(%)	THKSVM(%)	K-SVCR(%)	ITKSVC(%)	Twin Hypersphere-KSVC(%)
Wine	97.29 ± 2.28	92.24 ± 10.22	97.96 ± 2.47	97.53 ± 2.20	98.30 ± 2.55
Iris	93.47 ± 3.67	91.20 ± 4.52	95.47 ± 3.07	95.53 ± 4.03	96.00 ± 3.06
Ecoli	84.33 ± 4.60	77.75 ± 2.80	87.34 ± 3.07	85.63 ± 3.70	83.04 ± 4.21
Soybean	98.00 ± 4.47	100.0 ± 0.0	100.0 ± 0.0	100.0 ± 0.0	100.0 ± 0.0
Hayes-roth	61.07 ± 7.90	39.40 ± 2.05	54.02 ± 8.31	71.28 ± 9.92	52.50 ± 8.21
Teaching-evaluation	56.43 ± 8.23	50.28 ± 11.71	51.91 ± 8.34	52.02 ± 6.27	59.44 ± 7.52
Dermatology	95.43 ± 2.19	70.18 ± 3.16	97.66 ± 1.82	94.19 ± 2.74	96.64 ± 2.39
Balance	96.92 ± 1.29	90.05 ± 2.94	94.37 ± 1.96	97.73 ± 1.19	90.91 ± 1.34

The bolded classification accuracy is the highest one in all multi-classifiers.

Table 4. Learning time of THKSVM, ITKSVC, Twin-KSVC, K-SVCR and Twin Hypersphere-KSVC on UCI benchmark datasets.

Dataset	Twin-KSVC(s)	THKSVM(s)	K-SVCR(s)	ITKSVC(s)	Twin Hypersphere-KSVC(s)
Wine	2.1985	0.2749	2.5362	1.8673	1.1457
Iris	1.7260	0.2533	2.0569	1.5347	0.7429
Ecoli	38.7993	1.0329	62.2968	27.2987	15.1017
Soybean	1.1527	0.1787	0.4666	0.6355	0.5374
Hayes-roth	2.4358	0.1848	1.7696	1.1364	0.5671
Teaching-evaluation	3.6512	0.1849	1.7180	1.2330	1.0899
Dermatology	114.424	1.0811	89.2057	41.0467	19.6405
Balance	45.2907	3.7804	68.7381	43.7113	13.5046

Table 5. The rank of five algorithms on UCI benchmark datasets in the light of classification accuracy.

Dataset	Twin-KSVC	THKSVM	K-SVCR	ITKSVC	Twin Hypersphere-KSVC
Wine	4	5	2	3	1
Iris	4	5	3	2	1
Ecoli	3	5	1	2	4
Soybean	5	2.5	2.5	2.5	2.5
Hayes-roth	2	5	3	1	4
Teaching-evaluation	2	5	4	3	1
Dermatology	3	5	1	4	2
Balance	2	5	3	1	4
Average	3.13	4.69	2.81	2.31	2.44

Table 6. Classification results of Twin Hypersphere-KSVC, Twin-KSVC, THKSVM, K-SVCR and ITKSVC for handwritten digits recognition on the USPS dataset.

Algorithms	Twin-KSVC	THKSVM	K-SVCR	ITKSVC	Twin Hypersphere-KSVC
Accuracy (%)	53.13 ± 5.38	72.55 ± 3.55	85.09 ± 2.99	77.18 ± 4.24	85.75 ± 2.15
Learning time (s)	377.5463	2.1101	641.946	348.6982	342.983

Table 7. The statistics of text classification dataset.

Label	Cocoa	Coffee	Corn	Cice	Cubber	Soybean
Training dataset	52	101	176	46	35	83
Test dataset	23	44	76	21	15	36

Table 8. Classification results of Twin Hypersphere-KSVC, Twin-KSVC, THKSVM, K-SVCR and ITKSVC for text classification on the Reuters21578 dataset.

Algorithms	Twin-KSVC	THKSVM	K-SVCR	ITKSVC	Twin Hypersphere-KSVC
F1	0.4474	0.5932	0.7076	0.7078	0.7414
Learning time (s)	36.9486	0.4758	68.7947	26.4216	11.9575

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ai, Q.; Wang, A.; Zhang, A.; Wang, W.; Wang, Y. An Effective Multiclass Twin Hypersphere Support Vector Machine and Its Practical Engineering Applications. Electronics 2019, 8, 1195. https://doi.org/10.3390/electronics8101195

AMA Style

Ai Q, Wang A, Zhang A, Wang W, Wang Y. An Effective Multiclass Twin Hypersphere Support Vector Machine and Its Practical Engineering Applications. Electronics. 2019; 8(10):1195. https://doi.org/10.3390/electronics8101195

Chicago/Turabian Style

Ai, Qing, Anna Wang, Aihua Zhang, Wenhui Wang, and Yang Wang. 2019. "An Effective Multiclass Twin Hypersphere Support Vector Machine and Its Practical Engineering Applications" Electronics 8, no. 10: 1195. https://doi.org/10.3390/electronics8101195

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Effective Multiclass Twin Hypersphere Support Vector Machine and Its Practical Engineering Applications

Abstract

1. Introduction

2. Related Works

2.1. Review of K-SVCR Multi-Classifier

2.2. Review of Twin-KSVC Multi-Classifier

2.3. Review of THKSVM Multi-Classifier

3. Twin Hypersphere-KSVC

3.1. Linear Case

3.2. Nonlinear Case

3.3. Decision Rule

3.4. Analysis of Learning Complexity

4. Experiments

4.1. Benchmark Datasets

4.2. Handwritten Digits Recognition

4.3. Text Classification

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI