Lagrangian Regularized Twin Extreme Learning Machine for Supervised and Semi-Supervised Classification

Ma, Jun; Yu, Guolin

doi:10.3390/sym14061186

Open AccessArticle

Lagrangian Regularized Twin Extreme Learning Machine for Supervised and Semi-Supervised Classification

by

Jun Ma

and

Guolin Yu

^*

School of Mathematics and Information Sciences, North Minzu University, Yinchuan 750021, China

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(6), 1186; https://doi.org/10.3390/sym14061186

Submission received: 21 May 2022 / Revised: 31 May 2022 / Accepted: 6 June 2022 / Published: 9 June 2022

(This article belongs to the Special Issue Adaptive Filtering and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Twin extreme learning machine (TELM) is a phenomenon of symmetry that improves the performance of the traditional extreme learning machine classification algorithm (ELM). Although TELM has been widely researched and applied in the field of machine learning, the need to solve two quadratic programming problems (QPPs) for TELM has greatly limited its development. In this paper, we propose a novel TELM framework called Lagrangian regularized twin extreme learning machine (LRTELM). One significant advantage of our LRTELM over TELM is that the structural risk minimization principle is implemented by introducing the regularization term. Meanwhile, we consider the square of the

l_{2}

-norm of the vector of slack variables instead of the usual

l_{1}

-norm in order to make the objective functions strongly convex. Furthermore, a simple and fast iterative algorithm is designed for solving LRTELM, which only needs to iteratively solve a pair of linear equations in order to avoid solving two QPPs. Last, we extend LRTELM to semi-supervised learning by introducing manifold regularization to improve the performance of LRTELM when insufficient labeled samples are available, as well as to obtain a Lagrangian semi-supervised regularized twin extreme learning machine (Lap-LRTELM). Experimental results on most datasets show that the proposed LRTELM and Lap-LRTELM are competitive in terms of accuracy and efficiency compared to the state-of-the-art algorithms.

Keywords:

twin extreme learning machine; semi-supervised learning; manifold regularization; structural risk minimization; Lagrangian function

1. Introduction

Extreme Learning Machine (ELM) was first proposed as a novel single hidden layer feed-forward network (SLFN) training algorithm by Huang et al. [1,2]. Because ELM randomly generates the input weight and deviation of the hidden layer, ELM has the advantages of simple structure, low computational cost and good universality compared with traditional neural network algorithms [3]. ELM has been used in many fields in recent years thanks to its fast learning and good generalization and general approximation capabilities [4,5,6,7,8,9,10], such as bioinformatics [4,5], computer vision [6], data mining [7], robotics [8], and engineering applications [10].

Recently, ELM has been intensively studied by many researchers, and many variants have been proposed. For example, optimization extreme learning machine (OELM) was presented by Huang et al. [11]. In [12], Yang and Zhang proposed smooth extreme learning machine (SMELM) by applying smoothing techniques. Simultaneously, in [13], Yang and Zhang suggested a new sparse extreme learning machine (SPELM). In addition, a unified learning framework for different applications was introduced by Huang et al. [14]. Although the above improved versions of ELM have achieved good results, they are all supervised learning algorithms. In order to overcome the shortcomings of ELM, a novel semi-supervised extreme learning machine(SSELM) was suggested in [15]. Subsequently, a robust SS-ELM (RSS-ELM) was proposed by Pei et al. 42 to overcome the effect of outliers on the SS-ELM, while a Lagrangian semi-supervised extreme learning machine (LELM) for pattern recognition was proposed by Ma et al. [16].

In recent years, Jayadeva et al. [17] have proposed an excellent machine learning tool called the twin support vector machine (TSVM) for classification tasks. Due to the superior performance of TSVM, many variants of TSVM have been proposed in recent years [18,19,20,21,22]. It is well known that a significant advantage of Support Vector Machines (SVMs) is the implementation of the Structural Risk Minimisation principle (SRM). However, only the empirical risk minimisation (ERM) principle is considered in the standard TSVM learning framework. To improve the performance of TSVM, Shao et al. proposed a twin bounded support vector machine (TBSVM). In the TBSVM, the SRM principle is implemented by introducing regularization terms. Inspired by TSVM, Wan et al. proposed Twin Extreme Learning Machine (TELM). Similar to TSVM, TELM only considers the ERM principle.

Although the above SVM-based algorithms achieve good results, they can carry a heavy computational burden during training because they need to solve the quadratic programming problem (QPP). To overcome this challenge, Mangasarian et al. [23] proposed a computationally powerful machine learning algorithm called a Lagrangian Support Vector Machine (LSVM). It minimises an unconstrained differentiable convex function in a space where the dimension is equal to the number of classification points. Recently, researchers have developed the idea of extending LSSVM to TSVM and its variants, achieving excellent results [15,23,24,25,26,27,28,29,30]. Several representative works can be briefly reviewed as follows, such as the Lagrangian twin support vector machine (LTSVM) proposed by Balasundaram et al. [24]. Shao et al. [25] have proposed an efficient weighted Lagrangian dual support vector machine (WLTSVM) for imbalanced classification. It is well known that the performance of supervised learning algorithms tends to deteriorate when there is insufficient supervised information. An effective approach to deal with this problem is semi-supervised learning (SSL), which makes use of geometric information embedded in unlabelled samples [31,32,33,34,35,36,37]. Over the past decades, researchers have presented various SSL methods from different perspectives and have achieved promising results such as Laplacian support vector machine (Lap-SVM) [31], Laplacian twin support vector machine (Lap-TSVM) [20], semi-supervised extreme learning machine(SS-ELM) [36], and more.

Inspired by the above excellent works, this paper proposes a new TELM learning framework, namely, a Lagrangian regularized twin extreme learning machine (LRTELM). LRTELM is based on optimization theory and structural risk minimization. In addition, we extend LRTELM to semi-supervised learning by introducing manifold regularization in order to improve the performance of LRTELM when the labeled samples are insufficient, as well as to obtain a Lagrangian semi-supervised regularized twin extreme learning machine (Lap-LRTELM).

LRTELM is based on optimization theory and structural risk minimization. Then, LRTELM is extended to semi-supervised learning by introducing manifold regularization to improve the performance of LRTELM when insufficient labeled samples are available, as well as to obtain a Lagrangian semi-supervised regularized twin extreme learning machine (Lap-LRTELM). Lap-LRTELM can effectively exploit the geometric information embedded in the distribution of unlabelled samples of margins in order to improve the generalisation performance of LRTELM. Experimental results on various datasets show that the proposed algorithms, LRTELM and Lap-LRTELM, are competitive in terms of accuracy and efficiency when compared with state-of-the-art learning algorithms.

In particular, the major contributions of this paper are as follows:

(1): Two effective and reliable learning frameworks based on TELM are proposed, namely, Lagrangian regularized twin extreme learning machine (LRTELM) and Laplacian Lagrangian regularized twin extreme learning machine (Lap-LRTELM).
(2): LRTELM and Lap-LRTELM implement the principle of structural risk minimization by introducing regularization terms in the objective function. We consider the square of the $l_{2}$ -norm of the vector of slack variables instead of the usual $l_{1}$ -norm, as in TELM, to make the objective functions strongly convex.
(3): Two fast, simple, and efficient algorithms are designed to solve the LRTELM and Lap-LRTELM, respectively. These two algorithms only need to solve two linear equations separately, avoiding the need to solve a pair of QPPs, as in TELM. The resulting iterative algorithms globally converge and have a lower computational burden.
(4): Experimental results on variety of datasets show that our algorithms, LRTELM and Lap-LRTELM, are competitive with other algorithms in terms of accuracy and efficiency.

The remainder of this paper is organised as follows. We briefly review ELM, TELM, and the framework of manifold regularization in Section 2. In Section 3 we describe LRTELM in detail, while Section 4 describes the details Lap-LRTELM. Experimental results are presented in Section 5; finally, concluding remarks are provided in Section 6.

2. Related Work

In this section, we provide a brief overview of ELM, TELM, and the manifold regularization (MR) framework.

2.1. ELM

It is well known that the original ELM was proposed by Huang et al. [1]. ELM’s structure consists of an input layer, a hidden layer, and an output layer. The central idea of ELM is to randomly intialise the parameters of the hidden layer and correct them without iterative adjustment. The input data are then transformed from the input space to the high-dimensional hidden layer space by ELM feature mapping.

Let

T_{l} = {(x_{1}, y_{1}), \dots, (x_{l}, y_{l})} \in {(R^{n}, Y^{m})}^{l}

be a dataset of binary classification problems. Based on the theory of ELM, we have:

\begin{matrix} H = [\begin{matrix} h (x_{1}) \\ ⋮ \\ h (x_{l}) \end{matrix}] = {[\begin{matrix} g (w_{1}^{T} x_{1} + b_{1}) & \dots & g (w_{L}^{T} x_{1} + b_{L}) \\ ⋮ & ⋮ & ⋮ \\ g (w_{1}^{T} x_{l} + b_{1}) & \dots & g (w_{l}^{T} x_{L} + b_{L}) \end{matrix}]}_{l \times L} \end{matrix}

where

g (\cdot)

is the activation function,

w = (w_{1}, w_{2}, \dots, w_{L})

is the input weight, and

b = (b_{1}, b_{2}, \dots, b_{L})

are hidden layer biases. The hidden layer parameters

(w, b)

can be obtained randomly according to any continuous probability distribution.

It is well known that the standard ELM attempts to approximate these l samples with zero error, and can therefore be expressed as

min_{β} {∥ H β - Y ∥}_{2}^{2}

(1)

where

β = {[β_{1}, β_{2}, \dots, β_{L}]}^{T} \in R^{L \times m}

is the output weight matrix and

Y = {[y_{1}, y_{2}, \dots, y_{l}]}^{T}

is the following label matrix:

\begin{matrix} Y = [\begin{matrix} y_{1}^{T} \\ ⋮ \\ y_{l}^{T} \end{matrix}] = {[\begin{matrix} y_{11} & \dots & y_{1 n} \\ ⋮ & ⋮ & ⋮ \\ y_{l 1} & \dots & y_{l m} \end{matrix}]}_{l \times m} . \end{matrix}

Obviously, the above optimization problem (1) can obtain the optimal solution by solving

H β = Y

. Thus, we have

β^{*} = H^{†} Y

(2)

where

H^{†}

represents the Moore–Penrose generalized inverse matrix of H.

In contrast to traditional learning algorithms, ELM requires a criterion for both minimizing training error and for minimizing output weights:

min_{β} \frac{C}{2} {∥ H β - T ∥}^{2} + {∥ β ∥}^{2}

(3)

where C is regularization parameter. Intuitively, by setting the gradient of (3) relative to

β

to zero we have

β^{*} = \{\begin{matrix} {(H^{T} H + \frac{I}{C})}^{- 1} H^{T} Y, i f l > L, \\ H^{T} {(H H^{T} + \frac{I}{C})}^{- 1} Y, i f l \leq L . \end{matrix}

Thus, the output function of ELM is

f (x) = \sum_{i = 1}^{L} g (w_{i}, b_{i}, x) fi = h (x) \cdot fi

(4)

2.2. TELM

Let training data

T_{l} = {(x_{1}, y_{1}), \dots, (x_{l}, y_{l})} \in {(R^{n}, Y)}^{l}

, where

x_{i} \in R^{n}

and

y_{i} \in Y = {1, - 1}

,

i = 1, \dots, l

;

T_{l}

contains

m_{1}

positive class and

m_{2}

negative class, where

l = m_{1} + m_{2}

. In particular, we use matrixes

H_{1}

and

H_{2}

to represent positive class and negative class samples, respectively. Thus, we have

\begin{matrix} H_{1} = [\begin{matrix} h_{1} (x_{1}) & \dots & h_{L} (x_{1}) \\ ⋮ & ⋮ & ⋮ \\ h_{1} (x_{m_{1}}) & \dots & h_{L} (x_{m_{1}}) \end{matrix}], H_{2} = [\begin{matrix} h_{1} (x_{1}) & \dots & h_{L} (x_{1}) \\ ⋮ & ⋮ & ⋮ \\ h_{1} (x_{m_{2}}) & \dots & h_{L} (x_{m_{2}}) \end{matrix}] \end{matrix}

where

h_{i} (x) = G (w_{i}, b_{i}, x) = w_{i} \cdot x + b_{i}, i = 1, \dots, L

.

Inspired by TSVM, a novel twin extreme learning machine (TELM) was proposed by Wang et al. [38]. Specifically, TELM first utilizes the random feature mapping mechanism to construct the feature space, then a pair of nonparallel separating hyperplanes are learned for the final classification. As with TSVM, for each hyperplane TELM jointly minimizes its distance from one class and requires it to move away from the other. Therefore, we have:

f_{1} (x) = β_{1} \cdot h (x) = 0,

(5)

and

f_{2} (x) = β_{2} \cdot h (x) = 0 .

(6)

We can determine

f_{1} (x)

and

f_{2} (x)

by solving the following two quadratic programming problems (QPPs):

\begin{matrix} \begin{matrix} min_{β_{1}, ξ} & \frac{1}{2} {‖ H_{1} β_{1} ‖}^{2} + C_{1} e_{2}^{T} ξ \\ s . t . & - H_{2} β_{1} + ξ \geq e_{2} \\ ξ \geq 0 \end{matrix} \end{matrix}

(7)

and

\begin{matrix} \begin{matrix} min_{β_{1}, ξ} & \frac{1}{2} {‖ H_{2} β_{2} ‖}^{2} + C_{2} e_{1}^{T} η \\ s . t . & H_{1} β_{2} + η \geq e_{1} \\ η \geq 0 \end{matrix} \end{matrix}

(8)

where

ξ

and

η

are slack vectors,

C_{1}, C_{2} > 0

are regularization parameters, and

e_{1} \in R^{m_{1}}

and

e_{2} \in R^{m_{2}}

are vectors of ones.

The difference between ELM, TELM, and LRTELM are shown Table 1.

2.3. Manifold Regularization Framework

Let semi-supervised learning training set

T = T_{l} \cup T_{u} = {x_{i}, y_{i}}_{i = 1}^{l} \cup {x_{i + u}}_{i = l + u}^{u},

i = 1, \dots, l

, where

x_{i} \in R^{n}

,

y_{i} \in {- 1, + 1}

,

T_{l}

denotes a set of l labeled samples, and

T_{u}

denotes a set of u unlabeled samples. In the supervised learning case,

u = 0

.

Belkin et al. [31] proposed the manifold regularization framework

f^{*} = a r g min_{f \in H_{k}} \frac{1}{l} \sum_{i \in I} L_{l o s s} (x_{i}, y_{i}, f (x_{i})) + γ_{K} {∥ f ∥}_{H}^{2} + γ_{I} {∥ f ∥}_{I}^{2}

(9)

where

L_{l o s s} (\cdot)

is the loss function,

{∥ f ∥}_{H}^{2}

is the complexity regularization,

γ_{K}

and

γ_{I}

are the nonnegative regularization parameters, and

{∥ f ∥}_{I}^{2}

is the manifold regularizer, which takes the following empirical form:

{∥ f ∥}_{I}^{2} = \frac{1}{{(l + u)}^{2}} (f^{T} Lf)

(10)

where

L = D - W

is the graph Laplacian,

D

is a diagonal matrix of

W

provided by

D_{i i} = \sum_{j = 1}^{l + u} W_{i j}

,

D_{i j} = 0

for

i \neq j

, and the normalizing coefficient

\frac{1}{{(l + u)}^{2}}

is the natural scale factor for the empirical estimate of the Laplace operator.

3. Lagrangian Regularized Twin Extreme Learning Machine

In this section, we introduce the formulation of our method and then propose a program to solve its objective function.

3.1. LRTELM

Inspired by TSVM and LELM, we construct the following optimization problem:

\begin{matrix} \begin{matrix} min_{β_{1}, ξ} & \frac{1}{2} ‖ H_{1} β_{1} ‖^{2} + \frac{C_{1}}{2} {∥ β_{1} ∥}^{2} + \frac{C_{2}}{2} ξ^{T} ξ \\ s . t . & - H_{2} β_{1} + ξ \geq e_{2} \end{matrix} \end{matrix}

(11)

and

\begin{matrix} \begin{matrix} min_{β_{2}, η} & \frac{1}{2} ‖ H_{2} β_{2} ‖^{2} + \frac{C_{1}}{2} {∥ β_{2} ∥}^{2} + \frac{C_{2}}{2} η^{T} η \\ s . t . & H_{1} β_{2} + η \geq e_{1} \end{matrix} \end{matrix}

(12)

where

ξ

and

η

are slack vectors,

C_{1}

and

C_{2}

are regularization parameters, and

e_{1} \in R^{m_{1}}

and

e_{2} \in R^{m_{2}}

are vectors of ones.

In the new objective function of (11) the first term is the same as [38], and optimizing the term causes the positive training point to be as close as possible to the hyperplane

f_{1}

. The second term,

\frac{C_{1}}{2} {∥ β_{1} ∥}^{2}

, is a regularization term that defines the structural risk to ensure generalization and avoid overfitting. Minimizing the third term makes the negative class samples as far as possible from the positive class hyperplane

f_{1}

. We have a slightly similar interpretation of the problem (12).

The Lagrange function of the optimization problem (11) is

\begin{matrix} L (β_{1}, ξ, α) & = & \frac{1}{2} ‖ H_{1} β_{1} ‖^{2} + \frac{C_{1}}{2} {∥ β_{1} ∥}^{2} + \frac{C_{2}}{2} ξ^{T} ξ \\ - & α^{T} (- H_{2} β_{1} + ξ - e_{2}) \end{matrix}

(13)

where

α

is the Lagrange multipliers vector.

Thus, we have:

\begin{matrix} \frac{\partial L}{\partial β_{1}} = H_{1}^{T} H_{1} β_{1} + C_{1} β_{1} + H_{2}^{T} α = 0, \end{matrix}

(14a)

\begin{matrix} \frac{\partial L}{\partial ξ} = C_{2} ξ - α = 0, \end{matrix}

(14b)

\begin{matrix} α \geq 0, \end{matrix}

(14c)

\begin{matrix} α^{T} (- H_{2} β_{1} + ξ - e_{2}) = 0 . \end{matrix}

(14d)

Substituting (14a) and (14d) into (13), we can obtain the dual form of (11):

\begin{matrix} \begin{matrix} min_{α \geq 0} & \frac{1}{2} α^{T} Q_{1} α - e_{2}^{T} α \end{matrix} \end{matrix}

(15)

where

Q_{1} = H_{2} {(H_{1}^{T} H_{1} + C_{1} I)}^{- 1} H_{2}^{T}

.

In the same way, we can obtain

\begin{matrix} \begin{matrix} min_{θ \geq 0} & \frac{1}{2} θ^{T} Q_{2} θ - e_{1}^{T} θ \end{matrix} \end{matrix}

(16)

where

Q_{2} = H_{1} {(H_{2}^{T} H_{2} + C_{1} I)}^{- 1} H_{1}^{T}

.

By deforming the KKT necessary and sufficient optimality conditions for the dual problem, we solve the following classical nonlinear complementarity problem:

0 \leq α ⊥ (Q_{1} α - e_{2}) \geq 0

(17)

and

0 \leq θ ⊥ (Q_{2} θ - e_{1}) \geq 0

(18)

Further, per [23] we have:

0 \leq x ⊥ y \geq 0 \Leftrightarrow x = {(x - a y)}_{+}, f o r a > 0 .

(19)

where

x

and

y

are real vectors.

Thus, (17) and (18) can be rewritten as follows:

Q_{1} α - e_{2} = {((Q_{1} α - e_{2}) - η α)}_{+}

(20)

and

Q_{2} θ - e_{1} = {((Q_{2} θ - e_{1}) - λ θ)}_{+}

(21)

where

η > 0

and

λ > 0

.

For obtaining the solution of the above problems (20) and (21), we have:

α^{i + 1} = Q_{1}^{- 1} (e_{2} + ((Q_{1} α^{i} - e_{2}) - η α^{i}), i = 0, 1, 2 \dots

(22)

and

θ^{j + 1} = Q_{2}^{- 1} (e_{1} + ((Q_{2} θ^{i} - e_{1}) - λ θ^{j}), j = 0, 1, 2 \dots

(23)

where

0 < α < \frac{2}{C_{1}}

and

0 < θ < \frac{2}{C_{1}}

.

Thus, we can obtain the following decision function:

f (x) = s i g n (\frac{β_{1} \cdot h (x)}{∥ β_{1} ∥} + \frac{β_{2} \cdot h (x)}{∥ β_{2} ∥}) .

(24)

Based on the above discussion, the LRTELM is summarized as Algorithm 1.

Algorithm 1 Training LRTELM

Input: Training set $T_{l} = {x_{i}, y_{i}}_{i = 1}^{l}, i = 1, \dots, l$ , where $x_{i} \in R^{n}$ , $x_{j} \in R^{n}$ , $y_{i} \in {- 1, + 1}$ ; activation function $G (x)$ , and the number of hidden node number L, regularization parameters $C_{1}$ , $C_{2}$ , fix $η = λ = 1.9 / C_{1}$ .
Output: The decision function of LRETLM $f (x)$ .
Initiate: Start with any $α^{0}$ , $θ^{0}$ and set the iterator $i = 0$ .
Process:
1. Randomly assign input weights $w$ biases b;
2. Calculate the hidden layer output matrix $H_{1}$ and $H_{2}$ ;
3. Compute

$Q_{1} = H_{2} {(H_{1}^{T} H_{1} + C_{1} I)}^{- 1} H_{2}^{T}$

,

$Q_{2} = H_{1} {(H_{2}^{T} H_{2} + C_{1} I)}^{- 1} H_{1}^{T}$

;
4. Via (22) and (23) calculate $α$ and $θ$ , respectively;
5. Compute $β_{1}$ and $β_{2}$ by

$β_{1} = - {(H_{1}^{T} H_{1} + C_{1} I)}^{- 1} H_{2}^{T} α$

(25)

and

$β_{2} = - {(H_{2}^{T} H_{2} + C_{1} I)}^{- 1} H_{1}^{T} θ$

(26)
Return: The decision function $f (x) = s i g n (\frac{β_{1} \cdot h (x)}{∥ β_{1} ∥} + \frac{β_{2} \cdot h (x)}{∥ β_{2} ∥})$ of LRTELM.

3.2. Convergence Analysis

Theorem 1.

(Global Convergence of LRTELM) Let

Q_{1}

and

Q_{2}

be two symmetric positive definite matrices and assume that

0 < η < \frac{2}{C_{1}}

(27)

and

0 < λ < \frac{2}{C_{1}}

(28)

hold. Then, starting with arbitrary

α^{0}

and

θ^{0}

, the iterative schemes (22) and (23) converge to the unique solution

\bar{α}

and

\bar{θ}

, respectively. Therefore, we have

‖ Q_{1} α^{i + 1} - Q_{1} \bar{α} ‖ \leq ‖ I - η Q_{1}^{- 1} ‖ \cdot ‖ Q_{1} α^{i} - Q_{1} \bar{α} ‖

(29)

and

‖ Q_{2} θ^{i + 1} - Q_{2} \bar{θ} ‖ \leq ‖ I - λ Q_{2}^{- 1} ‖ \cdot ‖ Q_{2} λ^{i} - Q_{2} \bar{λ} ‖ .

(30)

Proof of Theorem 2.

Here, we use (15) as an example to prove Theorem 1. Suppose

\bar{α}

is the solution to (15); then, it must satisfy optimality condition (17) for any

η > 0

. Thus, we have

Q_{1} α^{i + 1} - e_{2} = {((Q_{1} α^{i} - e_{2}) - η α)}_{+}

(31)

and

Q_{1} \bar{α} - e_{2} = {((Q_{1} \bar{α} - e_{1}) - η \bar{α})}_{+},

(32)

From (31) and (32) we can obtain:

‖ Q_{1} α^{i + 1} - Q_{1} \bar{α} ‖ = ‖ {(Q_{1} α^{i} - η α^{i})}_{+} - {(Q_{1} \bar{α} - e_{2} - η \bar{α})}_{+} ‖ .

(33)

Per [39], the distance between any two points in

R^{n}

is not less than the distance between their projections on any convex set in

R^{n}

. Thus, we have the following inequality:

\begin{matrix} ‖ Q_{1} α^{i + 1} - Q_{1} \bar{α} ‖ & \leq & ‖ (Q_{1} - θ I) (α^{i} - \bar{α}) ‖ \\ \leq & ‖ I - η Q_{1}^{- 1} ‖ \cdot ‖ Q_{1} (α^{i} - \bar{α}) ‖ . \end{matrix}

(34)

If

η

is selected such that

‖ I - η Q_{1}^{- 1} ‖ < 1

(35)

then Equation (34) is obtained and the algorithm converges.

Now, we only need to prove

‖ I - η Q_{1}^{- 1} ‖ < 1

. Using the eigenvalue decomposition,

Q_{1}

is represented as

M^{T} λ M

where

λ = d i a g (λ_{1}, \dots, λ_{m})

and M is a unitary matrix. Then,

\begin{matrix} ‖ I - η Q_{1}^{- 1} ‖ & = & ‖ I - η {(M^{T} λ M)}^{- 1} ‖ \\ = & ‖ M^{T} M - η M^{T} λ^{- 1} M ‖ \\ = & ∥ M ∥ ∥ I - η λ^{- 1} ∥ ∥ M ∥ \\ = & ∥ I - η λ^{- 1} ∥ \end{matrix}

(36)

while

∥ I - η λ^{- 1} ∥

is less than 1, (35) can be established as equivalent to the following inequalities:

- 1 < 1 - \frac{η}{λ_{m i n} (Q_{1})} < 1

(37)

where the inequalities in (37) can be written as

0 < \frac{η}{λ_{m i n} (Q_{1})} < 2

(38)

From (38), we can obtain the following condition:

0 < η < 2 λ_{m i n} (Q_{1}) .

(39)

Due to

Q_{1} = H_{2} {(H_{1}^{T} H_{1} + C_{1} I)}^{- 1} H_{2}^{T}

being a positive semi-definite matrix, we have:

\frac{1}{c_{1}} \leq λ_{m i n} (H_{2} {(H_{1}^{T} H_{1} + C_{1} I)}^{- 1} H_{2}^{T})

(40)

which forms (40); if

η

is selected by

0 < η < \frac{2}{C_{1}}

(41)

then (34) is satisfied and the iterative method converges. □

3.3. Compare with Other Relevant Methods

LRTELM vs. TELM

It is easy to see that both LRTELM and TELM [40] aim to find two non-parallel hyperplanes, (5) and (6). Comparing LRTELM and TELM, there are three main differences:

(1): In the primal problem of TELM, only the empirical risk minimization (ERM) is considered. However, our proposed LRTELM implements the SRM principle.
(2): TELM determines the decision hyperplane by solving a pair of QPPs. However, LRTELM determines the decision hyperplane by solving a pair of linear equations.
(3): We have made a slight change compared to TELM by introducing a hinge loss function; we replace the $l_{1}$ -norm with the $l_{2}$ -norm of the slack variables $ξ$ and $η$ by weighting $\frac{C_{1}}{2}$ , which guarantees the strict convexity of the object function. This leads to the problem of optimising LRTELM with a unique solution.

LRTELM vs. LELM

Clearly, both LRTELM and LELM are supervised learning methods. However, the main difference between LRTELM and TELM is that LRTELM aims to generate two non-parallel separable hyperplanes, whereas LELM seeks to find only one separable hyperplane.

LRTELM vs. LSVM and LTSVM

(1): Obviously, the objectives are different. No bias b is required in LRTELM because the separating hyperplane $b m {b e t a}^{T} h (x) = 0$ passes through the origin in the LRTELM feature space, whereas LSVM [23] and LTSVM [24] require a bias b to determine the hyperplane.
(2): In contrast to LSVM and LTSVM, LRTELM has an explicit kernel function in the form of network parameters that are generated randomly and do not need to be adjusted.

4. Laplacian Lagrangian Regularized Twin Extreme Learning Machine

It is well known that insufficient volume of labeled samples is a major challenge in supervised learning. To improve the performance of LRTELM, this paper proposes a new semi-supervised learning framework, namely, Laplacian Lagrangian regularized twin extreme learning machine (Lap-LRTELM).

4.1. Lap-LRTELM

For Lap-LRTELM, the regularization terms

∥ f_{1} ∥_{H}^{2}

and

∥ f_{2} ∥_{H}^{2}

can be expressed by

∥ f_{1} ∥_{H}^{2} = \frac{1}{2} {∥ β_{1} ∥}_{2}^{2},

(42)

∥ f_{2} ∥_{H}^{2} = \frac{1}{2} {∥ β_{2} ∥}_{2}^{2} .

(43)

Correspondingly, the manifold regularization terms

∥ f_{1} ∥_{M}^{2}

and

∥ f_{2} ∥_{M}^{2}

can be written as

∥ f_{1} ∥_{M}^{2} = \frac{1}{{(l + u)}^{2}} \sum_{i, j = 1}^{l + u} W_{i, j} {(f_{1} (x_{i}) - f_{1} (x_{j}))}^{2} = f_{1}^{T} L f_{1},

(44)

∥ f_{2} ∥_{M}^{2} = \frac{1}{{(l + u)}^{2}} \sum_{i, j = 1}^{l + u} W_{i, j} {(f_{2} (x_{i}) - f_{2} (x_{j}))}^{2} = f_{2}^{T} L f_{2},

(45)

where

L = D - W

is the graph Laplacian, D is a diagonal matrix with its i-th diagonal

D_{i i} = \sum_{j = 1}^{l + u} W_{i j}

,

f_{1} = {[f_{1} (x_{1}), \dots, f_{1} (x_{l + u})]}^{T} = H β_{1}

,

f_{2} = {[f_{2} (x_{1}), \dots, f_{2} (x_{l + u})]}^{T} = H β_{2}

,

H \in R^{(l + u) \times n}

includes all labeled and unlabeled samples, and e is an appropriate ones vector.

Therefore, the primal Lap-LRTELM can be expressed as

\begin{matrix} min_{β_{1}, ξ} & \frac{1}{2} ∥ H_{1} β_{1} ∥_{2}^{2} + \frac{C_{1}}{2} {∥ β_{1} ∥}_{2}^{2} + \frac{C_{2}}{2} ξ^{T} ξ + \frac{C_{3}}{2} β_{1}^{T} H^{T} L H β_{1} \\ s . t . & - H_{2} β_{1} + ξ \geq e_{2} \end{matrix}

(46)

and

\begin{matrix} min_{β_{2}, η} & \frac{1}{2} ∥ H_{2} β_{2} ∥_{2}^{2} + \frac{C_{1}}{2} {∥ β_{2} ∥}_{2}^{2} + \frac{C_{2}}{2} η^{T} η + \frac{C_{3}}{2} β_{2}^{T} H^{T} L H β_{2} \\ s . t . & H_{1} β_{2} + η \geq e_{1} \end{matrix}

(47)

Thus, we can obtain the dual problems of (46) and (47),

\begin{matrix} \begin{matrix} min_{γ \geq 0} & \frac{1}{2} γ^{T} Θ_{1} γ - e_{2}^{T} γ \end{matrix} \end{matrix}

(48)

and

\begin{matrix} \begin{matrix} min_{ϑ \geq 0} & \frac{1}{2} ϑ^{T} Θ_{2} ϑ - e_{1}^{T} ϑ \end{matrix} \end{matrix}

(49)

respectively, where

Θ_{1} = H_{2} {(H_{1}^{T} H_{1} + C_{1} I + C_{3} H^{T} L H)}^{- 1} H_{2}^{T}

,

Θ_{2} = H_{1} {(H_{2}^{T} H_{2} + C_{1} I + C_{3} H^{T} L H)}^{- 1} H_{1}^{T}

, and

α

and

θ

are Lagrangian multiplier vectors.

Based on KKT necessary and sufficient optimality conditions, we can obtain the nonlinear complementarity problems

0 \leq γ ⊥ (Θ_{1} γ - e_{2}) \geq 0

(50)

and

0 \leq ϑ ⊥ (Θ_{2} ϑ - e_{1}) \geq 0

(51)

Similar to LRTELM, we have

Θ_{1} γ - e_{2} = {((Θ_{1} ϑ - e_{2}) - δ γ)}_{+}

(52)

and

Θ_{2} ϑ - e_{1} = {((Θ_{2} ϑ - e_{1}) - μ ϑ)}_{+}

(53)

In order to obtain the solutions of the above problems (52) and (53), we apply the following two simple iterative schemes:

γ^{i + 1} = Θ_{1}^{- 1} (e_{2} + ((Θ_{1} γ^{i} - e_{2}) - δ γ^{i}), i = 0, 1, 2 \dots

(54)

and

ϑ^{j + 1} = Θ_{2}^{- 1} (e_{1} + ((Θ_{2} ϑ^{i} - e_{1}) - μ ϑ^{j}), j = 0, 1, 2 \dots

(55)

where

0 < δ < \frac{2}{C_{1}}

and

0 < μ < \frac{2}{C_{1}}

.

Based on the above discussion, the Lap-LRTELM is summarized as Algorithm 2.

Algorithm 2 Training Lap-LRTELM

Input: Training set $T = T_{l} \cup T_{u} = {x_{i}, y_{i}}_{i = 1}^{l} \cup {x_{i + u}}_{i = l + u}^{u}, i = 1, \dots, l$ , where $x_{i} \in R^{n}$ , $y_{i} \in {- 1, + 1}$ , $T_{l}$ denotes a set of l labeled samples, $T_{u}$ denotes a set of u unlabeled samples; activation function $G (x)$ , and the number of hidden node number L, regularization parameters $C_{1}$ , $C_{2}$ , $C_{3}$ , number of learning times $i t = 0$ , the maximum number of cycles itmax; fix $δ = μ = \frac{1.9}{C_{1}}$ .
Output: The decision function of Lap-LRTELM $f (x)$ .
Initiate: Start with any $α^{0}$ , $θ^{0}$ and set the iterator $i = 0$ .
Process:
1. Randomly assign input weights $w$ biases b;
2. Calculate the hidden layer output matrix $H_{1}$ and $H_{2}$ ;
3. Compute graph Laplacian $L$ ;
4. Compute

$Θ_{1} = H_{2} {(H_{1}^{T} H_{1} + C_{1} I + C_{3} H^{T} L H)}^{- 1} H_{2}^{T}$

,

$Θ_{2} = H_{1} {(H_{2}^{T} H_{2} + C_{1} I + C_{3} H^{T} L H)}^{- 1} H_{1}^{T}$

;
5. Calculate $α$ and $θ$ via (54) and (55), respectively;
6. Compute $β_{1}$ and $β_{2}$ by

$β_{1} = - {(H_{1}^{T} H_{1} + C_{1} I + C_{3} H^{T} L H)}^{- 1} H_{2}^{T} α$

(56)

and

$β_{2} = - {(H_{2}^{T} H_{2} + C_{1} I + C_{3} H^{T} L H)}^{- 1} H_{1}^{T} θ$

(57)
Return: The decision function $f (x) = s i g n (\frac{β_{1} \cdot h (x)}{∥ β_{1} ∥} + \frac{β_{2} \cdot h (x)}{∥ β_{2} ∥})$ of Lap-LRTELM.

4.2. Comparison with Other Related Algorithms

In this subsection, we compare our proposed Lap-LRTELM with other related algorithms.

Lap-LRTELM vs. LRTELM

The main difference is that LRTELM is a supervised learning algorithm, while Lap-LRTELM is a semi-supervised learning algorithm. When choosing the appropriate parameters, our proposed Lap-LRTELM will degenerate into LRTELM.

Lap-LRTELM vs. Lap-LELM

Obviously, both Lap-LRTELM and Lap-LELM [16] are semi-supervised learning algorithms. However, the main difference between Lap-LRTELM and Lap-LELM is that Lap-LRTELM aims to generate two non-parallel separating hyperplanes, while Lap-LELM seeks only one separable hyperplane.

Lap-LRTELM vs. Lap-TELM

It is obvious that both Lap-LRTELM and Lap-TELM [37] aim to find two non-parallel hyperplanes in order to indirectly determine the decision hyperplane. Comparing Lap-LRTELM and Lap-TELM, there are two main differences:

(1): Lap-TELM determines the decision hyperplane by solving a pair of smaller QPPs; however, in Lap-LRTELM, the decision hyperplane is indirectly determined by iteratively solving a pair of linear equations;
(2): Compared with Lap-TELM, we changed the Lap-TELM slightly by adding a hinge loss function; we replaced the $l_{1}$ -norm with the $l_{2}$ -norm of the slack variables $ξ$ and $η$ by weighting $\frac{c_{1}}{2}$ , which guarantees the strict convexity of the object function.

5. Experiment

In order to evaluate the performance of our proposed LRTELM and Lap-LRTELM, we compare our methods with related algorithms, including TELM [40], LSVM [23], LTSVM [24], LELM [16], Lap-LELM [16], SS-ELM [35], and Lap-TELM [37]. The experimental settings are provided in Section 5.1. In Section 5.2, we provide supervised learning results and analysis. In Section 5.3, we provide semi-supervised learning results and analysis.

5.1. Experimental Setup

Here, the accuracy of all experiments is calculated using the standard ten-fold cross-validation method and all parameters are selected using the grid search method. For convenience, we set the regularization parameters as

C_{1} = C_{2} = C_{3} = C

. All parameter selection ranges are described as follows:

(1): Regularization parameters C and $λ$ and the RBF kernel parameter $σ$ are all selected from the set ${2^{i} | i = - 6, \dots, 6}$ ;
(2): For the K-nearest neighbors parameter, N is selected from ${3, 5, 7, 9, 11}$ ;
(3): The hidden layer node L is selected from ${100, 200, 300, 500, 1000, 2000, 3000, 5000}$ .

The activation function

1 / (1 + exp (- (w \cdot x + b)))

(in which

w

, b are randomly generated) was used for LRTELM, Lap-LRTELM, LELM, Lap-LELM, TELM, Lap-TELM, and SS-ELM. Classification accuracy (ACC) is used as an evaluation indicator to evaluate the performance of the algorithms involved. The ACC value is defined as

\begin{matrix} A C C = \frac{T P + T N}{T P + F N + T N + F P} \end{matrix}

(58)

where

T P

denotes true positives,

T N

denotes true negatives,

F N

denotes false negatives, and

F P

denotes false positives. In order to better compare the computation times of all the algorithms employed, we recorded their running times, mainly including training and testing on all the datasets involved.

To validate the effectiveness of the proposed LRTELM and Lap-LRTELM, numerical simulations were carried out on various datasets, including nine benchmark datasets from the UCI repository, four image datasets, two artificial datasets, and five sets of infrared spectral datasets. We performed ten-fold cross-validation on all but four of the image datasets considered. (Due to the relatively small sample size of image datasets, which are generally high-dimensional and low-sample data, 0-fold cross validation was not used on image datasets.) Intuitively, the dataset was randomly partitioned into ten subsets, one of which was retained as the test set. This process was repeated ten times and the average of the ten test results was used as a performance measure. To obtain objective experimental results, we normalised all the data sets involved in the experiment to stay within the interval

[0, 1]

. For a fair comparison, we used Matlab’s Quadratic Programming (QP) toolbox to solve all QP problems in the algorithms of interest. All methods were implemented in MATLAB 2014a running on a PC with an Intel(R) Core(TM) i7-7200u processor (3.40 GHz) and 8 GB of RAM in system configuration.

5.2. Supervised Learning Results

5.2.1. Experiment on Near-Infrared Spectral Datasets

Today, information plays an increasingly important role in agricultural production as a new factor of production. For agricultural information, because of its strong locality and timeliness, determining the hidden information behind the data, improving the quality of information, and providing timely and practical information with prediction, seasonality, and guidance is an urgent problem to be studied and solved. Data mining techniques are now being used in various areas of agriculture [41,42]. It is well known that maize yield, a major grain crop in China, is significantly correlated with seed purity. The ’Nongda 108’ maize hybrid seed and the ’parent 178’ seed used in our experiment were obtained from the 2008 harvest in Beijing, China. A total of 240 seed samples were used in our experiment, 120 from the hybrid seeds and 120 from the parent seeds. We obtained near-infrared (NIR) spectral datasets of maize seeds using an MPA spectrometer, where the corresponding sample regions are denoted as A, B, C, D, and E regions. The information in these datasets is summarized in Table 2.

To demonstrate the generalisation performance of the proposed LRTELM method in practical applications, the following numerical experiments were carried out on five near-infrared spectral datasets. Based on the optimal parameters, all experimental results are listed in Table 3 and Figure 1. Our analysis of the experimental results reveals the following:

(1): It is clear from Table 3 that LRTELM achieves the best average ACC, the highest average score, and the best overall performance compared to other related algorithms.
(2): The experimental results + on the five datasets with LRTELM are better than with TELM or LELM.
(3): As the table shows, LRTELM outperforms the other four algorithms in terms of learning time.

Through the above analysis of the near-infrared spectral dataset, we can draw a safe conclusion that the proposed LRTELM is effective and reliable.

5.2.2. Experimental Results on UCI Datasets

To further test the classification performance of the proposed LRTELM and other related algorithms, we conducted numerical experiments on several publicly available UCI datasets (http://archive.ics.uci.edu/ml/datasets.html (accessed on 22 March 2021)). All experimental results are presented in Table 4. The analysis of all experimental results is as follows.

From Table 4, it can be seen that in terms of classification performance, the proposed method outperforms other learning algorithms in most cases. Furthermore, in terms of learning efficiency, the proposed method outperforms other algorithms on most datasets. The main reason fpr this is that our LRTELM combines the advantages of TELM and LSVM while solving the problem of two smaller linear equations through an efficient iterative algorithm.

In order to statistically validate the performance of the proposed LRTELM, eight UCI datasets were selected and a series of experiments were conducted. The results of all experiments based on the optimal parameters are presented as box plots in Figure 2. Figure 2 shows the ACC box plots for LSVM, LTSVM, LELM, TELM, and LRTELM on the eight UCI datasets. The x-axis shows the different classifiers, including LSVM, LTSVM, LELM, TELM, and LRTELM, while the y-axis shows the ACC for all UCI datasets. From Figure 2, it can be seen that LRTELM has better classification accuracy than the other algorithms on most of the datasets.

5.2.3. Statistical Analysis

In this section, in order to analyse the significant differences between the seven algorithms on the ten UCI datasets, we employed the well-known Friedman test [43]. This test is known to be a simple, safe, and robust non-parametric test where the null hypothesis of the test is that all algorithms have the same performance. If the null hypothesis is rejected, a post hoc Nemeny test can be performed [43]. The average ranks of the five algorithms on all used datasets are shown in Table 3 and Table 4, respectively.

To begin with, we can calculate the Friedman statistic variable using the following formulation:

χ_{F}^{2} = \frac{12 N}{k (k + 1)} [\sum_{j} R_{j}^{2} - \frac{k {(k + 1)}^{2}}{4}] = 39.84

where k is number of algorithms, N is number of UCI datasets, and

R_{j}

is the average rank of the jth algorithm on the employed datasets; note that

k = 7

and

N = 10

in this paper. Furthermore, according to the

χ_{F}^{2}

-distribution with

(k - 1)

degrees of freedom, we have:

F_{F} = \frac{(N - 1) χ_{F}^{2}}{N (k - 1) - χ_{F}^{2}} = 19.36

where

F_{F} ((k - 1), (k - 1) (N - 1))

obeys F-distribution with

(k - 1)

and

(k - 1) (N - 1)

degrees of freedom. In addition, for

α = 0.05

we can obtain

F_{α} = (4, 16) = 3.01

. Obviously, the value of

F_{F} > F_{α}

; thus, the null hypothesis can be rejected.

Next, we further compare the seven algorithms in pairs using the Nemenyi post hoc test. The difference in performance between the two algorithms is significant when the average rank difference between the two algorithms is larger than the critical value, otherwise the difference is not significant. By dividing the Studentized range statistic by

\sqrt{2}

, we obtain

q_{α = 0.05} = 2.728

. Therefore, we can calculate the critical difference (CD) using the following formulation:

C D = q_{α = 0.05} \sqrt{\frac{k (k + 1)}{6 N}} = 2.728 \times \sqrt{\frac{5 (5 + 1)}{6 \times 5}} = 2.728

Thus, if the average rank of the two algorithms differs by at least

C D

, their performance is significantly different. From Table 3, we can conclude that the proposed LRTELM differs from the other four algorithms as follows:

D (L S V M - L R T E L M) = 4 - 1 = 3 > 2.728

D (L T S V M - L R T E L M) = 5 - 1 = 4 > 2.728

D (T E L M - L R T E L M) = 2.8 - 1 = 1.8 < 2.728

D (L E L M - L R T E L M) = 2.2 - 1 = 1.2 < 2.728

where

D (A - B)

denotes the difference between two algorithms, A and B. We can then conclude that the proposed LRTELM performs significantly better than LSVM and LTSVM on the NIR spectral dataset, while there is no significant difference between LRTELM, TELM, and LELM. Similarly, on the UCI dataset, it can be seen that the proposed LRTELM performs significantly better than LSVM, while there is no significant difference between LRTELM, LTSVM, TELM, and LELM according to the mean rank and correlation values reported in Table 4.

5.3. Semi-Supervised Learning Results

5.3.1. Experimental Results on Artificial Datasets

To verify the effect of manifold regularization on model performance, in this section we use the two artificial datasets [44,45] shown in Figure 3 to investigate the performance of our proposed Lap-LRTELM. Each dataset contains 200 samples with two randomly selected labeled samples and 98 unlabeled samples for every class.

Here, we analyze the effects of the

C_{3}

parameters on the performance of our proposed Lap-LRTELM;

C_{3}

is used to control the weight of

{∥ f ∥}_{M}^{2}

. For the parameter

C_{3}

, the optimal parameters are

C_{1} = 2^{3}

,

C_{2} = 2^{2}

, and

L = 200

,

N = 7

, and we select different

C_{3}

from the set

{2^{- 3}, 2^{- 2}, 2^{- 1}, 2^{0}, 2^{1}, 2^{2}, 2^{3}}

in order to observe the impact of

C_{3}

on the performance of the proposed Lap-LRTELM. It is easy to see from Figure 4 that the shape of the curve grows in the beginning and then falls when

n > 2

. This means that

{∥ f ∥}_{M}^{2}

can improve the performance of Lap-LRTELM with

C_{3}

.

Table 5 shows the classification accuracy of TELM, LRTELM, Lap-TELM, and Lap-LRTELM on the artificial datasets. According to the results shown in Table 5, it is obvious that when the tagged data are relatively small, the learning efficiency of Lap-LRTELM is better than the other three algorithms.

From the above experimental analysis of two artificial datasets, we can conclude that the performance of the proposed Lap-TELM is indeed improved by incorporating manifold regularisation. Intuitively, manifold regularisation can help the algorithm to seek a more reasonable classifier.

5.3.2. Experimental Results on UCI Datasets

In this section, to evaluate the effectiveness of Lap-LRTELM, we conduct experiments with different fractions of labeled samples, i.e.,

10 %

and

30 %

. In our experiments, the SS-ELM, Lap-LELM, Lap-TELM, and Lap-LRTELM were used to construct data adjacency graphs using K-nearest neighbors. All of experimental results are presented in Table 6.

From Table 6, it can be seen that the performance of all the algorithms improves as the number of labelled samples increases. Furthermore, we find that the proposed Lap-LRTELM outperforms the other algorithms in most cases, regardless of the size of the labelled samples. Furthermore, the generalisation performance of the proposed Lap-LRTELM outperforms the other relevant ELM-based algorithms on all datasets.

The analysis of the above experimental results shows that the proposed Lap-LRTELM improves the classification performance of the LRTELM by using flow regularisation. Intuitively, a reasonable classifier can be built using manifold regularisation.

5.3.3. Experimental Results on Image Datasets

To further verify the performance of Lap-LRTELM, we performed experiments on the G50C (http://people.cs.uchicago.edu/vikass/manifoldregularization.html) (accessed on 22 March 2021), COIL20(B) (http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php) (accessed on 22 March 2021), USPST(B) (http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html) (accessed on 22 March 2021), and MNIST(B) (http://yann.lecun.com/exdb/mnist/) (accessed on 22 March 2021) datasets.

To rationally demonstrate the semi-supervised approach involved, we utilised the equivalent experimental setup reported by Melacci and Belkin [24]. Concretely, we employed a four-fold cross-validation approach for each image dataset, with one fold as the test set (denoted by

T

) and the remaining folds as the training set. The training set was divided into labelled data (

L

), unlabelled data (

U

), and validation data (

V

). Here, this random folding was repeated three times, producing a total of twelve divisions. Detailed information on the datasets is summarized in Table 7.

The performance of these algorithms on the USPST(B), COIL20(B), G50C, and MNIST(B) datasets was evaluated in our experiments using ACC ± S (classification accuracy ± standard deviation); the experimental results are shown in Table 8.

As can be seen from Table 8, Lap-LRTELM outperforms the other five algorithms in terms of classification accuracy on all the datasets involved. Compared with LRTELM, the proposed Lap-LRTELM can effectively utilise unlabelled samples to produce better performance. The experimental results show that by taking streamwise regularisation into account, the proposed Lap-LRTELM can achieve better performance compared to considering only one part.

To analyse the influences of labelled and unlabelled samples on the performance of the relevant semi-supervised methods, we conducted a further series of experiments on COIL20(B), USPST(B), and G50C. The results of all experiments are shown in Figure 5 and Figure 6.

Figure 5 shows the classification accuracy of SS-ELM, Lap-TELM, Lap-LELM, and Lap-RTELM for different labelled samples on COIL20(B), USPST(B), G50C, and MNIST(B). As can be seen from the following Figure 5, in most cases Lap-RTELM achieves the best results relative to the other three algorithms. The classification accuracy of all algorithms improves substantially as the number of labelled samples increases. Figure 6 shows the performance of the four semi-supervised algorithms for different numbers of unlabelled samples. Using the same experimental scheme as in [29], we add unlabelled samples to the unlabelled set (

U

) in 10% increments, while the labelled set (

L

), test set (

T

), and validation set (

V

) remain unchanged. From Figure 6, it is easy to observe that classification accuracy improves when unlabelled samples are added to the unlabelled set (

U

). Even without any labelled ones, Lap-LRTELM performs better than SS-ELM, Lap-TELM and Lap-LELM. This phenomenon is consistent with Belkin et al. [24] in that stream regularization is effective for purely supervised learning.

5.3.4. Statistical Analysis

In this section, we use the famous Friedman test with the corresponding post hoc test [43] to analyze and compare algorithms’ performance on UCI datasets. The average ranking of the five algorithms on all datasets used is shown in Table 6. First, we compare the performance of the five algorithms on the UCI dataset, in which 10% of the sample was labelled.

To begin with, we can calculate the Friedman statistic variable using the following formulation:

χ_{F}^{2} = \frac{12 N}{k (k + 1)} [\sum_{j} R_{j}^{2} - \frac{k {(k + 1)}^{2}}{4}] = 10.95

where k is number of algorithms, N is the number of UCI datasets, and

R_{j}

is the average rank of the jth algorithm on the employed datasets; note that

k = 7

and

N = 10

in this paper. Furthermore, according to the

χ_{F}^{2}

-distribution with

(k - 1)

degrees of freedom, we have:

F_{F} = \frac{(N - 1) χ_{F}^{2}}{N (k - 1) - χ_{F}^{2}} = 3.497

where

F_{F} ((k - 1), (k - 1) (N - 1))

obeys F-distribution with

(k - 1)

and

(k - 1) (N - 1)

degrees of freedom. In addition, for

α = 0.05

we can obtain

F_{α} = (4, 32) = 2.69

. Obviously, the value of

F_{F} > F_{α}

; thus, the null hypothesis can be rejected.

Furthermore, we compared the seven algorithms in pairs using the Nemenyi post hoc test. The difference in performance between the two algorithms is significant when the average rank difference between the two algorithms is larger than the critical value; otherwise, the difference is not significant. By dividing the Studentized range statistic by

\sqrt{2}

, we obtain

q_{α = 0.05} = 2.728

. Therefore, we can calculate the critical difference (CD) using the following formulation:

C D = q_{α = 0.05} \sqrt{\frac{k (k + 1)}{6 N}} = 2.728 \times \sqrt{\frac{5 (5 + 1)}{6 \times 9}} = 2.03

Thus, if the average ranks of two algorithms differ by at least

C D

, their performance is significantly different. From Table 6, we can derive the differences between the proposed Lap-LRTELM and other four algorithms as follows:

D (L a p - L E L M - L a p - L R T E L M) = 1.89 < 2.03

D (L R T E L M - L a p - L R T E L M) = 2.56 > 2.03

D (S S - E L M - L a p - L R T E L M) = 2.34 > 2.03

D (L a p - T E L M - L a p - L R T E L M) = 2.11 > 2.03

In summary, the proposed Lap-LRTELM performs significantly better than LRTELM, SS-ELM, and Lap-TELM, and there is no significant difference between Lap-LRTELM and Lap-LELM on UCI datasets with 10% labeled samples. Similarly, on UCI datasets with 30% labeled samples, we can obtain the same conclusions based on the average ranks and relevant values reported in Table 6.

6. Conclusions

In this paper, we first propose the Lagrangian Regularized Twin Extreme Learning Machine algorithm (LRTELM). We then extend LRTELM to semi-supervised learning by introducing manifold regularisation to obtain a new semi-supervised learning framework, the Lagrangian Regularised Twin Extreme Learning Machine (Lap-LRTELM). Lap-LRTELM can effectively use geometric information embedded in the marginal distribution of unlabelled samples to construct a more reasonable classifier. A significant advantage of the proposed LRTELM and Lap-LRTELM is the implementation of the principle of SRM by adding regularization terms to the objective function. Another advantage of the proposed LRTELM and Lap-LRTELM is that only two simple linear equations need to be solved to avoid solving a pair of QPPs like TELM. Compared to existing supervised and semi-supervised ELM algorithms, the proposed LRTELM and Lap-LRTELM maintain almost all of the advantages of ELM, such as significant training efficiency for binary classification problems. These two ELM extensions for supervised and semi-supervised learning are expected to significantly extend the applicability of ELM and provide new insights into extreme learning paradigms. Experimental results on multiple datasets show that our LRTELM and Lap-LRTELM are highly effective compared to other methods.

We anticipate that ways of incorporating our methods into the multi-class classification, regression, and robust problems will be topics addressed in our future work.

Author Contributions

J.M.: Writing—original draft, Conceptualization, Methodology, Software, Funding acquisition. G.Y.: Writing—reviewing and editing, Methodology, Software, Data curation, Supervision, Validation, Project administration, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (No. 11861002, 61907012), the Natural Science Foundation of Ningxia Province, China (No. 2022A0950), the Young Talent Cultivation Project of North Minzu University (No. 2021KYQD23), and the Fundamental Research Funds for the Central Universities (No. 2022XYZSX03).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All of the benchmark datasets used in our numerical experiments are from the UCI Machine Learning Repository, and are available at http://archive.ics.uci.edu/ml/ (accessed on 22 March 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Huang, G.; Huang, G.B.; Song, S.H.; You, K.Y. Trends in extreme learning machines: A review. Neural Netw. 2015, 61, 32–48. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-Vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Wang, G.; Zhao, Y.; Wang, D. A protein secondary structure prediction frame-work based on the extreme learning machine. Neurocomputing 2008, 72, 262–268. [Google Scholar] [CrossRef]
Lan, Y.; Soh, Y.C.; Huang, G.B. Extreme Learning Machine based bacterial protein subcellular localization prediction. In Proceedings of the IEEE International Joint Conference on Neural Networks, IJCNN, Hong Kong, China, 1–8 June 2008; pp. 1859–1863. [Google Scholar]
Mohammed, A.A.; Minhas, R.; Jonathan, Q.M.; Sid-Ahmed, M.A. Human face recognition based on multidimensional PCA and extreme learning machine. Pattern Recognit. 2011, 44, 2588–2597. [Google Scholar] [CrossRef]
Nizar, A.H.; Dong, Z.Y.; Wang, Y. Power utility nontechnical loss analysis with extreme learning machine method. IEEE Trans. Power Syst. 2008, 23, 946–955. [Google Scholar] [CrossRef]
Decherchi, S.; Gastaldo, P.; Dahiya, R.S.; Valle, M.; Zunino, R. Tactile data classification of contact materials using computational intelligence. IEEE Trans. Robot 2011, 27, 635–639. [Google Scholar] [CrossRef]
Choudhary, R.; Shukla, S. Reduced-Kernel Weighted Extreme Learning Machine Using Universum Data in Feature Space (RKWELM-UFS) to Handle Binary Class Imbalanced Dataset Classification. Symmetry 2022, 14, 379. [Google Scholar] [CrossRef]
Owolabi, T.O.; Abd Rahman, M.A. Prediction of band gap energy of doped graphitic carbon nitride using genetic algorithm-based support vector regression and extreme learning machine. Symmetry 2021, 13, 411. [Google Scholar] [CrossRef]
Huang, G.B.; Ding, X.J.; Zhou, H.M. Optimization method based extreme learning machine for classification. Neurocomputing 2010, 74, 155–163. [Google Scholar] [CrossRef]
Yang, L.; Zhang, S. A smooth extreme learning machine framework. J. Intell. Fuzzy Syst. 2017, 33, 3373–3381. [Google Scholar] [CrossRef]
Yang, L.; Zhang, S. A sparse extreme learning machine framework by continuous optimization algorithms and its application in pattern recognition. Eng. Appl. Artif. Intell. 2016, 53, 176–189. [Google Scholar] [CrossRef]
Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2012, 42, 513–529. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Balasundaram, S.; Tanveer, M. On lagrangian twin support vector regression. Neural Comput. Appl. 2013, 22, 257–267. [Google Scholar] [CrossRef]
Ma, J.; Wen, Y.; Yang, L. Lagrangian supervised and semi-supervised extreme learning machine. Appl. Intell. 2019, 49, 303–318. [Google Scholar] [CrossRef]
Jayadeva; Khemchandani, R.; Chandra, S. Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 905. [Google Scholar] [CrossRef]
Peng, X. A ν-twin support vector machine (ν-TSVM) classifier and its geometric algorithms. Inf. Sci. 2010, 180, 3863–3875. [Google Scholar] [CrossRef]
Shao, Y.H.; Zhang, C.H.; Wang, X.B.; Deng, N.Y. Improvements on twin support vector machines. IEEE Trans. Neural Netw. 2011, 22, 962–968. [Google Scholar] [CrossRef]
Qi, Z.; Tian, Y.; Shi, Y. Laplacian twin support vector machine for semi-supervised classification. Neural Netw. 2012, 35, 46–53. [Google Scholar] [CrossRef]
Qi, Z.; Tian, Y.; Shi, Y. Robust twin support vector machine for pattern classification. Pattern Recognit. 2013, 46, 305–316. [Google Scholar] [CrossRef]
Shao, Y.H.; Hua, X.Y.; Liu, L.M.; Yang, Z.M.; Deng, N.Y. Combined outputs framework for twin support vector machines. Appl. Intell. 2015, 43, 424–438. [Google Scholar] [CrossRef]
Mangasarian, O.L.; Musicant, D.R. Lagrangian support vector machines. J. Mach. Learn. Res. 2001, 1, 161–177. [Google Scholar]
Balasundaram, S.; Kapil, N. Application of Lagrangian Twin Support Vector Machines for Classification. In Proceedings of the Second International Conference on Machine Learning & Computing, Washington, DC, USA, 27–29 October 2010. [Google Scholar]
Shao, Y.H.; Chen, W.J.; Zhang, J.J.; Wang, Z.; Deng, N.Y. An efficient weighted lagrangian twin support vector machine for imbalanced data classification. Pattern Recognit. 2014, 47, 3158–3167. [Google Scholar] [CrossRef]
Balasundaram, S.; Gupta, D.; Prasad, S.C. A new approach for training lagrangian twin support vector machine via unconstrained convex minimization. Appl. Intell. 2016, 46, 124–134. [Google Scholar] [CrossRef]
Balasundaram, S.; Gupta, D. On implicit lagrangian twin support vector regression by newton method. Int. J. Comput. Intell. Syst. 2014, 7, 50–64. [Google Scholar] [CrossRef]
Tanveer, M.; Shubham, K. A regularization on lagrangian twin support vector regression. Int. J. Mach. Learn. Cybern. 2017, 8, 807–821. [Google Scholar] [CrossRef]
Balasundaram, S.; Gupta, D. Training lagrangian twin support vector regression via unconstrained convex minimization. Knowl.-Based Syst. 2014, 59, 85–96. [Google Scholar] [CrossRef]
Tanveer, M.; Shubham, K.; Aldhaifallah, M.; Nisar, K.S. An efficient implicit regularized lagrangian twin support vector regression. Appl. Intell. 2016, 44, 831–848. [Google Scholar] [CrossRef]
Belkin, M.; Niyogi, P.; Sindhwani, V. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. J. Mach. Learn. Res. 2006, 7, 11. [Google Scholar]
Chapelle, O.; Sindhwani, V.; Keerthi, S.S. Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 2008, 9, 203–233. [Google Scholar]
Melacci, S.; Belkin, M. Laplacian support vector machines trained in the primal. J. Mach. Learn. Res. 2009, 12, 1149–1184. [Google Scholar]
Zhu, X. Semi-Supervised Learning Literature Survey. 2005. Available online: https://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf (accessed on 20 May 2022).
Huang, G.; Song, S.; Gupta, J.N.D.; Wu, C. Semi-Supervised and unsupervised extreme learning machines. IEEE Trans. Cybern. 2014, 44, 2405. [Google Scholar] [CrossRef] [PubMed]
Liu, B.; Xia, S.X.; Meng, F.R.; Zhou, Y. Manifold regularized extreme learning machine. Neural Comput. Appl. 2016, 27, 255–269. [Google Scholar] [CrossRef]
Li, S.; Song, S.; Wan, Y. Laplacian Twin Extreme Learning Machine for Semi-supervised Classificatio. Neurocomputing 2018, 321, 17–27. [Google Scholar] [CrossRef]
Schölkopf, B.; Smola, A. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
Bertsekas, D.P. Nonlinear programming. J. Oper. Res. Soc. 1997, 48, 334. [Google Scholar] [CrossRef]
Wan, Y.; Song, S.; Huang, G.; Li, S. Twin extreme learning machines for pattern classification. Neurocomputing 2017, 260, 235–244. [Google Scholar] [CrossRef]
Gao, X.; Lu, T.; Liu, P.; Lu, Q. A soil moisture classification model based on SVM used in agricultural WSN. In Proceedings of the IEEE Joint International Information Technology & Artificial Intelligence Conference, Chongqing, China, 20–21 December 2015. [Google Scholar]
Pierna, J.F.; Lecler, B.; Conzen, J.P.; Niemoeller, A.; Baeten, V.; Dardenne, P. Comparison of various chemometric approaches for large near infrared spectroscopic data of feed and feed products. Anal. Chim. Acta 2011, 705, 30–34. [Google Scholar] [CrossRef]
Demišar, J.; Schuurmans, D. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Chen, W.J.; Shao, Y.H.; Hong, N. Laplacian smooth twin support vector machine for semi-supervised classification. Int. J. Mach. Learn. Cybern. 2014, 5, 459–468. [Google Scholar] [CrossRef]
Pei, H.; Wang, K.; Lin, Q.; Zhong, P. Robust semi-supervised extreme learning machine. Knowl.-Based Syst. 2018, 159, 203–220. [Google Scholar] [CrossRef]

Figure 1. The learning times of the LSVM, LTSVM, TELM, LELM, and LRTELM on near-infrared spectral datasets.

Figure 2. ACC performance comparison of LSTVM, TELM, LELM, and LRTELM on eight UCI datasets.

Figure 3. Distribution of the two lines and two moon datasets: (a) two lines dataset and (b) two moon dataset.

Figure 4. Learning results of Lap-LRTELM on the two lines and two moon datasets with different values of parameter

C_{3}

(

C_{3} = 2^{n}

).

Figure 4. Learning results of Lap-LRTELM on the two lines and two moon datasets with different values of parameter

C_{3}

(

C_{3} = 2^{n}

).

Figure 5. Comparing the classification accuracy of SS-ELM, Lap-LELM, Lap-TELM, and Lap-LRTELM on COIL20(B), USPST(B), G50C, and MNIST(B) datasets with different numbers of labeled samples.

Figure 6. Comparing the classification accuracy of SS-ELM, Lap-LELM, Lap-TELM, and Lap-LRTELM on COIL20(B), USPST(B), G50C, and MNIST(B) datasets with different numbers of unlabeled samples.

Table 1. The difference between ELM, TELM, and LRTELM.

	ELM	TELM	LRTELM
Classification hyperplane	A classification	A pair of nonparallel	A pair of nonparallel
Classification hyperplane	hyperplane	classification hyperplanes	classification hyperplanes
Optimization task	One large linear equation	Two smaller QPPs	A pair of linear equations

Table 2. Description of near-infrared spectral datasets.

Regions	Spectral Range	Number of	Number of
Regions	(cm $^{- 1}$ )	Samples	Variables
Region A	4000–6000	240	518
Region B	8000–9000	240	260
Region C	8000–10,000	240	518
Region D	9000–10,000	240	260
Region E	4000–10,000	240	1555

Table 3. Results of LSVM, LTSVM, TELM, LELM, and LRTELM on near-infrared spectral datasets.

Datasets	LSVM	LTSVM	TELM	LELM	LRTELM
	ACC ± S (%)	ACC ± S (%)	ACC ± S (%)	ACC ± S (%)	ACC ± S (%)
	( $C^{}$ , $σ^{}$ )	( $C^{}$ , $σ^{}$ )	( $C^{}$ , $L^{}$ )	( $C^{}$ , $L^{}$ )	( $C^{}$ , $L^{}$ )
Region A	71.36 ± 2.27	66.67 ± 2.31	71.46 ± 2.22	72.50 ± 2.19	$73.19 \pm 2.08$
	( $2^{2}$ , $2^{- 2}$ )	( $2^{2}$ , $2^{- 2}$ )	( $2^{2}$ , 300)	( $2^{3}$ , 300)	( $2^{3}$ , 300)
Region B	73.11 ± 3.78	72.08 ± 4.02	73.58 ± 3.85	75.83 ± 3.66	$76.45 \pm 3.56$
	( $2^{3}$ , $2^{- 1}$ )	( $2^{2}$ , $2^{0}$ )	( $2^{3}$ , 300)	( $2^{1}$ , 500)	( $2^{2}$ , 300)
Region C	62.29 ± 4.16	61.32 ± 4.38	62.34 ± 4.47	63.17 ± 4.43	$63.55 \pm 4.18$
	( $2^{3}$ , $2^{3}$ )	( $2^{- 2}$ , $2^{0}$ )	( $2^{3}$ , 500)	( $2^{2}$ , 500)	( $2^{3}$ , 500)
Region D	72.13 ± 1.44	72.08 ± 1.29	72.87 ± 1.13	72.75 ± 1.56	$72.89 \pm 1.19$
	( $2^{3}$ , $2^{0}$ )	( $2^{4}$ , $2^{- 1}$ )	( $2^{2}$ , 300)	( $2^{- 1}$ , 300)	( $2^{- 1}$ , 300)
Region E	73.21 ± 2.56	72.69 ± 2.33	73.45 ± 2.67	73.88 ± 2.32	$74.01 \pm 2.28$
	( $2^{2}$ , $2^{3}$ )	( $2^{3}$ , $2^{2}$ )	( $2^{2}$ , 500)	( $10^{- 1}$ , 1000)	( $2^{- 1}$ , 1000)
Avg.ACC	70.42	68.968	70.74	71.626	72.018
Avg.rank	4	5	2.8	2.2	1

Table 4. Results of LSVM, LSTVM, TELM, LELM, and LRTELM on UCI datasets.

	LSVM	LTSVM	TELM	LELM	LRTELM
Datasets	ACC ± S (%)	ACC ± S (%)	ACC ± S (%)	ACC ± S (%)	ACC ± S (%)
	( $C^{}$ , $σ^{}$ )	( $C^{}$ , $σ^{}$ )	( $C^{}$ , $L^{}$ )	( $C^{}$ , $L^{}$ )	( $C^{}$ , $L^{}$ )
	Times (s)	Times (s)	Times (s)	Times (s)	Times (s)
Australian	89.71 ± 2.35	$92.38 \pm 2.34$	90.55 ± 2.51	90.12 ± 2.37	91.26 ± 2.06
( $690 \times 14$ )	( $2^{2}$ , $2^{- 2}$ )	( $2^{3}$ , $2^{- 2}$ )	( $2^{0}$ , 500)	( $2^{3}$ , 500)	( $2^{3}$ , 500)
	1.265	1.272	1.245	1.335	1.221
German	79.99 ± 5.376	83.87 ± 6.241	83.78 ± 3.45	84.78 ± 4.61	$85.74 \pm 3.45$
( $1000 \times 24$ )	( $2^{4}$ , $2^{0}$ )	( $2^{2}$ , $2^{- 2}$ )	( $2^{3}$ , 1000)	( $2^{1}$ , 1000)	( $2^{2}$ , 1000)
	3.489	3.476	3.542	3.337	3.262
Breast Cancer	91.56 ± 2.76	93.88 ± 0.78	96.26 ± 1.32	95.77 ± 1.05	$96.79 \pm 0.89$
( $699 \times 9$ )	( $2^{3}$ , $2^{- 3}$ )	( $2^{1}$ , $2^{- 3}$ )	( $2^{3}$ , 300)	( $2^{3}$ , 300)	( $2^{2}$ , 500)
	2.275	1.343	1.564	1.351	1.289
WDBC	94.77 ± 1.85	94.89 ± 1.83	95.78 ± 1.84	$96.55 \pm 2.56$	96.36 ± 1.67
( $569 \times 30$ )	( $2^{2}$ , $2^{- 3}$ )	( $2^{3}$ , $2^{1}$ )	( $2^{3}$ , 500)	( $2^{3}$ , 500)	( $2^{3}$ , 500)
	0.740	0.732	1.479	0.253	0.335
Spam	87.58 ± 1.56	86.86 ± 1.86	87.58 ± 1.38	86.89 ± 1.21	$89.87 \pm 1.62$
( $4601 \times 57$ )	( $2^{3}$ , $2^{- 2}$ )	( $2^{2}$ , $2^{- 2}$ )	( $2^{3}$ , 5000)	( $2^{2}$ , 5000)	( $2^{3}$ , 5000)
	8.586	6.287	8.455	7.585	5.898
Pima	74.56 ± 4.74	74.86 ± 2.43	76.27 ± 3.45	76.94 ± 1.52	$77.75 \pm 1.46$
( $768 \times 8$ )	( $2^{2}$ , $2^{- 3}$ )	( $2^{3}$ , $2^{1}$ )	( $2^{3}$ , 300)	( $2^{4}$ , 500)	( $2^{3}$ , 500)
	2.233	1.503	5.475	1.816	1.145
QSAR	84.86 ± 1.65	84.76 ± 1.67	86.87 ± 2.47	86.73 ± 1.63	$88.28 \pm 1.78$
( $1055 \times 41$ )	( $2^{1}$ , $2^{1}$ )	( $2^{3}$ , $2^{1}$ )	( $2^{2}$ , 500)	( $2^{3}$ , 1000)	( $2^{1}$ , 1000)
	3.158	3.671	5.873	3.778	3.233
Banknote	88.59 ± 2.01	86.79 ± 1.83	87.09 ± 1.33	86.85 ± 1.45	$89.75 \pm 0.94$
( $1372 \times 4$ )	( $2^{2}$ , $2^{- 2}$ )	( $2^{3}$ , $2^{- 3}$ , 10)	( $2^{1}$ , 1000)	( $2^{3}$ , 1000)	( $2^{4}$ , 1000)
	4.614	3.338	9.756	3.019	2.817
Diabetes	60.22 ± 1.45	$60.84 \pm 2.27$	$59.45 \pm 3.39$	$60.05 \pm 1.32$	$61.17 \pm 2.13$
( $1151 \times 19$ )	( $2^{3}$ , 10)	( $2^{1}$ , 10)	( $2^{3}$ , 1000)	( $2^{3}$ , 1000)	( $2^{0}$ , 1000)
	3.153	2.384	2.653	3.237	2.186
Avg.ACC	83.54	84.35	84.85	84.96	86.33
Avg.rank	4.06	3.67	3.05	3	1.22

Table 5. Performance comparison of the SS-ELM, Lap-TELM, LRTELM, and Lap-LRTELM on the two lines and two moon datasets.

	LRTELM	SS-ELM	Lap-TELM	Lap-LRTELM
Datasets	ACC (%)	ACC (%)	ACC (%)	ACC (%)
Datasets	Times (s)	Times (s)	Times (s)	Times (s)
Two lines	87.12	92.25	94.51	$95.39$
Two lines	2.232	3.154	2.105	1.396
Two moons	92.35	95.54	97.33	$98.26$
Two moons	3.437	5.205	3.285	3.116

Table 6. Performance comparison of Lap-LELM, LRTELM, SS-ELM, Lap-TELM, and Lap-LRTELM on UCI datasets.

Datasets	Percentage	Lap-LELM	LRTELM	SS-ELM	Lap-TELM	Lap-LRTELM
	of Labeled	ACC (%)	ACC (%)	ACC (%)	ACC (%)	ACC (%)
	Samples	( $C^{}$ , $L^{}$ , $N^{*}$ )	( $C^{}$ , $L^{}$ )	( $λ^{}$ , $L^{}$ , $N^{*}$ )	( $C^{}$ , $L^{}$ , $N^{*}$ )	( $C^{}$ , $L^{}$ , $N^{*}$ )
Diabetic	10%	58.78	59.65	59.74	59.53	$59.93$
( $1151 \times 19$ )		( $2^{2}$ , 1000, 7)	( $2^{3}$ , 1000)	( $2^{2}$ , 1000, 7)	( $2^{1}$ , 500, 7)	( $2^{3}$ , 1000, 7)
	30%	59.86	60.14	60.25	60.21	$61.08$
		( $2^{2}$ , 1000, 5)	( $10^{3}$ , 1000)	( $2^{1}$ , 1000, 7)	( $2^{3}$ , 1000, 7)	( $2^{3}$ , 1000, 5)
Australian	10%	85.45	$86.89$	84.82	84.55	86.37
( $690 \times 14$ )		( $2^{0}$ , 500, 3)	( $2^{1}$ , 500)	( $2^{- 2}$ , 500, 3)	( $2^{3}$ , 500, 3)	( $2^{2}$ , 1000, 3)
	30%	86.27	$87.36$	85.64	85.53	86.73
		( $2^{2}$ , 500, 3)	( $2^{1}$ , 500)	( $2^{2}$ , 500, 3)	( $2^{3}$ , 500, 3)	( $2^{3}$ , 1000, 3)
Banknote	10%	83.34	83.56	84.53	84.67	$84.97$
( $1372 \times 4$ )		( $2^{1}$ , 1000, 5)	( $2^{3}$ , 500)	( $2^{1}$ , 500, 5)	( $2^{3}$ , 1000, 5)	( $2^{2}$ , 1000, 5)
	30%	87.28	86.75	88.79	88.58	$89.38$
		( $2^{2}$ , 1000, 5)	( $2^{2}$ , 500)	( $2^{3}$ , 1000, 5)	( $2^{2}$ , 1000, 5)	( $2^{2}$ , 1000, 5)
Breast Cancer	10%	96.63	95.13	94.23	95.65	$97.24$
( $699 \times 9$ )		( $2^{2}$ , 500, 3)	( $2^{0}$ , 500)	( $2^{1}$ , 500, 3)	( $2^{0}$ , 500, 3)	( $2^{3}$ , 1000, 3)
	30%	97.38	96.45	96.74	96.86	$98.29$
		( $2^{0}$ , 500, 3)	( $2^{2}$ , 500)	( $2^{1}$ , 500, 3)	( $2^{1}$ , 500, 3)	( $2^{3}$ , 1000, 3)
WDBC	10%	$93.83$	92.33	93.43	93.57	93.66
( $569 \times 30$ )		( $2^{3}$ , 500, 3)	( $2^{2}$ , 500)	( $2^{3}$ , 500, 3)	( $2^{1}$ , 500, 3)	( $2^{3}$ , 500, 3)
	30%	$94.41$	93.13	94.11	94.14	94.27
		( $2^{3}$ , 500, 3)	( $2^{2}$ , 500)	( $2^{3}$ , 500, 3)	( $2^{3}$ , 500, 3)	( $2^{2}$ , 500, 3)
German	10%	72.96	71.13	76.39	76.89	$77.66$
( $1000 \times 24$ )		( $2^{4}$ , 1000, 7)	( $2^{0}$ , 500)	( $2^{2}$ , 500, 7)	( $2^{0}$ , 500, 3)	( $2^{2}$ , 500, 5)
	30%	76.11	78.93	78.81	78.91	$79.11$
		( $2^{4}$ , 1000, 7)	( $2^{0}$ , 500)	( $2^{- 1}$ , 500, 7)	( $2^{1}$ , 500, 3)	( $2^{1}$ , 500, 5)
Pima	10%	86.81	81.79	81.18	81.65	$89.64$
( $768 \times 8$ )		( $2^{2}$ , 500, 3)	( $2^{0}$ , 500)	( $2^{2}$ , 500, 3)	( $2^{3}$ , 500, 3)	( $2^{3}$ , 1000, 3)
	30%	88.64	83.55	83.38	83.75	$93.03$
		( $2^{1}$ , 500, 5)	( $2^{0}$ , 500)	( $2^{- 1}$ , 500, 3)	( $2^{1}$ , 500, 5)	( $2^{2}$ , 1000, 5)
QSAR	10%	85.75	85.14	86.54	86.24	$87.54$
( $1055 \times 41$ )		( $2^{3}$ , 1000, 9)	( $2^{2}$ , 1000)	( $2^{3}$ , 1000, 9)	( $2^{1}$ , 1000, 5)	( $2^{2}$ , 1000, 3)
	30%	89.58	86.25	89.66	89.46	$91.21$
		( $2^{2}$ , 1000, 9)	( $2^{2}$ , 1000)	( $2^{0}$ , 1000, 9)	( $2^{2}$ , 1000, 5)	( $2^{2}$ , 1000, 3)
Spam	10%	90.27	89.63	89.47	89.83	$90.89$
( $4601 \times 57$ )		( $2^{2}$ , 3000, 9)	( $2^{1}$ , 3000)	( $2^{3}$ , 3000, 9)	( $2^{- 1}$ , 3000, 9)	( $2^{0}$ , 3000, 9)
	30%	92.76	90.85	90.31	90.78	$92.37$
		( $2^{1}$ , 5000, 9)	( $2^{3}$ , 5000)	( $2^{2}$ , 5000, 9)	( $2^{- 1}$ , 5000, 9)	( $2^{2}$ , 5000, 9)
Avg.ACC	10%	83.65	82.81	83.37	83.62	$85.32$
Avg.ACC	30%	85.81	84.82	85.30	85.36	$87.28$
Avg.rank	10%	3.11	3.78	3.56	3.33	$1.22$
Avg.rank	30%	3	3.78	3.56	3.44	$1.22$

Table 7. Description of the datasets.

Datasets	$\| L \|$	$\| U \|$	$\| V \|$	$\| T \|$
G50C	50	313	50	137
COIL20(B)	40	1000	50	360
USPST(B)	50	1409	50	498
MNIST(B)	200	1100	200	500

Table 8. Performance comparison of LRTELM, SS-ELM, Lap-LELM, Lap-TELM, and Lap-LRTELM.

Datasets	Subset	LRTELM	SS-ELM	Lap-LELM	Lap-TELM	Lap-LRTELM
		ACC ± S (%)	ACC ± S (%)	ACC ± S (%)	ACC ± S (%)	ACC ± S (%)
COIL20(B)	$U$	86.47 ± 1.78	86.38 ± 2.12	86.79 ± 2.31	89.08 ± 2.64	$90.25 \pm 1.54$
	$V$	89.28 ± 2.32	88.60 ± 1.96	88.81 ± 2.64	91.13 ± 3.42	$91.29 \pm 2.88$
	$T$	86.37 ± 1.22	86.47 ± 2.68	86.61 ± 2.63	89.44 ± 4.00	$91.58 \pm 1.54$
USPST(B)	$U$	89.67 ± 3.74	90.78 ± 3.49	91.56 ± 2.72	91.83 ± 0.40	$92.66 \pm 1.38$
	$V$	91.25 ± 2.56	91.58 ± 2.34	91.75 ± 4.38	91.34 ± 0.94	$93.42 \pm 1.52$
	$T$	89.53 ± 2.59	90.12 ± 2.66	90.47 ± 2.76	90.74 ± 0.62	$92.895 \pm 1.53$
G50C	$U$	91.18 ± 3.26	93.85 ± 1.57	88.67 ± 5.39	91.77 ± 2.53	$93.87 \pm 2.26$
	$V$	92.33 ± 3.06	92.35 ± 1.53	85.83 ± 4.78	95.17 ± 3.24	$94.45 \pm 3.34$
	$T$	91.27 ± 3.76	93.49 ± 1.45	82.54 ± 2.65	92.27 ± 3.06	$94.59 \pm 2.56$
MNIST(B)	$U$	85.47 ± 2.76	89.33 ± 1.29	89.78 ± 1.77	90.36 ± 1.13	$91.58 \pm 1.63$
	$V$	87.66 ± 3.62	90.42 ± 2.22	91.56 ± 1.86	92.04 ± 1.80	$92.69 \pm 1.78$
	$T$	85.49 ± 1.39	87.05 ± 1.49	89.64 ± 1.58	89.88 ± 1.34	$91.32 \pm 1.46$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, J.; Yu, G. Lagrangian Regularized Twin Extreme Learning Machine for Supervised and Semi-Supervised Classification. Symmetry 2022, 14, 1186. https://doi.org/10.3390/sym14061186

AMA Style

Ma J, Yu G. Lagrangian Regularized Twin Extreme Learning Machine for Supervised and Semi-Supervised Classification. Symmetry. 2022; 14(6):1186. https://doi.org/10.3390/sym14061186

Chicago/Turabian Style

Ma, Jun, and Guolin Yu. 2022. "Lagrangian Regularized Twin Extreme Learning Machine for Supervised and Semi-Supervised Classification" Symmetry 14, no. 6: 1186. https://doi.org/10.3390/sym14061186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lagrangian Regularized Twin Extreme Learning Machine for Supervised and Semi-Supervised Classification

Abstract

1. Introduction

2. Related Work

2.1. ELM

2.2. TELM

2.3. Manifold Regularization Framework

3. Lagrangian Regularized Twin Extreme Learning Machine

3.1. LRTELM

3.2. Convergence Analysis

3.3. Compare with Other Relevant Methods

4. Laplacian Lagrangian Regularized Twin Extreme Learning Machine

4.1. Lap-LRTELM

4.2. Comparison with Other Related Algorithms

5. Experiment

5.1. Experimental Setup

5.2. Supervised Learning Results

5.2.1. Experiment on Near-Infrared Spectral Datasets

5.2.2. Experimental Results on UCI Datasets

5.2.3. Statistical Analysis

5.3. Semi-Supervised Learning Results

5.3.1. Experimental Results on Artificial Datasets

5.3.2. Experimental Results on UCI Datasets

5.3.3. Experimental Results on Image Datasets

5.3.4. Statistical Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI