An Accelerated Symmetric Nonnegative Matrix Factorization Algorithm Using Extrapolation

Wang, Peitao; He, Zhaoshui; Lu, Jun; Tan, Beihai; Bai, YuLei; Tan, Ji; Liu, Taiheng; Lin, Zhijie

doi:10.3390/sym12071187

Open AccessArticle

An Accelerated Symmetric Nonnegative Matrix Factorization Algorithm Using Extrapolation

by

Peitao Wang

^1,2,

Zhaoshui He

^1,2,*,

Jun Lu

^1,2,3,

Beihai Tan

^1,2,

YuLei Bai

^1,2,

Ji Tan

^1,2,

Taiheng Liu

^1,2 and

Zhijie Lin

^1,2

¹

School of Automation, Guangdong University of Technology, Guangzhou 510006, China

²

Guangdong Key Laboratory of IoT Information Technology, Guangzhou 510006, China

³

Peng Cheng Laboratory, Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

Symmetry 2020, 12(7), 1187; https://doi.org/10.3390/sym12071187

Submission received: 2 July 2020 / Revised: 13 July 2020 / Accepted: 14 July 2020 / Published: 17 July 2020

(This article belongs to the Special Issue Advances in Symmetric Tensor Decomposition Methods)

Download

Browse Figures

Versions Notes

Abstract

:

Symmetric nonnegative matrix factorization (SNMF) approximates a symmetric nonnegative matrix by the product of a nonnegative low-rank matrix and its transpose. SNMF has been successfully used in many real-world applications such as clustering. In this paper, we propose an accelerated variant of the multiplicative update (MU) algorithm of He et al. designed to solve the SNMF problem. The accelerated algorithm is derived by using the extrapolation scheme of Nesterov and a restart strategy. The extrapolation scheme plays a leading role in accelerating the MU algorithm of He et al. and the restart strategy ensures that the objective function of SNMF is monotonically decreasing. We apply the accelerated algorithm to clustering problems and symmetric nonnegative tensor factorization (SNTF). The experiment results on both synthetic and real-world data show that it is more than four times faster than the MU algorithm of He et al. and performs favorably compared to recent state-of-the-art algorithms.

Keywords:

symmetric nonnegative matrix factorization; extrapolation scheme; symmetric nonnegative tensor factorization; clustering

1. Introduction

Given a symmetric nonnegative matrix

A \in R_{+}^{n \times n}

, symmetric nonnegative matrix factorization (SNMF) aims to find a nonnegative matrix

G \in R_{+}^{n \times r}

(generally

r ≪ n

) such that

A \approx G G^{T}

. Based on the widely used Frobenius norm metric, the nonnegative factor

G

is obtained by solving the following objective function:

\{\begin{matrix} min_{G} F (G) = {‖ A - G G^{T} ‖}_{F}^{2}, \\ subject to G ⪰ 0 . \end{matrix}

(1)

SNMF is a special but important class of symmetric nonnegative tensor factorization (SNTF). It can serve as a basic building block of SNTF algorithms for a higher order tensor [1]. Besides, SNMF has been widely applied to the clustering problems where the cluster structure is captured by pairwise affinity relationships among points [2,3,4,5,6,7,8,9]. Recent works have demonstrated that SNMF can provide superior clustering quality compared to many classic clustering algorithms, such as spectral clustering, see [9,10,11,12] and the references therein.

So far, many efficient algorithms have been proposed for SNMF. In [6], He et al. proposed three multiplicative update algorithms, including a basic algorithm and two fast algorithms:

α

-SNMF and

β

-SNMF algorithms. It was numerically shown that the three algorithms outperform other previous multiplicative update algorithms [9,13,14]. In [10], Kuang et al. proposed a Newton-like SNMF algorithm, which uses partial second-order information to guide the scaling of the gradient direction. Compared to the projected Newton algorithm, the Newton-like algorithm can greatly reduce the computational complexity. It is guaranteed to converge to a stationary solution. In [7], Vandaele et al. suggested to solve the SNMF problem by the block coordinate descent (BCD) algorithm, in which only one entry of

G

is optimized each time and the corresponding subproblem is to find the best root of a cubic equation. The BCD algorithm has a low computational burden and storage requirement, which makes it suitable for solving large-scale problems. In [8], Shi et al. proposed two inexact BCD methods based on block successive upper-bounding minimization (BSUM), named scalar BSUM (sBSUM) and vector-wise BSUM (vBSUM). Both of them are guaranteed to converge to stationary solutions.

In this paper, we propose an accelerated variant of the basic multiplicative update (MU) algorithm of He et al. [6]. For convenience, we refer to the basic MU algorithm as MU-SNMF, and the accelerated variant as accelerated MU-SNMF (AMU-SNMF). We draw inspiration from Nesterov’s accelerated gradient (NAG) method and derive AMU-SNMF by applying the extrapolation scheme in NAG to MU-SNMF. To overcome the non-monotonic objective value caused by the extrapolation scheme, we restart the algorithm and take the current iteration as the new starting point when an increase in the objective value is observed. We apply AMU-SNMF to some real-world clustering problems and use it for SNTF following the framework of the averaging approach [1]. The experiment results show that AMU-SNMF can rapidly decrease the objective value and has good clustering performance in the applications of clustering. Furthermore, in SNTF, AMU-SNMF can achieve a low objective value of SNTF by the framework of the averaging approach.

The rest of this paper is organized as follows. In Section 2, we first review the MU-SNMF algorithm and the NAG method, and then present the AMU-SNMF algorithm. The averaging approach with AMU-SNMF for SNTF is discussed in the last part of Section 2. In Section 3, we conduct experiments on synthetic and real-world data. Finally, we conclude this paper in Section 4.

2. Accelerated SNMF Algorithm

2.1. Multiplicative Update Algorithm for SNMF

In this subsection, we briefly review the MU-SNMF algorithm proposed by He et al. [6]. It is derived from an auxiliary function (see Definition 1) of (1). At the

(t + 1)

th iteration, the auxiliary function is represented as

f (G, G^{t}) = Tr (A A^{T}) - 4 Tr (A G^{t} G^{T}) + 2 Tr (A G^{t} {G^{t}}^{T}) + \sum_{i j} \frac{{(G^{t} {G^{t}}^{T} G^{t})}_{i j}}{{(G_{i j}^{t})}^{3}} G_{i j}^{4},

(2)

where

G^{t}

denotes the status of

G

after t-th iteration, and

Tr (\cdot)

denotes the trace operator. By setting the gradient

\nabla_{G} f (G, G^{t})

to zero, the multiplicative update rule of MU-SNMF is derived as follows:

G_{i j}^{t + 1} = G_{i j}^{t} \sqrt[3]{\frac{{(A G^{t})}_{i j}}{{(G^{t} {G^{t}}^{T} G^{t})}_{i j}}}, i = 1, 2, \dots, n, j = 1, 2, \dots, r .

(3)

Given a positive initialization, MU-SNMF will always preserve the nonnegativity constraints on

G

. It has been proved that MU-SNMF decreases the objective value at each iteration. Moreover, MU-SNMF converges to a stationary point of (1) if

A

is complete positive. For the detail proof, please refer to Proposition 1 and 2 in [6]. We summarize MU-SNMF in Algorithm 1.

Definition 1

(Auxiliary function). A function

f (x, y)

is an auxiliary function of

F (x)

if it satisfies two properties: (1)

F (x) \leq f (x, y)

and (2)

F (y) = f (y, y)

for all vectors

x, y \in dom F

.

Algorithm 1:

G

= MU-SNMF(

A

, r).

Step 1: Initialization
Initialize

G^{0}

. Set

t = 0

.
Step 2: Update stage
repeat

G^{t + 1} = G^{t} \otimes \sqrt[3]{\frac{A G^{t}}{G^{t} {G^{t}}^{T} G^{t}}}

t = t + 1

until the stopping criterion is satisfied
Step 3: Output

G \leftarrow G^{t}

.
⊗ and

\frac{[\cdot]}{[\cdot]}

denote elementwise product and division, respectively.

2.2. Nesterov’s Accelerated Gradient

Accelerated gradient method [15] is an accelerated variant of gradient descent. It was proposed by Nesterov in 1983, and is commonly referred to as Nesterov’s accelerated gradient (NAG). Given a smooth convex function

J (θ)

to be minimized, NAG takes an initial point

θ^{0}

and runs the following iterative scheme:

\{\begin{matrix} v^{0} = 0, \\ y^{t + 1} = θ^{t} + γ^{t} v^{t}, \\ θ^{t + 1} = y^{t + 1} - η^{t + 1} \nabla_{θ} J (y^{t + 1}), \\ v^{t + 1} = θ^{t + 1} - θ^{t}, \end{matrix}

(4)

where the superscript of the variables denotes the iteration number,

γ^{t} = 1 - 3 / (5 + t)

, and

η^{t + 1}

denotes the learning rate. At each iteration, NAG first updates the variable

θ

along the previous update direction. The intuition behind it is that the descent direction tends to persist across successive iterations. The result is denoted by a new variable

y

. After that, NAG takes a gradient descent step from the point

y

. The gradient descent step plays an important role in making a timely correction to

y

if

y

is indeed a poor update [16,17]. For Lipshitz convex functions, NAG can achieve a global convergence rate of

O (1 / t^{2})

, which is the optimal convergence rate as described in [18].

2.3. Accelerated MU-SNMF Algorithm

Inspired from the extrapolation scheme in NAG, we modify MU-SNMF as follows:

\{\begin{matrix} Y^{t + 1} = G^{t} + γ^{t} (G^{t} - G^{t - 1}), \\ G^{t + 1} = Y^{t + 1} \otimes \sqrt[3]{\frac{A Y^{t + 1}}{Y^{t + 1} {Y^{t + 1}}^{T} Y^{t + 1}}}, \end{matrix}

(5)

where ⊗ denotes elementwise product, and

\frac{[\cdot]}{[\cdot]}

denotes elementwise division. Note that there might be some negative entries in

Y^{t + 1}

since the entries of

G

are not guaranteed to be nondecreasing during the iterative process. To keep

Y^{t + 1}

elementwise positive, the entries of

Y^{t + 1}

are truncated to a small positive number

ε

once they are less than

ε

, resulting in the following iterative scheme:

\{\begin{matrix} Y^{t + 1} = \max {G^{t} + γ^{t} (G^{t} - G^{t - 1}), ε}, \\ G^{t + 1} = Y^{t + 1} \otimes \sqrt[3]{\frac{A Y^{t + 1}}{Y^{t + 1} {Y^{t + 1}}^{T} Y^{t + 1}}} . \end{matrix}

(6)

For ease of description, we temporarily refer to the algorithm with update rule (6) as accelerated MU-SNMF (AMU-SNMF). The performance of AMU-SNMF is very dependent on the choice of

γ^{t}

. If it is chosen too small, the extrapolation scheme would do little to accelerate MU-SNMF. If it is too large, AMU-SNMF would probably diverge. We find that AMU-SNMF can work well by setting

γ^{t} = 1 - 3 / (5 + t)

.

Unlike MU-SNMF, AMU-SNMF is not guaranteed to be monotone in the objective value. To overcome this problem, we start AMU-SNMF again and take the current iteration as the new starting point when an increase in the objective value is observed. We summarize the whole procedure in Algorithm 2. In the following, when we refer to AMU-SNMF we mean the Algorithm 2.

Algorithm 2:

G

= AMU-SNMF(

A

, r).

Step 1: Initialization
Initialize

G^{0}

. Set

t = 0

,

t^{r e s t a r t} = 0

,

ε = 10^{- 16}

,

F^{0} = {‖ A - G^{0} {G^{0}}^{T} ‖}_{F}^{2}

.
Step 2: Update stage
repeat

γ^{t} = 1 - \frac{3}{5 + t - t^{r e s t a r t}}

if

t = t^{r e s t a r t}

then

Y^{t + 1} = G^{t}

else

Y^{t + 1} = \max {(1 + γ^{t}) G^{t} - γ^{t} G^{t - 1}, ε}

end if

G^{n e w} = Y^{t + 1} \otimes \sqrt[3]{\frac{A Y^{t + 1}}{Y^{t + 1} {Y^{t + 1}}^{T} Y^{t + 1}}}

F^{n e w} = {‖ A - G^{n e w} {(G^{n e w})}^{T} ‖}_{F}^{2}

if

F^{n e w} > F^{t}

then

t^{r e s t a r t} = t + 1

G^{n e w} = G^{t}

F^{n e w} = F^{t}

end if

G^{t + 1} = G^{n e w}

F^{t + 1} = F^{n e w}

t = t + 1

until the stopping criterion is satisfied
Step 3: Output

G \leftarrow G^{t}

.
⊗ and

\frac{[\cdot]}{[\cdot]}

denote elementwise product and division, respectively.

2.4. Symmetric Nonnegative Tensor Factorization (SNTF)

In this subsection, we apply AMU-SNMF to SNTF of third order symmetric nonnegative tensors. The SNTF problem is described as follows. Given a three order symmetric nonnegative tensor

A \in R_{+}^{n \times n \times n}

and a positive integer r, SNTF seeks to find a nonnegative matrix

G = [g_{1}, \dots, g_{r}] \in R_{+}^{n \times r}

such that

A \approx \sum_{i = 1}^{r} g_{i} \circ g_{i} \circ g_{i},

(7)

where the symbol ∘ represents the vector outer product. Based on the widely used Euclidean distance, the matrix

G

is obtained by solving the following objective function:

\{\begin{matrix} min_{G} J (G) = | | A - \sum_{i = 1}^{r} g_{i} \circ g_{i} \circ g_{i} {| |}^{2}, \\ subject to G ⪰ 0, \end{matrix}

(8)

where the tensor norm

| | \cdot | |

is the square root of the sum of the squares of all its elements, e.g., for a three order tensor

X \in R^{n \times n \times n}

,

| | X | | = \sqrt{\sum_{i = 1}^{n} \sum_{j = 1}^{n} \sum_{k = 1}^{n} X_{i j k}^{2}} .

(9)

For (8), Cichocki et al. [1] proposed a simple approach, referred to as averaging approach. The idea behind it is to convert the SNTF problem to a standard SNMF problem:

\tilde{A} \approx \tilde{G} {\tilde{G}}^{T},

(10)

where

\tilde{A} = \sum_{i = 1}^{n} A_{:, :, i} \in R_{+}^{n \times n}

,

A_{:, :, i}

is the i-th frontal slice of the given tensor

A

,

\tilde{G} = G D^{\frac{1}{2}}

,

D \in R_{+}^{r \times r}

is a diagonal matrix with the diagonal elements

D_{j j} = \sum_{i = 1}^{n} G_{i j}

. The matrix

\tilde{G}

can be obtained by any SNMF algorithm.

Following the framework of the averaging approach, we convert the SNTF problem (8) to the SNMF problem (10) and solve it by AMU-SNMF. Once

\tilde{G}

is obtained, we compute

G

from

\tilde{G}

in the following way. Denote the j-th column of

\tilde{G}

by

{\tilde{g}}_{j}

. The vector

{\hat{g}}_{j}

is a normalization of

{\tilde{g}}_{j}

such that the sum of

{\hat{g}}_{j}

is equal to 1. Then,

g_{j}

is derived by

g_{j} = α_{j} {\hat{g}}_{j}

, where

α_{j}

satisfies

α_{_{j}}^{_{3 / 2}} = \sum_{i = 1}^{n} {\tilde{G}}_{i j}

. It is worth noting that the objective value

J (G)

is not guaranteed to be monotonically decreasing by the averaging approach. This can be visually understood in Figure 1, which shows the trajectories of

J (G)

generated by the averaging approach, where four different SNMF algorithms including MU-SNMF [6], Newton-like SNMF [10], vBSUM [8] and AMU-SNMF were separately used for (10), on a three order symmetric nonnegative tensor

A \in R_{+}^{10 \times 10 \times 10}

. We can see from Figure 1 that the matrix

G

obtained at the last iteration may not have the lowest objective value

J (G)

compared to the previous iterations, and, therefore, may not be the best choice to be the output of the averaging approach. An alternative way, which we adopt in the experiments, is to store the estimation of

G

that has the best

J (G)

and output it when the averaging approach terminates.

3. Experiments and Results

In this section, we first test AMU-SNMF (Algorithm 2) on synthetic data, and then apply it to some real-world clustering problems. Finally, we test AMU-SNMF in SNTF using the framework of the averaging approach [1]. In the clustering experiments, the clustering quality is measured by clustering accuracy (CA) and normalized mutual information (NMI) [19]. CA is used for measuring the percentage of correctly clustered data points. Mutual information (MI) measures the mutual dependence between the predicted and ground truth clustering labels. Furthermore, NMI is a normalization of the MI score to scale the results between 0 (no mutual information) and 1 (perfect correlation). To demonstrate the effectiveness of AMU-SNMF, we compare it with seven SNMF algorithms including MU-SNMF [6],

α

-SNMF [6],

β

-SNMF [6], Newton-like SNMF [10] (the code of Newton-like SNMF is available from http://math.ucla.edu/dakuang/), BCD [7], sBSUM [8] and vBSUM [8]. For

α

-SNMF and

β

-SNMF algorithms, we use the default setting for the parameters

α

and

β

recommended by the authors of [6], i.e.,

α = 0.99

,

β = 0.99

. In all the experiments, the SNMF algorithms are randomly initialized from points in the form of

\sqrt{ω} P

(i.e.,

G^{0} = \sqrt{ω} P

), where

P

is a randomly generated nonnegative matrix and

ω = {argmin}_{ω \geq 0} | | A - ω P P^{T} {| |}_{F}^{2}

. We implement all the experiments on a computer with a 2.8-GHz Intel Core i5-4200 CPU and 8-GB memory by 64-bit MATLAB R2015a on Windows 7.

3.1. Synthetic Data

The synthetic data were generated in the following way:

A = G G^{T} + E

, where

G \in R_{+}^{100 \times 30}

was generated randomly and half of its entries were zeros.

E

was a matrix of noise whose intensity was determined by signal-to-noise ratio (SNR). We tested AMU-SNMF in both noise-free and noisy scenarios. In the noisy case, the noise level was

SNR = 10

dB. For each scenario, 20 different

A

were generated to test AMU-SNMF. To decrease the effect of initializations, the SNMF algorithms were run 10 times with different initializations for each

A

. The SNMF algorithms were stopped when their elapsed time exceeded 10 s. Following the strategy from [20], we report

E (G) = \frac{| | A - G G^{T} {| |}_{F}}{{| | A | |}_{F}} - e_{\min}

(11)

of all the SNMF algorithms, where

e_{\min}

denotes the lowest relative error obtained by any algorithm with any initialization. For the noise-free synthetic data, we use

e_{\min} = 0

. The use of

E (G)

allows us to take meaningfully the average results over several synthetic data. We compute the average

E (G)

over 200 trials (20 different

A

and 10 different initializations for each

A

), and plot them versus runtime in Figure 2. We can see that in both noise-free and noisy cases (1) it took AMU-SNMF less than 2 s to achieve lower average

E (G)

values than that MU-SNMF achieved at the end of the algorithm (10 s), i.e., AMU-SNMF was more than five times faster than MU-SNMF; (2) it took AMU-SNMF less than 4 s to achieve lower average

E (G)

values than that the other 6 algorithms (including

α

-SNMF,

β

-SNMF, Newton-like SNMF, BCD, sBSUM and vBSUM) achieved at the end of the algorithms (10 s), i.e., AMU-SNMF was more than two times faster than the six algorithms; (3) AMU-SNMF had lower average

E (G)

values at the end of the algorithm compared with all the other algorithms. The above observations demonstrate that AMU-SNMF is not only fast but also of high accuracy.

3.2. Document Clustering

In the experiment, we evaluate AMU-SNMF in the applications of document clustering. The goal of document clustering is to organize the documents into different semantic categories automatically [21]. Two widely used document collections, Topic Detection and Tracking 2 (TDT2) (http://projects.ldc.upenn.edu/TDT2/) and Reuters-21578 (http://www.daviddlewis.com/resources/testcollections/reuters21578/), were used here. The TDT2 corpus consists of data collected during the first half of 1998 and taken from six sources: ABC, CNN, VOA, NYT, PRI, and APW. It contains 11,201 documents from 96 semantic categories. The Reuters-21578 corpus was collected from the Reuters newswire in 1987. It contains 21,578 documents from 135 semantic categories. Both collections are highly unbalanced with the sizes of categories ranging from one to thousands of documents. To improve the reliability of our evaluations, we removed those documents appearing in two or more categories and selected 10 categories for each collection. We selected the 10th to 19th largest categories for the TDT2 corpus, and the 12th to 21th largest categories for the Reuters-21578 corpus. The sizes of the selected categories are just a little unbalanced (see Table 1).

We first normalized each document vector to unit length. Then we constructed the affinity matrix

A

as follows. Denote the normalized document vectors by

x_{i} \in R^{d}

,

i = 1, 2, \dots, n

. The affinity matrix

A

was computed by

A_{i j} = \{\begin{matrix} \exp (- \frac{| | x_{i} - x_{j} {| |}^{2}}{σ_{i} σ_{j}}), if i \in N (j) or j \in N (i) \\ 0, otherwise \end{matrix},

(12)

where

N (i) = {j | x_{j} is one of the k_{n} nearest neighbors of x_{i}, j \neq i}

, and the local scale parameter

σ_{i}

of each data point was set to be the distance between

x_{i}

and its k-th neighbor. We used

k_{n} = ⌊ \log_{2} (n) ⌋ + 1

and

k = 7

as suggested in [10,22]. After that, we performed SNMF for document clustering, where every document was assigned to a cluster that had the largest entry in the corresponding row of

G

. To decrease the effect of initializations, the SNMF algorithms were run 20 times with different initializations for each document collection. They were stopped when their elapsed time exceeded 30 s. The plots of the average

F (G)

versus runtime are displayed in Figure 3. We can see from Figure 3a that it took AMU-SNMF less than 5 s to achieve lower average

F (G)

value than that the other algorithms achieved at the end of the algorithms (30 s). Furthermore, we can see from Figure 3b that (1) all the algorithms converged, and their convergence time was about 15 s (MU-SNMF), 7 s (

α

-SNMF), 5 s (

β

-SNMF), 2.5 s (Newton-like SNMF), 25 s (BCD), 15 s (sBSUM), 15 s (vBSUM) and 5 s (AMU-SNMF), respectively; (2) it took AMU-SNMF less than 2 s to achieve lower average

F (G)

value than that the other algorithms achieved when they converged. The above observations demonstrate that AMU-SNMF was more than six times faster than MU-SNMF, and more than two times faster than the other algorithms except Newton-like SNMF. The clustering results on TDT2 and Reuters-21578, averaged over 20 trials with different initializations, are shown in Table 2. We can see that AMU-SNMF had higher, which demonstrates that AMU-SNMF achieved a better clustering performance compared with the other algorithms.

3.3. Object Clustering

In the experiment, we evaluate AMU-SNMF on Columbia Object Image Library (COIL-20) dataset (http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php). COIL-20 is a database of gray-scale images of 20 objects. The objects were placed on a motorized turntable against a black background. The turntable was rotated through 360 degrees to vary object pose with respect to a fixed camera. Images of the objects were taken at pose intervals of 5 degrees. This corresponds to 72 images per object. Each image is

128 \times 128

pixels in size.

We first constructed the affinity matrix

A

by (12). Then we performed SNMF for object clustering, where every image was assigned to a cluster that had the largest entry in the corresponding row of

G

. To decrease the effect of initializations, the SNMF algorithms were run 20 times with different initializations. They were stopped when their elapsed time exceeded 30 s. The plot of the average

F (G)

versus runtime is displayed in Figure 4. We can see that (1) it took AMU-SNMF about 3.5 s to achieve the average

F (G)

value that MU-SNMF achieved when MU-SNMF converged (about 15 s), i.e., AMU-SNMF was more than four times faster than MU-SNMF; (2) AMU-SNMF was more than three times faster than

α

-SNMF,

β

-SNMF, BCD, sBSUM and vBSUM. The clustering results, averaged over 20 trials with different initializations, are shown in Table 3. We can see that AMU-SNMF had higher, which demonstrates that AMU-SNMF achieved a better clustering performance compared with the other algorithms.

3.4. SNTF

In the experiment, we test AMU-SNMF in SNTF using the framework of the averaging approach [1], which has been discussed in Section 2.4. The experiment was performed on synthetic data which were generated in the following way:

A = \sum_{i = 1}^{r} g_{i} \circ g_{i} \circ g_{i} + E

, where

r = 3

,

g_{i} \in R_{+}^{10} (i = 1, 2, 3)

were generated randomly, and

E \in R^{10 \times 10 \times 10}

was a symmetric tensor of noise whose intensity was determined by signal-to-noise ratio (SNR). Both noise-free and noisy (SNR = 10 dB) synthetic data were used here with each including 10 different

A

. The SNTF was performed by the averaging approach where eight different SNMF algorithms were separately used for (10). The averaging approach was stopped when its elapsed time exceeded 0.5 s. To decrease the effect of initializations, 10 trials were conducted by choosing different initializations for each

A

. We measure the quality of SNTF by

F i t (G) = 1 - \frac{| | A - \sum_{i = 1}^{r} g_{i} \circ g_{i} \circ g_{i} | |}{| | A | |} .

(13)

The greater the value of

F i t (G)

, the better approximation of the tensor

A

is obtained by

\sum_{i = 1}^{r} g_{i} \circ g_{i} \circ g_{i}

. We compute the average

F i t (G)

of the averaging approach over 100 trials (10 different

A

and 10 different initializations for each

A

). The results are listed in Table 4. We can see that the averaging approach with AMU-SNMF yielded better approximate tensor than that with other SNMF algorithms in both noise-free and noisy cases.

4. Conclusions

In this paper, we propose an accelerated variant of the MU-SNMF algorithm designed to solve the SNMF problem. The accelerated algorithm is referred to as AMU-SNMF. It is derived by exploiting the extrapolation scheme in NAG and a restart strategy. Our contributions consist of two parts. First, we derive an accelerated variant of the MU-SNMF algorithm. It is numerically shown that AMU-SNMF is more than four times faster than MU-SNMF. Second, we empirically demonstrate that AMU-SNMF outperforms other 6 state-of-the-art algorithms, including

α

-SNMF [6],

β

-SNMF [6], Newton-like SNMF [10], BCD [7], sBSUM [8] and vBSUM [8]. The experiment results show that AMU-SNMF is more than two times faster than the above algorithms except the Newton-like SNMF algorithm. Moreover, the experiment results show that AMU-SNMF can achieve a better clustering performance in the real-world clustering problems compared with the above algorithms. Furthermore, AMU-SNMF is more suitable for the averaging approach in SNTF. As one of directions of future work, we plan to investigate further the convergence properties of the AMU-SNMF algorithm. Another direction of future work is that we could apply the methodologies adopted in this paper to other SNMF algorithm.

Author Contributions

Funding acquisition, Z.H. and J.L.; Methodology, P.W. and Z.H.; Supervision, Z.H.; Validation, P.W., J.L., B.T., Y.B., J.T., T.L. and Z.L.; Writing—original draft, P.W.; Writing—review and editing, P.W., Z.H., J.L., B.T., Y.B., J.T., T.L. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported in part by National Natural Science Foundation of China under Grants 61773127 and 61727810, Ten Thousand Talent Program approved in 2018, Guangdong Province Foundation Grant 2019B1515120036, Natural Science Foundation of Guangdong Province under Grant 2018A030313306, Guangzhou Science and Technology Foundation under Grant 201802010037, and Key Areas of Research and Development Plan Project of Guangdong under Grant 2019B010147001, Natural Science Foundation of Guangdong Province under Grant 2018A030313306, PCL Future Regional Network Facilities for Large-Scale Experiments and Applications Project under Grant PCL2018KP001, GDHVPS 2014.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cichocki, A.; Jankovic, M.; Zdunek, R.; Amari, S.I. Sparse super symmetric tensor factorization. In Proceedings of the 2007 International Conference on Neural Information Processing (ICONIP), Kitakyushu, Japan, 13–16 November 2007; pp. 781–790. [Google Scholar]
Lu, S.; Hong, M.; Wang, Z. A nonconvex splitting method for symmetric nonnegative matrix factorization: Convergence analysis and optimality. IEEE Trans. Signal Process. 2017, 65, 3120–3135. [Google Scholar] [CrossRef]
Gao, T.; Olofsson, S.; Lu, S. Minimum-volume-regularized weighted symmetric nonnegative matrix factorization for clustering. In Proceedings of the 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Washington, DC, USA, 7–9 December 2016; pp. 247–251. [Google Scholar]
Marin, M.; Vlase, S.; Paun, M. Considerations on double porosity structure for micropolar bodies. Aip Adv. 2015, 5, 037113. [Google Scholar] [CrossRef] [Green Version]
Ma, Y.; Hu, X.; He, T.; Jiang, X. Hessian regularization based symmetric nonnegative matrix factorization for clustering gene expression and microbiome data. Methods 2016, 111, 80–84. [Google Scholar] [CrossRef] [PubMed]
He, Z.; Xie, S.; Zdunek, R.; Zhou, G.; Cichocki, A. Symmetric Nonnegative Matrix Factorization: Algorithms and Applications to Probabilistic Clustering. IEEE Trans. Neural Netw. 2011, 22, 2117–2131. [Google Scholar]
Vandaele, A.; Gillis, N.; Lei, Q.; Zhong, K.; Dhillon, I. Efficient and Non-Convex Coordinate Descent for Symmetric Nonnegative Matrix Factorization. IEEE Trans. Signal Process. 2016, 64, 5571–5584. [Google Scholar] [CrossRef]
Shi, Q.; Sun, H.; Lu, S.; Hong, M.; Razaviyayn, M. Inexact Block Coordinate Descent Methods for Symmetric Nonnegative Matrix Factorization. IEEE Trans. Signal Process. 2017, 65, 5995–6008. [Google Scholar] [CrossRef] [Green Version]
Zass, R.; Shashua, A. A unifying approach to hard and probabilistic clustering. In Proceedings of the 2005 International Conference on Computer Vision (ICCV), Beijing, China, 17–20 October 2005; pp. 294–301. [Google Scholar]
Kuang, D.; Ding, C.; Park, H. Symmetric Nonnegative Matrix Factorization for Graph Clustering. In Proceedings of the 2012 SIAM International Conference on Data Mining (SDM), Anaheim, CA, USA, 26–28 April 2012; pp. 106–117. [Google Scholar]
Kuang, D.; Yun, S.; Park, H. SymNMF: Nonnegative Low-Rank Approximation of a Similarity Matrix for Graph Clustering. J. Glob. Optim. 2015, 62, 545–574. [Google Scholar] [CrossRef]
Lu, S.; Wang, Z. Accelerated algorithms for eigen-value decomposition with application to spectral clustering. In Proceedings of the 2015 Asilomar Conference on Signals, Systems and Computers (ACSSC), Pacific Grove, CA, USA, 8–11 November 2015; pp. 355–359. [Google Scholar]
Bo, L.; Zhang, Z.; Wu, X.; Yu, P.S. Relational clustering by symmetric convex coding. In Proceedings of the 2007 International Conference on Machine Learning (ICML), Corvallis, OR, USA, 20–24 June 2007; pp. 569–576. [Google Scholar]
Long, B.; Zhang, Z.; Yu, P. Co-clustering by block value decomposition. In Proceedings of the 2005 SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD), Chicago, IL, USA, 21–24 August 2005; pp. 635–640. [Google Scholar]
Nesterov, Y. A method of solving a convex programming problem with convergence rate O(1/k²). Sov. Math. Dokl. 1983, 27, 372–376. [Google Scholar]
Sutskever, I.; Martens, J.; Dahl, G.; Hinton, G. On the importance of initialization and momentum in deep learning. In Proceedings of the 2013 International Conference on Machine Learning (ICML), Atlanta Marriott Marquis, Atlanta, GA, USA, 16–21 June 2013; pp. 1139–1147. [Google Scholar]
Botev, A.; Lever, G.; Barber, D. Nesterov’s accelerated gradient and momentum as approximations to regularised update descent. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1899–1903. [Google Scholar]
Yudin, D.; Nemirovskii, A.S. Informational complexity and efficient methods for the solution of convex extremal problems. Matekon 1976, 13, 3–25. [Google Scholar]
Wu, W.; Jia, Y.; Kwong, S.; Hou, J. Pairwise Constraint Propagation-Induced Symmetric Nonnegative Matrix Factorization. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 6348–6361. [Google Scholar] [CrossRef]
Ang, A.M.S.; Gillis, N. Accelerating nonnegative matrix factorization algorithms using extrapolation. Neural Comput. 2019, 31, 417–439. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cai, D.; He, X.; Han, J. Document clustering using locality preserving indexing. IEEE Trans. Knowl. Data Eng. 2005, 17, 1624–1637. [Google Scholar] [CrossRef] [Green Version]
Zelnik-Manor, L.; Perona, P. Self-Tuning Spectral Clustering. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 13–18 December 2004; pp. 1601–1608. [Google Scholar]

Figure 1. The trajectories of

J (G)

generated by the averaging approach on a three order symmetric nonnegative tensor

A \in R_{+}^{10 \times 10 \times 10}

, where 4 different SNMF algorithms were used separately in the averaging approach.

Figure 1. The trajectories of

J (G)

generated by the averaging approach on a three order symmetric nonnegative tensor

A \in R_{+}^{10 \times 10 \times 10}

, where 4 different SNMF algorithms were used separately in the averaging approach.

Figure 2. The plots of the average

E (G)

versus runtime on the synthetic data. (a) Noise-free. (b)

SNR = 10

dB.

Figure 2. The plots of the average

E (G)

versus runtime on the synthetic data. (a) Noise-free. (b)

SNR = 10

dB.

Figure 3. The plots of the average

F (G)

versus runtime on (a) TDT2 and (b) Reuters-21578.

Figure 3. The plots of the average

F (G)

versus runtime on (a) TDT2 and (b) Reuters-21578.

Figure 4. The plot of the average

F (G)

versus runtime on the COIL-20 dataset.

Figure 4. The plot of the average

F (G)

versus runtime on the COIL-20 dataset.

Table 1. The semantic categories from the TDT2 and the Reuters-21578 collections.

TDT2	Category	10	11	12	13	14	15	16	17	18	19
TDT2	Number of documents	167	160	145	141	140	131	123	123	120	104
Reuters-21578	Category	12	13	14	15	16	17	18	19	20	21
Reuters-21578	Number of documents	63	60	53	45	45	44	42	38	38	37

Table 2. The mean of CA and the mean of NMI of the SNMF algorithms on the TDT2 and the Reuters-21578 collections.

		MU-SNMF [6]	$α$ -SNMF [6]	$β$ -SNMF [6]	Newton-Like SNMF [10]	vBSUM [8]	sBSUM [8]	BCD [7]	AMU-SNMF
TDT2	CA	0.9266	0.9035	0.9187	0.9348	0.9299	0.9312	0.9133	0.9631
TDT2	NMI	0.9363	0.9197	0.9292	0.9477	0.9450	0.9464	0.9354	0.9625
Reuters-21578	CA	0.6711	0.6803	0.6786	0.6778	0.6671	0.6630	0.6749	0.6877
Reuters-21578	NMI	0.6329	0.6380	0.6383	0.6408	0.6365	0.6345	0.6389	0.6449

Table 3. The mean of CA and the mean of NMI of the SNMF algorithms on the COIL-20 dataset.

		MU-SNMF [6]	$α$ -SNMF [6]	$β$ -SNMF [6]	Newton-Like SNMF [10]	vBSUM [8]	sBSUM [8]	BCD [7]	AMU-SNMF
COIL-20	CA	0.6641	0.6527	0.6890	0.7247	0.7086	0.6657	0.6362	0.7332
COIL-20	NMI	0.7826	0.7774	0.8160	0.8492	0.8454	0.8166	0.7882	0.8554

Table 4. The mean of

F i t (G)

of the averaging approach using different SNMF algorithms.

Table 4. The mean of

F i t (G)

of the averaging approach using different SNMF algorithms.

	MU-SNMF [6]	$α$ -SNMF [6]	$β$ -SNMF [6]	Newton-Like SNMF [10]	vBSUM [8]	sBSUM [8]	BCD [7]	AMU-SNMF
free noise	0.9464	0.9471	0.9471	0.9430	0.9472	0.9460	0.9460	0.9483
SNR = 10 dB	0.7230	0.7234	0.7234	0.7228	0.7234	0.7230	0.7230	0.7243

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, P.; He, Z.; Lu, J.; Tan, B.; Bai, Y.; Tan, J.; Liu, T.; Lin, Z. An Accelerated Symmetric Nonnegative Matrix Factorization Algorithm Using Extrapolation. Symmetry 2020, 12, 1187. https://doi.org/10.3390/sym12071187

AMA Style

Wang P, He Z, Lu J, Tan B, Bai Y, Tan J, Liu T, Lin Z. An Accelerated Symmetric Nonnegative Matrix Factorization Algorithm Using Extrapolation. Symmetry. 2020; 12(7):1187. https://doi.org/10.3390/sym12071187

Chicago/Turabian Style

Wang, Peitao, Zhaoshui He, Jun Lu, Beihai Tan, YuLei Bai, Ji Tan, Taiheng Liu, and Zhijie Lin. 2020. "An Accelerated Symmetric Nonnegative Matrix Factorization Algorithm Using Extrapolation" Symmetry 12, no. 7: 1187. https://doi.org/10.3390/sym12071187

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Accelerated Symmetric Nonnegative Matrix Factorization Algorithm Using Extrapolation

Abstract

1. Introduction

2. Accelerated SNMF Algorithm

2.1. Multiplicative Update Algorithm for SNMF

2.2. Nesterov’s Accelerated Gradient

2.3. Accelerated MU-SNMF Algorithm

2.4. Symmetric Nonnegative Tensor Factorization (SNTF)

3. Experiments and Results

3.1. Synthetic Data

3.2. Document Clustering

3.3. Object Clustering

3.4. SNTF

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI