Next Article in Journal
Deep Neural Networks for Form-Finding of Tensegrity Structures
Next Article in Special Issue
A Clustering Ensemble Framework with Integration of Data Characteristics and Structure Information: A Graph Neural Networks Approach
Previous Article in Journal
Contagion Effect of Financial Markets in Crisis: An Analysis Based on the DCC–MGARCH Model
Previous Article in Special Issue
Robust Graph Neural Networks via Ensemble Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-View Graph Clustering by Adaptive Manifold Learning

1
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
2
College of Computer Science, Sichuan University, Chengdu 610065, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2022, 10(11), 1821; https://doi.org/10.3390/math10111821
Submission received: 28 April 2022 / Revised: 17 May 2022 / Accepted: 20 May 2022 / Published: 25 May 2022
(This article belongs to the Special Issue Trustworthy Graph Neural Networks: Models and Applications)

Abstract

:
Graph-oriented methods have been widely adopted in multi-view clustering because of their efficiency in learning heterogeneous relationships and complex structures hidden in data. However, existing methods are typically investigated based on a Euclidean structure instead of a more suitable manifold topological structure. Hence, it is expected that a more suitable manifold topological structure will be adopted to carry out intrinsic similarity learning. In this paper, we explore the implied adaptive manifold for multi-view graph clustering. Specifically, our model seamlessly integrates multiple adaptive graphs into a consensus graph with the manifold topological structure considered. We further manipulate the consensus graph with a useful rank constraint so that its connected components precisely correspond to distinct clusters. As a result, our model is able to directly achieve a discrete clustering result without any post-processing. In terms of the clustering results, our method achieves the best performance in 22 out of 24 cases in terms of four evaluation metrics on six datasets, which demonstrates the effectiveness of the proposed model. In terms of computational performance, our optimization algorithm is generally faster or in line with other state-of-the-art algorithms, which validates the efficiency of the proposed algorithm.

1. Introduction

In real scenarios, since data typically originates from various sources or consists of different features, a large amount of multi-view data emerges. For example, an object can be described by audio, images, video, and text; news can also be written in different languages [1]. Since the majority of the data is unlabeled, multi-view clustering, which combines the latent information from multiple views and separates the data into different categories [2,3], has become a significant application [4,5,6].
Graph-oriented learning is an efficient approach for modeling heterogeneous relationships and complex structures hidden in data and therefore has been widely adopted in multi-view clustering [7,8,9]. Among them, multi-view clustering based on the adaptive neighbor technique [10], which conducts local manifold structure learning and clustering simultaneously, has been widely utilized with superior performance. The method in [11] learns an optimized graph for each view by assigning adaptive neighbors and then integrates these graphs into a global graph in a well-designed way. To allocate each data sample to the most appropriate cluster and guarantee consistency across views, the method in [12] proposes to make all views share the same similarity matrix. The method in [13,14] learns a similarity graph for each view, and an automated weighting strategy is then adopted to combine the different views efficiently into a unified one. Since all samples in separate views have the same cluster structure, the method in [15] exploits the shared information derived from the links between the different views to obtain a better consensus clustering result. Instead of constructing a similarity graph in the original feature space, the method in [16] learns the critical graph in a spectral embedding space to eliminate the disturbance of noise and redundant information. In addition, the adaptive neighbor technique has been effectively applied to the field of incomplete multi-view clustering [17]. At the same time, there are methods designed to solve the clustering task [18,19].
Although these methods have demonstrated excellent performance, they typically work on Euclidean space and ignore the manifold topological structure, which is crucial for clustering data in the manifold. In this paper, we explicitly explore the manifold topological structure across multiple adaptive graphs by learning a consensus graph. We further manipulate the consensus graph with a useful rank constraint so that its connected components precisely correspond to distinct clusters. As a result, our model is able to directly achieve the discrete clustering result without any post-processing. Our model seamlessly accomplishes three subtasks: It constructs an adaptive graph for each view, integrates the multiple adaptive graphs into a consensus graph with the manifold topological structure considered, and allocates the discrete cluster label for each sample. By leveraging the interactions between these three subtasks in a unified framework, each subtask is iteratively boosted in a mutual reinforcement manner. An iterative updating algorithm is introduced to solve the optimization problem. Experiments on several real-world datasets demonstrate the effectiveness of the proposed model, compared to the state-of-the-art competitors, in terms of four widely used clustering evaluation metrics.
The main contributions of this work are summarized as follows:
  • The proposed multi-view graph clustering method, for the first time to the best of our knowledge, explores the topological manifold structure from multiple adaptive graphs such that the topological relevance across multiple views can be explicitly detected.
  • Essentially as an end-to-end single-stage learning paradigm, our model seamlessly achieves three subtasks: It constructs the adaptive graphs for each view, explores the topological manifold structure across multiple graphs, and allocates the discrete cluster label for each sample.
  • An iterative updating algorithm is carefully designed to solve the optimization problem. Experiments on several benchmark datasets demonstrate the effectiveness of the proposed model.
The remainder of this paper is as follows: In Section 2, we introduce the preliminary work. Section 3 describes the model in detail proposed in this paper. Section 4 introduces the solution and optimization algorithm of the model. Section 5 verifies the validity of the model proposed in this paper through experiments. We conclude the paper in Section 6.
Notations. Throughout this paper, a boldface uppercase letter, e.g., A , denotes a matrix. a i and a i j represent the i-th column and the i j -th element of A , respectively. · F denotes the Frobenius norm, and 1 is a column vector with all its elements being 1. I represents the identity matrix with proper size.

2. Preliminary Work

Previous works have proven that real-world data are usually sampled from a nonlinear low-dimensional manifold that is embedded in high-dimensional ambient space [20,21,22]. Thus, it is beneficial to reveal the manifold structure implied within the data to boost the corresponding learning performance.
To accurately measure the intrinsic similarity relationships of crowds, the authors in [23] explored the topological relationship between individuals by using a propagation-based manifold learning method. It aims to uncover the topological relevance such that the manifold topological structure can be explicitly considered. There is a simple and intuitive assumption for this consideration: the topological connectivities between distinct individuals could be propagated from near to far. In other words, the spatial similarity between two individuals may be low, but their topological relevance to each other would be high if they are linked by consecutive neighbors. Instead of only making use of the Euclidean structure, it is expected that a more suitable manifold topological structure will be adopted to carry out intrinsic similarity learning. Figure 1 visualizes the manifold topological structure learning procedure. As we can see, though the green and red samples have low similarity in terms of spatial location and velocity, they are closely connected to each other considering the high topological relevance between them. That is to say, if two samples maintain a high consistency, their topological relevance to any other sample should be similar.
Suppose a predefined similarity graph Z = [ z i j ] R n × n that depicts the intrinsic similarity relationships of data X R d × n with n samples and d features. Based on the assumption that data samples with large similarity share similar topological relevance to any other sample, the authors in [23] extract the topological relationship of data samples by solving the following problem:
min S 1 2 i , j , k = 1 n z i j s k i s k j 2 + β S I F 2 ,
where β is a balance parameter, S is the target topological relationship matrix, and s i j denotes the data sample j’s topological relevance to i. The first term in Equation (1) is a smoothness constraint that guarantees that the data samples j and k share a similar topological relationship with sample i if j and k are similar. The second term is a fitting constraint that avoids the trivial solution. Based on Equation (1), the topological consistency is propagated through neighbors with high similarities, and the distant data samples will maintain a close relationship if they are linked by consecutive neighbors. Finally, we can search the topological relationship matrix S by solving the problem defined in Equation (1).
Note that the similarity graph Z involved in Equation (1) is a fixed graph that might not be optimal for subsequent learning. More often, it is expected a similarity graph will be automatically learned from the original data. To do so, the method described in [10] was proposed to automatically learn a similarity graph for clustering tasks by assigning the adaptive and optimal neighbors for each data sample based on the local connectivity. It is based on a natural assumption that the data samples with a smaller distance should have a larger probability of being neighbors. Instead of a predefined similarity graph, an adaptive graph can be automatically learned by solving the problem as follows:
min Z i , j = 1 n x i x j 2 2 z i j + α z i j 2 s . t . z i j 0 , z i T 1 = 1 ,
where α is a trade-off parameter and can be determined according to the number of adaptive neighbors [10].
Let { X ( 1 ) , , X ( m ) | X ( v ) R n × d v } be a multi-view dataset with m views and d v features for the v-th view. Equation (2) is easily extended to a multi-view formulation as
min Z ( v ) v = 1 m i , j = 1 n x i ( v ) x j ( v ) 2 2 z i j ( v ) + α z i j ( v ) 2 s . t . z i i ( v ) 0 , z i ( v ) T 1 = 1 ,
where Z ( v ) denotes the adaptive graph learned from the v-th view.

3. The Proposed Model

Note that the formulation in Equation (1) has several drawbacks. First, if a data sample is connected with many similar neighbors, it will largely affect the objective value. Hence, it is required that a normalized version of Equation (1) be designed, such that each sample is treated equally. That is to say, a data sample with too many connections would dominate the objective function. The normalization has two main purposes: (1) it ensures that each sample is treated equally; (2) it is equivalent to a sparse constraint, i.e., a l 1 -norm on a similarity graph. Second, the learned S does not contain the explicit cluster structures, so a subsequent postprocessing step is needed to output the final discrete clustering results. It is preferred that the similarity graph and cluster structure are learned simultaneously. It sounds unrealistic to achieve such a pure structured graph. Fortunately, we can tackle this problem with a useful rank constraint. Taking the above concerns into consideration, Equation (1) can be upgraded to
min S 1 2 i , j , k = 1 n z i j s k i d i i s k j d j j 2 + β S I F 2 , s . t . s i T 1 = 1 , s i j 0 , r a n k L S = n c ,
where D = [ d i j ] R n × n is the degree matrix of Z , L S is the Laplacian matrix of S , and r a n k L S = n c is a rank constraint that guarantees that S contains exactly c connected components (c is the cluster number of data). The rank constraint has been successfully used to achieve a clear cluster structure [10,24]. Note that, unlike Equation (1), here we constrain the sum of each row of S to be one, and all elements of S are nonnegative. Finally, a structured target graph S that reveals the topological relevance can be acquired by solving Equation (4).
In this paper, we extract the topological relevance from multiple adaptive graphs such that the manifold topological structure can be explicitly detected for the clustering task. Combining Equation (4) and Equation (3), our new multi-view graph clustering model can be formulated as
min Z ( v ) , S v = 1 m i , j = 1 n x i ( v ) x j ( v ) 2 2 z i j ( v ) + α z i j ( v ) 2 + 1 2 v = 1 m μ ( v ) i , j , k = 1 n z i j ( v ) s k i d i i ( v ) s k j d j j ( v ) 2 + β S I F 2 s . t . z i ( v ) T 1 = 1 , z i j ( v ) 0 , s i T 1 = 1 , s i j 0 , r a n k L S = n c ,
where μ ( v ) denotes the weight of the v-th view and can be determined by an inverse distance weighting strategy, i.e.,
μ ( v ) = 1 / 2 i , j , k = 1 n z i j ( v ) s k i d i i ( v ) s k j d j j ( v ) 2 .
Note that Equation (6) is essentially a kind of auto-weighted strategy, which has been widely utilized in previous works [7,25,26]. To further verify the effect of an auto-weighting strategy, we will conduct an ablation study in a later section.
Note that, if we denote σ i L S as the i-th smallest eigenvalue of L S , the rank constraint r a n k L S = n c would be satisfied if i = 1 k σ i L S = 0 since L S is a positive semidefinite matrix. According to Ky Fan’s Theorem [27] that i = 1 c σ i L S = min F Tr F T L S F , we can incorporate the rank constraint term into the objective function, and finally we arrive at
min Z ( v ) , S , F v = 1 m i , j = 1 n x i ( v ) x j ( v ) 2 2 z i j ( v ) + α z i j ( v ) 2 + 1 2 v = 1 m μ ( v ) i , j , k = 1 n z i j ( v ) s k i d i i ( v ) s k j d j j ( v ) 2 + β S I F 2 + 2 γ Tr F T L S F s . t . z i ( v ) T 1 = 1 , z i j ( v ) 0 , s i T 1 = 1 , s i j 0 , F R n × c , F T F = I ,
where γ is a self-adjusted parameter, and F denotes the cluster indicator matrix. When γ is large enough, the optimal solution S for Equation (7) will enforce the last term Tr F T L S F , i.e., i = 1 c σ i L S , to be zero. Thus, the constraint r a n k L S = n c in Equation (7) could be satisfied. Moreover, according to [26], γ can be tuned in a heuristic way: initialize γ to a positive value (e.g., γ = 10 ) and we then automatically halve (i.e., γ γ 2 ) or double (i.e., γ 2 γ ) it when its number of connected components is greater or smaller than the cluster number c in each iteration. In this way, the target graph S will be modified until it contains precisely c connected components.

4. Optimization

Since the optimization problem in Equation (7) is not jointly convex in all variables, we optimize it with respect to one variable while fixing other variables.

4.1. Update Z ( v )

The optimization problem for Z ( v ) can be stated as
min Z ( v ) v = 1 m i , j = 1 n x i ( v ) x j ( v ) 2 2 z i j ( v ) + α z i j ( v ) 2 + 1 2 v = 1 m μ ( v ) i , j , k = 1 n z i j ( v ) s k i d i i ( v ) s k j d j j ( v ) 2 s . t . z i ( v ) T 1 = 1 , z i j ( v ) 0 .
Since Equation (8) is independent for different v, for a specific v, we need to solve
min Z ( v ) i , j = 1 n h i j x z i j ( v ) + α z i j ( v ) 2 + 1 2 μ ( v ) h i j s z i j ( v ) s . t . z i ( v ) T 1 = 1 , z i j ( v ) 0 ,
where h i j x = x i ( v ) x j ( v ) 2 2 , and h i j s = s i d i i ( v ) s j d j j ( v ) 2 2 . Note that, for each i, Equation (9) can be transformed into a compact vector form as
min z i ( v ) T 1 = 1 , z i j ( v ) 0 z i ( v ) + 1 2 α h i 2 2 ,
where h i is a vector with its j-th element being
h i j = h i j x + 1 2 μ ( v ) h i j s .
Note that Equation (10) has a closed-form solution according to [28].

4.2. Update S

The optimization problem with respect to S can be denoted as
min S 1 2 v = 1 m μ ( v ) i , j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 + β S I F 2 + 2 γ Tr F T L S F s . t . s i T 1 = 1 , s i j 0 .
It is obvious that Equation (12) can be rewritten as
min S i = 1 n v = 1 m 1 2 μ ( v ) j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 + β j = 1 n s i j e i j 2 + γ j = 1 n f i f j 2 2 s i j s . t . s i T 1 = 1 , s i j 0 ,
where e i j is the i j -th element of the identity matrix I .
Note that Equation (13) is independent for different i, so we have
min s i j 0 , s i T 1 = 1 s i T v = 1 m μ ( v ) I D ( v ) 1 2 Z ( v ) D ( v ) 1 2 s i + β s i e i 2 2 + s i T u i ,
where u i represents a vector with its j-th element u i j = γ f i f j 2 2 .
Denote A = v = 1 m μ ( v ) I D ( v ) 1 2 Z ( v ) D ( v ) 1 2 + β I and b = 2 β e i u i . Equation (14) can then be stated as
min s i j 0 , s i T 1 = 1 s i T A s i s i T b .
Equation (15) turns out to be a quadratic convex optimization problem, which can be tackled by the augmented Lagrangian multiplier (ALM) strategy [29]. According to the ALM, the counterpart of Equation (15) is defined as
min s i j 0 , s i T 1 = 1 , p = s i s i T A p s i T b ,
where p is the Lagrangian multiplier. Thus, the augmented Lagrangian function of Equation (16) can be represented as
min s i j 0 , s i T 1 = 1 , p s i T A p s i T b + η 2 s i p + 1 η q 2 2 ,
where η and q are the corresponding penalty coefficient and parameter, respectively.
To solve Equation (17), we can update p and s i iteratively:
(1) Update p with a fixed s i . The Lagrange function of Equation (17) with respect to p can be formulated as
L p = s i T A p + η 2 s i p + 1 η q 2 2 .
Taking the derivative of L p with respect to p and setting the derivative to zero, i.e.,
L p p = 0 ,
we obtain the following solution:
p = s i 1 η A T s i + q .
(2) Update s i with a fixed p . The Lagrange function of Equation (17) with respect to s i is
min s i j 0 , s i T 1 = 1 s i p + 1 η q + A p b η 2 2 ,
which has a similar form to Equation (10) and thus can be solved effectively. Note that η is exaggerated increasingly during each iteration, and q can be updated by q q + η s i p . The optimization routine for Equation (17) is outlined in Algorithm 1.
Algorithm 1: Algorithm to solve Equation (17).
Require: a nonzero matrix A and a nonzero vector b.
  Set 1 < ρ < 2, initialize η > 0, q.
Ensure: S.
 1: repeat
 2:  Update p according to (20).
 3:  Update si according to (21).
 4:  Update ηρη.
 5:  Update qq + η (sip).
 1: until converge

4.3. Update F

The optimal solution of F can be searched by solving
min F R n × c , F T F = I Tr F T L S F ,
which is a classical problem in spectral theory, and the solution is formed by the c eigenvectors of L S corresponding to the c smallest eigenvalues.
Up to now, the overall algorithm to solve the proposed objective function in Equation (7) is summarized in Algorithm 2. We will provide the time complexity and convergence analysis of Algorithm 2 in a later section.
Algorithm 2: The algorithm for our model.
Require: multi-view data { X ( 1 ) , X ( 2 ) , , X ( m ) } with m views, cluster number c, parameters α and β .
  Initialize the weight of each view μ ( v ) = 1 m .
  Initialize Z ( v ) for each view by (2).
  Initialize the consensus graph S = v = 1 m μ ( v ) Z ( v ) .
Ensure: The target S R n × n with exactly c connected components, and the cluster indicator matrix F.
 1: repeat
 2:  Update Z ( v ) according to Equation (10).
 3:  Update S by Algorithm 1.
 4:  Update F according to Equation (22).
 5:  Update μ ( v ) according to Equation (6).
 6: Until converge

4.4. Time Complexity Analysis

As shown in Algorithm 2, there are four steps that mainly determine the complexity of our model. Recall that n, m, and c denote the number of data points, views, and clusters, respectively. t denotes the number of iterations. We summarize the computational complexity of each step in Table 1.
In practice, we have m n and c n , so the overall complexity is O n 2 t , which is in line with the classical graph-based methods and hence is acceptable.

4.5. Convergence Analysis

In Algorithm 2, we can obtain the closed-form solutions with respect to Z ( v ) , μ ( v ) , and F , as described in the main paper. ALM optimization theory [30] guarantees that the iterations will make the optimization process converge. In a word, the updating of variables Z ( v ) , μ ( v ) , and F with iterative optimization steps will monotonically decrease toward the lower bound of the objective function in (7). Hence, we only need to prove the convergence of (12) with respect to S . We will show that our algorithm can find a local optimal solution. Before we prove its convergence, first we introduce an important lemma as follows [31]:
Lemma 1.
For any positive real number q and t, the following inequality holds:
q q 2 t t t 2 t .
Proof. 
It is obvious that the inequality ( q t ) 2 0 , so we have ( q t ) 2 0 q 2 q t + t 0 q q 2 t t 2 q q 2 t t t 2 t which completes the proof. □
Theorem 1.
In each iteration, the updated S will monotonically decrease the objective in Equation (12), which generally makes the solution converge to the local optimum of Equation (12).
Proof. 
Suppose the alternatively updated S is S ˜ in each iteration. By solving Equation (17), we obtain
S ˜ = arg min S 1 2 v = 1 m μ ( v ) i , j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 + β S I F 2 .
According to Equation (6), i.e., μ ( v ) = 1 / 2 i , j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 , we obtain
1 2 v = 1 m i , j , k = 1 n z j k ( v ) s ˜ i j d j j ( v ) s ˜ i k d k k ( v ) 2 2 i , j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 + G S ˜ 1 2 v = 1 m i , j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 2 i , j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 + G S ,
where G S = β S I F 2 .
Based on Lemma 1, we have
1 2 v = 1 m i , j , k = 1 n z j k ( v ) s ˜ i j d j j ( v ) s ˜ i k d k k ( v ) 2 1 2 v = 1 m i , j , k = 1 n z j k ( v ) s ˜ i j d j j ( v ) s ˜ i k d k k ( v ) 2 2 i , j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 1 2 v = 1 m i , j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 1 2 v = 1 m i , j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 2 i , j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 .
By summing over the above two equations in the two sides, we obtain
1 2 v = 1 m i , j , k = 1 n z j k ( v ) s ˜ i j d j j ( v ) s ˜ i k d k k ( v ) 2 + G S ˜ 1 2 v = 1 m i , j , k = 1 n z j k ( v ) s i j d j j ( v ) s i k d k k ( v ) 2 + G S ,
which completes the proof. That is, the objective function value will monotonically decrease in each iteration of updating S . □

5. Experiments

To verify the efficiency of our proposed method, we compare it with the following state-of-the-art methods: self-weighted multi-view clustering (SwMC) [7], multi-view learning with adaptive neighbours (MLAN) [12], multi-view clustering via adaptively weighted procrustes (AWP) [32], weighted multi-view spectral clustering (WMSC) [33], multi-view consensus graph clustering (MCGC) [15], consistent and specific multi-view subspace clustering (CSMSC) [34], graph-based multi-view clustering (GMC) [14], consensus one-step multi-view subspace clustering (COMSC) [35], and multi-view clustering via consensus graph learning (CGL) [16].
We conduct the experiments on several prevalent datasets, namely, 3Sources, MSRC, BBCSport, COIL-20, Caltech-7, and Caltech-20. The detailed information of all datasets is as follows: 3Sources (http://mlg.ucd.ie/datasets/3sources.html (accessed on 26 April 2022)) is collected from three news sources, i.e., Reuters, BBC, and The Guardian. There are 948 news articles covering 416 different news stories. Among them, 169 news stories were reported in all three sources, and each story was annotated with one of six topical labels: business, health, politics, entertainment, sport, and technology. MSRC is comprised of 240 images in eight classes. We selected seven classes with each class containing 30 images. For each image, five visual features are extracted for a comprehensive description. BBCSport is a sports news dataset that consists of 544 articles in 5 areas with 2 views. The 2 views are 3183 dimension MTX features and 3203 dimension TERMS features, respectively. COIL-20 is a subset of an object database (http://www.cs.columbia.edu/CAVE/software/softlib/coil-100.php (accessed on 26 April 2022)) that includes 100 categories. Images of each object were taken five degrees apart as the object was rotated on a turntable, and each object has 72 gray images. Each image is described by three types of features. Caltech101 is an object recognition dataset with 101 categories. This dataset is represented by 6 types of features. Following [12], we selected 1474 images within 7 classes (Caltech-7) and 2386 images within 20 classes (Caltech-20).
The specific characteristics of the datasets are given in Table 2.
The parameters for comparison algorithms were set according to the recommendations in their corresponding paper. The parameter settings of our model will be introduced later. All algorithms were repeated 10 times, and the average results are presented.
To achieve a comprehensive evaluation, four widely used metrics (clustering accuracy (ACC), normalized mutual information (NMI), purity, and F-score) are adopted in this paper.

5.1. Clustering Results

The clustering performance of different methods on all datasets is reported in Table 3. Note that the best performance is in bold, and the second-best performance is underlined. As we can see, our method outperforms other methods in most cases, which verifies the effectiveness of the proposed model. Specifically, our method outperforms the most competitive competitors by 1.5%, 12.2%, 3.1%, 1.1%, 15.2%, and 24.5%, respectively, on different datasets in terms of ACC. In terms of the F-score, the number becomes 2.8%, 17.3%, 4.9%, 1.0%, 15.0%, and 4.4%, respectively. With other clustering metrics, it can also be observed that the improvement is notable.
It is worth noting that MLAN, MCGC, GMC, and CGL are all multi-view clustering methods that involve the adaptive graph. From the results, we can say the proposed method that adopts the manifold topological structure has a superior performance, which further illustrates the effectiveness of adaptive manifold learning. Taking the datasets COIL-20 and BBCSport as examples, we carried out the t-SNE [36] to visualize the clustering results. As shown in Figure 2 and Figure 3, our model obviously achieves a much clearer clustering structure with better separability, and the gap between different clusters is evident, which further validates that our model can better uncover the intrinsic structure of the data. Considering that real-world data are usually sampled from a nonlinear low-dimensional manifold, our method is able to achieve a promising clustering performance in the general case.

5.2. Sensitivity Analysis

We showcase the sensitivity of the proposed model with respect to different parameter settings. As described before, γ is a self-adjusted parameter and can be tuned in a heuristic way. Thus, we only need to search the parameters α and β properly. As mentioned in Equation (2), α can be determined according to the number of adaptive neighbors [10]. In this paper, we empirically search the adaptive neighbors k in the range [10,15,20,25,30,35,40] and β in the range [0.1,0.5,1,5,10,50,100]. The clustering results of all datasets are plotted in Figure 4. It is obvious that our performance is relatively stable under a wide range of parameter settings, which pinpoints the robustness of our model. Generally speaking, we can expect a promising clustering performance when β varies from 0.5 to 10, and k from 15 to 30, respectively.

5.3. Ablation Study

In order to verify the importance of the auto-weight learning strategy in Equation (6), we tested a special case where all views are of the same importance to the clustering task, i.e., μ ( v ) = 1 m , denoted as the simple version of our model. As shown in Figure 5, we report the clustering performance of two evaluation indexes (ACC and NMI) on six datasets, respectively. We can see that the experimental results of the auto-weight learning strategy (as described in Equation (6) are superior to those obtained by the simple average weighting strategy. Moreover, it can be seen that the overall improvement is remarkable, which obviously showcases the effectiveness of our auto-weight learning strategy.

5.4. Computational Performance

Given the computational complexity of our algorithm theoretical analyzed above, here, we empirically compare the computational speed of our method with other multi-view graph clustering approaches. The computational time of all algorithms on a machine with 2.60 GHz Intel Xeon Gold 6240 CPU and 256 GB RAM is shown in Table 4. We see that COMSC and SwMC are the two timesaving-most algorithms, especially COMSC, which is slower than other methods by nearly two orders of magnitude. Generally speaking, our algorithm is faster than COMSC and SwMC and in line with other methods, which validates the efficiency of the proposed algorithm.

5.5. Convergence Study

The convergence property of our algorithm was theoretically analyzed previously. Here, we empirically validate the convergence speed of the proposed algorithm. The convergence curves along with the clustering results of our algorithm are shown in Figure 6, where the orange line represents the ACC of the proposed model, and the blue line denotes the value of the objective function. We can observe that the iterative updating algorithm converges very fast, which implies the efficiency of our algorithm.

6. Conclusions

In this paper, we explore the implied adaptive manifold for multi-view graph clustering. Our model seamlessly integrates multiple adaptive graphs into a consensus graph with the manifold topological structure considered. In addition, we manipulate the consensus graph with a useful rank constraint so that its connected components precisely correspond to distinct clusters. As a result, our model is able to directly achieve the discrete clustering result without any post-processing. An alternating iterative algorithm is introduced to solve the optimization problem. Experiments on several benchmark datasets illustrate the effectiveness of the proposed model, compared to the state-of-the-art algorithms in terms of four clustering evaluation metrics. In detail, the experimental results have shown that (1) our model achieved the best results in the majority of cases, (2) the proposed model is very stable across a wide range of parameter settings, and (3) the designed optimization algorithm is very efficient and converges fast. However, our model cannot deal with nonlinear data, which can be considered in future work. Furthermore, we are also interested in extending the proposed framework to other machine learning applications, such as medical data analysis and genetic data analysis.

Author Contributions

Conceptualization, P.Z., H.W. and S.H.; investigation, H.W.; methodology, P.Z. and S.H.; software, P.Z. and H.W.; supervision, S.H.; validation, P.Z., H.W. and S.H.; writing—original draft, P.Z.; writing—review & editing, P.Z., H.W. and S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Natural Science Foundation of China under Grant 62106164, the Sichuan Science and Technology Program under Grant 2021ZDZX0011, the Sichuan Key Project of Research and Development Program under Grant 2022YFG0188, and the 111 Project under Grant B21044.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare that there is no conflict of interest. The funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

References

  1. Tang, C.; Zhu, X.; Liu, X.; Li, M.; Wang, P.; Zhang, C.; Wang, L. Learning a Joint Affinity Graph for Multiview Subspace Clustering. IEEE Trans. Multimed. 2019, 21, 1724–1736. [Google Scholar] [CrossRef]
  2. Zhang, C.; Fu, H.; Hu, Q.; Cao, X.; Xie, Y.; Tao, D.; Xu, D. Generalized latent multi-view subspace clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 86–99. [Google Scholar] [CrossRef] [PubMed]
  3. Fan, S.; Wang, X.; Shi, C.; Lu, E.; Lin, K.; Wang, B. One2multi graph autoencoder for multi-view graph clustering. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 3070–3076. [Google Scholar]
  4. Liu, J.; Liu, X.; Yang, Y.; Guo, X.; Kloft, M.; He, L. Multiview Subspace Clustering via Co-Training Robust Data Representation. IEEE Trans. Neural Networks Learn. Syst. 2021, 1–13. [Google Scholar] [CrossRef] [PubMed]
  5. Jiang, R.; Yin, D.; Wang, Z.; Wang, Y.; Deng, J.; Liu, H.; Cai, Z.; Deng, J.; Song, X.; Shibasaki, R. DL-Traff: Survey and Benchmark of Deep Learning Models for Urban Traffic Prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Gold Coast, Australia, 1–5 November 2021; pp. 4515–4525. [Google Scholar]
  6. Wu, H.; Lv, J.; Wang, J. Automatic Cataract Detection with Multi-Task Learning. In Proceedings of the International Joint Conference on Neural Networks, Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar]
  7. Nie, F.; Li, J.; Li, X. Self-weighted multiview clustering with multiple graphs. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; pp. 2564–2570. [Google Scholar]
  8. Wang, X.; Fan, S.; Kuang, K.; Shi, C.; Liu, J.; Wang, B. Decorrelated clustering with data selection bias. In Proceedings of the IJCAI, Yokohama, Japan, 11–17 July 2020. [Google Scholar]
  9. Kang, Z.; Shi, G.; Huang, S.; Chen, W.; Pu, X.; Zhou, J.T.; Xu, Z. Multi-graph fusion for multi-view spectral clustering. Knowl.-Based Syst. 2020, 189, 105102. [Google Scholar] [CrossRef] [Green Version]
  10. Nie, F.; Wang, X.; Huang, H. Clustering and projected clustering with adaptive neighbors. In Proceedings of the KDD, New York, NY, USA, 24–27 August 2014; pp. 977–986. [Google Scholar]
  11. Zhan, K.; Zhang, C.; Guan, J.; Wang, J. Graph learning for multiview clustering. IEEE Trans. Cybern. 2017, 48, 2887–2895. [Google Scholar] [CrossRef] [PubMed]
  12. Nie, F.; Cai, G.; Li, X. Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours. In Proceedings of the AAAI, San Francisco, CA, USA, 4–9 February 2017; pp. 2408–2414. [Google Scholar]
  13. Zhuge, W.; Nie, F.; Hou, C.; Yi, D. Unsupervised single and multiple views feature extraction with structured graph. IEEE Trans. Knowl. Data Eng. 2017, 29, 2347–2359. [Google Scholar] [CrossRef]
  14. Wang, H.; Yang, Y.; Liu, B. GMC: Graph-based Multi-view Clustering. IEEE Trans. Knowl. Data Eng. 2020, 32, 1116–1129. [Google Scholar] [CrossRef]
  15. Zhan, K.; Nie, F.; Wang, J.; Yang, Y. Multiview consensus graph clustering. IEEE Trans. Image Process. 2018, 28, 1261–1270. [Google Scholar] [CrossRef] [PubMed]
  16. Li, Z.; Tang, C.; Liu, X.; Zheng, X.; Yue, G.; Zhang, W.; Zhu, E. Consensus Graph Learning for Multi-view Clustering. IEEE Trans. Multimed. 2021, 24, 2461–2472. [Google Scholar] [CrossRef]
  17. Wen, J.; Yan, K.; Zhang, Z.; Xu, Y.; Wang, J.; Fei, L.; Zhang, B. Adaptive graph completion based incomplete multi-view clustering. IEEE Trans. Multimed. 2020, 23, 2493–2504. [Google Scholar] [CrossRef]
  18. Tkachenko, R.; Izonin, I. Model and principles for the implementation of neural-like structures based on geometric data transformations. In Proceedings of the International Conference on Computer Science, Engineering and Education Applications, Kiev, Ukraine, 18–20 January 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 578–587. [Google Scholar]
  19. Tkachenko, R. An Integral Software Solution of the SGTM Neural-Like Structures Implementation for Solving Different Data Mining Tasks. In Proceedings of the International Scientific Conference “Intellectual Systems of Decision Making and Problem of Computational Intelligence”, Zalizny Port, Ukraine, 24–28 May 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 696–713. [Google Scholar]
  20. Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Zhang, Z.; Wang, J.; Zha, H. Adaptive manifold learning. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 253–265. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Minh, H.Q.; Bazzani, L.; Murino, V. A unifying framework in vector-valued reproducing kernel hilbert spaces for manifold regularization and co-regularized multi-view learning. J. Mach. Learn. Res. 2016, 17, 1–72. [Google Scholar]
  23. Wang, Q.; Chen, M.; Li, X. Quantifying and detecting collective motion by manifold learning. In Proceedings of the AAAI, San Francisco, CA, USA, 4–9 February 2017; pp. 4292–4298. [Google Scholar]
  24. Huang, S.; Tsang, I.; Xu, Z.; Lv, J.C. Measuring Diversity in Graph Learning: A Unified Framework for Structured Multi-view Clustering. IEEE Trans. Knowl. Data Eng. 2021, 1–15. [Google Scholar] [CrossRef]
  25. Nie, F.; Li, J.; Li, X. Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification. In Proceedings of the IJCAI, New York, NY, USA, 9–15 July 2016; pp. 1881–1887. [Google Scholar]
  26. Huang, S.; Kang, Z.; Tsang, I.W.; Xu, Z. Auto-weighted multi-view clustering via kernelized graph learning. Pattern Recognit. 2019, 88, 174–184. [Google Scholar] [CrossRef]
  27. Fan, K. On a theorem of Weyl concerning eigenvalues of linear transformations I. Proc. Natl. Acad. Sci. USA 1949, 35, 652–655. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Huang, J.; Nie, F.; Huang, H. A new simplex sparse learning model to measure data similarity for clustering. In Proceedings of the IJCAI, Buenos Aires, Argentina, 25–31 July 2015; pp. 3569–3575. [Google Scholar]
  29. Bertsekas, D.P. Nonlinear programming. J. Oper. Res. Soc. 1997, 48, 334. [Google Scholar] [CrossRef]
  30. Bazaraa, M.S.; Sherali, H.D.; Shetty, C.M. Nonlinear Programming: Theory and Algorithms; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  31. Nie, F.; Huang, H.; Cai, X.; Ding, C.H. Efficient and robust feature selection via joint 2,1-norms minimization. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2010; pp. 1813–1821. [Google Scholar]
  32. Nie, F.; Tian, L.; Li, X. Multiview clustering via adaptively weighted procrustes. In Proceedings of the KDD, London, UK, 19–23 August 2018; pp. 2022–2030. [Google Scholar]
  33. Zong, L.; Zhang, X.; Liu, X.; Yu, H. Weighted multi-view spectral clustering based on spectral perturbation. In Proceedings of the AAAI, New Orleans, LA, USA, 2–7 February 2018; pp. 4621–4628. [Google Scholar]
  34. Luo, S.; Zhang, C.; Zhang, W.; Cao, X. Consistent and Specific Multi-View Subspace Clustering. In Proceedings of the AAAI, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
  35. Zhang, P.; Liu, X.; Xiong, J.; Zhou, S.; Zhao, W.; Zhu, E.; Cai, Z. Consensus One-step Multi-view Subspace Clustering. IEEE Trans. Knowl. Data Eng. 2020. [Google Scholar] [CrossRef]
  36. Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Figure 1. Illustration of topological relevance. The red data sample and the green data sample show a low similarity on spatial velocity, but they maintain high topological relevance to each other.
Figure 1. Illustration of topological relevance. The red data sample and the green data sample show a low similarity on spatial velocity, but they maintain high topological relevance to each other.
Mathematics 10 01821 g001
Figure 2. Visualization of the clustering results of the first 5 classes of COIL-20 with t-SNE. (a) MLAN. (b) MCGC. (c) GMC. (d) CGL. (e) Ours.
Figure 2. Visualization of the clustering results of the first 5 classes of COIL-20 with t-SNE. (a) MLAN. (b) MCGC. (c) GMC. (d) CGL. (e) Ours.
Mathematics 10 01821 g002
Figure 3. Visualization of the clustering results of the first 5 classes of BBCSport with t-SNE. (a) MLAN. (b) MCGC. (c) GMC. (d) CGL. (e) Ours.
Figure 3. Visualization of the clustering results of the first 5 classes of BBCSport with t-SNE. (a) MLAN. (b) MCGC. (c) GMC. (d) CGL. (e) Ours.
Mathematics 10 01821 g003
Figure 4. Accuracy (ACC) with respect to β and the number of neighbors (k). (a) 3Sources. (b) MSRC. (c) BBCSport. (d) COIL-20. (e) Caltech-7. (f) Caltech-20.
Figure 4. Accuracy (ACC) with respect to β and the number of neighbors (k). (a) 3Sources. (b) MSRC. (c) BBCSport. (d) COIL-20. (e) Caltech-7. (f) Caltech-20.
Mathematics 10 01821 g004
Figure 5. Auto-weight learning strategy (adopted in Our original model) vs. average weighting strategy (denoted by Ours_smp). (a) ACC. (b) NMI.
Figure 5. Auto-weight learning strategy (adopted in Our original model) vs. average weighting strategy (denoted by Ours_smp). (a) ACC. (b) NMI.
Mathematics 10 01821 g005
Figure 6. Convergence analysis of the proposed method, where OBJ denotes the objective value. (a) 3Sources. (b) MSRC. (c) BBCSport. (d) COIL-20. (e) Caltech-7. (f) Caltech-20.
Figure 6. Convergence analysis of the proposed method, where OBJ denotes the objective value. (a) 3Sources. (b) MSRC. (c) BBCSport. (d) COIL-20. (e) Caltech-7. (f) Caltech-20.
Mathematics 10 01821 g006
Table 1. Details of computational complexity.
Table 1. Details of computational complexity.
StepsCalculationComplexity
Equation (10)update Z ( v ) O n m c
Algorithm 1update S O n m 2 c + n m c
Equation (22)c eigenvectors of L S O n 2 c
Equation (6)view weight μ ( v ) O n 2 m c
Total O n 2 t
Table 2. Characteristics of all datasets.
Table 2. Characteristics of all datasets.
Data Setnmc d 1 d 2 d 3 d 4 d 5 d 6
3Sources16936356036313068
MSRC2105724576512256254
BBCSport5442531833203
COIL-201440320102433046750
Caltech-714746748402541984512928
Caltech-20238662048402541984512928
n, m, and c denote the number of samples, views, and clusters, respectively. dv denotes the dimensionality of the features in the v-th view.
Table 3. Clustering results of all methods on different datasets (%). The best performance is bolded, and the second best performance is underlined.
Table 3. Clustering results of all methods on different datasets (%). The best performance is bolded, and the second best performance is underlined.
DatasetSwMCMLANAWPWMSCMCGCCSMSCGMCCOMSCCGLOurs
ACC
3Sources39.6479.8854.4457.8156.8078.2869.2357.2256.2781.06
MSRC72.3851.4863.3368.8174.8680.4874.7680.4881.9091.90
BBCSport35.8587.5059.7435.2947.1382.3580.7047.5079.7890.25
COIL-2086.3975.2866.4676.0877.2272.9479.1065.4256.5787.22
Caltech-740.7763.3058.9638.8955.2263.4969.2058.3061.4779.71
Caltech-2055.1153.9151.5533.4347.5347.2245.6442.1754.8268.60
NMI
3Sources11.8164.1245.8849.2034.2170.7154.8039.7061.9171.23
MSRC71.7845.2454.8859.3070.1871.4374.2169.9174.4384.02
BBCSport11.5176.5243.0620.9422.0168.2272.2627.8871.9279.37
COIL-2094.2983.0678.2483.9089.8683.6191.8976.5476.0794.49
Caltech-723.3854.6646.2528.0747.0054.3560.5637.2854.1667.75
Caltech-2047.0259.0057.9041.6654.5758.1738.4642.9760.6261.43
Purity
3Sources44.3880.4763.3171.5465.0983.6174.5664.9776.3384.61
MSRC77.1452.4363.3371.1980.9580.4879.0579.5281.9091.90
BBCSport36.5887.5066.5442.1047.1382.3584.3852.0683.6494.11
COIL-2089.8676.4168.1377.7682.9276.0184.7971.8662.5190.00
Caltech-756.2488.7483.0479.5882.9787.8488.4776.0786.5088.60
Caltech-2067.9877.1273.3967.1968.6578.1155.4962.3778.9277.66
F-score
3Sources35.9572.6742.4650.6651.5873.0960.4751.8855.5675.10
MSRC66.1946.4753.7657.2969.6870.1372.4668.7071.8784.33
BBCSport38.3584.2747.4230.2348.7074.0979.4341.2178.3288.38
COIL-2084.4471.6863.1572.8075.5769.8479.2057.1251.6985.26
Caltech-745.9861.8761.8337.7658.7863.9172.1752.2162.6382.98
Caltech-2038.9845.2653.4330.2140.1742.2434.0336.2649.3455.79
Table 4. Comparison of computational time (seconds).
Table 4. Comparison of computational time (seconds).
DatasetSwMCMLANAWPWMSCMCGCCSMSCGMCCOMSCCGLOurs
3Sources0.510.110.080.160.500.190.2227.290.340.66
MSRC1.050.120.090.190.220.080.2929.310.580.84
BBCSport2.670.900.450.611.000.850.76103.041.661.99
COIL-20192.2513.488.4112.9816.039.8511.152015.4726.0810.94
Caltech-7143.4513.1516.6050.0117.316.7812.623493.8754.1619.89
Caltech-20679.3466.7574.00224.9178.7627.2839.9412484.65201.2075.11
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhao, P.; Wu, H.; Huang, S. Multi-View Graph Clustering by Adaptive Manifold Learning. Mathematics 2022, 10, 1821. https://doi.org/10.3390/math10111821

AMA Style

Zhao P, Wu H, Huang S. Multi-View Graph Clustering by Adaptive Manifold Learning. Mathematics. 2022; 10(11):1821. https://doi.org/10.3390/math10111821

Chicago/Turabian Style

Zhao, Peng, Hongjie Wu, and Shudong Huang. 2022. "Multi-View Graph Clustering by Adaptive Manifold Learning" Mathematics 10, no. 11: 1821. https://doi.org/10.3390/math10111821

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop