Next Article in Journal
The Cauchy Distribution in Information Theory
Previous Article in Journal
Hellinger Information Matrix and Hellinger Priors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimating Mixed Memberships in Directed Networks by Spectral Clustering

School of Mathematics, China University of Mining and Technology, Xuzhou 221116, China
Entropy 2023, 25(2), 345; https://doi.org/10.3390/e25020345
Submission received: 29 December 2022 / Revised: 4 February 2023 / Accepted: 10 February 2023 / Published: 13 February 2023
(This article belongs to the Section Complexity)

Abstract

:
Community detection is an important and powerful way to understand the latent structure of complex networks in social network analysis. This paper considers the problem of estimating community memberships of nodes in a directed network, where a node may belong to multiple communities. For such a directed network, existing models either assume that each node belongs solely to one community or ignore variation in node degree. Here, a directed degree corrected mixed membership (DiDCMM) model is proposed by considering degree heterogeneity. An efficient spectral clustering algorithm with a theoretical guarantee of consistent estimation is designed to fit DiDCMM. We apply our algorithm to a small scale of computer-generated directed networks and several real-world directed networks.

1. Introduction

Many real-world complex networks have community structure such that nodes within the same community (also known as cluster or module) have more links than across communities. For example, in social networks, communities can be groups of students in the same department; in co-authorship networks, a community can be formed by researchers in the same field. However, community structure for a real-world network is usually not directly observable. To process this problem, community detection, also known as graph clustering, is a popular tool for uncovering a latent community structure in a network [1,2]. For decades, many community detection methods have been proposed for non-overlapping undirected networks in which each node belongs to a single community, and the interactions between two nodes are symmetric or undirected. The stochastic block model (SBM) [3] is a popular generative model for non-overlapping undirected networks. In SBM, it is assumed that each node only belongs to one community and that nodes in the same community have the same expectation degrees. Ref. [4] proposes the classical degree corrected stochastic block model (DCSBM) which extends SBM by considering variation in node degree. In recent years, numerous algorithms have been developed to estimate node community for non-overlapping undirected networks generated from SBM and DCSBM, see [5,6,7,8,9,10,11,12,13,14,15]. For recent developments about SBM, see the wonderful review paper [16].
However, in most real-world networks, a node may belong to more than one community at a time. In recent years, the problem of estimating mixed memberships for the undirected network has received a lot of attention [17,18,19,20,21,22,23,24,25,26,27,28,29], and references therein. Ref. [17] extends the SBM model from non-overlapping undirected networks to mixed membership undirected networks and designs the mixed membership stochastic block (MMSB) model. Based on the MMSB model, ref. [24] designs a model called the degree corrected mixed membership (DCMM) model by considering degree heterogeneity, where DCMM can also be seen as an extension of the non-overlapping model DCSBM, and ref. [24] also develops an efficient and provably consistent spectral algorithm. Ref. [27] presents a spectral algorithm under MMSB and establishes per-node rates for mixed memberships by sharp row-wise eigenvector deviation. Ref. [29] proposes an overlapping continuous community assignment model (OCCAM), which is also an extension of MMSB, by considering degree heterogeneity. To fit OCCAM, ref. [29] develops a spectral algorithm requiring a relatively small fraction of mixed nodes when building theoretical frameworks. Ref. [26] finds the cone structure inherent in the normalization of the eigen-decomposition of the population adjacency matrix under DCMM and develops a spectral algorithm to hunt corners in the cone structure.
Though the above works are encouraging and appealing, they focus on undirected networks. In reality, there exist substantial directed networks, such as citation networks, protein–protein interaction networks, and the hyperlink network of websites. In recent years, a lot of works with encouraging results have been developed for directed networks. Ref. [30] proposes a stochastic co-block model (ScBM) and its extension DC-ScBM by considering degree heterogeneity to model non-overlapping directed networks, where ScBM and DC-ScBM can model directed networks whose row nodes may be different from column nodes, and the number of row communities may also be different from the number of column communities. Ref. [31] studies the theoretical guarantees for the algorithm DSCORE [32] and its variants designed under DC-ScBM. Ref. [33] studies the spectral clustering algorithms designed by a data-driven regularization of the adjacency matrix under ScBM. Ref. [34] studies higher-order spectral clustering of directed graphs by designing a nearly linear time algorithm. Based on the fact that the above works only consider non-overlapping directed networks, ref. [35] develops a directed mixed membership stochastic block model (DiMMSB), which is an extension of ScBM, and models directed networks with mixed memberships. DiMMSB can also be seen as a direct extension of MMSB from an undirected network to a directed network.
Recall that DCSBM, DCMM, and DCScBM are extensions of SBM, MMSB, and ScBM by considering node degree variation, respectively, this paper aims at proposing a model as an extension of DiMMSB by considering node degree heterogeneity and building an efficient spectral algorithm to fit the proposed model. In this paper, we focus on the directed network with mixed membership. Our contributions are as follows:
(i)
We propose a novel generative model for directed networks with a mixed membership, the directed degree corrected mixed membership (DiDCMM) model. DiDCMM models a directed network with mixed memberships when row nodes have degree heterogeneities, while column nodes do not. We present the identifiability of DiDCMM under popular conditions which are also required by models modeling mixed membership networks when considering degree heterogeneity. Meanwhile, our results also show that modeling a directed network with mixed membership when considering degree heterogeneity for both row and column nodes needs nontrivial conditions. DiDCMM can be seen as an extension of the DCScBM model from a non-overlapping directed network to an overlapping directed network. DiDCMM also extends the DCMM model from an undirected network to a directed network and extends the DiMMSB model by considering node degree heterogeneity. For a detailed comparison of our DiDCMM with previous models, see Remark 2.
(ii)
To fit DiDCMM, we present a spectral algorithm called DiMSC, which is designed based on the investigation that there exists an ideal cone structure inherent in the normalized version of the left singular vectors and an ideal simplex structure inherent in the right singular vectors of the population adjacency matrix. We prove that our DiMSC exactly recovers the membership matrices for both row and column nodes in the oracle case under DiDCMM, and this also supports the identifiability of DiDCMM. We obtain the upper bounds of error rates for each row (and column) node and show that our method produces asymptotically consistent parameter estimations under mild conditions. Our theoretical results are consistent with classical results when DiDCMM degenerates to SBM and MMSB under mild conditions. Numerical results of simulated directed networks support our theoretical results and show that our approach outperforms its competitors. We also apply our algorithm to several real-world directed networks to test the existence of highly mixed nodes and asymmetric structures between row and column communities.
Notations. 
We take the following general notations in this paper. For a vector x and fixed q > 0 , x q denotes its l q -norm. For a matrix M, M denotes the transpose of the matrix M, M denotes the spectral norm, M F denotes the Frobenius norm, and M 2 denotes the maximum l 2 -norm of all the rows of M. Let rank ( M ) denote the rank of matrix M. Let σ i ( M ) be the i-th largest singular value of matrix M, and λ i ( M ) denote the i-th largest eigenvalue of the matrix M ordered by the magnitude. M ( i , : ) and M ( : , j ) denote the i-th row and the j-th column of matrix M, respectively. M ( S r , : ) and M ( : , S c ) denote the rows and columns in the index sets S r and S c of matrix M, respectively. For any matrix M, we simply use Y = max ( 0 , M ) to represent Y i j = max ( 0 , M i j ) for any i , j . For any matrix M R m × m , let diag ( M ) be the m × m diagonal matrix whose i-th diagonal entry is M ( i , i ) . Here, 1 and 0 are column vectors with all entries being ones and zeros, respectively; e i is a column vector whose i-th entry is one, while other entries are zero. C is a positive constant that may vary occasionally.

2. The Directed Degree Corrected Mixed Membership Model

Consider a directed network N = ( V r , V c , E ) , where V r = { 1 , 2 , , n r } is the set of row nodes, V c = { 1 , 2 , , n c } is the set of column nodes ( n r and n c indicate the number of row nodes and the number of column nodes, respectively), and E is the set of edges. Note that when V r = V c such that row nodes are the same as column nodes, N is a traditional directed network [31,36,37,38,39,40,41,42]; when V r V c , N is a bipartite network (also known as a bipartite graph) [30,33,35,43,44,45]; see Figure 1 for illustrations of the topological structures for a directed network and a bipartite network. Without confusion, we also call bipartite networks directed networks occasionally in this paper.
We assume that the row nodes of the directed network N belong to K perceivable communities (called row communities in this paper)
C r ( 1 ) , C r ( 2 ) , , C r ( K ) ,
and the column nodes of the directed network N belong to K perceivable communities (called column communities in this paper)
C c ( 1 ) , C c ( 2 ) , , C c ( K ) .
Define an n r × K row nodes membership matrix Π r and an n c × K column nodes membership matrix Π c such that Π r ( i , : ) is a 1 × K probability mass function (PMF) for row node i, Π c ( j , : ) is a 1 × K PMF for column node j, and
Π r ( i , k ) is the weight of row node i on C r ( k ) , 1 k K ,
Π c ( j , k ) is the weight of column node j on C c ( k ) , 1 k K .
Call row node i ‘pure’ if Π r ( i , : ) is degenerate (i.e., one entry is 1, all other K 1 entries are 0) and ‘mixed’ otherwise. The same definitions hold for column nodes. Note that mixed nodes considered in this article are not the boundary nodes introduced in [46] since boundary nodes are defined based on non-overlapping networks, while mixed nodes belong to multiple communities.
Let A { 0 , 1 } n r × n c be the bi-adjacency matrix of N such that for each entry, A ( i , j ) = 1 if there is a directional edge from row node i to column node j, and A ( i , j ) = 0 otherwise. So, the i-th row of A records how row node i sends edges, and the j-th column of A records how column node j receives edges. Let P be a K × K matrix such that
P ( k , l ) 0 for 1 k , l K .
Note that since we consider a directed network in this paper, P may be asymmetric.
Without loss of generality, suppose that row nodes have degree heterogeneities, while column nodes do not i.e., row nodes have variation in degree, while column nodes do not. Note that in a directed network, if column nodes have degree heterogeneities while row nodes do not, to detect memberships of both row nodes and column nodes, we set the transpose of the adjacency matrix as input when applying our algorithm DiMSC. Meanwhile, in a directed network, if both row and column nodes have degree heterogeneity, to model such a directed network with mixed memberships, we need nontrivial constraints on the degree heterogeneities between row nodes and column nodes for model identifiability, for detail, see Remark 1.
Let θ r be an n r × 1 vector whose i-th entry is the positive degree heterogeneity of row node i. For all pairs of ( i , j ) with 1 i n r , 1 j n c , DiDCMM models the entries of A such that A ( i , j ) are independent Bernoulli random variables satisfying
P ( A ( i , j ) = 1 ) = θ r ( i ) k = 1 K l = 1 K Π r ( i , k ) Π c ( j , l ) P ( k , l ) .
Equation (6) means that P ( A ( i , j ) = 1 ) = θ r ( i ) Π r ( i , : ) P Π c ( j , : ) , i.e., the probability of generating a directional edge from row node i to column node j is θ r ( i ) Π r ( i , : ) P Π c ( j , : ) , and this probability is controlled by the degree heterogeneity parameter θ r ( i ) of row node i, the connecting matrix P, and the memberships of nodes i and j. Equation (6) functions similarly to Equation (1.4) in [24], and both equations define the probability of generating an edge. For comparison, Equation (6) defines the probability of generating a directional edge under DiDCMM for a directed network, while Equation (1.4) in [24] defines the probability of generating an edge under DCMM for an undirected network, i.e., DiDCMM can be seen as an extension of DCMM from an undirected network to a directed network.
Introduce the degree heterogeneity diagonal matrix Θ r R n r × n r for row nodes such that
Θ r ( i , i ) = θ r ( i ) for 1 i n r .
Equation (7) uses a diagonal matrix Θ r to contain all degree heterogeneities, and Θ r is useful for further theoretical analysis through Equation (8).
Definition 1. 
Call model (1)–(6) the directed degree corrected mixed membership (DiDCMM) model, and denote it by D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) .
The following conditions are sufficient for the identifiability of DiDCMM:
  • (I1) rank ( P ) = K , and P has unit diagonals.
  • (I2) There is at least one pure node for each of the K row and K column communities.
When building statistical models for a network in which nodes can belong to multiple communities, the full rank requirement of connecting matrix P and pure nodes condition are always necessary for model identifiability, see models for an undirected network such as MMSB considered in [23,27], DCMM considered in [24,26], and OCCAM considered in [26,29]. Meanwhile, if models modeling networks with mixed memberships consider degree heterogeneity, the unit diagonals requirement on connecting matrix P is also necessary for model identifiability, see the identifiability requirement of DCMM and OCCAM considered in [24,26,29]. Furthermore, based on the fact that DiDCMM, DCMM, and OCCAM can include the well-known model SBM, letting P have unit diagonals is not a serious problem since many wonderful works study a special case of SBM when P has unit diagonals and a network has K equal size clusters (this special case of SBM is also known as a planted partition model), see [12,47,48,49,50,51,52].
Let Ω = E [ A ] be the expectation of the adjacency matrix A. Under DiDCMM, we have
Ω = Θ r Π r P Π c .
We refer to Ω as the population adjacency matrix. Since rank ( Θ r ) = K , rank ( Π r ) = K , rank ( Π c ) = K and rank ( P ) = K by Equation (7) and Conditions (I1) and (I2), the rank of Ω is K. Recall that K is the number of communities, and it is much smaller than network size. We see that Ω has a low dimensional structure. The form of Ω given in Equation (8) is powerful to build the spectral algorithm developed in this paper to fit DiDCMM. Analyzing properties of the population adjacency matrix to build a spectral algorithm fitting statistical model is a common strategy in community detection, for example, references [24,26,27,35] also use this strategy to design their algorithms fitting DCMM, MMSB, and DiDCMM.
For 1 k K , let I r ( k ) = { i { 1 , 2 , , n r } : Π r ( i , k ) = 1 } and I c ( k ) = { j { 1 , 2 , , n c } : Π c ( j , k ) = 1 } . By Condition (I2), I r ( k ) and I c ( k ) are nonempty for all 1 k K . For 1 k K , select one row node from I r ( k ) to construct the index set I r , i.e., I r is the indices of row nodes corresponding to K pure row nodes, one from each community, and I c is defined similarly. W.L.O.G., let Π r ( I r , : ) = I K and Π c ( I c , : ) = I K (Lemma 2.1 [27] has a similar setting to design their spectral algorithm under MMSB.), where I K is the K × K identity matrix. The proposition below shows that the DiDCMM model is identifiable.
Proposition 1. 
(Identifiability). When Conditions (I1) and (I2) hold, DiDCMM is identifiable: for eligible ( P , Π r , Π c , Θ r ) and ( P ˜ , Π ˜ r , Π ˜ c , Θ ˜ r ) , set Ω = Θ r Π r P Π c and Ω ˜ = Θ ˜ r Π ˜ r P ˜ Π ˜ c . If Ω = Ω ˜ , then Θ r = Θ ˜ r , Π r = Π ˜ r , Π c = Π ˜ c and P = P ˜ .
 Remark 1.
(The reason that we do not model a directed network with mixed memberships where both row and column nodes have degree heterogeneities). Suppose both row and column nodes have degree heterogeneities in a mixed membership directed network. To model such a directed network, the probability of generating an edge from row node i to column node j is
P ( A ( i , j ) = 1 ) = θ r ( i ) θ c ( j ) k = 1 K l = 1 K Π r ( i , k ) Π c ( j , l ) P ( k , l ) ,
where θ c is an n r × 1 vector whose j-th entry is the degree heterogeneity of column node j. Set Ω = E [ A ] , then Ω = Θ r Π r P Π c Θ c , where Θ c R n c × n c is a diagonal matrix whose j-th diagonal entry θ c ( j ) . Set Ω = U Λ V as the compact SVD of Ω. Follow similar analysis as Lemma 1, we see that U = Θ r Π r B r and V = Θ c Π c B c (without causing confusion, we still use B c here for convenience.). For model identifiability, follow similar analysis as the proof of Proposition 1, since Ω ( I r , I c ) = Θ r ( I r , I r ) Π r ( I r , ; ) P Π c ( I c , : ) Θ c ( I c , I c ) = Θ r ( I r , I r ) P Θ c ( I c , I c ) = U ( I r , : ) Λ V ( I c , : ) , we see that Θ r ( I r , I r ) P Θ c ( I c , I c ) = U ( I r , : ) Λ V ( I c , : ) . To obtain Θ r ( I r , I r ) and Θ c ( I c , I c ) from U ( I r , : ) Λ V ( I c , : ) , when P has unit diagonals, we see that it is impossible to recover Θ r ( I r , I r ) and Θ c ( I c , I c ) unless we add a condition that Θ r ( I r , I r ) = Θ c ( I c , I c ) . Now, suppose Θ r ( I r , I r ) = Θ c ( I c , I c ) holds and call it Condition (I3); we have Θ r ( I r , I r ) P Θ r ( I r , I r ) = U ( I r , : ) Λ V ( I c , : ) ; hence, Θ r ( I r , I r ) = Θ c ( I c , I c ) = diag ( U ( I r , : ) Λ V ( I c , : ) ) when P has unit diagonals. However, Condition (I3) is nontrivial since it requires Θ r ( I r , I r ) = Θ c ( I c , I c ) , and we always prefer a directed network in which there are no connections between row nodes degree heterogeneities and column nodes degree heterogeneities. For example, when all nodes are pure in a directed network, ref. [30] models such directed network using model DC-ScBM such that Ω = Θ r Π r P Π c Θ c when all nodes are pure, and Θ r and Θ c are independent under DC-ScBM. Because Condition (I3) is nontrivial, we do not model a mixed membership directed network with all nodes having degree heterogeneities.
For DiDCMM’s identifiability, the number of row communities should equal that of column communities when both row and column nodes may belong to more than one community. However, when only row nodes have mixed memberships while column nodes do not, the number of row communities can be lesser than that of column communities, and this is also discussed in [53]. All proofs of our theoretical results are provided in the Appendix A.1.
Unless specified, we treat Conditions (I1) and (I2) as default from now on. Proposition 1 is important since it guarantees that our model DiDCMM is well-defined, and we can design efficient spectral algorithms to fit DiDCMM based on its identifiability. The reason that we do not consider degree heterogeneity for column nodes for our DiDCMM is mainly for its identifiability. As analyzed in Remark 1, considering degree heterogeneity for both row and column nodes make the model unidentifiable unless adding some nontrivial conditions on model parameters. Meanwhile, many previous statistical models in the community detection areas are identifiable, and spectral algorithms can be applied to fit them. For examples, SBM [3], DCSBM [4], MMSB [17], DCMM [24], OCCAM [29], ScBM (and DCScBM), [30], and DiMMSB [35] are identifiable. Especially, though different statistical models may have different requirements on model parameters for identifiability, the proof of identifiability enjoys a similar idea as that of Proposition 1, for instance, Proposition 1.1 [24] and Theorem 2.1 [27] build theoretical guarantees on identifiability for DCMM and MMSB, respectively.
Remark 2. 
We compare our DiDCMM with some previous models in this remark.
  • When Θ r = ρ I for ρ > 0 , Equation (8) gives Ω = ρ Π r P Π c and DiDCMM degenerates to DiMMSB [35], where ρ is known as a sparsity parameter [9,27,35]. So, DiDCMM includes DiMMSB as a special case, and the relationship between DiDCMM and DiMMSB is similar to that between DCSBM [3,4]. Meanwhile, DiDCMM considers degree heterogeneity parameter Θ r at the cost that DiDCMM requires P to have unit diagonals for model identifiability, while there is no such requirement for P on DiMMSB’s identifiability. Note that both DiDCMM and DiMMSB are identifiable only when P is a full-rank square matrix.
  • When Θ r = ρ I for ρ > 0 and all nodes are pure, DiDCMM reduces to ScBM [30]. DiDCMM can model a directed network in which nodes enjoy overlapping memberships, while ScBM cannot. Meanwhile, DiDCMM enjoys this advantage at the cost of requiring rank ( P ) = K for model identifiability, while ScBM is identifiable even when P is not a square matrix, i.e., ScBM can model a directed network in which the number of row communities can be different from the number of column communities. A comparison between DiDCMM and DCScBM [30] is similar.
  • When Θ r = ρ I and the network is undirected, DiDCMM reduces to MMSB [17]. However, DiDCMM models directed networks with mixed memberships, while MMSB only models undirected networks with mixed memberships. Again, DiDCMM enjoys its advantage at the cost of P having unit diagonals for its identifiability (not that DiDCMM allows P to be asymmetric since DiDCMM models directed networks), while MMSB is identifiable even when P has non-unit diagonals (note that P is symmetric under MMSB since it models undirected networks). Meanwhile, the identifiability of both DiDCMM and MMSB requires the square matrix P to have full rank.
  • When Θ r = ρ I , the network is undirected and all nodes are pure, DiDCMM reduces to SBM [3]. For comparison, DiDCMM models directed networks and allows nodes to belong to multiple communities, while SBM only models undirected networks in which a node only belongs to one community. Meanwhile, DiDCMM enjoys these advantages at the cost of requiring P to be full rank with unit diagonals for its identifiability, while SBM is identifiable even when P is not full rank and P has non-unit diagonals. Note that DiDCMM allows P to be asymmetric, while P must be symmetric for SBM since DiDCMM models directed networks, while SBM models undirected networks. Comparison between DiDCMM and DCSBM [4] is similar.
  • Compared with DCMM introduced in [24] and OCCAM introduced in [29], DCMM, and OCCAM model undirected networks with mixed memberships, while DiDCMM models directed networks with mixed memberships. DiDCMM, DCMM, and OCCAM all consider degree heterogeneity for overlapping networks, and they are identifiable only when the full rank matrix P has unit diagonals. These three models are identifiable only when the square matrix P is full rank. Meanwhile, DiDCMM allows P to be asymmetric, while P must be symmetric for DCMM and OCCAM since DiDCMM models directed networks, while DCMM and OCCAM model undirected networks.

3. Algorithm

The primary goal of the proposed algorithm is to estimate the row membership matrix Π r and column membership matrix Π c from the observed adjacency matrix A with given K. We start by considering the ideal case when Ω is known, and then we extend what we learn in the ideal case to the real case.

3.1. The Ideal Simplex (IS), the Ideal Cone (IC), and the Ideal DiMSC

Recall that rank ( Ω ) = K under Conditions (I1) and (I2), and K is much smaller than min { n r , n c } . Let Ω = U Λ V be the compact singular value decomposition of Ω such that U R n r × K , Λ R K × K , V R n c × K , U U = I K , V V = I K . The goal of the ideal case is to use U , Λ , and V to exactly recover Π r and Π c . As stated in [8,24], θ r is one of the major nuisances, and similar to [7], we remove the effect of θ r by normalizing each row of U to have a unit l 2 norm. Set U * R n r × K by U * ( i , : ) = U ( i , : ) U ( i , : ) F , and let N U be the n r × n r diagonal matrix such that N U ( i , i ) = 1 U ( i , : ) F for 1 i n r . Then, U * can be rewritten as U * = N U U . The existences of the ideal cone (IC for short) structure inherent in U * and the ideal simplex (IS for short) structure inherent in V are guaranteed by the following lemma.
Lemma 1.  (Ideal Simplex and Ideal Cone). Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , there exist a unique K × K matrix B r and a unique K × K matrix B c such that
  • U = Θ r Π r B r , where B r = Θ r 1 ( I r , I r ) U ( I r , : ) , and U * = Y U * ( I r , : ) where
    Y = N M Π r Θ r 1 ( I r , I r ) N U 1 ( I r , I r ) with N M being an n r × n r diagonal matrix whose diagonal entries are positive. Meanwhile, U * ( i , : ) = U * ( i ¯ , : ) if Π r ( i , : ) = Π r ( i ¯ , : ) for 1 i , i ¯ n r .
  • V = Π c B c , where B c = V ( I c , : ) . Meanwhile, V ( j , : ) = V ( j ¯ , : ) if Π c ( j , : ) = Π c ( j ¯ , : ) for 1 j , j ¯ n c .
Lemma 1 says that the rows of V form a K-simplex in R K which we call the ideal simplex (IS), with the K rows of B c being the vertices. Such IS is also found in [24,27,35]. Lemma 1 also shows that the form of U * = Y U * ( I r , : ) is actually the ideal cone structure mentioned in [26]. Meanwhile, we remove the influence of θ r by normalizing each row of U to have a unit norm in this paper. Using the idea of the entry-wise ratio in [8] also works, where ref. [24] develops their spectral algorithms to fit DCMM using the idea of entry-wise ratio. Designing algorithms based on the nonnegative matrix factorization [25] to fit DiDCMM by adding some constraints on Ω may also work. We leave the study of using these ideas to fit DiDCMM or its submodels for our future work.
For column nodes (recall that column nodes have no degree heterogeneities), since B c is full rank if V and B c are known in advance, ideally we can exactly recover Π c by setting Π c = V B c ( B c B c ) 1 V B c 1 . For convenience, to transfer the ideal case to the real case, set Z c = V B c 1 . Since Z c Π c , we have
Π c ( j , : ) = Z c ( j , : ) Z c ( j , : ) 1 , 1 j n c .
With given V, since it enjoys IS structure V = Π c B c Π c V ( I c , : ) , as long as we can obtain V ( I c , : ) (i.e., B c ), we can recover Π c exactly. As mentioned in [24,27], for such IS, the successive projection (SP) algorithm [54] (i.e., Algorithm A2 in the Appendix E) can be applied to V with K column communities to find the column corner matrix B c . The above analysis gives how to recover Π c with given Ω and K under DiDCMM ideally.
Next, we aim to recover Π r from U with the given K. Since rank ( U * ) = K , rank ( U * ( I r , : ) ) = K . As U * ( I r , : ) R K × K , the inverse of U * ( I r , : ) exists. Therefore, Lemma 1 also gives that
Y = U * U * 1 ( I r , : ) .
Equation (9) holds because U * = Y U * ( I r , : ) and U * ( I r , : ) is a nonsingular matrix. By Lemma 1, we know that for row nodes, their membership matrix Π r appears in the expression of Y. Therefore, we aim to use Equation (9) to find the exact expression of Π r using U , V , and Λ by putting Y at the left-hand side of equality. For our next step, we aim at finding Π r using Equation (9). Since Y = N M Π r Θ r 1 ( I r , I r ) N U 1 ( I r , I r ) by Lemma 1 and U * = N U U , using N M Π r Θ r 1 ( I r , I r ) N U 1 ( I r , I r ) and N U U to replace Y and U * in Equation (9), respectively, we have N U 1 N M Π r Θ r 1 ( I r , I r ) N U 1 ( I r , I r ) = U U * 1 ( I r , : ) , which gives
N U 1 N M Π r = U U * 1 ( I r , : ) N U ( I r , I r ) Θ r ( I r , I r ) .
From Equation (10), we have found the expression of Π r as a function of U , U * , Θ r , N U , and I r , where we do not move N U 1 N M to the right-hand side of Equation (10) because it is a diagonal matrix and does not influence the expression of Π r , see our next step for details. When designing a spectral algorithm in the ideal case with given Ω and K, we aim at recovering Π r and Π c by taking advantage of the singular value decomposition of Ω . We find that though Equation (10) provides an expression for Π r by Ω ’s SVD, there is a term Θ r ( I r , I r ) which relates to degree heterogeneity, and we aim at expressing Θ r ( I r , I r ) through Ω ’s SVD. By the proof of Lemma 1, we know that Θ r ( I r , I r ) = diag ( U ( I r , : ) Λ V ( I c , : ) ) when Condition (I1) holds. Thus, substituting diag ( U ( I r , : ) Λ V ( I c , : ) ) for Θ r ( I r , I r ) in Equation (10), we obtain an expression of Π r such that this expression is directly related to Ω ’s SVD and two index set I r and I c . For convenience, set J * = N U ( I r , I r ) Θ r ( I r , I r ) diag ( U * ( I r , : ) Λ V ( I c , : ) ) , Z r = N U 1 N M Π r , Y * = U U * 1 ( I r , : ) . By Equation (10), we have
Z r = Y * J * U U * 1 ( I r , : ) diag ( U * ( I r , : ) Λ V ( I c , : ) ) .
Equation (11) looks similar to Equation (7) of [55]. However, Equation (11) is related to two index sets I r and I c , while Equation (7) of [55] is only related to one index set because Equation (11) aims at designing a spectral algorithm for directed network generated under DiDCMM and Equation (7) of [55] aims at reviewing the generation of the SVM-cone-DCMMSB algorithm proposed in [26] for undirected network generated under DCMM. Meanwhile, since N U 1 N M is an n r × n r positive diagonal matrix, we have
Π r ( i , : ) = Z r ( i , : ) Z r ( i , : ) 1 , 1 i n r .
With given Ω and K, we can obtain U , V ; thus, the above analysis shows that once the two index sets I r and I c are known, we can exactly recover Π r by Equations (11) and (12). Meanwhile, from Equation (10), we see that it is important to express Θ r ( I r , I r ) as a combination of U , V , Λ , and the two index sets I r and I c , where we successfully obtain an expression of Θ r ( I r , I r ) by Condition (I1), the unit diagonal constraint on P. Otherwise, if P has no unit diagonals, we cannot obtain an expression of Θ r ( I r , I r ) unless adding some nontrivial conditions on model parameters, just as analyzed in Remark 1. Similarly, references [24,26] also design their spectral algorithms to fit DCMM by using the unit diagonal constraint on P to obtain an expression of a sub-matrix of degree heterogeneity matrix, see Equations (6)–(8) of [55] as an example.
Given Ω and K, to recover Π r in the ideal case, we need to obtain Z r by Equation (11), which means that the only difficulty is in finding the index set I r since V ( I c , : ) can be obtained by SP algorithm from the IS structure V = Π c V ( I c , : ) . From Lemma 1, we know that U * = Y U * ( I r , : ) forms the IC structure. In [26], their SVM-cone algorithm (i.e., Algorithm A3 in the Appendix F) can exactly obtain the row nodes corner matrix U * ( I r , : ) from the ideal cone U * = Y U * ( I r , : ) as long as the Condition ( U * ( I r , : ) U * ( I r , : ) ) 1 1 > 0 holds (see Lemma 2).
Lemma 2. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , ( U * ( I r , : ) U * ( I r , : ) ) 1 1 > 0 holds.
Based on the above analysis, we are now ready to give the following four-stage algorithm which we call ideal DiMSC. Input Ω , K . Output: Π r and Π c .
  • Let Ω = U Λ V be the compact SVD of Ω such that U R n r × K , V R n c × K , Λ R K × K , U U = I , V V = I . Let U * = N U U , where N U is an n r × n r diagonal matrix whose i-th diagonal entry is 1 U ( i , : ) F for 1 i n r .
  • Run the SP algorithm on V assuming that there are K column communities to obtain the column corner matrix V ( I c , : ) (i.e., B c ). Run the SVM-cone algorithm on U * assuming that there are K row communities to obtain I r .
  • Set J * = diag ( U * ( I r , : ) Λ V ( I c , : ) ) , Y * = U U * 1 ( I r , : ) , Z r = Y * J * and Z c = V V 1 ( I c , : ) .
  • Recover Π r and Π c by setting Π r ( i , : ) = Z r ( i , : ) Z r ( i , : ) 1 for 1 i n r , and Π c ( j , : ) = Z c ( j , : ) Z c ( j , : ) 1 for 1 j n c .
The following theorem guarantees that ideal DiMSC exactly recovers nodes memberships, and this verifies the identifiability of DiDCMM in turn. Meanwhile, it should be noted that many spectral algorithms designed to fit identifiable statistical models in the community detection area can exactly recover node memberships for the ideal case. For example, the spectral clustering for K many clusters algorithm addressed in [5] under SBM, the regularized spectral clustering designed in [7] under DCSBM, the SCORE algorithm designed in [8] under DCSBM, the two algorithms designed and studied in [9] under SBM and DCSBM, the RSC- τ algorithm studied in [11] under SBM, the mixed-SCORE algorithm designed in [24] under DCMM, the DI-SIM algorithm designed in [30] under DCScBM, the D-SCORE algorithm studied in [31,32] under DCScBM, the SVM-cone-DCMMSB algorithm designed in [26] under DCMM, and the SPACL algorithm designed in [27] under MMSB can exactly recover membership matrices under respective models for the ideal case by using the population adjacency matrix to replace the adjacency matrix in the input of these algorithms. The fact that ideal cases for the above spectral algorithms can return community information also supports the identifiability of the above models.
Theorem 1. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , the ideal DiMSC exactly recovers the row nodes membership matrix Π r and the column nodes membership matrix Π c .
To demonstrate that U * has the ideal cone structure, we drew Panel (a) of Figure 2. The simulated data used for Panel (a) is generated from D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) with n r = 600 , n c = 400 , K = 3 ; each row (and column) community has 120 pure nodes. For the 240 mixed row nodes, we set Π r ( i , 1 ) = rand ( 1 ) / 2 , Π r ( i , 2 ) = rand ( 1 ) / 2 , Π r ( i , 3 ) = 1 Π r ( j , 1 ) Π r ( j , 2 ) , where rand ( 1 ) is any random number in ( 0 , 1 ) ,
and i is a mixed row node. For the 40 mixed column nodes, set Π c ( j , 1 ) = rand ( 1 ) / 2 ,
Π c ( j , 2 ) = rand ( 1 ) / 2 , Π c ( j , 3 ) = 1 Π c ( j , 1 ) Π c ( j , 2 ) . For the degree heterogeneity parameter, set θ r ( i ) = rand ( 1 ) for all row nodes i. The matrix P is set as
P = 1 0.4 0.3 0.2 1 0.1 0.1 0.4 1 .
Under such a setting, after computing Ω and obtaining U * , V from Ω , we can plot Figure 2. Panel (a) shows that all rows respective to mixed row nodes of U * are located at one side of the hyperplane formed by the K rows of U * ( I r , : ) , and this phenomenon occurs since each row of U * is a scaled convex combination of the K rows of U * ( I r , : ) guaranteed by the IC structure U * = Y U * ( I r , ; ) . Thus Panel (a) shows the existence of the ideal cone structure formed by U * . Similarly, to demonstrate that V has the ideal simplex structure, we drew Panel (b) of Figure 2, where Panel (b) is obtained under the same setting as Panel (a). Panel (b) shows that rows respective to mixed column nodes of V are located inside of the simplex formed by the K rows of V ( I c , : ) , and this phenomenon occurs since each row of V is a convex linear combination of the K rows of V ( I c , : ) guaranteed by the IS structure V = Π c V ( I c , ; ) . Thus Panel (b) shows the existence of the ideal simplex structure formed by V.

3.2. Dimsc Algorithm

We now extend the ideal case to the real case. Set A ˜ = U ^ Λ ^ V ^ to be the top-K-dimensional SVD of A such that U ^ R n r × K , V ^ R n c × K , Λ ^ R K × K , U ^ U ^ = I K , V ^ V ^ = I K , and Λ ^ contains the top K singular values of A. Let U ^ * be the row-wise normalization of U ^ such that U ^ = N U ^ U ^ , where N U ^ R n r × n r is a diagonal matrix whose i-th diagonal entry is 1 U ^ ( i , : ) F . For the real case, we use J ^ * , Y ^ * , Z ^ r , Z ^ c , Π ^ r , Π ^ c given in Algorithm 1 to estimate J * , Y * , Z r , Z c , Π r , Π c , respectively. Algorithm 1 called directed mixed simplex and cone (DiMSC for short) algorithm is a natural extension of the ideal DiMSC to the real case.
Algorithm 1: Directed Mixed Simplex and Cone (DiMSC) algorithm
Require: The adjacency matrix A R n r × n c of a directed network, the number of row (column) communities K.
Ensure: The estimated n r × K row membership matrix Π ^ r and the estimated n c × K column membership matrix Π ^ c .
1:
Obtain A ˜ = U ^ Λ ^ V ^ , the top-K-dimensional SVD of A. Compute U ^ * from U ^ .
2:
Apply SP algorithm (i.e., Algorithm A2) on the rows of V ^ assuming there are K column communities to obtain I ^ c , the index set returned by SP algorithm.
3:
Similarly, apply SVM-cone algorithm (i.e., Algorithm 3) on the rows of U ^ * with K row communities to obtain I ^ r , the index set returned by SVM-cone algorithm.
4:
Set J ^ * = diag ( U ^ * ( I ^ r , : ) Λ ^ V ^ ( I ^ c , : ) ) , Y ^ * = U ^ U ^ * 1 ( I ^ r , : ) , Z ^ r = Y ^ * J ^ * and Z ^ c = V ^ V ^ 1 ( I ^ c , : ) . Then, set Z ^ r = max ( 0 , Z ^ r ) and Z ^ c = max ( 0 , Z ^ c ) .
5:
Estimate Π r ( i , : ) by Π ^ r ( i , : ) = Z ^ r ( i , : ) / Z ^ r ( i , : ) 1 , 1 i n r and estimate Π c ( j , : ) by Π ^ c ( j , : ) = Z ^ c ( j , : ) / Z ^ c ( j , : ) 1 , 1 j n c .
In the third step, we set the negative entries of Z ^ r as 0 by setting Z ^ r = max ( 0 , Z ^ r ) for the reason that weights for any row node should be nonnegative, while there may exist some negative entries of Y ^ * J ^ * . A similar argument holds for Z ^ c . The flowchart of DiMSC is displayed in Figure 3. Meanwhile, in community detection, researchers often use top-K-dimensional SVD of A or its variants such as Laplacian matrix or regularized Laplacian matrix to design their spectral clustering algorithms to fit identifiable statistical models such as spectral methods designed or studied in [5,7,8,9,11,24,26,27,29,31,33,35,56]. Furthermore, as discussed in [57], the SVS + and SVS * algorithms may be used as substitutions of the SP algorithm in our DiMSC for a better estimation of Π r . When applying the entry-wise normalization idea developed in [8] to deal with U, as analyzed in [24], we obtain a simplex structure, and we can use the SP algorithm (or the combinatorial vertex search and sketched vertex search approaches developed in [24]) to hunt for the corners. The above ideas suggest that we can design different spectral algorithms to fit our model DiDCMM. We leave them for our future work. In particular, in this paper, we apply the SVM-cone algorithm to hunt for the corners of the cone structure inherent in U * mainly for the theoretical convenience of the SVM-cone algorithm because ref. [26] has developed a nice theoretical framework on the performance for the SVM-cone algorithm.

3.3. Computational Complexity

The computing cost of DiMSC mainly comes from SVD, SP, and SVM-cone. The computational complexity of SVD is O ( max ( n r , n c ) min ( n r 2 , n c 2 ) ) . Since the adjacency matrix A for real-world network data sets is usually sparse, using the power method discussed in [58], the computation complexity for obtaining the top-K-dimensional SVD of A is only slightly larger than O ( max ( n r 2 , n c 2 ) K ) [8,24]. The SP algorithm step in DiMSC has a complexity of O ( max ( n r , n c ) K 2 ) [24]. The complexity of the one-class SVM step for SVM-cone algorithm is O ( max ( n r , n c ) K 2 ) [26,59]. The complexity of the K-means step for SVM-cone algorithm is O ( max ( n r , n c ) K 2 ) [60]. Since the number of communities K considered in this paper is much smaller than the network size, the total complexity of DiMSC is O ( max ( n r 2 , n c 2 ) K ) . Results in Section 5 show that, for a computer-generated network with 15,000 nodes under SBM, DiMSC takes hundreds of seconds to process a standard personal computer (Thinkpad X1 Carbon Gen 8) using MATLAB R2021b. Meanwhile, many spectral methods developed under models SBM, DCSBM, MMSB, ScBM, DCScBM, OCCAM, DCMM, and DiMMSB for community detection also have complexity O ( max ( n r 2 , n c 2 ) K ) , see spectral algorithms designed or studied in [5,7,8,9,11,24,26,27,29,30,31,33,35,61,62]. Researchers design spectral algorithms for community detection under various identifiable statistical models mainly for their convenience on building a theoretical guarantee of consistent estimation, and we also provide a theoretical guarantee on DiMSC’s estimation consistency in next section.

4. Consistency Results

In this section, we show the consistency of our algorithm for fitting the DiDCMM by proving that the sample-based estimates Π ^ r and Π ^ c concentrate around the true mixed membership matrices Π r and Π c . Throughout this paper, K is a known positive integer. Set θ r , max = max 1 i n r θ r ( i ) and θ r , min = min 1 i n r θ r ( i ) . Assume that
Assumption 1. 
P max max ( θ r 1 , θ r , max n c ) log ( n r + n c ) .
Assumption 1 means that the network cannot be too sparse, and it also means that we allow θ r , max to go to zero with increasing numbers of row nodes and column nodes. When building theoretical guarantees on consistent estimation, controlling network sparsity is popular in the community detection area. For examples, Condition (2.9) of [8], Theorem 3.1 of [9], Condition (2.13) of [24], Assumption 3.1 of [27], and Assumption 2 of [31] all control network sparsity for their theoretical analysis. Especially, when DiDCMM reduces to SBM by letting Θ r = ρ I , n = n r = n c , Π r = Π c , and all nodes are pure for ρ > 0 , Assumption A1 requires that ρ n log ( n ) , which is consistent with the sparsity requirement in [8,9,24,31]. As analyzed in [55], we know that our requirement on network sparsity is optimal since it matches the sharp threshold of obtaining a connected Erdös–Rényi (ER) random graph [63] when SBM reduces to an ER random graph by letting K = 1 .
For notation convenience, set ϖ = max ( U ^ U ^ U U 2 , V ^ V ^ V V 2 ) , f ^ r = max 1 i n r e i ( Π ^ r Π r P r ) 1 , f ^ c = max 1 j n c e j ( Π ^ c Π c P c ) 1 , and π r , min = min 1 k K 1 Π r e k , where ϖ is the row-wise singular vector deviation which can be bounded by Theorem 4.4 of [64], f ^ r and f ^ c measures per node clustering error of DiMSC, and π r , min measures the minimum summation of row nodes belonging to a certain row community. Increasing π r , min makes the network tend to be more balanced and vice versa. Meanwhile, row-wise singular vector deviation is important when building a theoretical guarantee of spectral methods fitting models for a network with mixed memberships, for example, refs. [24,26,27,35] also consider ϖ when building consistent estimation for their spectral methods.
The next theorem gives theoretical bounds on estimations of memberships for both row and column nodes, which is the main theoretical result for our DiMSC method.
Theorem 2. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , let Π ^ r and Π ^ c be obtained from Algorithm 1, when Assumption 1 holds, suppose σ K ( Ω ) C θ r , max P max ( n r + n c ) log ( n r + n c ) , with probability at least 1 o ( ( n r + n c ) 3 ) , we have
f ^ r = O ( K 5.5 θ r , max 15 ϖ κ 4.5 ( Π r Π r ) κ ( Π c ) λ 1 1.5 ( Π r Π r ) θ r , min 15 π r , min ) , f ^ c = O ( ϖ K κ ( Π c Π c ) λ 1 ( Π c Π c ) ) .
In Theorem 2, the Condition σ K ( Ω ) C θ r , max P max ( n r + n c ) log ( n r + n c ) is necessary when applying Theorem 4.4 [64] to obtain a theoretical upper bound of ϖ . When building a theoretical guarantee on estimation consistency for spectral methods fitting models modeling network with mixed memberships, it is necessary to have a lower bound requirement on σ K ( Ω ) , see [24,26,27,35]. Actually, this requirement matches with the consistent requirement on σ K ( P ) P max obtained from the theoretical upper bound of error rates for a balanced network, see Remark 4 for details. Meanwhile, similar to [7,11,30], we can design a spectral algorithm via an application of regularized Laplacian matrix to fit DiDCMM.
The following corollary is obtained by adding conditions on model parameters similar to Corollary 3.1 in [27], where these conditions give a directed network in which each community has the same order of size, and each node has the same order of degree, i.e., a balanced network.
Corollary 1. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , when conditions of Theorem 2 hold, suppose λ K ( Π r Π r ) = O ( n r K ) , λ K ( Π c Π c ) = O ( n c K ) , π r , min = O ( n r K ) and K = O ( 1 ) , with probability at least 1 o ( ( n r + n c ) 3 ) , we have
f ^ r = O ( ( θ r , max θ r , min ) 15.5 1 σ K ( P ) P max log ( n r + n c ) θ r , min n c ) , f ^ c = O ( ( θ r , max θ r , min ) 0.5 1 σ K ( P ) P max log ( n r + n c ) θ r , min n r ) .
Meanwhile,
  • when θ r , max = O ( ρ ) , θ r , min = O ( ρ ) (i.e., θ r , min θ r , max = O ( 1 ) ), we have
    f ^ r = O ( 1 σ K ( P ) P max log ( n r + n c ) ρ n c ) , f ^ c = O ( 1 σ K ( P ) P max log ( n r + n c ) ρ n r ) .
  • when n r = O ( n ) , n c = O ( n ) and θ r , max = O ( ρ ) , θ r , min = O ( ρ ) , we have
    f ^ r = O ( 1 σ K ( P ) P max log ( n ) ρ n ) , f ^ c = O ( 1 σ K ( P ) P max log ( n ) ρ n ) .
Consider a directed mixed membership network under the settings of Corollary 1 when θ r , max = O ( ρ ) , θ r , min = O ( ρ ) for ρ > 0 , to obtain consistent estimations for both row nodes and column nodes, by Corollary 1, σ K ( P ) P max should shrink slower than log ( n r + n c ) ρ min ( n r , n c ) , where consistent estimation means that the theoretical upper bound of error rate goes to zero when increasing network size. Especially, when n r = O ( n ) and n c = O ( n ) , σ K ( P ) P max should shrink slower than log ( n ) n . We further assume that P = ( 2 β ) I K + ( β 1 ) 1 1 for β [ 1 , 2 ) ( 2 , ) and let P ˜ = ρ P (note that for this P, we have σ K ( P ) = | β 2 | and P max = max ( 1 , β 1 ) ). So the diagonal elements for P ˜ are ρ and non-diagonal elements are ρ ( β 1 ) . Set p in as the diagonal entries of P ˜ , and p out as the non-diagonal entries of P ˜ , we have p in = ρ , p out = ρ ( β 1 ) , and | p in p out | max ( p in , p out ) = ρ | β 2 | max ( 1 , β 1 ) = ρ σ K ( P ) P max . Hence, for consistent estimation, we see that | p in p out | max ( p in , p out ) should shrink slower than log ( n r + n c ) min ( n r , n c ) by Corollary 1 and should shrink slower than log ( n ) n when n r = O ( n ) and n c = O ( n ) , where this result is consistent with classical separation condition for a standard network with two equal-sized clusters by applying the separation condition and sharp threshold criterion developed in [55].
Remark 3. 
When the network is undirected (i.e., n r = n c = n , Π r = Π c ) with K = O ( 1 ) by setting θ r ( i ) = ρ for 1 i n r , DiDCMM degenerates to MMSB considered in [27], the upper bound of error rate for DiMSC is O ( 1 σ K ( P ) log ( n ) ρ n ) when P max = 1 . Replacing the Θ in [24] by Θ = ρ I , their DCMM model degenerates to MMSB. Then, their conditions in Theorem 2.2 are our Assumption 1 and λ K ( Π Π ) = O ( n K ) , where Π = Π r = Π c for MMSB. When K = O ( 1 ) , the error bound in Theorem 2.2 in [24] is O ( 1 σ K ( P ) log ( n ) ρ n ) , which is consistent with ours.
Remark 4. 
By Lemma A5 in the Appendix D, we know σ K ( Ω ) θ r , min σ K ( P ) σ K ( Π r ) σ K ( Π c ) . To ensure the Condition σ K ( Ω ) C ( θ r , max P max ( n r + n c ) log ( n r + n c ) ) 1 / 2 in Theorem 2 holds, we need
σ K ( P ) P max C θ r , max ( n r + n c ) log ( n r + n c ) θ r , min 2 λ K ( Π r Π r ) λ K ( Π c Π c ) 1 / 2 .
When K = O ( 1 ) , n r = O ( n ) , n c = O ( n ) , λ K ( Π r Π r ) = O ( n r K ) , λ K ( Π c Π c ) = O ( n c K ) and θ r , max = O ( ρ ) , θ r , min = O ( ρ ) , Equation (13) gives that σ K ( P ) P max should shrink slower than log ( n ) ρ n , which matches with the consistency requirement on σ K ( P ) P max of Corollary 1.
For convenience, we need the following definition.
Definition 2. 
Let D i D C M M ( n , K , Π r , Π c , α in , α out ) be a special case of D i M M D F n r , n c ( K , P , Π r , Π c , Θ r ) when Θ r = ρ I , n r = n c = n , λ K ( Π r Π r ) = O ( n / K ) , λ K ( Π c Π c ) = O ( n / K ) , π r , min = O ( n / K ) , K = O ( 1 ) , and P ˜ = ρ P has diagonal entries p in = α in log ( n ) n and non-diagonal entries p out = α out log ( n ) n .
D i D C M M ( n , K , Π r , Π c , α in , α out ) denotes a special directed network such that row communities have nearly equal sizes since λ K ( Π r Π r ) = O ( n / K ) , and column communities also have nearly equal sizes. By Corollary 1, for consistent estimation, we need | p in p out | max ( p in , p out ) log ( n ) n under D i D C M M ( n , K , Π r , Π c , α in , α out ) . Since | p in p out | max ( p in , p out ) = | α in α out | log ( n ) n max ( α in , α out ) , for consistent estimation, we need
| α in α out | max ( α in , α out ) 1
Our numerical results in Section 5 support that DiMSC can estimate memberships for both row and column nodes when the threshold | α in α out | max ( α in , α out ) 1 holds under D i D C M M ( n , K , Π r , Π c , α in , α out ) .
Remark 5. 
When K = 2 , the network is undirected (i.e., Π r = Π c ), all nodes are pure, and each community has an equal size, D i D C M M ( n , K , Π r , Π c , α in , α out ) reduces to the SBM case such that nodes connect with probability p in within clusters and p out across clusters. This case has been well studied in recent years, see [50] and references therein. Especially, for this case, ref. [50] finds that exact recovery is possible if | α in α out | > 2 and impossible if | α in α out | < 2 . For convenience, we use S B M ( n , p in , p out ) to denote this case. Our numerical results in Section 5 show that DiMSC return consistent estimation under S B M ( n , p in , p out ) when α in and α out are set in the impossible region of exact recovery but satisfy Equation (14).
Remark 6. 
In information theory, Shannon entropy [65] quantifies the amount of information in a variable, and it is a measure of uncertainty information of a probability distribution. We use a node membership entropy (NME) derived from Shannon theory to measure the node’s uncertainty about the node and all communities [66,67]. For row node i with membership Π r ( i , : ) , since k = 1 K Π r ( i , k ) = 1 and Π r ( i , k ) can be seen as the probability that row node i belongs to row cluster k for 1 k K , NME of row node i is the Shannon entropy related to Π r ( i , : ) :
N M E ( i ) = k = 1 K Π r ( i , k ) log ( Π r ( i , k ) ) .
For column node j with membership Π c ( j , : ) , we can also obtain its NME by Equation (15). In particular, if a node belongs to each cluster with equal probability 1 K , its NME is log ( K ) which is the maximum among all NME; if a node belongs to two clusters with equal probability 1 2 , its NME is log ( 2 ) which is less than log ( K ) when K 3 . Generally, we see that recovering memberships for mixed nodes is harder than for pure nodes since NME is 0 for pure nodes, while NME is larger than 0 for mixed nodes by the definition of NME.

5. Simulations

In this section, several experiments are conducted to investigate the performance of our DiMSC under DiDCMM. We compare our DiMSC with three model-based methods that can be thought of as special cases of our model DiDCMM. Model-based methods we compare include the DISIM algorithm proposed in [30], the DSCORE algorithm studied in [31], and the DiPCA algorithm which is obtained by using the adjacency matrix A to replace the regularized graph Laplacian matrix in the DISIM algorithm. Similar to [24,27], for simulations, we measure the errors for the inferred community membership matrices instead of simply each node. We measure the performance of DiMSC and its competitors by the mixed Hamming error rate (MHamm for short) defined below
MHamm = max ( min P S P Π ^ r P Π r 1 n r , min P S P Π ^ c P Π c 1 n c ) ,
where S P is the set of K × K permutation matrices.
For all simulations in this section, unless specified, we set the parameters ( n r , n c , K , P , Π r , Π c , Θ r ) under DiDCMM as follows: let each row community and each column community have n 0 pure nodes; let all mixed row nodes (and mixed column nodes) have membership ( 1 / K , 1 / K , , 1 / K ) ; for z 1 , we generate the degree parameters for row nodes as below: let θ ¯ r R n r × 1 such that 1 / θ ¯ r ( i ) i i d U ( 1 , z ) for 1 i n r , where U ( 1 , z ) denotes the uniform distribution on [ 1 , z ] , and set θ r = ρ θ ¯ r , where we use ρ to control the sparsity of the network; when K = 2 , P is set as
P 1 = 1 0.1 0.2 1 or P 2 = 0.8 0.1 0.2 0.9 ;
when K = 3 ,
P 3 = 1 0.1 0.3 0.2 1 0.4 0.5 0.2 1 or P 4 = 0.8 0.1 0.3 0.2 0.9 0.4 0.5 0.2 1 ;
where P 2 and P 4 have non-unit diagonals, and we consider the two cases because we want to investigate DiMSC’s sensitivity when P has non-unit diagonals such that P disobeys Condition (I1).
After obtaining P , Π r , Π c , θ r , similar to the five simulation steps in [8], each simulation experiment contains the following steps:
(a) Let Θ r be the n r × n r diagonal matrix such that Θ r ( i , i ) = θ r ( i ) , 1 i n r . Set Ω = Θ r Π r P Π c .
(b) Let W be an n r × n c matrix such that W ( i , j ) are independent centered-Bernoulli with parameters Ω ( i , j ) . Let A ˜ = Ω + W .
(c) Set S ˜ r = { i : j = 1 n c A ˜ ( i , j ) = 0 } and S ˜ c = { j : i = 1 n r A ˜ ( i , j ) = 0 } , i.e., S ˜ r ( S ˜ c ) is the set of row (column) nodes with 0 edges. Let A be the adjacency matrix obtained by removing rows respective to nodes in S ˜ r and removing columns respective to nodes in S ˜ c from A ˜ . Similarly, update Π r by removing nodes in S ˜ r and update Π c by removing nodes in S ˜ c .
(d) Apply the DiMSC algorithm (and its competitors) to A. Record MHamm under investigations.
(e) Repeat (b)–(d) 50 times, and report the averaged MHamm over the 50 repetitions.
Let n r , A be the number of rows of A and n c , A be the number of columns of A. In our experiments, n r , A and n c , A are usually very close to n r and n c ; therefore we do not report the exact values of n r , A and n c , A . After providing the above steps about how to generate A numerically under DiDCMM and how to record the error rates, now we describe our experiments in detail. We consider six experiments here. In experiments 1–6, we study the influence of the fraction of pure nodes, degree heterogeneity, connectivity across communities, sparsity, phase transition, and network size on performances of these methods, respectively.
Experiment 1 (a): Fraction of pure nodes. Set n r = 200 , n c = 300 , z = 5 , ρ = 1 and P as P 1 . Let n 0 range in { 10 , 20 , 30 , , 100 } . The numerical results are shown in Panel (a) of Figure 4. The results show that as the fraction of pure nodes increases for both row and column communities, all approaches perform better. Meanwhile, DiMSC performs best among all methods in Experiment 1 (a).
Experiment 1 (b): Fraction of pure nodes. All parameters are set the same as Experiment 1 (a) except that we set P as P 2 here. The numerical results are shown in Panel (b) of Figure 4. The results show that all methods perform better as n 0 increases, DiMSC outperforms its competitors, and DiMSC enjoys satisfactory performance even when P has non-unit diagonals.
Experiment 1 (c): Fraction of pure nodes. Set n r = 600 , n c = 900 , z = 5 , ρ = 1 , and P as P 3 . Let n 0 range in { 20 , 40 , 60 , , 200 } . The numerical results are shown in Panel (c) of Figure 4, and we see that all methods perform better when there are more pure nodes and our DiMSC performs best.
Experiment 1 (d): Fraction of pure nodes. All parameters are set the same as Experiment 1 (c) except that we set P as P 4 here. The numerical results are shown in Panel (d) of Figure 4, and the analysis is similar to that of Experiment 1 (b).
Experiment 2 (a): Degree heterogeneity. Set n r = 200 , n c = 300 , n 0 = 80 , ρ = 1 , and P as P 1 . Let z range in { 2 , 3 , 4 , , 12 } . A lager z generates lesser edges. The results are displayed in Panel (a) of Figure 5. The results suggest that the error rates of DiMSC for both row and column nodes tend to increase as z increases. This phenomenon happens because decreasing degree heterogeneities for row nodes lowers the number of edges in the directed network; thus the network becomes harder to be detected for both row and column nodes. Meanwhile, DiMSC outperforms its competitors in this experiment, and it is interesting to see that the error rates of DI-SIM, DiPCA, and DSCORE are almost the same for this experiment.
Experiment 2 (b): Degree heterogeneity. All parameters are set the same as Experiment 2 (a) except that we set P as P 2 here. The results are displayed in Panel (b) of Figure 5, and we see that DiMSC performs satisfactorily when the directed network is not too sparse (i.e., a small z case) even when P has non-unit diagonals. Meanwhile, DiMSC significantly outperforms its competitors in this experiment.
Experiment 2 (c): Degree heterogeneity. Set n r = 600 , n c = 900 , n 0 = 150 , ρ = 1 , and P as P 3 . Let z range in { 2 , 3 , 4 , , 12 } . The results are shown in Panel (c) of Figure 5 and can be analyzed similarly to Experiment 2 (a).
Experiment 2 (d): Degree heterogeneity. All parameters are set the same as Experiment 2 (c) except that we set P as P 4 here. The results are displayed in Panel (d) of Figure 5 and are similar to that of Experiment 2 (b).
Experiment 2 (e): Degree heterogeneity. All parameters are set the same as Experiment 2(a) except that we set n 0 = 0 (so there are no pure nodes in both row and column communities), and all mixed row nodes have two different memberships (0.9, 0.1) and (0.1, 0.9), each with n r K = 100 number of row nodes, and all mixed column nodes also have the above two memberships, each with n c K = 150 number of column nodes. Panel (e) of Figure 5 shows the results, and we see that DiMSC performs satisfactorily for a small z even for the case when there are no pure nodes for both row and column communities. Meanwhile, DiMSC performs better than its competitors when z < 7 , and it perform poorer than its competitors when z 8 for this experiment. Furthermore, compared with numerical results of Experiment 2 (a), we see that DI-SIM, DiPCA, and DSCORE have better performances in Experiment 2 (e). The possible reason is the memberships ( 0.9 , 0.1 ) and ( 0.1 , 0.9 ) are close to ( 1 , 0 ) and ( 0 , 1 ) somewhat.
Experiment 2 (f): Degree heterogeneity. All parameters are set the same as Experiment 2 (b) except that we set Π r and Π c the same as Experiment 2 (e). The results are shown in Panel (f) of Figure 5 and are similar to that of Experiment 2 (e).
Experiment 2 (g): Degree heterogeneity. All parameters are set the same as Experiment 2 (c) except that we set n 0 = 0 , all mixed row nodes have three different memberships (0.8, 0.1, 0.1), (0.1, 0.8, 0.1), and ( 0.1 , 0.1 , 0.8 ) , each with n r K = 200 number of row nodes, and all mixed column nodes also have the above four memberships, each with n c K = 300 number of column nodes. The results are displayed in Panel (g) of Figure 5 and are similar to that of Experiment 2 (e).
Experiment 2 (h): Degree heterogeneity. All parameters are set the same as Experiment 2 (d) except that we set Π r and Π c the same as Experiment 2 (g). The results are shown in Panel (h) of Figure 5 and are similar to that of Experiment 2 (e).
Experiment 3 (a): Connectivity across communities. Set n r = 200 , n c = 300 , n 0 = 80 , z = 5 , ρ = 1 . Set
P = 1 β 1 β 1 1 .
and let β range in { 1 , 1.2 , 1.4 , , 4 } . Decreasing | β 2 | increases the hardness of detecting such directed networks. Note that P ( A ( i , j ) = 1 ) = Ω ( i , j ) = θ r ( i ) Π r ( i , : ) P Π c ( j , : ) gives max i , j Ω ( i , j ) = θ r , max P max should be no larger than 1. Since P max may be larger than one in this experiment, after obtaining θ r , we need to update θ r as θ r / P max . The results are displayed in Panel (a) of Figure 6, and they support the arguments given after Corollary 1 such that DiMSC performs better when | β 2 | increases and vice versa. Meanwhile, our DiMSC outperforms its competitors in this experiment.
Experiment 3 (b): Connectivity across communities. All parameters are set the same as Experiment 3 (a) except that we set
P = 0.8 β 1 β 1 0.9 .
The results are displayed in Panel (b) of Figure 6, and we see that DiMSC performs better when | β 2 | increases even for the case that P has non-unit diagonals.Meanwhile, our DiMSC performs better than its competitors here.
Experiment 3 (c): Connectivity across communities. Set n r = 600 , n c = 900 , n 0 = 150 , z = 5 , ρ = 1 . Set
P = 1 β 1 β 1 β 1 1 β 1 β 1 β 1 1 .
and let β range in { 1 , 1.2 , 1.4 , , 4 } . The results are displayed in Panel (c) of Figure 6 and can be analyzed similarly to Experiment 3 (a).
Experiment 3 (d): Connectivity across communities. All parameters are set the same as Experiment 3(c) except that we set
P = 0.8 β 1 β 1 β 1 0.9 β 1 β 1 β 1 1 .
The results are displayed in Panel (d) of Figure 6 and can be analyzed similarly to Experiment 3 (b).
Experiment 3 (e): Connectivity across communities. All parameters are set the same as Experiment 3(a) except that we let Π r and Π c be the same as that of Experiment 2 (e) (so there are no pure nodes in both row and column communities.). Panel (e) of Figure 6 shows the results, and we see that DiMSC enjoys better performance when | β 2 | increases even in the case that there are no pure nodes for both row and column communities. Meanwhile, all methods have competitive performances for this experiment, and the possible reason that DiMSC’s competitors enjoy better performances here than in Experiment 3 (a) is analyzed in Experiment 2 (e).
Experiment 3 (f): Connectivity across communities. All parameters are set the same as Experiment 3 (b) except that we set Π r and Π c the same as Experiment 2 (e). The results are displayed in Panel (f) of Figure 6 and can be analyzed similarly to Experiment 3 (e).
Experiment 3 (g): Connectivity across communities. All parameters are set the same as Experiment 3 (c) except that we let Π r and Π c be the same as that of Experiment 2 (g) (so there are no pure nodes). Panel (g) of Figure 6 shows the results, and the analysis is similar to that of Experiment 3 (b).
Experiment 3 (h): Connectivity across communities. All parameters are set the same as Experiment 3 (d) except that we set Π r and Π c the same as Experiment 2 (g). Panel (h) of Figure 6 shows the results, and the analysis is similar to that of Experiment 3 (b).
Experiment 4 (a): Sparsity. Set n r = 200 , n c = 300 , n 0 = 80 , z = 5 , and P as P 1 . Let ρ range in { 0.2 , 0.3 , , 1 } . A larger ρ indicates a denser network. Panel (a) in Figure 7 displays the simulation results of this experiment. We see that DiMSC performs better as the simulated directed network becomes denser, and DiMSC significantly outperforms its competitors in this experiment.
Experiment 4 (b): Sparsity. All parameters are set the same as Experiment 4 (a) except that P is set as P 2 . Panel (b) of Figure 7 shows the results, and the analysis is similar to that of Experiment 2 (b).
Experiment 4 (c): Sparsity. Set n r = 600 , n c = 900 , n 0 = 150 , z = 5 , and P as P 3 . Let ρ range in { 0.2 , 0.3 , , 1 } . Panel (c) of Figure 7 shows the results, and the analysis is similar to that of Experiment 4 (a).
Experiment 4 (d): Sparsity. All parameters are set the same as Experiment 4 (c) except that P is set as P 4 . Panel (d) of Figure 7 displays the results, and the analysis is similar to that of Experiment 4 (b).
Experiment 4 (e): Sparsity. All parameters are set the same as Experiment 4 (a) except that we let Π r and Π c be the same as that of Experiment 2 (e). Panel (e) of Figure 7 shows the results, and we see that DiMSC’s error rates decrease for a denser directed network even when all nodes are mixed. Meanwhile, all methods enjoy similar performances in this experiment.
Experiment 4 (f): Sparsity. All parameters are set the same as Experiment 4 (b) except that we set Π r and Π c the same as Experiment 2(e). Panel (f) of Figure 7 shows the results, and the analysis is similar to that of Experiment 4 (e).
Experiment 4 (g): Sparsity. All parameters are set the same as Experiment 4 (c) except that we let Π r and Π c be the same as that of Experiment 2 (g). Panel (g) of Figure 7 shows the results, and the analysis is similar to that of Experiment 4 (e).
Experiment 4 (h): Sparsity. All parameters are set the same as Experiment 4 (d) except that we set Π r and Π c the same as Experiment 2(g). Panel (h) of Figure 7 shows the results, and the analysis is similar to that of Experiment 4 (e).
Experiment 5 (a): Phase transition. Under D i D C M M ( n , K , Π r , Π c , α in , α out ) , set K = 2 , n = n r = n c = 300 . Let each row community have 100 pure nodes, each column community have 120 pure nodes, and all mixed nodes have membership ( 1 / 2 , 1 / 2 ) . Since max ( p in , p out ) = max ( α in , α out ) log ( n ) n 1 , α in and α out should be set in ( 0 , n log ( n ) ] . We let α in and α out be in the range of { 2.5 , 5 , 7.5 , , 50 } . Panel (a) of Figure 8 displays the results. We see that DiMSC performs satisfactorily when α in and α out satisfy Equation (14), and this means that DiMSC achieves the threshold provided in Equation (14) under D i D C M M ( n , K , Π r , Π c , α in , α out ) .
Experiment 5 (b): Phase transition. Under D i D C M M ( n , K , Π r , Π c , α in , α out ) , set K = 3 , n = n r = n c = 300 . Let each row community have 60 pure nodes, each column community have 80 pure nodes, and all mixed nodes have membership ( 1 / 3 , 1 / 3 , 1 / 3 ) . We also let α in and α out be in the range of { 2.5 , 5 , 7.5 , , 50 } . Panel (b) of Figure 8 displays the results, and the analysis is similar to that of Experiment 5 (a).
For Experiments 1–5, we can conclude that DiMSC outperforms its competitors, and this supports our analysis in Remark 6 because DiMSC is designed to estimate mixed memberships, while its competitors are designed for community partition of pure nodes.
Experiment 6: Network size. Under S B M ( n , p in , p out ) , let α in = 2 and α out = 0.0001 . On the one hand, we have α in α out = 2 0.01 < 2 , i.e., α in and α out locates in the impossible region of exact recovery introduced in [50]. On the other hand, we have α in α out α in > 1 , i.e., α in and α out satisfy Equation (14) for DiMSC’s consistent estimation. Let n range in { 1000 , 2000 , 3000 , , 15000 } . For each n in this experiment, we report the averaged error rate and running time of DiMSC over 10 independent repetitions. The results are shown in Figure 9. From Panel (a) of Figure 9, we see that DiMSC enjoys satisfactory performance with a small error rate for this experiment. Panels (b) of Figure 9 says that DiMSC processes computer-generated networks of up to 15,000 nodes within hundreds of seconds.
Remark 7. 
For visuality, we provide some examples of different types of directed networks generated under DiDCMM in this remark. Let θ r ( i ) = 0.9 + i 2 9 n r 2 for 1 i n r . Let each row community has n r , 0 pure nodes, and each column community has n c , 0 pure nodes. Let all mixed nodes have membership ( 1 / K , , 1 / K ) . For the setting of P, we set it as
P a = 0.9 0.05 0.1 0.95 or P b = 0.1 0.95 0.9 0.05 or P c = 12 1 0 12 log ( n r ) n r or P d = 0 12 12 1 log ( n r ) n r or P e = 12 1 0 0 12 0 1 0 12 log ( n r ) n r or P f = 1 0 12 12 0 0 0 12 1 log ( n r ) n r ,
where K = 2 when P is P a , P b , P c or P d , and K = 3 when P is P e or P f . Meanwhile, we can generate different types of directed networks under DiDCMM by considering the above six different settings of P, where these different types are also considered in Experiments 1–6, and we mainly provide the visuality for these directed networks with different structures provided in different P for this remark. Note that we allow P to have non-unit diagonals here because Condition (I1) is mainly for our theoretical buildings, and results for previous experiments show that DiMSC performs stable even when P has non-unit diagonals. We consider below eight settings.
Model Setup 1: Set n r = 16 , n r , 0 = 6 , n c = 16 , n c , 0 = 7 , and P as P a . For this setup, a directed network with 16 row nodes and 16 column nodes is generated from DiDCMM. Figure 10 shows a directed network N generated under Model Setup 1, where we also report DiMSC’s error rate. Figure 10 says that there are more directed edges sent from row nodes 1–6 to column nodes 1–7 than from row nodes 7–12 to column nodes 1–7 for P a . With given adjacency matrix A and known memberships Π r and Π c for this setup, readers can apply our DiMSC directly to A given in Panel (a) of Figure 10 to check the effectiveness of DiMSC.
Model Setup 2: All settings are the same as Model Setup 1 except that we let P be P b . The directed network N and its adjacency matrix are shown in Figure 11. We see that there are more directed edges sent from row nodes 1–6 to column nodes 10–16 than from row nodes 7–12 to column nodes 10–16 for P b , which means that directed network generated using P b and directed network from P a has different structures.
Model Setup 3: Set n r = 32 , n r , 0 = 14 , n c = 28 , n c , 0 = 12 , and P as P a . For this setup, a bipartite network with 32 row nodes and 28 column nodes are generated from DiDCMM. Figure 12 shows this bipartite network and its adjacency matrix.
Model Setup 4: All settings are the same as Model Setup 3 except that we let P be P b . Figure 13 displays the results, and we see that the bipartite network from P b also has a different structure compared with the one generated from using P a under DiDCMM.
Model Setup 5: Set n r = 100 , n r , 0 = 48 , n c = 100 , n c , 0 = 45 , and P as P c . Figure 14 shows the row and column communities for a directed network generated from Setup 5 under DiDCMM, where we plot the directed network directly.
Model Setup 6: All settings are the same as Model Setup 5 except that we let P be P d . Figure 15 shows a directed network obtained from this setup, and we see that the structure of the directed network from P d in Figure 15 differs a lot from that of the directed network from P c shown in Figure 14.
Model Setup 7: Set n r = 100 , n r , 0 = 30 , n c = 100 , n c , 0 = 32 , and P as P e . Figure 16 shows a directed network generated from this setup.
Model Setup 8: All settings are the same as Model Setup 7 except that we let P be P f . Figure 17 displays a directed network generated from this setup, and we see that directed networks from P f and P e have different structures by comparing Figure 16 and Figure 17.

6. Application to Real-World Directed Networks

For the empirical directed networks considered here, row nodes are always the same as column nodes, so we let n r = n c = n . For Π ^ r , we call node i highly mixed node if 0.8 max 1 k K Π ^ r ( i , k ) , similar for Π ^ c . A highly mixed node tells us whether a node has mixed memberships and belongs to multiple communities. Let τ r = | i : 0.8 max 1 k K Π ^ r ( i , k ) | n be the proportion of highly mixed nodes among all nodes to measure the mixability of all row communities. Define τ c similar to τ r . Let ^ r be a vector such that ^ r ( i ) = argmax 1 k K Π ^ r ( i , k ) for 1 i n , where we use ^ r ( i ) to denote the home base row community of node i. Define ^ c similar to ^ r . To measure the asymmetric structure of a directed network, we use
Hamm r c = min P S P Π ^ c P Π ^ r 1 n ,
where a large Hamm r c means that the structure of row clusters differs a lot from that of column clusters. For 1 i n , let d r ( i ) = j = 1 n A ( i , j ) be the number of edges sent by node i, d c ( i ) = j = 1 n A ( j , i ) be the number of edges received by node i, where d r ( i ) (and d c ( i ) ) is the out degree (in degree) of node i. Since there are many nodes with zero in degree or out degree for real-world directed network, we need the below pre-processing: for any directed network N , we let A m be its adjacency matrix for any positive integer m such that A m is connected, and every node has at least m in degree and m out degree in A m .
We apply DiMSC to the following real-world directed networks to discover their mixability, asymmetries, and directional communities.
Political blogs: This is a directed network of hyperlinks between weblogs on US politics [68]. In this data, node means a blog, and edge means a hyperlink. This data can be downloaded from http://www-personal.umich.edu/~mejn/netdata/ (accessed on 28 August 2022). It is well-known that there are two parties, “liberal” and “conservative”, so K = 2 for this data. The are 1490 nodes in the original data. After pre-processing, A 1 { 0 , 1 } 813 × 813 , A 3 { 0 , 1 } 495 × 495 , A 6 { 0 , 1 } 285 × 285 , A 9 { 0 , 1 } 158 × 158 , where we focus on the cases when m = 1 , 3 , 6 , 9 for this data here. Meanwhile, we use political blogs A m to denote this network when its adjacency matrix is A m , where every node has a degree at least m. Similar notations hold for other real-world directed networks used in this paper.
Wikipedia links (gan): This directed network consists of the Wikilinks of Wikipedia in the Gan Chinese language (gan). In this data, node means an article, and the directed edge is a Wikilink [69]. This data can be downloaded from http://konect.cc/networks/wikipedia_link_gan (accessed on 28 August 2022). There are 9189 nodes in the original data. After pre-processing, A 1 { 0 , 1 } 6012 × 6012 , A 30 { 0 , 1 } 820 × 820 , A 60 { 0 , 1 } 559 × 569 , A 90 { 0 , 1 } 240 × 240 , where we study the cases m = 1 , 30 , 60 , 90 for this data. The leading 20 singular values of A 1 , A 30 , A 60 , A 90 shown in Panels (e)–(h) of Figure 18 suggest K = 2 for these four adjacency matrices, where [30] also uses eigengap to estimate K.
Wikipedia links (nah): This network consists of the Wikilinks of the N a ¯ huatl language (nah) [69] and can be downloaded from http://konect.cc/networks/wikipedia_link_nah/ (accessed on 28 August 2022). The original data has 10285 nodes. After pre-processing, A 1 { 0 , 1 } 6924 × 6924 , A 20 { 0 , 1 } 1057 × 1057 , A 30 { 0 , 1 } 486 × 486 , A 40 { 0 , 1 } 136 × 136 . Panel (i) of Figure 18 suggests K = 4 for A 1 , and Panels (j)–(l) of Figure 18 suggest K = 2 for A 20 , A 30 , and A 40 . Note that it only takes around 4 seconds for DiMSC to estimate memberships of Wikipedia links (nah) A 1 .
The proportions of highly mixed nodes and Hamm r c when applying DiMSC on the above real-world directed networks are reported in Table 1. For the political blogs network, small τ r , τ c , and Hamm r c indicate that there are only a few highly mixed nodes, and the structure of row communities is similar to that of column communities, i.e., there is a slight asymmetry for this data. For Wikipedia links (gan) A 1 and Wikipedia links (nah) A 1 , they have a large proposition of highly mixed nodes in both row and column communities, and the row communities differ a lot from column communities, suggesting heavy asymmetric structure between row and column communities for these two data. For Wikipedia links (gan) A 30 , A 60 , and Wikipedia links (nah) A 20 , we see that the proportion of highly mixed nodes for row (column) communities is small (large), and there is a slight asymmetric for these data. For Wikipedia links (gan) A 90 and Wikipedia links (nah) A 30 , A 40 , there is no highly mixed node, and the structure of row clusters is similar to that of column clusters. For visualization, we plot the row and column communities as well as highly mixed nodes by applying DiMSC to some of these directed networks in Figure 19 and Figure 20.

7. Discussion and Conclusions

In this paper, we propose a novel directed degree corrected mixed membership (DiDCMM) model. DiDCMM models a directed network with mixed memberships for row nodes with degree heterogeneities and column nodes without degree heterogeneities. DiDCMM is identifiable when the two well-used Conditions (I1) and (I2) hold. It should be mentioned that a model modeling a directed network with mixed memberships for both row and column nodes with degree heterogeneities is unidentifiable unless considering some nontrivial conditions. To fit the model, we propose a provably consistent spectral algorithm called DiMSC to infer community memberships for both row and column nodes in a directed network generated by DiDCMM. DiMSC is designed based on the SVD of the adjacency matrix, where we apply the SP algorithm to hunt for the corners in the simplex structure and the SVM-cone algorithm to hunt for the corners in the cone structure. The theoretical results of DiMSC show that it consistently recovers memberships of both row nodes and column nodes under mild conditions. Meanwhile, when DiDCMM degenerates to MMSB, our theoretical results match that of Theorem 2.2 [24] when their DCMM degenerates to MMSB under mild conditions. Experiments conducted on synthetic directed networks generated from DiDCMM verify the effectiveness and the stability of Conditions (I1) and (I2) of DiMSC. Results for real-world directed networks show that DiMSC reveals highly mixed nodes and asymmetries in the structure of row and column communities. The model DiDCMM and the algorithm DiMSC developed in this paper are useful to discover asymmetry for a directed network with mixed memberships. DiDCMM can also generate an artificially directed network with mixed memberships as a benchmark directed network for research purposes. We wish that DiDCMM and DiMSC can be widely applied in social network analysis.
The proposed model DiDCMM and the algorithm DiMSC can be extended in many ways. Similar to [24,57], we may obtain an ideal simplex from U using the idea of the entry-wise ratio proposed in [8]. Meanwhile, DiMSC is designed based on the SVD of the adjacency matrix, and similar to [5,7,11,30], we may design spectral algorithms based on the regularized Laplacian matrix under DiDCMM. Extending DiDCMM from an un-weighted directed network to a weighted directed network with an application of the distribution-free idea introduced in [62] is one of our future research directions. The SVD step of DiMSC can be accelerated by the random projection and random sampling ideas introduced in [70] to process large-scale directed networks. Instead of simply using eigengap to find K, in our future work, it is worth focusing on estimating the number of communities in a directed network generated under ScBM (and DCScBM) [30] and DiDCMM. Ref. [46] proposes an algorithm to uncover boundary nodes that spread information between communities in undirected social networks. It is an interesting topic to extend works in [46] to directed networks generated from ScBM, DCScBM, and DiDCMM. We leave them for our future work.

Funding

This research was funded by the Scientific research start-up fund of CUMT NO. 102520253, the High-level personal project of Jiangsu Province NO. JSSCBS20211218.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

    The following abbreviations are used in this manuscript:
SBMStochastic Blockmodel
DCSBMDegree Corrected Stochastic Blockmodel
MMSBMixed Membership Stochastic Blockmodel
DCMMDegree Corrected Mixed Membership model
OCCAMOverlapping Continuous Community Assignment model
ScBMStochastic co-Blockmodel
DC-ScBMDegree Corrected Stochastic co-Blockmodel
DiMMSBDirected Mixed Membership Stochastic Blockmodel
DiDCMMDirected Degree Corrected Mixed Membership model
SPSuccessive projection algorithm
SVDSingular value decomposition
DiMSCDirected Mixed Simplex & Cone algorithm

Appendix A. Proof for Identifiability

Appendix A.1. Proof of Proposition 1

Proof. 
Let Ω = U Λ V be the compact singular value decomposition of Ω . Lemma 1 gives V = Π c B c Π c V ( I c , : ) . Since Ω = Ω ˜ , V also equals to Π ˜ c V ( I c , : ) , which gives that Π c = Π ˜ c .
Since Ω ( I r , I c ) = Θ r ( I r , I r ) Π r ( I r , : ) P Π c ( I c , : ) = Θ r ( I r , I r ) P = U ( I r , : ) Λ V ( I c , : ) by Condition (I2), we have Θ r ( I r , I r ) P = U ( I r , : ) Λ V ( I c , : ) , which gives that Θ r ( I r , I r ) = diag ( U ( I r , : ) Λ V ( I c , : ) ) . From this step, we see that if P’s diagonal entries are not ones, we cannot obtain Θ r ( I r , I r ) = diag ( U ( I r , : ) Λ V ( I c , : ) ) which leads to a consequence that Θ r ( I r , I r ) does not equal to Θ ˜ r ( I r , I r ) ; hence Condition (I1) is necessary by Condition (I1). Since Ω = Ω ˜ , we also have Θ ˜ r ( I r , I r ) = diag ( U ( I r , : ) Λ V ( I c , : ) ) , which gives that Θ r ( I r , I r ) = Θ ˜ r ( I r , I r ) . Since Θ ˜ r ( I r , I r ) P ˜ also equals to U ( I r , : ) Λ V ( I c , : ) , we have P = P ˜ .
Lemma 1 gives that U = Θ r Π r B r , where B r = Θ r 1 ( I r , I r ) U ( I r , : ) . Since Ω = Ω ˜ , we also have U = Θ ˜ r Π ˜ r B ˜ r . Since B ˜ r = Θ ˜ r 1 ( I r , I r ) U ( I r , : ) = Θ r 1 ( I r , I r ) U ( I r , : ) , we have B ˜ r = B r . Since U = Θ r Π r B r = Θ ˜ r Π ˜ r B ˜ r = Θ ˜ r Π ˜ r B r , we have Θ r Π r = Θ ˜ r Π ˜ r . Since each row of Π r or Π ˜ r is a PMF, Θ r = Θ ˜ r , Π r = Π ˜ r , and the claim follows.    □

Appendix B. Ideal Simplex, Ideal Cone

Appendix B.1. Proof of Lemma 1

Proof. 
First, we consider U and V. Since Ω = U Λ V , we have U = Ω V Λ 1 since V V = I K . Recall that Ω = Θ r Π r P Π c , we have U = Θ r Π r P Π c V Λ 1 = Θ r Π r B r , where we set B r = P Π c V Λ 1 and sure it is unique. Since U ( I r , : ) = Θ r ( I r , I r ) Π r ( I r , : ) B r = Θ r ( I r , I r ) B r , we have B r = Θ r 1 ( I r , I r ) U ( I r , : ) .
Similarly, since Ω = U Λ V , we have V = Λ 1 U Ω since U U = I K , hence V = Ω U Λ 1 . Recall that Ω = Θ r Π r P Π c , we have V = ( Θ r Π r P Π c ) U Λ 1 = Π c P Π r Θ r U Λ 1 = Π c B c , where we set B c = P Π r Θ r U Λ 1 and sure it is unique. Since V ( I c , : ) = Π c ( I c , : ) B c = B c , we have B c = V ( I c , : ) . Meanwhile, for 1 j n c , we have V ( j , : ) = e j Π c B c = Π c ( j , : ) B c . Hence, we have V ( j , : ) = V ( j ¯ , : ) as long as Π c ( j , : ) = Π c ( j ¯ , : ) .
Now, we show the ideal cone structure that appears in U * . For convenience, set M = Π r B r , hence U = Θ r Π r B r gives U = Θ r M . Hence, we have U ( i , : ) = e i U = Θ r ( i , i ) M ( i , : ) . Therefore, U * ( i , : ) = U ( i , : ) U ( i , : ) F = M ( i , : ) M ( i , : ) F , combine it with the fact that B r = Θ r 1 ( I r , I r ) U ( I r , : ) , we have
U * = 1 M ( 1 , : ) F 1 M ( 2 , : ) F 1 M ( n r , : ) F Π r B r = Π r ( 1 , : ) / M ( 1 , : ) F Π r ( 2 , : ) / M ( 2 , : ) F Π r ( n r , : ) / M ( n r , : ) F Θ r 1 ( I r , I r ) N U 1 ( I r , I r ) U * ( I r , : ) .
Therefore, we have
Y = Π r ( 1 , : ) / M ( 1 , : ) F Π r ( 2 , : ) / M ( 2 , : ) F Π r ( n r , : ) / M ( n r , : ) F Θ r 1 ( I r , I r ) N U 1 ( I r , I r ) = N M Π r Θ r 1 ( I r , I r ) N U 1 ( I r , I r ) ,
where N M is a diagonal matrix with N M ( i , i ) = 1 M ( i , : ) F for 1 i n r . All entries of Y are nonnegative, and since we assume that each community has at least one pure node, no row of Y is 0.
Then, we prove that U * ( i , : ) = U * ( i ¯ , : ) when Π r ( i , : ) = Π r ( i ¯ , : ) . For 1 i n r , we have
U * ( i , : ) = e i U * = 1 M ( i , : ) F e i M = 1 Π r ( i , : ) B r F Π r ( i , : ) B r ,
and the claim follows immediately.    □

Appendix B.2. Proof of Lemma 2

Proof. 
Since I = U U = B r Π r Θ r 2 Π r B r = U ( I r , : ) Θ r 1 ( I r , I r ) Π r Θ r 2 Π r Θ 1 ( I r , I r ) U ( I r , : ) and rank ( U ( I r , : ) ) = K (i.e., the inverse of U ( I r , : ) exists), we have ( U ( I r , : ) U ( I r , : ) ) 1 = Θ r 1 ( I r , I r ) Π r Θ r 2 Π r Θ r 1 ( I r , I r ) . Since U * ( I r , : ) = N U ( I r , I r ) U ( I r , : ) , we have
( U * ( I r , : ) U * ( I r , : ) ) 1 = N U 1 ( I r , I r ) Θ 1 ( I r , I r ) Π r Θ r 2 Π r Θ r 1 ( I r , I r ) N U 1 ( I r , I r ) .
Since all entries of N U 1 ( I r , I r ) , Π r , Θ r and nonnegative and N , Θ r are diagonal matrices, we see that all entries of ( U * ( I r , : ) U * ( I r , : ) ) 1 are nonnegative, and its diagonal entries are strictly positive, hence we have ( U * ( I r , : ) U * ( I r , : ) ) 1 1 > 0 .    □

Appendix B.3. Proof of Theorem 1

Proof. 
For column nodes, Remark A1 guarantees that SP algorithm returns I c when the input is V with K column communities, hence ideal DiMSC recovers Π c exactly. For row nodes, Remark A2 guarantees that SVM-cone algorithm returns I r when the input is U * with K row communities, hence ideal DiMSC recovers Π r exactly, and this theorem follows.    □

Appendix C. Equivalence Algorithm

In this subsection, we design one algorithm DiMSC-equivalence which returns the same estimations as DiMSC. Set U 2 = U U R n r × n r , U ^ 2 = U ^ U ^ R n r × n r , V 2 = V V R n c × n c , V ^ 2 = V ^ V ^ R n c × n c . Set U * , 2 R n r × n r as U * , 2 ( i , : ) = U 2 ( i , : ) U 2 ( i , : ) F for 1 i n r . U ^ * , 2 is defined similarly. The next lemma guarantees that V 2 enjoys IS structure, and U * , 2 enjoys IC structure.
Lemma A1. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , we have V 2 = Π c V 2 ( I c , : ) , and U * , 2 = Y U * , 2 ( I r , : ) .
Proof. 
By Lemma 1, we know that V = Π c V ( I c , : ) , which gives that V 2 = V V = Π c V ( I c , : ) V = Π c ( V V ) ( I c , : ) = Π c V 2 ( I c , : ) . For U, since U = Θ r Π r Θ r 1 ( I r , I r ) U ( I r , : ) by Lemma 1, we have U 2 = U U = Θ r Π r Θ r 1 ( I r , I r ) U ( I r , : ) U = Θ r Π r Θ r 1 ( I r , I r ) ( U U ) ( I r , : ) = Θ r Π r Θ r 1 ( I r , I r ) U 2 ( I r , : ) . Set M 2 = Π r Θ r 1 ( I r , I r ) U 2 ( I r , : ) , we have U 2 = Θ r M 2 . Then, follow similar proof as Lemma 1, we have U * , 2 = Y 2 U * , 2 ( I r , : ) , where Y 2 = N M 2 Π r Θ r 1 ( I r , I r ) N U 2 1 ( I r , I r ) , and N M 2 , N U 2 are n r × n r diagonal matrices whose i-th diagonal entries are 1 M 2 ( i , : ) F , 1 U 2 ( i , : ) F , respectively. Since U 2 ( i , : ) F = U ( i , : ) U F = U ( i , : ) F , we have N U 2 = N U . Since M 2 ( i , : ) | F = Π r Θ r 1 ( I r , I r ) U 2 ( I r , : ) F = Π r Θ r 1 ( I r , I r ) U ( I r , : ) U F = M ( i , : ) F , we have N M 2 = N M . Hence, Y 2 Y and the claim follows.    □
Since U * , 2 ( I r , : ) R K × n r and V 2 ( I c , : ) R K × n c , U * , 2 ( I r , : ) and V 2 ( I c , : ) are singular matrix with rank K by Condition (I1), while the inverses of U * , 2 ( I r , : ) U * , 2 ( I r , : ) and V 2 ( I c , : ) V 2 ( I c , : ) exist. Therefore, Lemma A1 gives that
Y = U * , 2 U * , 2 ( I r , : ) ( U * , 2 ( I r , : ) U * , 2 ( I r , : ) ) 1 , Π c = V 2 V 2 ( I c , : ) ( V 2 ( I c , : ) V 2 ( I c , : ) ) 1 .
Since U * , 2 = N U U 2 and Y = N M Π r Θ r 1 ( , I r , I r ) N U 1 ( I r , I r ) , we see that Y * also equals to U 2 U * , 2 ( I r , : ) ( U * , 2 ( I r , : ) U * , 2 ( I r , : ) ) 1 by basic algebra.
Based on the above analysis, we are now ready to give the ideal DiMSC-equivalence. Input Ω . Output: Π r and Π c .
  • Obtain U , Λ , V , U * , 2 , V 2 from Ω .
  • Run SP algorithm on V 2 with K column communities to obtain V 2 ( I c , : ) . Run SVM-cone algorithm on U * , 2 with K row communities to obtain I r .
  • Set J * = diag ( U * ( I r , : ) Λ V ( I c , : ) ) , Y * = U 2 U * , 2 ( I r , : ) ( U * , 2 ( I r , : ) U * , 2 ( I r , : ) ) 1 , Z r = Y * J * and Z c = V 2 V 2 ( I c , : ) ( V 2 ( I c , : ) V 2 ( I c , : ) ) 1 .
  • Recover Π r and Π c by setting Π r ( i , : ) = Z r ( i , : ) Z r ( i , : ) 1 for 1 i n r , and Π c ( j , : ) = Z c ( j , : ) Z c ( j , : ) 1 for 1 j n c .
For the real case, set U ^ 2 = U ^ U ^ , V ^ 2 = V ^ V ^ , U ^ * , 2 = N U ^ U ^ 2 . We now extend the ideal case to the real one given by Algorithm A1.
Algorithm A1: DiMSC-equivalence
Require:  The adjacency matrix A R n r × n c of a directed network, the number of row communities (column communities) K.
Ensure:  The estimated n r × K row membership matrix Π ^ r , 2 and the estimated n c × K column membership matrix Π ^ c , 2 .
1:
Obtain A ˜ = U ^ Λ ^ V ^ , the top-K-dimensional SVD of A. Compute U ^ * , U ^ 2 , V ^ 2 , U ^ * , 2 .
2:
Apply SP algorithm on the rows of V ^ 2 assuming there are K column communities to obtain I ^ c , 2 , the index set returned by SP algorithm.
3:
    Apply SVM-cone algorithm on the rows of U ^ * , 2 with K row communities to obtain I ^ r , 2 , the index set returned by SVM-cone algorithm.
4:
Set J ^ * , 2 = diag ( U ^ * ( I ^ r , 2 , : ) Λ ^ V ^ ( I ^ c , 2 , : ) ) , Y ^ * , 2 = U ^ 2 U ^ * , 2 ( I ^ r , 2 , : ) ( U ^ * , 2 ( I ^ r , 2 , : ) U ^ * , 2 ( I ^ r , 2 , : ) ) 1 , Z ^ r , 2 = Y ^ * , 2 J ^ * , 2 and Z ^ c , 2 = V ^ 2 V ^ 2 ( I ^ c , 2 , : ) ( V ^ 2 ( I ^ c , 2 , : ) V ^ 2 ( I ^ c , 2 , : ) ) 1 . Then, set Z ^ r , 2 = max ( 0 , Z ^ r , 2 ) and Z ^ c , 2 = max ( 0 , Z ^ c , 2 ) .
5:
Estimate Π r ( i , : ) by Π ^ r , 2 ( i , : ) = Z ^ r , 2 ( i , : ) / Z ^ r , 2 ( i , : ) 1 , 1 i n r and estimate Π c ( j , : ) by Π ^ c , 2 ( j , : ) = Z ^ c , 2 ( j , : ) / Z ^ c , 2 ( j , : ) 1 , 1 j n c .
Lemma A2.(Equivalence). For the empirical case, we have I ^ r , 2 I ^ r , I ^ c , 2 I ^ c , U ^ * , 2 ( I ^ r , 2 , : ) U ^ * , 2 ( I ^ r , 2 , : ) U ^ * ( I ^ r , : ) U ^ * ( I ^ r , : ) , Y ^ * , 2 Y ^ * , J ^ * , 2 J ^ * , Z ^ r , 2 Z ^ r , Z ^ c , 2 Z ^ c , Π ^ r , 2 Π ^ r and Π ^ c , 2 Π ^ c .
Proof. 
For column nodes, Lemma 3.2 [27] gives I ^ c = I ^ c , 2 (i.e., SP algorithm will return the same indices on both V ^ and V ^ 2 .), which gives that V ^ 2 V ^ 2 ( I ^ c , 2 , : ) = V ^ 2 V ^ 2 ( I ^ c , : ) = V ^ V ^ ( ( V ^ V ^ ) ( I ^ c , : ) ) = V ^ V ^ ( V ^ ( I ^ c , : ) V ^ ) = V ^ V ^ V ^ V ^ ( I ^ c , : ) = V ^ V ^ ( I ^ c , : ) , and V ^ 2 ( I ^ c , 2 , : ) V ^ 2 ( I ^ c , 2 , : ) = V ^ 2 ( I ^ c , : ) V ^ 2 ( I ^ c , : ) = V ^ ( I ^ c , : ) V ^ ( V ^ ( I ^ c , : ) V ^ ) = V ^ ( I ^ c , : ) V ^ ( I ^ c , : ) . Therefore, we have Z ^ c , 2 = Z ^ c , Π ^ c , 2 = Π ^ c .
For row nodes, Lemma G.1 [26] guarantees that I ^ r = I ^ r , 2 (i.e., SVM-cone algorithm will return the same indices on both U ^ * and U ^ * , 2 .), so immediately we have J ^ * , 2 = J ^ * . Since U ^ * , 2 ( I ^ r , 2 , : ) = U ^ * , 2 ( I ^ r , : ) = N U ^ ( I ^ r , I ^ r ) U ^ 2 ( I ^ r , : ) = N U ^ ( I ^ r , I ^ r ) U ^ ( I ^ r , : ) U ^ = U ^ * ( I ^ r , : ) U ^ , we have U ^ 2 U ^ * , 2 ( I ^ r , 2 , : ) = U ^ 2 U ^ * , 2 ( I ^ r , : ) = U ^ U ^ U ^ U ^ * ( I ^ r , : ) = U ^ U ^ * ( I ^ r , : ) and ( U ^ * , 2 ( I ^ r , 2 , : ) U ^ * , 2 ( I ^ r , 2 , : ) ) 1 = ( U ^ * , 2 ( I ^ r , : ) U ^ * , 2 ( I ^ r , : ) ) 1 = ( U ^ * ( I ^ r , : ) U ^ * ( I ^ r , : ) ) 1 , which give that Y ^ * , 2 = Y ^ * , and the claim follows immediately.    □
Lemma A2 guarantees that the DiMSC and DiMSC-equivalence return same estimations for both row and column nodes’s memberships. In this article, we introduce the DiMSC-equivalence algorithm since it is helpful to build a theoretical framework for DiMSC, see Remark A3 and A4 for detail.

Appendix D. Basic Properties of Ω

Lemma A3. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , we have
θ r , min θ r , max K λ 1 ( Π r Π r ) U ( i , : ) F θ r , max θ r , min λ K ( Π r Π r ) , 1 i n r , 1 K λ 1 ( Π c Π c ) V ( j , : ) F 1 λ K ( Π c Π c ) , 1 j n c .
Proof. 
Since I = U U = U ( I r , : ) Θ r 1 ( I r , I r ) Π r Θ r 2 Π r Θ 1 ( I r , I r ) U ( I r , : ) , we have
( ( Θ r 1 ( I r , I r ) U ( I r , : ) ) ( ( Θ r 1 ( I r , I r ) U ( I r , : ) ) ) 1 = Π r Θ r 2 Π r ,
which gives that
max k e k ( Θ r 1 ( I r , I r ) U ( I r , : ) ) F 2 = max k e k ( Θ r 1 ( I r , I r ) U ( I r , : ) ) ( Θ r 1 ( I r , I r ) U ( I r , : ) ) e k max x F = 1 x ( Θ r 1 ( I r , I r ) U ( I r , : ) ) ( Θ r 1 ( I r , I r ) U ( I r , : ) ) x = λ 1 ( ( Θ r 1 ( I r , I r ) U ( I r , : ) ) ( Θ r 1 ( I r , I r ) U ( I r , : ) ) ) = 1 λ K ( Π r Θ r 2 Π r ) 1 θ r , min 2 λ K ( Π r Π r ) .
Similarly, we have
min k e k ( Θ r 1 ( I r , I r ) U ( I r , : ) ) F 2 1 λ 1 ( Π r Θ r 2 Π r ) 1 θ r , max 2 λ 1 ( Π r Π r ) .
Since U ( i , : ) = e i U = e i Θ r Π r Θ r 1 ( I r , I r ) U ( I r , : ) = θ r ( i ) Π r ( i , : ) Θ r 1 ( I r , I r ) U ( I r , : ) for 1 i n r , we have
U ( i , : ) F = θ r ( i ) Π r ( i , : ) Θ r 1 ( I r , I r ) U ( I r , : ) F = θ r ( i ) Π r ( i , : ) Θ r 1 ( I r , I r ) U ( I r , : ) F θ r ( i ) max i Π r ( i , : ) F max i e i ( Θ r 1 ( I r , I r ) U ( I r , : ) ) F θ r ( i ) max i e i ( Θ r 1 ( I r , I r ) U ( I r , : ) ) F θ r , max θ r , min λ K ( Π r Π r ) .
Similarly, we have
U ( i , : ) F θ r ( i ) min i Π r ( i , : ) F min i e i ( Θ r 1 ( I r , I r ) U ( I r , : ) ) F θ r ( i ) min i e i ( Θ r 1 ( I r , I r ) U ( I r , : ) ) F / K θ r , min θ r , max K λ 1 ( Π r Π r ) .
For V ( j , : ) F , since V = Π c B c , we have
min j e j V F 2 = min j e j V V e j = min j Π c ( j , : ) B c B c Π c ( j , : ) = min j Π c ( j , : ) F 2 Π c ( j , : ) Π c ( j , : ) F B c B c Π c ( j , : ) Π c ( j , : ) F min j Π c ( j , : ) F 2 min x F = 1 x B c B c x = min j Π c ( j , : ) F 2 λ K ( B c B c ) = By Lemma A 4 min j Π c ( j , : ) F 2 λ 1 ( Π c Π c ) 1 K λ 1 ( Π c Π c ) .
Meanwhile,
max j e j V F 2 = max j Π c ( j , : ) F 2 Π c ( j , : ) Π c ( j , : ) F B c B c Π c ( j , : ) Π c ( j , : ) F max j Π c ( j , : ) F 2 max x F = 1 x B c B c x = max j Π c ( j , : ) F 2 λ K ( B c B c ) = By Lemma A 4 max j Π c ( j , : ) F 2 λ K ( Π c Π c ) 1 λ K ( Π c Π c ) .
   □
Lemma A4. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , we have
θ r , min 2 λ K ( Π r Π r ) θ r , max 2 λ 1 ( Π r Π r ) λ K ( U * ( I r , : ) U * ( I r , : ) ) , λ 1 ( U * ( I r , : ) U * ( I r , : ) ) θ r , max 2 K λ 1 ( Π r Π r ) θ r , min 2 λ K ( Π r Π r ) , and λ 1 ( B c B c ) = 1 λ K ( Π c Π c ) , λ K ( B c B c ) = 1 λ 1 ( Π c Π c ) .
Proof. 
Recall that V = Π c B c and V V = I , we have I = B c Π c Π c B c . As B c is full rank, we have Π c Π c = ( B c B c ) 1 , which gives
λ 1 ( B c B c ) = 1 λ K ( Π c Π c ) , λ K ( B c B c ) = 1 λ 1 ( Π c Π c ) .
By the proof of Lemma 2, we know that
( U * ( I r , : ) U * ( I r , : ) ) 1 = N U 1 ( I r , I r ) Θ 1 ( I r , I r ) Π r Θ r 2 Π r Θ r 1 ( I r , I r ) N U 1 ( I r , I r ) ,
which gives that
U * ( I r , : ) U * ( I r , : ) = N U ( I r , I r ) Θ ( I r , I r ) ( Π r Θ r 2 Π r ) 1 Θ r ( I r , I r ) N U ( I r , I r ) .
Then, we have
λ 1 ( U * ( I r , : ) U * ( I r , : ) ) = λ 1 ( N U ( I r , I r ) Θ ( I r , I r ) ( Π r Θ r 2 Π r ) 1 Θ r ( I r , I r ) N U ( I r , I r ) ) = λ 1 ( N U 2 ( I r , I r ) Θ r 2 ( I r , I r ) ( Π r Θ r 2 Π r ) 1 ) λ 1 2 ( N U ( I r , I r ) Θ r ( I r , I r ) ) λ 1 ( ( Π r Θ r 2 Π r ) 1 ) = λ 1 2 ( N U ( I r , I r ) Θ r ( I r , I r ) ) / λ K ( Π r Θ r 2 Π r ) ( max i I r θ r ( i ) / U ( i , : ) F ) 2 / λ K ( Π r Θ r 2 Π r ) θ r , max 2 K λ 1 ( Π r Π r ) λ K ( Π r Θ r 2 Π r ) θ r , max 2 K λ 1 ( Π r Π r ) θ r , min 2 λ K ( Π r Π r ) .
Similarly, we have
λ K ( U * ( I r , : ) U * ( I r , : ) ) = λ K ( N U ( I r , I r ) Θ ( I r , I r ) ( Π r Θ r 2 Π r ) 1 Θ r ( I r , I r ) N U ( I r , I r ) ) = λ K ( N U 2 ( I r , I r ) Θ r 2 ( I r , I r ) ( Π r Θ r 2 Π r ) 1 ) λ K 2 ( N U ( I r , I r ) Θ r ( I r , I r ) ) λ K ( ( Π r Θ r 2 Π r ) 1 ) = λ K 2 ( N U ( I r , I r ) Θ r ( I r , I r ) ) / λ 1 ( Π r Θ r 2 Π r ) ( min i I r θ r ( i ) / U ( i , : ) F ) 2 / λ 1 ( Π r Θ r 2 Π r ) θ r , min 2 λ K ( Π r Π r ) λ 1 ( Π r Θ r 2 Π r ) θ r , min 2 λ K ( Π r Π r ) θ r , max 2 λ 1 ( Π r Π r ) .
   □
Lemma A5. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , we have
σ K ( Ω ) θ r , min σ K ( P ) σ K ( Π r ) σ K ( Π c ) , σ 1 ( Ω ) θ r , max σ 1 ( P ) σ 1 ( Π r ) σ 1 ( Π c ) .
Proof. 
For σ K ( Ω ) , we have
σ K 2 ( Ω ) = λ K ( Ω Ω ) = λ K ( Θ r Π r P Π c Π c P Π r Θ r ) = λ K ( Θ r 2 Π r P Π c Π c P Π r ) θ r , min 2 λ K ( Π r Π r P Π c Π c P ) θ r , min 2 λ K ( Π r Π r ) λ K ( P Π c Π c P ) = θ r , min 2 λ K ( Π r Π r ) λ K ( Π c Π c P P ) θ r , min 2 λ K ( Π r Π r ) λ K ( Π c Π c ) λ K ( P P ) = θ r , min 2 σ K 2 ( Π r ) σ K 2 ( Π c ) σ K 2 ( P ) ,
where we have used the fact for any matrices X , Y , the nonzero eigenvalues of X Y are the same as the nonzero eigenvalues of Y X . Following a similar analysis, the lemma follows.    □
Lemma A6. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , when Assumption A1 holds, with probability at least 1 o ( ( n r + n c ) 3 ) , we have
A Ω = O ( P max max ( θ r 1 , θ r , max n c ) log ( n r + n c ) ) .
Proof. 
Since the proof is similar to that of Lemma 7 [35], we omit most of the details. Let e i be an n r × 1 vector, where e i ( i ) = 1 and 0 elsewhere, for row nodes 1 i n r , and e ˜ j be an n c × 1 vector, where e ˜ j ( j ) = 1 and 0 elsewhere, for column nodes 1 j n c . Set W = i = 1 n r j = 1 n c W ( i , j ) e i e ˜ j , where W = A Ω . Set W ( i , j ) = W ( i , j ) e i e ˜ j , for 1 i n r , 1 j n c . Then, we have E ( W ( i , j ) ) = 0 . For 1 i n r , 1 j n c , we have
W ( i , j ) = W ( i , j ) e i e ˜ j = | A ( i , j ) Ω ( i , j ) | 1 .
Next, we consider the variance parameter
σ 2 : = max ( i = 1 n r j = 1 n c E ( W ( i , j ) ( W ( i , j ) ) ) , i = 1 n r j = 1 n c E ( ( W ( i , j ) ) W ( i , j ) ) ) .
Since
E ( W 2 ( i , j ) ) = E ( ( A ( i , j ) Ω ( i , j ) ) 2 ) = Var ( A ( i , j ) ) ,
where Var ( A ( i , j ) ) denotes the variance of the Bernoulli random variable A ( i , j ) , we have
E ( W 2 ( i , j ) ) = Var ( A ( i , j ) ) = P ( A ( i , j ) = 1 ) ( 1 P ( A ( i , j ) = 1 ) ) P ( A ( i , j ) = 1 ) = Ω ( i , j ) = e i Θ r Π r P Π c e ˜ j = θ r ( i ) e i Π r P Π c e ˜ j θ r ( i ) P max .
Since e i e i is an n r × n r diagonal matrix with ( i , i ) -th entry being one and other entries being zero, we have
i = 1 n r j = 1 n c E ( W ( i , j ) ( W ( i , j ) ) ) = i = 1 n r j = 1 n c E ( W 2 ( i , j ) ) e i e i = max 1 i n r | j = 1 n c E ( W 2 ( i , j ) ) | θ r , max P max n c .
Similarly, we have i = 1 n r j = 1 n c E ( ( W ( i , j ) ) W ( i , j ) ) P max θ r 1 , which gives that
σ 2 = max ( i = 1 n r j = 1 n c E ( W ( i , j ) ( W ( i , j ) ) ) , i = 1 n r j = 1 n c E ( ( W ( i , j ) ) W ( i , j ) ) ) P max max ( θ r 1 , θ r , max n c ) .
By the rectangular version of the Bernstein inequality [71], combining with
σ 2 P max max ( θ r 1 , θ r , max n c ) , R = 1 , d 1 + d 2 = n r + n c , set
t = α + 1 + α 2 + 20 α + 19 3 P max max ( θ r 1 , θ r , max n c ) log ( n r + n c ) for any α > 0 , we have
P ( W t ) = P ( i = 1 n r j = 1 n c W ( i , j ) t ) ( n r + n c ) exp ( t 2 / 2 σ 2 + R t 3 ) ( n r + n c ) exp ( t 2 / 2 P max max ( θ r 1 , θ r , max n c ) + t / 3 ) = ( n r + n c ) exp ( ( α + 1 ) log ( n r + n c ) · 1 18 ( α + 19 + α + 1 ) 2 + 2 α + 1 α + 19 + α + 1 log ( n r + n c ) P max max ( θ r 1 , θ r , max n c ) ) ( n r + n c ) exp ( ( α + 1 ) log ( n r + n c ) ) = 1 ( n r + n c ) α ,
where we have used Assumption 1 in the last inequality. Set α = 3 , and the claim follows.    □

Appendix E. Proof of Consistency for DiMSC

Similar to [24,26,27], for our DiMSC, the main theoretical results (i.e., Theorem 2) rely on the row-wise singular vector deviation bounds for the singular eigenvectors of the adjacency matrix.
Lemma A7.  (Row-wise singular vector deviation) Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , when Assumption 1 holds, suppose σ K ( Ω ) C θ r , max ( n r + n c ) log ( n r + n c ) , with probability at least 1 o ( ( n r + n c ) 3 ) , we have
max ( U ^ U ^ U U 2 , V ^ V ^ V V 2 ) = O ( P max θ r , max K log ( n r + n c ) θ r , min σ K ( P ) σ K ( Π r ) σ K ( Π c ) ) .
Proof. 
Let H U ^ = U ^ U , and H U ^ = U H U ^ Σ H U ^ V H U ^ be the SVD decomposition of H U ^ with U H U ^ , V H U ^ R n r × K , where U H U ^ and V H U ^ represent, respectively, the left and right singular matrices of H U ^ . Define sgn ( H U ^ ) = U H U ^ V H U ^ ; sgn ( H V ^ ) is defined similarly. Since E ( A ( i , j ) Ω ( i , j ) ) = 0 , E [ ( A ( i , j ) Ω ( i , j ) ) 2 ] θ r ( i ) P max θ r , max P max by the proof of Lemma A6, 1 θ r , max P max min ( n r , n c ) / ( μ log ( n r + n c ) ) O ( 1 ) holds by Assumption 1, where μ is the incoherence parameter defined as μ = max ( n r U 2 2 K , n c V 2 2 K ) . By Theorem 4.4 [64], with high probability, we have below row-wise singular vector deviation
max ( U ^ sgn ( H U ^ ) U 2 , V ^ sgn ( H V ^ ) V 2 ) C P max θ r , max K ( κ ( Ω ) max ( n r , n c ) μ min ( n r , n c ) + log ( n r + n c ) ) σ K ( Ω ) C P max θ r , max K log ( n r + n c ) σ K ( Ω ) ,
provided that c 1 σ K ( Ω ) θ r , max P max ( n r + n c ) log ( n r + n c ) for some sufficiently small constant c 1 , and here we set max ( n r , n c ) μ min ( n r , n c ) = O ( 1 ) for convenience since this term has little effect on the error bounds of DiMSC, especially for the case when n r n c = O ( 1 ) .
Since U U = I , U ^ U ^ = I , we have U ^ U ^ U U 2 2 U U ^ sgn ( H U ^ ) 2 by basic algebra. Now, we are ready to bound U ^ U ^ U U 2 :
U ^ U ^ U U 2 = max 1 i n r e i ( U U U ^ U ^ ) F 2 U U ^ sgn ( H U ^ ) 2 C P max θ r , max K log ( n r + n c ) σ K ( Ω ) By Lemma A 5 C P max θ r , max K log ( n r + n c ) θ r , min σ K ( P ) σ K ( Π r ) σ K ( Π c ) .
The lemma holds by following similar proof for V ^ V ^ V V 2 .    □
When Θ r = ρ I , n r = n c , Π r = Π c = Π , P max = O ( 1 ) , and DiCCMM degenerates to MMSB, the bound in Lemma A7 is O ( K log ( n ) σ K ( P ) ρ λ K ( Π Π ) ) . if we further assume that λ K ( Π Π ) = O ( n K ) and K = O ( 1 ) , the bound is of order O ( 1 σ K ( P ) 1 n log ( n ) ρ n ) . Set the Θ in [24] as ρ I , their DCMM degenerates to MMSB, their assumptions are translated to our λ K ( Π Π ) = O ( n K ) , when K = O ( 1 ) , the row-wise singular vector deviation bound in the fourth bullet of Lemma 2.1 [24] is O ( 1 σ K ( P ) 1 n log ( n ) ρ n ) , which is consistent with ours. Meanwhile, if we further assume that σ K ( P ) = O ( 1 ) , the bound is of order 1 n log ( n ) ρ n .
The next lemma is the cornerstone to characterizing the behaviors of DiMSC.
Lemma A8. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , when conditions of Lemma A7 hold, there exist two permutation matrices P r , P c R K × K such that with probability at least 1 o ( ( n r + n c ) 3 ) , we have
max 1 k K e k ( U ^ * , 2 ( I ^ r , : ) P r U * , 2 ( I r , : ) ) F = O ( K 3 θ r , max 11 ϖ κ 3 ( Π r Π r ) λ 1 1.5 ( Π r Π r ) θ r , min 11 π r , min ) , max 1 k K e k ( V ^ 2 ( I ^ c , : ) P c V 2 ( I c , : ) ) F = O ( ϖ κ ( Π c Π c ) ) .
Proof. 
First, we consider column nodes. The detail of the SP algorithm is in Algorithm A2.
Algorithm A2 Successive Projection (SP) [54]
Require: Near-separable matrix Y s p = S s p M s p + Z s p R + m × n , where S s p , M s p should satisfy Assumption 1 [54], the number r of columns to be extracted.
Ensure: Set of indices K such that Y ( K , : ) S (up to permutation)
1:
Compute U ^ r R n r × K r and U ^ c R n c × K r from the top- K r -dimensional SVD of A.
2:
Let R = Y s p , K = { } , k = 1 .
3:
While R 0 and k r  do
4:
        k * = argmax k R ( k , : ) F .
5:
       u k = R ( k * , : ) .
6:
       R ( I u k u k u k F 2 ) R .
7:
       K = K { k * } .
8:
      k=k+1.
9:
end while
Based on Algorithm A2, the following theorem is Theorem 1.1 in [54].
Theorem A1.  Fix m r and n r . Consider a matrix Y s p = S s p M s p + Z s p , where S s p R m × r has a full column rank, M s p R r × n is a nonnegative matrix such that the sum of each column is at most 1, and Z s p = [ Z s p , 1 , , Z s p , n ] R m × n . Suppose M s p has a submatrix equal to I r . Write ϵ max 1 i n Z s p , i F . Suppose ϵ = O ( σ min ( S s p ) r κ 2 ( S s p ) ) , where σ min ( S s p ) and κ ( S s p ) are the minimum singular value and condition number of S s p , respectively. If we apply the SP algorithm to columns of Y s p , then it outputs an index set K { 1 , 2 , , n } such that | K | = r and max 1 k r min j K S s p ( : , k ) Y s p ( : , j ) F = O ( ϵ κ 2 ( S s p ) ) , where S s p ( : , k ) is the k-th column of S s p .
Let m = K , r = K , n = n c , Y s p = V ^ 2 , Z s p = V ^ 2 V 2 , S s p = V 2 ( I c , : ) , and M s p = Π c . By Condition (I2), M s p has an identity submatrix I K . By Lemma A7, we have
ϵ c = max 1 j n c V ^ 2 ( j , : ) V 2 ( j , : ) F = V ^ 2 ( j , : ) V 2 ( j , : ) 2 ϖ .
By Theorem A1, there exists a permutation matrix P c such that
max 1 k K e k ( V ^ 2 ( I ^ c , : ) P c V 2 ( I c , : ) ) F = O ( ϵ c κ 2 ( V 2 ( I c , : ) ) K ) = O ( ϖ κ 2 ( V 2 ( I c , : ) ) ) .
Since κ 2 ( V 2 ( I c , : ) ) = κ ( V 2 ( I c , : ) V 2 ( I c , : ) ) = κ ( V ( I c , : ) V ( I c , : ) ) = κ ( Π c Π c ) , where the last equality holds by Lemma A4, we have
max 1 k K e k ( V ^ 2 ( I ^ c , : ) P c V 2 ( I c , : ) ) F = O ( ϖ κ ( Π c Π c ) ) .
Remark A1.For the ideal case, let m = K , r = K , n = n c , Y s p = V , Z s p = V V 0 , S s p = V ( I c , : ) , and M s p = Π c . Then, we have max 1 j n c V ( j , : ) V ( j , : ) F = 0 . By Theorem A1, SP algorithm returns I c when the input is V assuming there are K column communities.
Now, we consider row nodes. From Lemma 2, we see that U * ( I r , : ) satisfies Condition 1 in [26]. Meanwhile, since ( U * ( I r , : ) U * ( I r , : ) ) 1 1 > 0 , we have ( U * ( I r , : ) U * ( I r , : ) ) 1 1 η 1 , hence U * ( I r , : ) satisfies Condition 2 in [26]. Now, we give a lower bound for η to show that η is strictly positive. By the proof of Lemma A4, we have
( U * ( I r , : ) U * ( I r , : ) ) 1 = N U 1 ( I r , I r ) Θ 1 ( I r , I r ) Π r Θ r 2 Π r Θ r 1 ( I r , I r ) N U 1 ( I r , I r ) θ r , min 2 θ r , max 2 N U , max 2 Π r Π r θ r , min 4 θ r , max 4 K λ 1 ( Π r Π r ) Π r Π r ,
where we set N U , max = max 1 i n r N U ( i , i ) , and we have used the facts that N U , Θ r are diagonal matrices, and N U , max θ r , max K λ 1 ( Π r Π r ) θ r , min by Lemma A3. Then, we have
η = min 1 k K ( ( U * ( I r , : ) U * ( I r , : ) ) 1 1 ) ( k ) θ r , min 4 θ r , max 4 K λ 1 ( Π r Π r ) min 1 k K e k Π r Π r 1 = θ r , min 4 θ r , max 4 K λ 1 ( Π r Π r ) min 1 k K e k Π r 1 = θ r , min 4 π r , min θ r , max 4 K λ 1 ( Π r Π r ) ,
i.e., η is strictly positive. Since U * , 2 ( I r , : ) U * , 2 ( I r , : ) U * ( I r , : ) U * ( I r , : ) , we have U * , 2 ( I r , : ) also satisfies Conditions 1 and 2 in [26]. The above analysis shows that we can directly apply Lemma F.1 of [26] since the ideal DiMSC algorithm satisfies Conditions 1 and 2 in [26], therefore there exists a permutation matrix P r R K × K such that
max 1 k K e k ( U ^ * , 2 ( I ^ r , : ) P r U * , 2 ( I r , : ) ) F = O ( K ζ ϵ r λ K 1.5 ( U * , 2 ( I r , : ) ) U * , 2 ( I r , : ) ) ,
where ζ 4 K η λ K 1.5 ( U * , 2 ( I r , : ) U * , 2 ( I r , : ) ) = O ( K η λ K 1.5 ( U * ( I r , : ) U * ( I r , : ) ) ) , and ϵ r = max 1 i n r U ^ * , 2 ( i , : ) U * , 2 ( i , : ) . Next, we bound ϵ r as below
U ^ * , 2 ( i , : ) U * , 2 ( i , : ) F = U ^ 2 ( i , : ) U 2 ( i , : ) F U 2 ( i , : ) U ^ 2 ( i , : ) F U ^ 2 ( i , : ) F U 2 ( i , : ) F F 2 U ^ 2 ( i , : ) U 2 ( i , : ) F U 2 ( i , : ) F 2 U ^ 2 U 2 2 U 2 ( i , : ) F 2 ϖ U 2 ( i , : ) F = 2 ϖ ( U U ) ( i , : ) F = 2 ϖ U ( i , : ) U F = 2 ϖ U ( i , : ) F 2 ϖ θ r , max K λ 1 ( Π r Π r ) θ r , min ,
where the last inequality holds by Lemma A3. Then, we have ϵ r = O ( ϖ θ r , max K λ 1 ( Π r Π r ) θ r , min ) . Finally, by Lemma A4, we have
max 1 k K e k ( U ^ * , 2 ( I ^ r , : ) P r U * , 2 ( I r , : ) ) F = O ( K 3 θ r , max 11 ϖ κ 3 ( Π r Π r ) λ 1 1.5 ( Π r Π r ) θ r , min 11 π r , min ) .
Remark A2.For the ideal case, when setting U * as the input of the SVM-cone algorithm assuming there are K row communities, since U * U * 2 = 0 , Lemma F.1; [26] guarantees that SVM-cone algorithm returns I r exactly. Meanwhile, another view to see that the SVM-cone algorithm exactly obtains I r when the input is U * (also U 2 , * ) is given in Appendix F, which focuses on following the three steps of SVM-cone algorithm to show that it returns I r with input U * (also U * , 2 ), instead of simply applying Lemma F.1. [26].
   □
Lemma A9. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , when conditions of Lemma A7 hold, with probability at least 1 o ( ( n r + n c ) 3 ) , we have
max 1 i n r e i ( Z ^ r Z r P r ) F = O ( K 5 θ r , max 15 ϖ κ 4.5 ( Π r Π r ) κ ( Π c ) λ 1 1.5 ( Π r Π r ) θ r , min 14 π r , min ) , max 1 j n c e j ( Z ^ c Z c P c ) F = O ( ϖ κ ( Π c Π c ) K λ 1 ( Π c Π c ) ) .
Proof. 
First, we consider column nodes. Recall that V ( I c , : ) = B c . For convenience, set V ^ ( I ^ c , : ) = B ^ c , V 2 ( I c , : ) = B 2 c , V ^ 2 ( I ^ c , : ) = B ^ 2 c . We bound e j ( Z ^ c Z c P c ) F when the input is V ^ in the SP algorithm. Recall that Z c = max ( V V ( I c , : ) ( V ( I c , : ) V ( I c , : ) ) 1 , 0 ) Π c , for 1 j n c , we have
e j ( Z ^ c Z c P c ) F = e j ( max ( 0 , V ^ B ^ c ( B ^ c B ^ c ) 1 ) V B c ( B c B c ) 1 P c ) F e j ( V ^ B ^ c ( B ^ c B ^ c ) 1 V B c ( B c B c ) 1 P c ) F = e j ( V ^ V ( V V ^ ) ) B ^ c ( B ^ c B ^ c ) 1 + e j ( V ( V V ^ ) B ^ c ( B ^ c B ^ c ) 1 V ( V V ^ ) ( P c ( B c B c ) ( B c ) 1 ( V V ^ ) ) 1 ) F e j ( V ^ V ( V V ^ ) ) B ^ c ( B ^ c B ^ c ) 1 F + e j V ( V V ^ ) ( B ^ c ( B ^ c B ^ c ) 1 ( P c ( B c B c ) ( B c ) 1 ( V V ^ ) ) 1 ) F e j ( V ^ V ( V V ^ ) ) F B ^ c 1 F + e j V ( V V ^ ) ( B ^ c ( B ^ c B ^ c ) 1 ( P c ( B c B c ) ( B c ) 1 ( V V ^ ) ) 1 ) F K e j ( V ^ V ( V V ^ ) ) F / λ K ( B ^ c B ^ c ) + e j V ( V V ^ ) ( B ^ c 1 ( P c B c ( V V ^ ) ) 1 ) F = K e j ( V ^ V ^ V V ) V ^ F O ( λ 1 ( Π c Π c ) ) + e j V ( V V ^ ) ( B ^ c 1 ( P c B c ( V V ^ ) ) 1 ) F K e j ( V ^ V ^ V V ) F O ( λ 1 ( Π c Π c ) ) + e j V ( V V ^ ) ( B ^ c 1 ( P c B c ( V V ^ ) ) 1 ) F K ϖ O ( λ 1 ( Π c Π c ) ) + e j V ( V V ^ ) ( B ^ c 1 ( P c B c ( V V ^ ) ) 1 ) F = O ( ϖ K λ 1 ( Π c Π c ) ) + e j V ( V V ^ ) ( B ^ c 1 ( P c B c ( V V ^ ) ) 1 ) F ,
where we have used similar idea in the proof of Lemma VII.3 in [27] such that apply O ( 1 λ K ( B c B c ) ) to estimate 1 λ K ( B ^ c B ^ c ) , then by Lemma A4, we have 1 λ K ( B ^ c B ^ c ) = O ( λ 1 ( Π c Π c ) ) .
Now, we aim to bound e j V ( V V ^ ) ( B ^ c 1 ( P c B c ( V V ^ ) ) 1 ) F . For convenience, set T c = V V ^ , S c = P c B c T c . We have
e j V ( V V ^ ) ( B ^ c 1 ( P c B c ( V V ^ ) ) 1 ) F = e j V T c S c 1 ( S c B ^ c ) B ^ c 1 F e j V T c S c 1 ( S c B ^ c ) F B ^ c 1 F e j V T c S c 1 ( S c B ^ c ) F K | λ K ( B ^ c ) | = e j V T c S c 1 ( S c B ^ c ) F K λ K ( B ^ c B ^ c ) e j V T c S c 1 ( S c B ^ c ) F O ( K λ 1 ( Π c Π c ) ) = e j V T c T c 1 B c ( B c B c ) 1 P c ( S c B ^ c ) F O ( K λ 1 ( Π c Π c ) ) = e j V B c ( B c B c ) 1 P c ( S c B ^ c ) F O ( K λ 1 ( Π c Π c ) ) = e j Z c P c ( S c B ^ c ) F O ( K λ 1 ( Π c Π c ) ) By Z c = Π c max 1 k K e k ( S c B ^ c ) F O ( K λ 1 ( Π c Π c ) ) = max 1 k K e k ( B ^ c P c B c V V ^ ) F O ( K λ 1 ( Π c Π c ) ) = max 1 k K e k ( B ^ c V ^ P c B c V ) V ^ F O ( K λ 1 ( Π c Π c ) ) max 1 k K e k ( B ^ c V ^ P c B c V ) F O ( K λ 1 ( Π c Π c ) ) = max 1 k K e k ( B ^ 2 c P c B 2 c ) F O ( K λ 1 ( Π c Π c ) ) = O ( ϖ κ ( Π c Π c ) K λ 1 ( Π c Π c ) ) .
Remark A3.  Equation (A2) supports our statement that building the theoretical framework of DiMSC benefits a lot by introducing the DiMSC-equivalence algorithm since B ^ 2 c P c B 2 c 2 is obtained from DiMSC-equivalence (i.e., inputing V ^ 2 in the SP algorithm obtains B ^ 2 c P c B 2 c 2 ).
Then, we have
e j ( Z ^ c Z c P c ) F O ( ϖ K λ 1 ( Π c Π c ) ) + e j V ( V V ^ ) ( B ^ c 1 ( P c B c ( V V ^ ) ) 1 ) F O ( ϖ K λ 1 ( Π c Π c ) ) + O ( ϖ κ ( Π c Π c ) K λ 1 ( Π c Π c ) ) = O ( ϖ κ ( Π c Π c ) K λ 1 ( Π c Π c ) ) .
Next, we consider row nodes. For 1 i n r , since Z r = Y * J * , Z ^ r = Y ^ * J ^ * , we have
e i ( Z ^ r Z r P r ) F = e i ( max ( 0 , Y ^ * J ^ * ) Y * J * P r ) F e i ( Y ^ * J ^ * Y * J * P r ) F = e i ( Y ^ * Y * P r ) J ^ * + e i Y * P r ( J ^ * P r J * P r ) F e i ( Y ^ * Y * P r ) F J ^ * F + e i Y * P r F J ^ * P r J * P r F = e i ( Y ^ * Y * P r ) F J ^ * F + e i Y * F J ^ * P r J * P r F .
Next, we consider row nodes. For 1 i n r , since Z r = Y * J * , Z ^ r = Y ^ * J ^ * , we have
e i ( Z ^ r Z r P r ) F = e i ( max ( 0 , Y ^ * J ^ * ) Y * J * P r ) F e i ( Y ^ * J ^ * Y * J * P r ) F = e i ( Y ^ * Y * P r ) J ^ * + e i Y * P r ( J ^ * P r J * P r ) F e i ( Y ^ * Y * P r ) F J ^ * F + e i Y * P r F J ^ * P r J * P r F = e i ( Y ^ * Y * P r ) F J ^ * F + e i Y * F J ^ * P r J * P r F .
Therefore, the bound of e i ( Z ^ r Z r P r ) F can be obtained as long as we bound e i ( Y ^ * Y * P r ) F , J ^ * F , e i Y * F and J ^ * P r J * P r F . We bound the four terms as below:
  • We bound e i ( Y ^ * Y * P r ) F first. Similar as bounding e j ( Z ^ c Z c P c ) , we set U * ( I r , : ) = B R , U ^ * ( I ^ r , : ) = B ^ R , U * , 2 ( I r , : ) = B 2 R , U ^ * , 2 ( I ^ r , : ) = B ^ 2 R for convenience. We bound e i ( Y ^ * Y * P r ) F when the input is U ^ * in the SVM-cone algorithm. For 1 i n r , we have
    e i ( Y ^ * Y * P r ) F = e i ( U ^ B ^ R ( B ^ R B ^ R ) 1 U B R ( B R B R ) 1 P r ) F = e i ( U ^ U ( U U ^ ) ) B ^ R ( B ^ R B ^ R ) 1 + e i ( U ( U U ^ ) B ^ R ( B ^ R B ^ R ) 1 U ( U U ^ ) ( P r ( B R B R ) ( B R ) 1 ( U U ^ ) ) 1 ) F e i ( U ^ U ( U U ^ ) ) B ^ R ( B ^ R B ^ R ) 1 F + e i U ( U U ^ ) ( B ^ R ( B ^ R B ^ R ) 1 ( P r ( B R B R ) ( B R ) 1 ( U U ^ ) ) 1 ) F e i ( U ^ U ( U U ^ ) ) F B ^ R 1 F + e i U ( U U ^ ) ( B ^ R ( B ^ R B ^ R ) 1 ( P r ( B R B R ) ( B R ) 1 ( U U ^ ) ) 1 ) F K e i ( U ^ U ( U U ^ ) ) F / λ K ( B ^ R B ^ R ) + e i U ( U U ^ ) ( B ^ R 1 ( P r B R ( U U ^ ) ) 1 ) F = ( i ) K e i ( U ^ U ^ U U ) U ^ F O ( θ r , max κ ( Π r Π r ) θ r , min ) + e i U ( U U ^ ) ( B ^ R 1 ( P r B R ( U U ^ ) ) 1 ) F K e i ( U ^ U ^ U U ) F O ( θ r , max κ ( Π r Π r ) θ r , min ) + e i U ( U U ^ ) ( B ^ R 1 ( P r B R ( U U ^ ) ) 1 ) F K ϖ O ( θ r , max κ ( Π r Π r ) θ r , min ) + e i U ( U U ^ ) ( B ^ R 1 ( P r B R ( U U ^ ) ) 1 ) F = O ( ϖ θ r , max K κ ( Π r Π r ) θ r , min ) + e i U ( U U ^ ) ( B ^ R 1 ( P r B R ( U U ^ ) ) 1 ) F ,
    where we have used similar idea in the proof of Lemma VII.3 in [27] such that we apply O ( 1 λ K ( B R B R ) ) to estimate 1 λ K ( B ^ R B ^ R ) , hence (i) holds by Lemma A4.
Now, we aim to bound e i U ( U U ^ ) ( B ^ R 1 ( P r B R ( U U ^ ) ) 1 ) F . For convenience, set T r = U U ^ , S r = P r B R T r . We have
e i U ( U U ^ ) ( B ^ R 1 ( P r B R ( U U ^ ) ) 1 ) F = e i U T r S r 1 ( S r B ^ R ) B ^ R 1 F e i U T r S r 1 ( S r B ^ R ) F B ^ R 1 F e i U T r S r 1 ( S r B ^ R ) F K | λ K ( B ^ R ) | = e i U T r S r 1 ( S r B ^ R ) F K λ K ( B ^ R B ^ R ) e i U T r S r 1 ( S r B ^ R ) F O ( θ r , max K κ ( Π r Π r ) θ r , min ) = e i U T r T r 1 B R ( B R B R ) 1 P r ( S r B ^ R ) F O ( θ r , max K κ ( Π r Π r ) θ r , min ) = e i U B R ( B R B R ) 1 P r ( S r B ^ R ) F O ( θ r , max K κ ( Π r Π r ) θ r , min ) = e i Y * P r ( S r B ^ R ) F O ( θ r , max K λ 1 ( Π r Π r ) θ r , min ) e i Y * F S r B ^ R F O ( θ r , max K λ 1 ( Π r Π r ) θ r , min ) By Equation ( A 4 ) θ r , max 2 K λ 1 ( Π r Π r ) θ r , min 2 λ K ( Π r Π r ) max 1 k K e k ( S r B ^ R ) F O ( θ r , max K κ ( Π r Π r ) θ r , min ) = max 1 k K e k ( B ^ R P r B R U U ^ ) F O ( θ r , max 3 K 1.5 κ ( Π r Π r ) θ r , min 3 λ K ( Π r Π r ) ) = max 1 k K e k ( B ^ R U ^ P r B R U ) U ^ F O ( θ r , max 3 K 1.5 κ ( Π r Π r ) θ r , min 3 λ K ( Π r Π r ) ) max 1 k K e k ( B ^ R U ^ P r B R U ) F O ( θ r , max 3 K 1.5 κ ( Π r Π r ) θ r , min 3 λ K ( Π r Π r ) ) = max 1 k K e k ( B ^ 2 R P r B 2 R ) F O ( θ r , max 3 K 1.5 κ ( Π r Π r ) θ r , min 3 λ K ( Π r Π r ) ) = By Lemma A 8 O ( K 4.5 θ r , max 14 ϖ κ 4.5 ( Π r Π r ) λ 1 ( Π r Π r ) θ r , min 14 π r , min ) .
Remark A4.Similar as Equation (A2), Equation (A3) supports our statement that building the theoretical framework of DiMSC benefits a lot by introducing the DiMSC-equivalence algorithm since B ^ 2 R P r B 2 R 2 is obtained from DiMSC-equivalence (i.e., inputing U ^ * , 2 in the SVM-cone algorithm obtains B ^ 2 R P r B 2 R 2 ).
Then, we have
e i ( Y ^ * Y * P r ) F O ( ϖ θ r , max K κ ( Π r Π r ) θ r , min ) + e i U ( U U ^ ) ( B ^ R 1 ( P r B R U U ^ ) ) 1 ) F O ( ϖ θ r , max K κ ( Π r Π r ) θ r , min ) + O ( K 4.5 θ r , max 14 ϖ κ 4.5 ( Π r Π r ) λ 1 ( Π r Π r ) θ r , min 14 π r , min ) = O ( K 4.5 θ r , max 14 ϖ κ 4.5 ( Π r Π r ) λ 1 ( Π r Π r ) θ r , min 14 π r , min ) .
  • for e i Y * F , since Y * = U U * 1 ( I r , : ) , by Lemmas (A3) and (A4), we have
    e i Y * F U ( i , : ) F U * 1 ( I r , : ) F K U ( i , : ) F λ K ( U * ( I r , : ) U * ( I r , : ) ) θ r , max 2 K λ 1 ( Π r Π r ) θ r , min 2 λ K ( Π r Π r ) .
  • for J ^ * F , recall that J ^ * = diag ( U ^ * ( I ^ r , : ) Λ ^ V ^ ( I ^ c , : ) ) , we have
    J ^ * = max 1 k K J ^ * ( k , k ) = max 1 k K e k U ^ * ( I ^ r , : ) Λ ^ V ^ ( I ^ c , : ) e k = max 1 k K e k U ^ * ( I ^ r , : ) Λ ^ V ^ ( I ^ c , : ) e k max 1 k K e k U ^ * ( I ^ r , : ) Λ ^ V ^ ( I ^ c , : ) e k max 1 k K e k U ^ * ( I ^ r , : ) F Λ ^ V ^ ( I ^ c , : ) e k = max 1 k K A V ^ ( I ^ c , : ) e k = max 1 k K A ( e k V ^ ( I ^ c , : ) ) = max 1 k K A e k V ^ ( I ^ c , : ) max 1 k K A e k V ^ ( I ^ c , : ) F A V ^ 2 = A V ^ sgn ( H V ^ ) V + V 2 A ( V ^ sgn ( H V ^ ) V 2 + V 2 ) .
    By Lemmas (A6) and (A5), A = A Ω + Ω A Ω + σ 1 ( Ω ) A Ω + θ r , max σ 1 ( P ) σ 1 ( Π r ) σ 1 ( Π c ) = O ( θ r , max σ 1 ( Π r ) σ 1 ( Π c ) ) . By Lemma (A5) and Equation (A1),
    V ^ sgn ( H V ^ ) V 2 C θ r , max K log ( n r + n c ) θ r , min σ K ( P ) σ K ( Π r ) σ K ( Π c ) . By Lemma A3, V 2 1 λ K ( Π c Π c ) , which gives V ^ sgn ( H V ^ ) V 2 + V 2 = O ( 1 λ K ( Π c Π c ) ) (this can be seen as simply using V 2 to estimate V ^ 2 since 1 λ K ( Π c Π c ) is the same order as θ r , max K log ( n r + n c ) θ r , min σ K ( P ) σ K ( Π r ) σ K ( Π c ) ). Then, we have J ^ * = O ( θ r , max σ 1 ( Π r ) κ ( Π c ) ) , which gives that J ^ * F = O ( θ r , max K σ 1 ( Π r ) κ ( Π c ) ) .
  • for J ^ * P r J * P r F , since J * = N U ( I r , I r ) Θ r ( I r , I r ) , we have J * N U , max θ r , max θ r , max 2 K λ 1 ( Π r Π r ) θ r , min , which gives that J * F θ r , max 2 K σ 1 ( Π r ) θ r , min . Thus, we have J ^ * P r J * P r F = O ( θ r , max 2 K σ 1 ( Π r ) θ r , min ) .
Combining the above results, we have
e i ( Z ^ r Z r P r ) F e i ( Y ^ * Y * P r ) F J ^ * F + e i Y * F J ^ * P r J * P r F = O ( K 4.5 θ r , max 14 ϖ κ 4.5 ( Π r Π r ) λ 1 ( Π r Π r ) θ r , min 14 π r , min ) O ( θ r , max K σ 1 ( Π r ) κ ( Π c ) ) + θ r , max 2 K λ 1 ( Π r Π r ) θ r , min 2 λ K ( Π r Π r ) O ( θ r , max 2 K σ 1 ( Π r ) θ r , min ) = O ( K 5 θ r , max 15 ϖ κ 4.5 ( Π r Π r ) κ ( Π c ) λ 1 1.5 ( Π r Π r ) θ r , min 14 π r , min ) .
   □

Appendix E.1. Proof of Theorem 2

Proof. 
We bound e j ( Π ^ c Π c P c ) 1 first. Recall that Z c = Π c , Π c ( j , : ) = Z c ( j , : ) Z c ( j , : ) 1 , Π ^ c ( i , : ) = Z ^ c ( j , : ) Z ^ c ( j , : ) 1 , for 1 j n c , since
e j ( Π ^ c Π c P c ) 1 = e j Z ^ c e j Z ^ c 1 e j Z c P c e j Z c P c 1 1 = e j Z ^ c e j Z c 1 e j Z c P c e j Z ^ c 1 e j Z ^ c 1 e j Z c 1 1 = e j Z ^ c e j Z c 1 e j Z ^ c e j Z ^ c 1 + e j Z ^ c e j Z ^ c 1 e j Z c P e j Z ^ c 1 e j Z ^ c 1 e j Z c 1 1 e j Z ^ c e j Z c 1 e j Z ^ c e j Z ^ c 1 1 + e j Z ^ c e j Z ^ c 1 e j Z c P c e j Z ^ c 1 1 e j Z ^ c 1 e j Z c 1 = | e j Z c 1 e j Z ^ c 1 | + e j Z ^ c e j Z c P c 1 e j Z c 1 2 e j ( Z ^ c Z c P c ) 1 e j Z c 1 = 2 e j ( Z ^ c Z c P c ) 1 e j Π c 1 = 2 e j ( Z ^ c Z c P c ) 1 2 K e j ( Z ^ c Z c P c ) F ,
by Lemma A9, we have
e j ( Π ^ c Π c P c ) 1 = O ( K e j ( Z ^ c Z c P c ) F ) = O ( ϖ K κ ( Π c Π c ) λ 1 ( Π c Π c ) ) .
For row nodes 1 i n r , recall that Z r = Y * J * N U 1 N M Π r , Z ^ r = Y ^ * J ^ * , Π r ( i , : ) = Z r ( i , : ) Z r ( i , : ) 1 and Π ^ r ( i , : ) = Z ^ r ( i , : ) Z ^ r ( i , : ) 1 , where N M and M are defined in the proof of Lemma 1 such that U = Θ r M Θ r Π r B r and N M ( i , i ) = 1 M ( i , : ) F , similar as the proof for column nodes, we have
e i ( Π ^ r Π r P r ) 1 2 e i ( Z ^ r Z r P r ) 1 e i Z r 1 2 K e i ( Z ^ r Z r P r ) F e i Z r 1 .
Now, we provide a lower bound of e i Z r 1 as below
e i Z r 1 = e i N U 1 N M Π r 1 = N U 1 ( i , i ) e i N M Π r 1 = N U 1 ( i , i ) N M ( i , i ) e i Π r 1 = N M ( i , i ) N U ( i , i ) = U ( i , : ) F N M ( i , i ) = U ( i , : ) F 1 M ( i , : ) F = U ( i , : ) F 1 e i M F = U ( i , : ) F 1 e i Θ r 1 U F = U ( i , : ) F 1 Θ r 1 ( i , i ) e i U F = θ r ( i ) θ r , min .
Therefore, by Lemma A9, we have
e i ( Π ^ r Π r P r ) 1 2 K e i ( Z ^ r Z r P r ) F e i Z r 1 2 K e i ( Z ^ r Z r P r ) F θ r , min = O ( K 5.5 θ r , max 15 ϖ κ 4.5 ( Π r Π r ) κ ( Π c ) λ 1 1.5 ( Π r Π r ) θ r , min 15 π r , min ) .
   □

Appendix E.2. Proof of Corollary 1

Proof. 
Under conditions of Corollary 1, we have
e i ( Π ^ r Π r P r ) 1 = O ( K 5.5 θ r , max 15 ϖ κ 4.5 ( Π r Π r ) κ ( Π c ) λ 1 1.5 ( Π r Π r ) θ r , min 15 π r , min ) = O ( θ r , max 15 ϖ n r θ r , min 15 ) , e j ( Π ^ c Π c P c ) 1 = O ( ϖ K κ ( Π c Π c ) λ 1 ( Π c Π c ) ) = O ( ϖ n c ) .
Under conditions of Corollary 1, Lemma A7 gives ϖ = O ( P max θ r , max log ( n r + n c ) θ r , min σ K ( P ) n r n c ) , which gives that
e i ( Π ^ r Π r P r ) 1 = O ( θ r , max 15 ϖ n r θ r , min 15 ) = O ( θ r , max 15.5 P max log ( n r + n c ) θ r , min 16 σ K ( P ) n c ) , e j ( Π ^ c Π c P c ) 1 = O ( ϖ n c ) = O ( P max θ r , max log ( n r + n c ) θ r , min σ K ( P ) n r ) .
By basic algebra, this corollary follows.    □

Appendix F. SVM-Cone Algorithm

For readers’ convenience, we briefly introduce the SVM-cone algorithm given in [26] and provide another view that the SVM-cone algorithm exactly recovers Π r when the input is U * (or U * , 2 ). Let S be a matrix whose rows have unit l 2 norm, and S can be written as S = H S C , where H R n × K with nonnegative entries, no row of H is 0, and S C R K × m corresponding to K rows of S (i.e., there exists an index set I with K entries such that S C = S ( I , : ) ). Inferring H from S is called the ideal cone problem, i.e., Problem 1 in [26]. The ideal cone problem can be solved by applying one-class SVM to the rows of S, and the K rows of S C are the support vectors found by one-class SVM:
maximize b s . t . w S ( i , : ) b ( for i = 1 , 2 , , n ) and w F 1 .
The solution ( w , b ) for the ideal cone problem when ( S C S C ) 1 1 > 0 is given by
w = b 1 · S C ( S C S C ) 1 1 1 ( S C S C ) 1 1 , b = 1 1 ( S C S C ) 1 1 .
For the empirical case, let S ^ R n × m be a matrix where all rows have unit l 2 norm, infer H from S ^ with given K is called the empirical cone problem, i.e., Problem 2 in [26]. For the empirical cone problem, one-class SVM is applied to all rows of S ^ to obtain w and b’s estimations w ^ and b ^ . Then, apply the K-means algorithm to rows of S ^ that are close to the hyperplane into K clusters, and an estimation of the index set I can be obtained from the K clusters provided. Algorithm A3 below is the SVM-cone algorithm provided in [26].
Algorithm 3: SVM-cone [26]
Require: 
S ^ R n × m with rows having unit l 2 norm, number of corners K, estimated distance corners from hyperplane γ .
Ensure: 
The near-corner index set I ^ .
1:
Run one-class SVM on S ^ ( i , : ) to obtain w ^ and b ^ .
2:
Run K-means algorithm to the set { S ^ ( i , : ) | S ^ ( i , : ) w ^ b ^ + γ } that are close to the hyperplane into K clusters.
3:
Pick one point from each cluster to obtain the near-corner set I ^ .
As suggested in [26], we can start γ = 0 and incrementally increase it until K distinct clusters are found.
Now, turn to our DiMSC algorithm, and focus on estimating I r with given U * , U * , 2 , and K. By Lemmas 1 and A1, we know that U * and U * , 2 enjoy the ideal cone structure, and Lemma 2 guarantees that one-class SVM can be applied to rows of U * and U * , 2 . Set w 1 = b 1 1 U * ( I r , : ) ( U * ( I r , : ) U * ( I r , : ) ) 1 1 1 ( U * ( I r , : ) U * ( I r , : ) ) 1 , b 1 = 1 1 ( U * ( I r , : ) U * ( I r , : ) ) 1 1 , and w 2 = b 2 1 U * , 2 ( I r , : ) ( U * , 2 ( I r , : ) U * , 2 ( I r , : ) ) 1 1 1 ( U * , 2 ( I r , : ) U * , 2 ( I r , : ) ) 1 , b 2 = 1 1 ( U * , 2 ( I r , : ) U * , 2 ( I r , : ) ) 1 1 . Now that w 1 and b 1 are solutions of the one-class SVM in Equation (A5) by setting S = U * , and w 2 and b 2 are solutions of the one-class SVM in Equation (A5) by setting S = U * , 2 . Lemma A11 says that if row node i is a pure node, we have U * ( i , : ) w 1 = b 1 , and this suggests that in the SVM-cone algorithm, if the input matrix is U * , by setting γ = 0 , we can find all pure row nodes, i.e., the set { U * ( i , : ) | U * ( i , : ) w 1 = b 1 } contains all rows of U * respective to pure row nodes while including mixed row nodes. By Lemma 1, these pure row nodes belong to the K distinct row communities such that if row nodes i , i ¯ are in the same row community, then we have U * ( i , : ) = U * ( i ¯ , : ) , and this is the reason that we need to apply the K-means algorithm on the set obtained in Step 2 in the SVM-cone algorithm to obtain the K distinct row communities, and this is also the reason that we said the SVM-cone algorithm returns the index set I exactly when the input is U * . These conclusions also hold when we set the input in the SVM-cone algorithm as U * , 2 .
Lemma A10. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , for 1 i n r , U * ( i , : ) can be written as U * ( i , : ) = r 1 ( i ) Φ 1 ( i , : ) U * ( I r , : ) , where r 1 ( i ) 1 . Meanwhile, r 1 ( i ) = 1 and Φ 1 ( i , : ) = e k if i is a pure node such that Π r ( i , k ) = 1 ; r 1 ( i ) > 1 and Φ 1 ( i , : ) e k if Π r ( i , k ) < 1 for 1 k K . Similarly, U * , 2 ( i , : ) can be written as U * , 2 ( i , : ) = r 2 ( i ) Φ 2 ( i , : ) U * , 2 ( I r , : ) , where r 2 ( i ) 1 . Meanwhile, r 2 ( i ) = 1 and Φ 2 ( i , : ) = e k if Π r ( i , k ) = 1 ; r 2 ( i ) > 1 and Φ 2 ( i , : ) e k if Π r ( i , k ) < 1 for 1 k K .
Proof. 
Since U * = Y U * ( I r , : ) by Lemma 1, for 1 i n r , we have
U * ( i , : ) = Y ( i , : ) U * ( I r , : ) = Y ( i , : ) 1 Y ( i , : ) Y ( i , : ) 1 U * ( I r , : ) = r 1 ( i ) Φ 1 ( i , : ) U * ( I r , : ) ,
where we set r 1 ( i ) = Y ( i , : ) 1 , Φ 1 ( i , : ) = Y ( i , : ) Y ( i , : ) 1 , and 1 is a K × 1 vector with all entries being ones.
By the proof of Lemma 1, Y ( i , : ) = Π r ( i , : ) M ( i , : ) F Θ r 1 ( I r , I r ) N U 1 ( I r , I r ) , where M = Π r Θ r 1 ( I r , I r ) U ( I r , : ) . For convenience, set T = Θ r 1 ( I r , I r ) , Q = N U 1 ( I r , I r ) , and R = U ( I r , : ) (such setting of T , Q , R is only for used for notation convenience for the proof of Lemma A10).
On the one hand, if row node i is pure such that Π r ( i , k ) = 1 for certain k among { 1 , 2 , , K } (i.e., Π r ( i , : ) = e k if Π r ( i , k ) = 1 ), we have M ( i , : ) = Π r ( i , : ) Θ r 1 ( I r , I r ) U ( I r , : ) = T ( k , k ) R ( k , : ) , and Π r ( i , : ) T Q = T ( k , k ) Q ( k , : ) , which give that Y ( i , : ) = T ( k , k ) Q ( k , : ) T ( k , k ) R ( k , : ) F = Q ( k , : ) R ( k , : ) F . Recall that the k-th diagonal entry of N U 1 ( I r , I r ) is [ U ( I r , : ) ] ( k , : ) F , i.e., Q ( k , : ) 1 = R ( k , : ) F , which gives that r 1 ( i ) = Y ( i , : ) 1 = 1 and Φ 1 ( i , : ) = e k when Π r ( i , k ) = 1 .
On the other hand, if i is a mixed node, since M ( i , : ) F = Π r ( i , : ) Θ r 1 ( I r , I r ) U ( I r , : ) F = k = 1 K Π r ( i , k ) T ( k , k ) R ( k , : ) F < k = 1 K Π r ( i , k ) T ( k , k ) R ( k , : ) F = k = 1 K Π r ( i , k ) T ( k , k ) Q ( k , k ) , combine it with Π r ( i , : ) T Q 1 = k = 1 K Π r ( i , k ) T ( k , k ) Q ( k , k ) , so r 1 ( i ) = Y ( i , : ) 1 = Π r ( i , : ) T Q 1 M ( i , : ) F > 1 . The lemma follows by a similar analysis for U * , 2 . □
Lemma A11. 
Under D i D C M M n r , n c ( K , P , Π r , Π c , Θ r ) , for 1 i n r , if row node i is a pure node such that Π r ( i , k ) = 1 for certain k, we have
U * ( i , : ) w 1 = b 1 and U * , 2 ( i , : ) w 2 = b 2 ,
Meanwhile, if row node i is a mixed node, the above equalities do not hold.
Proof. 
For the claim that U * ( i , : ) w 1 = b 1 holds when i is pure, by Lemma A10, when i is a pure node such that Π r ( i , k ) = 1 , U * ( i , : ) can be written as U * ( i , : ) = e k U * ( I r , : ) , so U * ( i , : ) w 1 = b 1 holds surely. When i is a mixed node, by Lemma A10, r 1 ( i ) > 1 and Φ 1 ( i , : ) e k for any k = 1 , 2 , , K ; hence U * ( i , : ) e k U * ( I r , : ) if i is mixed, which gives the result. Follow a similar analysis, we obtain the results associated with U * , 2 , and the lemma follows. □

References

  1. Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef]
  2. Fortunato, S.; Hric, D. Community detection in networks: A user guide. Phys. Rep. 2016, 659, 1–44. [Google Scholar] [CrossRef]
  3. Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
  4. Karrer, B.; Newman, M.E.J. Stochastic blockmodels and community structure in networks. Phys. Rev. E 2011, 83, 16107. [Google Scholar] [CrossRef]
  5. Rohe, K.; Chatterjee, S.; Yu, B. Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 2011, 39, 1878–1915. [Google Scholar] [CrossRef]
  6. Zhao, Y.; Levina, E.; Zhu, J. Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Stat. 2012, 40, 2266–2292. [Google Scholar] [CrossRef]
  7. Qin, T.; Rohe, K. Regularized spectral clustering under the degree-corrected stochastic blockmodel. Adv. Neural Inf. Process. Syst. 2013, 26, 3120–3128. [Google Scholar]
  8. Jin, J. Fast community detection by SCORE. Ann. Stat. 2015, 43, 57–89. [Google Scholar] [CrossRef]
  9. Lei, J.; Rinaldo, A. Consistency of spectral clustering in stochastic block models. Ann. Stat. 2015, 43, 215–237. [Google Scholar] [CrossRef]
  10. Cai, T.T.; Li, X. Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. Ann. Stat. 2015, 43, 1027–1059. [Google Scholar] [CrossRef]
  11. Joseph, A.; Yu, B. Impact of regularization on spectral clustering. Ann. Stat. 2016, 44, 1765–1791. [Google Scholar] [CrossRef]
  12. Chen, Y.; Li, X.; Xu, J. Convexified modularity maximization for degree-corrected stochastic block models. Ann. Stat. 2018, 46, 1573–1602. [Google Scholar] [CrossRef]
  13. Passino, F.S.; Heard, N.A. Bayesian estimation of the latent dimension and communities in stochastic blockmodels. Stat. Comput. 2020, 30, 1291–1307. [Google Scholar] [CrossRef]
  14. Li, X.; Chen, Y.; Xu, J. Convex Relaxation Methods for Community Detection. Stat. Sci. 2021, 36, 2–15. [Google Scholar] [CrossRef]
  15. Jing, B.; Li, T.; Ying, N.; Yu, X. Community detection in sparse networks using the symmetrized Laplacian inverse matrix (SLIM). Stat. Sin. 2022, 32, 1–22. [Google Scholar] [CrossRef]
  16. Abbe, E. Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res. 2017, 18, 6446–6531. [Google Scholar]
  17. Airoldi, E.M.; Blei, D.M.; Fienberg, S.E.; Xing, E.P. Mixed Membership Stochastic Blockmodels. J. Mach. Learn. Res. 2008, 9, 1981–2014. [Google Scholar]
  18. Ball, B.; Karrer, B.; Newman, M.E.J. Efficient and principled method for detecting communities in networks. Phys. Rev. E 2011, 84, 36103. [Google Scholar] [CrossRef]
  19. Wang, F.; Li, T.; Wang, X.; Zhu, S.; Ding, C. Community discovery using nonnegative matrix factorization. Data Min. Knowl. Discov. 2011, 22, 493–521. [Google Scholar] [CrossRef]
  20. Gopalan, P.K.; Blei, D.M. Efficient discovery of overlapping communities in massive networks. Proc. Natl. Acad. Sci. USA 2013, 110, 14534–14539. [Google Scholar] [CrossRef]
  21. Anandkumar, A.; Ge, R.; Hsu, D.; Kakade, S.M. A tensor approach to learning mixed membership community models. J. Mach. Learn. Res. 2014, 15, 2239–2312. [Google Scholar]
  22. Kaufmann, E.; Bonald, T.; Lelarge, M. A spectral algorithm with additive clustering for the recovery of overlapping communities in networks. Theor. Comput. Sci. 2017, 742, 3–26. [Google Scholar] [CrossRef]
  23. Panov, M.; Slavnov, K.; Ushakov, R. Consistent estimation of mixed memberships with successive projections. In Proceedings of the International Conference on Complex Networks and their Applications, Lyon, France, 29 November–1 December 2017; Springer: Cham, Switzerland, 2017; pp. 53–64. [Google Scholar]
  24. Jin, J.; Ke, Z.T.; Luo, S. Mixed membership estimation for social networks. arXiv 2017, arXiv:1708.07852. [Google Scholar] [CrossRef]
  25. Mao, X.; Sarkar, P.; Chakrabarti, D. On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2324–2333. [Google Scholar]
  26. Mao, X.; Sarkar, P.; Chakrabarti, D. Overlapping Clustering Models, and One (class) SVM to Bind Them All. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Volume 31, pp. 2126–2136. [Google Scholar]
  27. Mao, X.; Sarkar, P.; Chakrabarti, D. Estimating Mixed Memberships With Sharp Eigenvector Deviations. J. Am. Stat. Assoc. 2020, 116, 1928–1940. [Google Scholar] [CrossRef] [Green Version]
  28. Wang, Y.; Bu, Z.; Yang, H.; Li, H.J.; Cao, J. An effective and scalable overlapping community detection approach: Integrating social identity model and game theory. Appl. Math. Comput. 2021, 390, 125601. [Google Scholar] [CrossRef]
  29. Zhang, Y.; Levina, E.; Zhu, J. Detecting Overlapping Communities in Networks Using Spectral Methods. SIAM J. Math. Data Sci. 2020, 2, 265–283. [Google Scholar] [CrossRef]
  30. Rohe, K.; Qin, T.; Yu, B. Co-clustering directed graphs to discover asymmetries and directional communities. Proc. Natl. Acad. Sci. USA 2016, 113, 12679–12684. [Google Scholar] [CrossRef]
  31. Wang, Z.; Liang, Y.; Ji, P. Spectral Algorithms for Community Detection in Directed Networks. J. Mach. Learn. Res. 2020, 21, 1–45. [Google Scholar]
  32. Ji, P.; Jin, J. Coauthorship and citation networks for statisticians. Ann. Appl. Stat. 2016, 10, 1779–1812. [Google Scholar] [CrossRef]
  33. Zhou, Z.; Amini, A.A. Analysis of spectral clustering algorithms for community detection: The general bipartite setting. J. Mach. Learn. Res. 2019, 20, 1–47. [Google Scholar]
  34. Laenen, S.; Sun, H. Higher-order spectral clustering of directed graphs. Adv. Neural Inf. Process. Syst. 2020, 33, 941–951. [Google Scholar]
  35. Qing, H.; Wang, J. Directed mixed membership stochastic blockmodel. arXiv 2021, arXiv:2101.02307. [Google Scholar]
  36. Wang, Y.J.; Wong, G.Y. Stochastic Blockmodels for Directed Graphs. J. Am. Stat. Assoc. 1987, 82, 8–19. [Google Scholar] [CrossRef]
  37. Fagiolo, G. Clustering in complex directed networks. Phys. Rev. E 2007, 76, 026107. [Google Scholar] [CrossRef] [PubMed]
  38. Leicht, E.A.; Newman, M.E. Community structure in directed networks. Phys. Rev. Lett. 2008, 100, 118703. [Google Scholar] [CrossRef] [PubMed]
  39. Kim, Y.; Son, S.W.; Jeong, H. Finding communities in directed networks. Phys. Rev. E 2010, 81, 016103. [Google Scholar] [CrossRef]
  40. Malliaros, F.D.; Vazirgiannis, M. Clustering and Community Detection in Directed Networks: A Survey. Phys. Rep. 2013, 533, 95–142. [Google Scholar] [CrossRef]
  41. Zhang, X.; Lian, B.; Lewis, F.L.; Wan, Y.; Cheng, D. Directed Graph Clustering Algorithms, Topology, and Weak Links. IEEE Trans. Syst. Man, Cybern. Syst. 2021, 52, 3995–4009. [Google Scholar] [CrossRef]
  42. Zhang, J.; Wang, J. Identifiability and parameter estimation of the overlapped stochastic co-block model. Stat. Comput. 2022, 32, 1–14. [Google Scholar] [CrossRef]
  43. Florescu, L.; Perkins, W. Spectral thresholds in the bipartite stochastic block model. In Proceedings of the Conference on Learning Theory. PMLR, New York, NY, USA, 23–26 June 2016; pp. 943–959. [Google Scholar]
  44. Neumann, S. Bipartite stochastic block models with tiny clusters. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
  45. Ndaoud, M.; Sigalla, S.; Tsybakov, A.B. Improved clustering algorithms for the bipartite stochastic block model. IEEE Trans. Inf. Theory 2021, 68, 1960–1975. [Google Scholar] [CrossRef]
  46. Mantzaris, A.V. Uncovering nodes that spread information between communities in social networks. EPJ Data Sci. 2014, 3, 1–17. [Google Scholar] [CrossRef]
  47. McSherry, F. Spectral partitioning of random graphs. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, Newport Beach, CA, USA, 8–11 October 2001; pp. 529–537. [Google Scholar]
  48. Massoulié, L. Community detection thresholds and the weak Ramanujan property. In Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 31 May–3 June 2014; pp. 694–703. [Google Scholar]
  49. Mossel, E.; Neeman, J.; Sly, A. Reconstruction and estimation in the planted partition model. Probab. Theory Relat. Fields 2015, 162, 431–461. [Google Scholar] [CrossRef]
  50. Abbe, E.; Bandeira, A.S.; Hall, G. Exact recovery in the stochastic block model. IEEE Trans. Inf. Theory 2015, 62, 471–487. [Google Scholar] [CrossRef]
  51. Hajek, B.; Wu, Y.; Xu, J. Achieving exact cluster recovery threshold via semidefinite programming. IEEE Trans. Inf. Theory 2016, 62, 2788–2797. [Google Scholar] [CrossRef]
  52. Mossel, E.; Neeman, J.; Sly, A. A proof of the block model threshold conjecture. Combinatorica 2018, 38, 665–708. [Google Scholar] [CrossRef]
  53. Qing, H. Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models. Entropy 2022, 24, 1216. [Google Scholar] [CrossRef]
  54. Gillis, N.; Vavasis, S.A. Semidefinite Programming Based Preconditioning for More Robust Near-Separable Nonnegative Matrix Factorization. SIAM J. Optim. 2015, 25, 677–698. [Google Scholar] [CrossRef]
  55. Qing, H. A useful criterion on studying consistent estimation in community detection. Entropy 2022, 24, 1098. [Google Scholar] [CrossRef] [PubMed]
  56. Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
  57. Ke, Z.T.; Jin, J. The SCORE normalization, especially for highly heterogeneous network and text data. arXiv 2022, arXiv:2204.11097. [Google Scholar]
  58. Newman, M.E. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef] [PubMed]
  59. Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (Tist) 2011, 2, 1–27. [Google Scholar] [CrossRef]
  60. Xu, R.; Wunsch, D. Survey of clustering algorithms. IEEE Trans. Neural Netw. 2005, 16, 645–678. [Google Scholar] [CrossRef] [PubMed]
  61. Palmer, W.R.; Zheng, T. Spectral clustering for directed networks. In Proceedings of the International Conference on Complex Networks and Their Applications, Madrid, Spain, 1–3 December 2020; Springer: Cham, Switzerland, 2020; pp. 87–99. [Google Scholar]
  62. Qing, H. Degree-corrected distribution-free model for community detection in weighted networks. Sci. Rep. 2022, 12, 15153. [Google Scholar] [CrossRef] [PubMed]
  63. Erdös, P.; Rényi, A. On the evolution of random graphs. In The Structure and Dynamics of Networks; Princeton University Press: Princeton, NJ, USA, 2011; pp. 38–82. [Google Scholar] [CrossRef]
  64. Chen, Y.; Chi, Y.; Fan, J.; Ma, C. Spectral Methods for Data Science: A Statistical Perspective. Found. Trends Mach. Learn. 2021, 14, 566–806. [Google Scholar] [CrossRef]
  65. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  66. Žalik, K.R.; Žalik, B. Memetic algorithm using node entropy and partition entropy for community detection in networks. Inf. Sci. 2018, 445, 38–49. [Google Scholar] [CrossRef]
  67. Feutrill, A.; Roughan, M. A review of Shannon and differential entropy rate estimation. Entropy 2021, 23, 1046. [Google Scholar] [CrossRef]
  68. Adamic, L.A.; Glance, N. The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, Chicago, IL, USA, 21–25 August 2005; pp. 36–43. [Google Scholar]
  69. Kunegis, J. Konect: The koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 1343–1350. [Google Scholar]
  70. Zhang, H.; Guo, X.; Chang, X. Randomized spectral clustering in large-scale stochastic block models. J. Comput. Graph. Stat. 2022, 31, 887–906. [Google Scholar] [CrossRef]
  71. Tropp, J.A. User-Friendly Tail Bounds for Sums of Random Matrices. Found. Comput. Math. 2012, 12, 389–434. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Illustration for directed network and bipartite network. Panel (a): directed network; Panel (b): bipartite network.
Figure 1. Illustration for directed network and bipartite network. Panel (a): directed network; Panel (b): bipartite network.
Entropy 25 00345 g001
Figure 2. Panel (a): plot of U * and the hyperplane formed by U * ( I r , : ) . Blue points denote rows respective to mixed row nodes of U * , and black points denote the K rows of the corner matrix U * ( I r , : ) . The plane in Panel (a) is the hyperplane formed by the triangle of the 3 rows of U * ( I r , : ) . Panel (b): plot of V and the ideal simplex formed by V ( I c , : ) . Blue points denote rows respective to mixed column nodes of V, and black points denote the K rows of the corner matrix V ( I c , : ) . Since K = 3 , for visualization, we have projected these points from R 3 to R 2 .
Figure 2. Panel (a): plot of U * and the hyperplane formed by U * ( I r , : ) . Blue points denote rows respective to mixed row nodes of U * , and black points denote the K rows of the corner matrix U * ( I r , : ) . The plane in Panel (a) is the hyperplane formed by the triangle of the 3 rows of U * ( I r , : ) . Panel (b): plot of V and the ideal simplex formed by V ( I c , : ) . Blue points denote rows respective to mixed column nodes of V, and black points denote the K rows of the corner matrix V ( I c , : ) . Since K = 3 , for visualization, we have projected these points from R 3 to R 2 .
Entropy 25 00345 g002
Figure 3. Flowchart of Algorithm 1.
Figure 3. Flowchart of Algorithm 1.
Entropy 25 00345 g003
Figure 4. Errors against increasing n 0 . y-axis: MHamm. Panel (a): Experiment 1 (a); Panel (b): Experiment 1 (b); Panel (c): Experiment 1 (c); Panel (d): Experiment 1 (d).
Figure 4. Errors against increasing n 0 . y-axis: MHamm. Panel (a): Experiment 1 (a); Panel (b): Experiment 1 (b); Panel (c): Experiment 1 (c); Panel (d): Experiment 1 (d).
Entropy 25 00345 g004
Figure 5. Errors against increasing z. y-axis: MHamm. Panel (a): Experiment 2 (a); Panel (b): Experiment 2 (b); Panel (c): Experiment 2 (c); Panel (d): Experiment 2 (d); Panel (e): Experiment 2 (e); Panel (f): Experiment 2 (f); Panel (g): Experiment 2 (g); Panel (h): Experiment 2 (h).
Figure 5. Errors against increasing z. y-axis: MHamm. Panel (a): Experiment 2 (a); Panel (b): Experiment 2 (b); Panel (c): Experiment 2 (c); Panel (d): Experiment 2 (d); Panel (e): Experiment 2 (e); Panel (f): Experiment 2 (f); Panel (g): Experiment 2 (g); Panel (h): Experiment 2 (h).
Entropy 25 00345 g005
Figure 6. Errors against increasing β . y-axis: MHamm. Panel (a): Experiment 3 (a); Panel (b): Experiment 3 (b); Panel (c): Experiment 3 (c); Panel (d): Experiment 3 (d); Panel (e): Experiment 3 (e); Panel (f): Experiment 3 (f); Panel (g): Experiment 3 (g); Panel (h): Experiment 3 (h).
Figure 6. Errors against increasing β . y-axis: MHamm. Panel (a): Experiment 3 (a); Panel (b): Experiment 3 (b); Panel (c): Experiment 3 (c); Panel (d): Experiment 3 (d); Panel (e): Experiment 3 (e); Panel (f): Experiment 3 (f); Panel (g): Experiment 3 (g); Panel (h): Experiment 3 (h).
Entropy 25 00345 g006
Figure 7. Errors against increasing ρ . y-axis: MHamm: Panel (a): Experiment 4 (a); Panel (b): Experiment 4 (b); Panel (c): Experiment 4 (c); Panel (d): Experiment 4 (d); Panel (e): Experiment 4 (e); Panel (f): Experiment 4 (f); Panel(g): Experiment 4 (g); Panel (h): Experiment 4 (h).
Figure 7. Errors against increasing ρ . y-axis: MHamm: Panel (a): Experiment 4 (a); Panel (b): Experiment 4 (b); Panel (c): Experiment 4 (c); Panel (d): Experiment 4 (d); Panel (e): Experiment 4 (e); Panel (f): Experiment 4 (f); Panel(g): Experiment 4 (g); Panel (h): Experiment 4 (h).
Entropy 25 00345 g007
Figure 8. Phase transition for DiMSC: darker pixels represent lower error rates. The red lines represent | α in α out | max ( α in , α out ) = 1 . Panel (a): Experiment 5 (a); Panel (b): Experiment 5 (b).
Figure 8. Phase transition for DiMSC: darker pixels represent lower error rates. The red lines represent | α in α out | max ( α in , α out ) = 1 . Panel (a): Experiment 5 (a); Panel (b): Experiment 5 (b).
Entropy 25 00345 g008
Figure 9. Numerical results for Experiment 6. Panel (a): MHamm; Panel (b): running time.
Figure 9. Numerical results for Experiment 6. Panel (a): MHamm; Panel (b): running time.
Entropy 25 00345 g009
Figure 10. Illustration for a directed network under Model Setup 1. Panel (a): Adjacency matrix of N , where black square denotes 1; Panel (b): directed network N , where red (blue) points indicate row (column) nodes. The error rate MHamm defined in Equation (16) of our DiSMC algorithm for this directed network N is 0.0377.
Figure 10. Illustration for a directed network under Model Setup 1. Panel (a): Adjacency matrix of N , where black square denotes 1; Panel (b): directed network N , where red (blue) points indicate row (column) nodes. The error rate MHamm defined in Equation (16) of our DiSMC algorithm for this directed network N is 0.0377.
Entropy 25 00345 g010
Figure 11. Illustration for a directed network under Model Setup 2. Panel (a): Adjacency matrix A; Panel (b): directed network N . MHamm of DiMSC for this directed network N is 0.0424.
Figure 11. Illustration for a directed network under Model Setup 2. Panel (a): Adjacency matrix A; Panel (b): directed network N . MHamm of DiMSC for this directed network N is 0.0424.
Entropy 25 00345 g011
Figure 12. Illustration for a bipartite network under Model Setup 3. Panel (a): Adjacency matrix A; Panel (b): bipartite network N . MHamm of DiMSC for this bipartite network N is 0.0313.
Figure 12. Illustration for a bipartite network under Model Setup 3. Panel (a): Adjacency matrix A; Panel (b): bipartite network N . MHamm of DiMSC for this bipartite network N is 0.0313.
Entropy 25 00345 g012
Figure 13. Illustration for a bipartite network under Model Setup 4. Panel (a): Adjacency matrix A; Panel (b): bipartite network N . MHamm of DiMSC for this bipartite network N is 0.0320.
Figure 13. Illustration for a bipartite network under Model Setup 4. Panel (a): Adjacency matrix A; Panel (b): bipartite network N . MHamm of DiMSC for this bipartite network N is 0.0320.
Entropy 25 00345 g013
Figure 14. Illustration for a directed network under Model Setup 5. Panels (a,b) show the row and column communities, respectively. In these two panels, dots in the same color are pure nodes in the same communities, and a square indicates mixed nodes. MHamm of DiMSC for this directed network N is 0.0181.
Figure 14. Illustration for a directed network under Model Setup 5. Panels (a,b) show the row and column communities, respectively. In these two panels, dots in the same color are pure nodes in the same communities, and a square indicates mixed nodes. MHamm of DiMSC for this directed network N is 0.0181.
Entropy 25 00345 g014
Figure 15. Illustration for a directed network under Model Setup 6. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network N is 0.0185.
Figure 15. Illustration for a directed network under Model Setup 6. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network N is 0.0185.
Entropy 25 00345 g015
Figure 16. Illustration for a directed network under Model Setup 7. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network N is 0.0266.
Figure 16. Illustration for a directed network under Model Setup 7. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network N is 0.0266.
Entropy 25 00345 g016
Figure 17. Illustration for a directed network under Model Setup 8. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network N is 0.0279.
Figure 17. Illustration for a directed network under Model Setup 8. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network N is 0.0279.
Entropy 25 00345 g017
Figure 18. Leading 20 singular values of real-world directed networks used in this paper. Panel (a): political blogs A 1 ; Panel (b): political blogs A 3 ; Panel (c): political blogs A 6 ; Panel (d): political blogs A 9 ; Panel (e): Wikipedia links (gan) A 1 ; Panel (f): Wikipedia links (gan) A 30 ; Panel (g): Wikipedia links (gan) A 60 ; Panel (h): Wikipedia links (gan) A 90 ; Panel (i): Wikipedia links (nah) A 1 ; Panel (j): Wikipedia links (nah) A 20 ; Panel (k): Wikipedia links (nah) A 30 ; Panel (l): Wikipedia links (nah) A 40 .
Figure 18. Leading 20 singular values of real-world directed networks used in this paper. Panel (a): political blogs A 1 ; Panel (b): political blogs A 3 ; Panel (c): political blogs A 6 ; Panel (d): political blogs A 9 ; Panel (e): Wikipedia links (gan) A 1 ; Panel (f): Wikipedia links (gan) A 30 ; Panel (g): Wikipedia links (gan) A 60 ; Panel (h): Wikipedia links (gan) A 90 ; Panel (i): Wikipedia links (nah) A 1 ; Panel (j): Wikipedia links (nah) A 20 ; Panel (k): Wikipedia links (nah) A 30 ; Panel (l): Wikipedia links (nah) A 40 .
Entropy 25 00345 g018
Figure 19. Row and column communities detected by DiMSC for political blogs. Colors indicate clusters, and a green square indicates highly mixed nodes, where the row and column communities are obtained from ^ r and ^ c , respectively. Panel (a): political blogs A 1 ; Panel (b): political blogs A 1 ; Panel (c): political blogs A 3 ; Panel (d): political blogs A 3 ; Panel (e): political blogs A 6 ; Panel (f): political blogs A 6 ; Panel (g): political blogs A 9 ; Panel (h): political blogs A 9 .
Figure 19. Row and column communities detected by DiMSC for political blogs. Colors indicate clusters, and a green square indicates highly mixed nodes, where the row and column communities are obtained from ^ r and ^ c , respectively. Panel (a): political blogs A 1 ; Panel (b): political blogs A 1 ; Panel (c): political blogs A 3 ; Panel (d): political blogs A 3 ; Panel (e): political blogs A 6 ; Panel (f): political blogs A 6 ; Panel (g): political blogs A 9 ; Panel (h): political blogs A 9 .
Entropy 25 00345 g019aEntropy 25 00345 g019b
Figure 20. Row and column communities detected by DiMSC for Wikipedia links (gan) A 90 and Wikipedia links (nah) A 40 . Colors indicate clusters, where the row and column communities are obtained from ^ r and ^ c , respectively. Panel (a): Wikipedia links (gan) A 90 ; Panel (b): Wikipedia links (gan) A 90 ; Panel (c): Wikipedia links (nah) A 40 ; Panel (d): Wikipedia links (nah) A 40 .
Figure 20. Row and column communities detected by DiMSC for Wikipedia links (gan) A 90 and Wikipedia links (nah) A 40 . Colors indicate clusters, where the row and column communities are obtained from ^ r and ^ c , respectively. Panel (a): Wikipedia links (gan) A 90 ; Panel (b): Wikipedia links (gan) A 90 ; Panel (c): Wikipedia links (nah) A 40 ; Panel (d): Wikipedia links (nah) A 40 .
Entropy 25 00345 g020
Table 1. τ r , τ c , and Hamm r c obtained from DiMSC for real-world directed networks used in this paper.
Table 1. τ r , τ c , and Hamm r c obtained from DiMSC for real-world directed networks used in this paper.
Data τ r τ c Hamm rc
Political blogs A 1 0.04550.13530.0893
Political blogs A 3 0.04810.15700.0705
Political blogs A 6 0.03860.13680.0662
Political blogs A 9 0.04430.17720.0771
Wikipedia links (gan) A 1 0.15050.60510.3528
Wikipedia links (gan) A 30 0.08170.19020.0547
Wikipedia links (gan) A 60 0.00540.11450.0664
Wikipedia links (gan) A 90 000.0203
Wikipedia links (nah) A 1 0.27180.35210.2065
Wikipedia links (nah) A 20 0.09370.17220.0488
Wikipedia links (nah) A 30 000.0046
Wikipedia links (nah) A 40 000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qing, H. Estimating Mixed Memberships in Directed Networks by Spectral Clustering. Entropy 2023, 25, 345. https://doi.org/10.3390/e25020345

AMA Style

Qing H. Estimating Mixed Memberships in Directed Networks by Spectral Clustering. Entropy. 2023; 25(2):345. https://doi.org/10.3390/e25020345

Chicago/Turabian Style

Qing, Huan. 2023. "Estimating Mixed Memberships in Directed Networks by Spectral Clustering" Entropy 25, no. 2: 345. https://doi.org/10.3390/e25020345

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop