Community Detection in Multiplex Networks Based on Orthogonal Nonnegative Matrix Tri-Factorization

Networks are commonly used to model complex systems. The different entities in the system are represented by nodes of the network and their interactions by edges. In most real life systems, the different entities may interact in different ways necessitating the use of multiplex networks where multiple links are used to model the interactions. One of the major tools for inferring network topology is community detection. Although there are numerous works on community detection in single-layer networks, existing community detection methods for multiplex networks mostly learn a common community structure across layers and do not take the heterogeneity across layers into account. In this paper, we introduce a new multiplex community detection method that identifies communities that are common across layers as well as those that are unique to each layer. The proposed method, Multiplex Orthogonal Nonnegative Matrix Tri-Factorization, represents the adjacency matrix of each layer as the sum of two low-rank matrix factorizations corresponding to the common and private communities, respectively. Unlike most of the existing methods which require the number of communities to be pre-determined, the proposed method also introduces a two stage method to determine the number of common and private communities. The proposed algorithm is evaluated on synthetic and real multiplex networks, as well as for multiview clustering applications, and compared to state-of-the-art techniques.


I. INTRODUCTION
C OMPLEX networks are usually used to represent many real world systems, ranging from social to biological ones [1], where the different agents and their relations are represented as the nodes and edges of the network, respectively.Traditional network models employ simple graphs, where there is a single edge between any two nodes.Thus, these models cannot capture multiple modes of interaction that may exist between the nodes.Recently, multiplex networks that represent multiple modes of interaction have been proposed.A multiplex network is a multilayer network where all layers share the same set of nodes with different topologies [2].Multiplex networks have been used to model a variety of complex systems, including living organisms, human societies, transportation systems and critical infrastructures [3], [4].
Community detection is an important tool in network analysis, where communities are defined as groups of nodes that are more densely connected to each other than they are to the rest of the network.Most of the existing work on community detection [5] focuses on single layer networks.Community detection methods for multiplex networks [6] can be grouped into three main classes.The first class of methods merges the layers in a multiplex network, using a flattening algorithm, then apply single-layer community detection to the aggregated network [7], [8], [9].While these methods are computationally efficient, they can only identify communities that are common across all layers.Moreover, due to the flattening process, some spurious communities may emerge.The second class of methods applies community detection to each layer individually and then merges the results [10], [11], [12].These methods include nodes in the same community only when they are part of the same community in at least one layer.Finally, the third class of methods operates directly on the multiplex network model [13], [14], [15], [16], [17], [18].
Existing multiplex community detection approaches typically assume that the community structure is the same across layers and find the partition that best fits all layers.Thus, they do not differentiate between communities that are common across layers from those that are unique to each layer.This is particularly important for real world applications where the networks are heterogeneous, and the different layers correspond to different modes of interaction.For example, in social networks, a group of individuals may be well connected via friendships on Facebook; however, this group of individuals will likely not work at the same company.Thus, in a situation like this, a given community will only be present in a subset of the layers, and different communities may be present in different subsets of layers.
In this work, common communities are defined as communities that are observed in more than one layer, i.e., communities that are common across any subset of two or more layers, and private communities as communities that are unique to each layer.The problem of detecting common and private communities is then formulated using a novel framework titled Multiplex Orthogonal Nonnegative Matrix Trifactorization (MX-ONMTF).In the proposed framework, each layer's adjacency matrix is represented as the sum of two low-rank matrix factorizations corresponding to the common and private communities, respectively.The resulting optimization problem is solved using an iterative multiplicative update algorithm.The proposed approach also addresses the problem of determining the number of communities.Unlike most existing work, where the number of communities is determined through a greedy search, in this paper, a twostep approach is proposed.The proposed algorithm is first evaluated on synthetic benchmark multiplex networks with arXiv:2205.00626v2[cs.SI] 8 Jan 2023 different numbers of layers, nodes, communities, noise levels, and inter-layer dependency probability.Next, the proposed method is applied to real networks including social and biological networks.Finally, the algorithm is evaluated for multiview clustering task, where the communities across all layers are assumed to be the same.
This paper extends our prior work [19] where the community detection problem is formulated for two-layer multiplex networks.This work introduces significant new contributions.First, MX-ONMTF is generalized for multiplex networks with L layers where the common communities can be observed for a subset of layers.Second, a new two stage method is introduced to determine the number of common communities such that communities that are common across two or more layers can be identified.A detailed theoretical analysis of the convergence of the algorithm and recovery guarantees is also provided.Finally, this paper includes an extensive evaluation of the method on synthetic and real networks, and multiview clustering applications.
The rest of the paper is organized as follows.Section II presents a summary of related works.Section III provides background on community detection, multiplex networks, and orthogonal nonnegative matrix tri-factorization.Sections IV and V present the proposed multiplex community detection algorithm and its convergence analysis.Section VI establishes the theoretical properties of the algorithm, while Section VII illustrates results on both simulated and real networks.Finally, Section VIII provides conclusions and discussion on future work.

II. RELATED WORKS
The method proposed in this paper belongs to the third class of algorithms, which operate directly on the multiplex network model.There are different types of algorithms that fall in this class: random walk, statistical generative network models, label propagation, objective function optimization, and Nonnegative Matrix Factorization (NMF).
Methods based on random walkers model the dynamic process on networks as random walks where the process is more likely to persist on the vertices in the same community and far less on the vertices in different communities.For instance, LART [13] is initialized by assigning each node in each layer to its own community.Hierarchical clustering is then used to merge nodes based on a distance matrix.The partition with the highest multiplex modularity is chosen.In [14], Infomap which is based on a compression of network flows is proposed to identify communities within and across layers.However, Infomap tends to assign each physical node across layers to the same community, not differentiating the topological differences across layers.
Statistical methods such as [20] use Weighted Stochastic Block Model (WSBM) to detect common and private communities in heterogeneous weighted networks.Although this method addresses the heterogeneity of networks across layers, the method is limited to detecting only common communities that are shared by all layers, ignoring communities that may be shared by only a subset or different subsets of layers.
In [21], authors propose a generative model and an expectation maximization algorithm for community detection and link prediction in multilayer networks.Although the method allows for different connectivity patterns in each layer, the interdependence between layers is only taken into account for link prediction, while the layers are assumed to share a common community structure.
The third class of methods, Label Propagation Algorithms (LPA), is based on the intuition that a label can become dominant in a densely connected group of nodes but will have trouble crossing a sparsely connected region.In [22], an LPAbased method for community detection in multidimensional networks is proposed to identify communities and the subset of layers in which each of these communities is observed, simultaneously.However, this algorithm fails to detect communities that are private to each layer and communities that may be common among a small number of layers.
The fourth type of multiplex community detection methods is based on defining an objective function and identifying the community structure that maximizes/minimizes the objective function.For example, Generalized Louvain (GenLouvain) [15] uses an extended definition of modularity and is one of the fastest methods for community detection in multiplex networks.As GenLouvain assigns each node-layer tuple to its own community, it cannot identify common communities across layers.More recently, multiobjective genetic and evolutionary algorithms such as MultiMOGA [16] and MOEA/D-TS [23] have been used to jointly maximize the modularity of each layer and the similarity between the community structures across layers.These methods find a shared community structure across all layers, not differentiating communities that may be unique to each layer.In [24], extension of normalized cut to multiplex networks is proposed by constructing a block Laplacian matrix with each block corresponding to a layer.This method relies on selecting a parameter β that controls the consistency of the community structure across different layers.
The last class of methods is based on NMF which, because of its interpretability and good performance, has been broadly used for community detection in single-layer, multiplex, multilayer, and dynamic networks [25], [26], [27], [28].In [29], Semi-Supervised joint Nonnegative Matrix Factorization (S2-jNMF) is proposed for detecting the common communities across layers in a multiplex network.A greedy search of dense subgraphs is performed and these subgraphs are used as a priori information to create new adjacency matrices for each layer.In [30], a two-step approach is proposed, where first a nonnegative low dimensional feature representation of each layer is found using one of the four different NMF models.These community structures are then used to obtain a consensus community structure.Authors in [31] use NMF for detecting communities in multiplex social networks, where both unifying and coupling approaches are proposed.The unifying approach finds a common community structure by aggregating all layers, while the coupling approach finds mostly consistent community structures.Most of the aforementioned NMF based methods find a common structure across all layers or for a majority of layers and do not consider cases where common communities may be present in different subsets of layers.Moreover, they do not detect private communities.These methods also require that the number of communities is provided a priori.

A. Community Detection
Community detection for a single-layer network is the partitioning of a node set V as C = {C 1 , ..., C K } where K is the number of communities.One of the most popular algorithms for partitioning graphs is the minimum cut method and its variants such as ratio cut and normalized cut [32].In these methods, the network is partitioned such that the number of edges between different communities is minimized.
Given a single layer graph, G = {V, E, A}, where V , E and A are the set of nodes, edges, and adjacency matrix of the graph, respectively, the min-cut problem aims to find a partition by minimizing the following objective function: where Cp is the complement of C p and Cut(C p , Cp ) = i∈Cp,j∈ Cp A ij for two disjoint sets C p and Cp .The min-cut problem is NP-hard.However, it has been shown [32], [25] that spectral clustering and nonnegative matrix factorization provide solutions to relaxed versions of the min-cut problem.
In particular, spectral clustering solves the min-cut problem by embedding each node in a lower dimensional subspace spanned by the eigenvectors of the normalized Laplacian matrix L, where L = D −1/2 (D − A)D −1/2 , with D being the diagonal matrix defined as D ii = j A ij .k-means clustering is then applied to this low-dimensional subspace spanned by the eigenvectors.Authors in [25] show that NMF is equivalent to Laplacian based spectral clustering.Normalized Cut using the normalized adjacency matrix, Ã = D −1/2 AD −1/2 , is equivalent to the nonnegative matrix factorization problem argmin

B. Multiplex Networks
Multiplex networks can be represented using a finite sequence of graphs {G l }, where l ∈ {1, 2, . . ., L}, G l = (V l , E l , A l ) [33].V l is the set of nodes in layer l and A l ∈ R n×n is the adjacency matrix for layer l.In this paper, we use undirected (symmetric) weighted and binary adjacency matrices.For a weighted adjacency matrix, A lij ∈ [0, 1], and for a binary adjacency matrix, A lij ∈ {0, 1}.

C. Orthogonal Nonnegative Matrix Tri-Factorization
Nonnegative Matrix Factorization decomposes a nonnegative matrix W ∈ R n×m into the product of two low-rank nonnegative matrices V ∈ R n×k and U ∈ R m×k , such that W ≈ VU and k n, m.V and U are found by solving the optimization problem argmin In NMF-based community detection, V and U are the community feature matrix and the community indicator matrix, respectively.Adding orthogonality constraints improves the performance of NMF as orthogonality and nonnegativity force each row of V (U) to have only one nonzero element which implies that each node belongs only to one community.Orthogonal NMF has been broadly used in community detection where W is usually the adjacency matrix and k is the number of communities [34], [35].Different extensions of NMF, such as Symmetric Nonnegative Matrix Tri-factorization [36], where W is approximated by USU , have been proposed for community detection.U contains community membership information while the symmetric matrix S ∈ R k×k provides more degrees of freedom to the approximation.In this paper, we will use Orthogonal Nonnegative Matrix Tri-factorization (ONMTF) to formulate the community detection problem in multiplex networks.

IV. PROPOSED METHOD (MX-ONMTF)
The proposed method, MX-ONMTF, models each layer's adjacency matrix as a sum of low-rank representations of common and private communities using Orthogonal Nonnegative Matrix Tri-Factorization (ONMTF).Fig. 1 illustrates the overview of the proposed algorithm for a multiplex network with L layers and two common communities.

A. Problem Formulation
In this paper, we define common communities as communities that are observed in more than one layer.Definition 1: An ideal common community C in a multiplex network {G l } is defined as a subgraph with the same set of nodes for a subset of layers m ⊆ {1, 2, . . ., L}, where |m| > 1. Mathematically, C can be defined as Definition 2: A private community C in a multiplex network {G l } is defined as any community that is not common across at least two layers.For a multiplex network with L layers and adjacency matrices, A l ∈ R n×n , l ∈ {1, 2, . . ., L}, we model each layer's adjacency matrix in terms of common and individual communities using ONMTF.The resulting objective function can be formulated as where H ∈ R n×kc and H l ∈ R n×kp l , l ∈ {1, 2, . . ., L} are the community membership matrices corresponding to the common and private communities, respectively, and S l and G l are symmetric matrices.In this work, it is assumed that the L layers have a total of k c common communities and k p l private communities in each layer l.The goal is to simultaneously identify communities that are common across any subset of two or more layers and communities that are unique to each layer.Therefore, H will contain information for all common communities.

B. Optimization solution
ONMTF optimization problem in (3) can be solved using a multiplicative update algorithm (MUA) [36].Multiplicative update algorithms for solving NMF problems were introduced in [37], while solving NMTF with orthogonal constraints was first addressed by [36].In this paper, we follow their approach to derive the multiplicative update rules for each variable.
To find the update rules for H, H l , S l , and G l , the following Lagrangian function with Lagrange multipliers Λ and Λ l is minimized: For updating H, we find ∇ H L as ( Applying the KKT conditions ∇ H L = 0 and ∇ Λ L = 0, we obtain: Substituting (i) and (ii) in Eq. ( 5), we get As discussed in [38], if the gradient of an error function, ε, is of the form ∇ε = ∇ε + − ∇ε − , where ∇ε + > 0 and ∇ε − > 0, then the multiplicative update for parameter Θ has the form Θ = Θ ∇ε − ∇ε + .It can be easily seen that the multiplicative update preserves the nonnegativity of Θ, while ∇ε = 0 when the convergence is achieved.Following this procedure, from the gradient of the error function in Eq. ( 6), we derive the following multiplicative update rule for , (7) where the multiplication and division are performed elementwise and both numerator and denominator are positive.Similarly, we obtain the following update rules for H l , S l , and G l , for each l ∈ {1, 2, . . ., L}: Since NMF algorithms are initialized with random matrices, different runs yield local minima.For this reason, we run the algorithm 50 times and report the best results [39], [40].As shown in Algorithm 1, for each random initialization of H, H l , S l , and G l , the multiplicative update rules described in Eqs. ( 7)-( 10) are repeated for 1000 iterations or until convergence.We then select the solution that yields the maximum value of the performance metric across the different runs.For synthetic networks for which a ground truth is available, Normalized Mutual Information (NMI) [41] is used.For networks without ground truth, Modularity Density (Q D ) [42] is used as the performance metric.Randomly initialize H, H l , S l , G l > 0 4: for 1000 iterations or until convergence do 5: update H according to Eq. ( 7) 6: update H l for each l ∈ {1, 2, . . ., L} according to Eq. ( update S l for each l ∈ {1, 2, . . ., L} according to Eq. ( update G l for each l ∈ {1, 2, . . ., L} according to Eq. ( 10) end for 10: for each layer l do 11: Apply Algorithm 3 with A l , H, and k p l as inputs to to find H c l . 12: for each i do 13: end for

C. Number of communities
In most NMF-based community detection algorithms, the number of communities (k) is an input parameter.This problem is usually addressed by detecting communities with different values of k and selecting the one that gives the solution with the best pre-determined performance metric, such as modularity [43].
In this paper, a two-step approach is proposed to determine the number of communities per layer and the number of common communities.First, the number of communities per layer (k 1 , k 2 ,. . ., k L ), is found using the eigengap rule [44].Next, ONMTF is applied to each layer [36] and the low-rank embedding matrices, U l ∈ R n×k l , are obtained.Each element of the embedding matrices, U l (i, j), represents the likelihood of node i belonging to community j.An agglomerative hierarchical clustering algorithm using Euclidean distance is applied on the rows of X = [U 1 , U 2 , . . ., U L ] ∈ R m×n , where m = L l=1 k l , to obtain the number of common communities.At each step of the algorithm, the two columns with the smallest distance are aggregated, and the distances between the newly formed cluster and the remaining ones are updated.A dendrogram like the one shown in Fig. 2 Randomly initialize U l ≥ 0, S l ≥ 0. 6: for 1000 iterations or until convergence do This agglomerative hierarchical clustering algorithm outputs a matrix F ∈ R m−1×3 .The first two columns of F(i, :) correspond to the labels of the two leaves of the dendrogram that form cluster m + i and the third column contains the distance between these two leaves.This distance matrix F is used in Algorithm 2 to determine the number of common communities, k c , and the number of private communities per layer, k p l .The algorithm iterates until the minimum distance between any two clusters increases by more than 50% of the minimum distance from the previous iteration.for each node i do 3: end for 6: H(f ind(idx = j), j) = 0 B 0 ← A l (H(:, j)H(:, j) ) 13: end for 16: [q sorted , J] ← sort(q, descending).J contains the sorted indices of the elements of q. 17: pos ← J(1 : k p l ).

E. Time complexity
The time complexity of the proposed algorithm is mostly due to the Multiplicative Rules, Eqs. ( 7)- (10).The time complexity for the product of two matrices, e.g., the product of a m×k matrix by a k×n matrix, is O(mkn).Table II shows the time complexities of Eqs. ( 7)-( 10) and the total complexity, with l ∈ {1, 2, . . ., L}, and k = k c + L l=1 k p l .

F. Storage complexity
The storage complexity of our algorithm is determined by the sizes of the matrices H, H l , S l , and G l .It can be seen that the total storage complexity is O(nk).For a multiplex network of size n × n × L, this is a significant reduction in memory cost.V. CONVERGENCE ANALYSIS In this section, we will prove the convergence of the multiplicative update rule defined by Eq. ( 7) using the auxiliary function approach.As the other update rules are similar, we will not explicitly prove their convergence.We first introduce the definition of auxiliary function as follows.Definition 1: A function Z(H, H t ) is called an auxiliary function of L(H) if it satisfies

Z(H, H t ) ≥ L(H) and Z(H, H) = L(H).
The auxiliary function is a useful concept because of the following lemma which is proved in [37].Lemma 1.If Z is an auxiliary function, then L is nonincreasing under the update Theorem 1.Given H l , S l , and G l the Lagrangian function L(H) is monotonically decreasing under the update rule (7).
Proof.For convenience, let L(h) denote the part of L(H) dependent on H ij .From Eq. ( 6) we have The second-order derivative of L(h) with respect to h ij is Let h t ij denote the updated value of h ij after the tth iteration, then the Taylor series expansion of L(h) at h t ij can be written as Now, the key is to find an appropriate auxiliary function Z(h, h t ij ).We choose the following Z(h, h t ij ) and prove in Appendix A, that it satisfies the conditions to be an auxiliary function of L(h).

Z(h, h
According to Lemma 1, we must find the minimum of Z(h, h t ij ) with respect to h.

∂Z(h, h
Replacing L (h t ij ) in the equation above and canceling the common terms, we obtain Replacing h by h t+1 ij we obtain the following update rule , which is the same as the update rule shown in Eq. ( 7).

VI. RECOVERY GUARANTEES
In this section, we will establish the theoretical properties of the proposed community detection method.We want to determine if, by optimizing the objective function, our algorithm will return good community structures.In particular, we will investigate the consistency properties of the global optimizer of the objective function under the multilayer stochastic blockmodel (MLSBM).The optimization problem in (3) can be rewritten as where F l is a block matrix defined as is the concatenation of the community membership matrices of the common and private communities, H ∈ R n×kc and H l ∈ R n×kp l , respectively.For the n × n × L population adjacency tensor A = {A 1 , ..., A L }, we can define a multiplex SBM [Z, Θ] as in [20], with each of the L slices A l ∈ R n×n .The mulitplex SBM with parameters [Z = {Z 1 , ..., Z L }, B = {Θ 1 , ..., Θ L }], can be written in the matrix form as, for each layer l.Θ l is a block matrix defined as with Θ c and Θ p l being the affinity probability matrices of the common and private communities, respectively.
is the concatenation of the community membership matrices of the common and private communities, Z c ∈ R n×kc and Z p l ∈ R n×kp l , respectively.
To prove that our method can correctly recover the community assignments, we propose the following lemma following the work in [45].
Lemma 2. The optimization problem in (3) applied to A has H l = Z l (Z l Z l ) −1/2 and F l = (Z l Z l ) 1/2 Θ l (Z l Z l ) 1/2 , l = 1, ..., L, as the unique solution up to an orthogonal matrix, provided at least one of the Θ l is full rank.Proof.To prove lemma 2, we can show that a solution to the optimization problem in (12).Substituting the solution to (12), we have the value of this minimization objective function is 0, and Now, we need to show the uniqueness of this solution.By assumption, at least one of the Θ l is full rank.For a non-singular matrix P ∈ R (kc+kp l )×(kc+kp l ) we can say that H l P and P −1 Θ l P −1 is also a solution.Due to the orthogonality constraint, we must have (H l P) H l P = P (Z l Z l ) −1/2 Z l Z l (Z l Z l ) −1/2 P = I, which implies P P = I, and therefore the solution is unique up to an orthogonal matrix.Moreover, since Q −1/2 = (Z l Z l ) −1/2 is a diagonal matrix with positive elements and therefore invertible, we have that

VII. EXPERIMENTS
A. Synthetic Multiplex Networks 1) Model description: Multiplex benchmark networks based on the model described in [46], [47] were generated.The authors in [46] propose a two-step approach to generate multilayer networks with a community structure.First, a multilayer partition with the user-defined number of nodes in each layer, number of layers, and an interlayer dependency tensor that specifies the desired dependency structure between layers is generated.Next, for the given multilayer partition, edges in each layer are generated following a degree-corrected block model [48] parameterized by the distribution of expected degrees and a community mixing parameter µ ∈ [0, 1].The mixing parameter µ controls the modularity of the network.When µ = 0, all edges lie within communities, whereas µ = 1 implies that edges are distributed independently.For multiplex networks, the probabilities in the interlayer dependency tensor are the same for all pairs of layers and are specified by p ∈ [0, 1].When p = 0, the partitions are independent across layers while p = 1 indicates an identical partition across layers.
In this paper, we extend the model described above to generate multiplex benchmark networks with common and private communities.We first generate the common communities by randomly selecting n c nodes across all layers and setting the inter-layer dependency probability to p 1 .For each common community, we decide whether it exists in a particular layer or not.Next, we independently generate the private communities for each layer with the remaining nodes in that layer.We generated 100 different random realizations of each multiplex network in order to report the average performance metric on the experiments.
3) Experiment 1: In this experiment, we generated two different types of networks, one where the common communities are present across all layers and another where the common communities are present in different subsets of layers.Fig. 3 shows the results for the networks with 2 common communities across all layers for 3 (3a), 4 (3b), and 5 layers (3c), and for the networks with 3 common communities across different subsets of layers for 3 (3d), 4 (3e), and 5 layers (3f).The results indicate that our method performs well for both networks with common communities across all layers as well as for networks with common communities that do not span all layers.Our method discovers the complete structure of the network rather than forcing it to have a consensus partition.Moreover, our method is robust to noise for larger values of µ compared to the other methods.We can also conclude that our algorithm performs better when the common communities are across all layers (see Fig. 3a, 3b, 3c) than when the multiplex community structure is more complex with common communities across subsets of layers (see Fig. 3d, 3e, 3f ), but it still outperforms the rest of the methods.From Fig. 3 we can see that GenLouvain performs well when µ is small, but its performance deteriorates for µ values above 0.6.Another observation is that when the number of layers is small, the NMF-based methods perform closer to GenLouvain but when the number of layers increases, NMF algorithms perform worse.This is because these NMF methods perform aggregation on either the adjacency or the community indicator matrices.When there is more variation across layers, these methods fail to capture this heterogeneity.
4) Experiment 2: In the second experiment, we evaluated the robustness of the algorithm against variations in the common community structure by fixing µ = 0.1 and varying the inter-layer dependency probability, p 1 , i.e., the common communities are allowed to vary across layers.The performance of all methods for a 5-layer network are reported in Fig. 4 based on the average NMI over 100 realizations of the network.As we can see in Fig. 4, our method still outperforms the other seven methods when there is some variation in the common community structure.This demonstrates that our method is robust to variations of the common community structure across layers.However, our algorithm is more sensitive to the drop in p 1 than the rest of the methods.This is because when the common communities have a high variation our algorithm may try to assign some of those nodes to the private communities.munities are common across layers and there are no private communities.As we can see in Fig. 5, as k c is increased, the performance of all the other methods improves, as expected, because these methods are designed to detect the common community structure.When the communities are common across layers, most of the methods, except Infomap, converge to the same NMI value.The performance of MX-ONMTF is not affected by increasing k c , and when all communities are common it performs similarly to the other methods.

6) Scalabilty Analysis:
In this experiment, we evaluate the effect of network size on the run time of the proposed algorithm.For this purpose, we fixed µ = 0.3, L = 5, k c = 3 and varied n from 32 to 8192.From Fig. 6, it can be seen that our method's run time is almost log-linear.This is comparable with all the other NMF based methods.However, our as shown in the previous experiments, our method performs better.Most of this time complexity is due to the multiplicative update rule used in NMF-based algorithms and can be reduced using alternative approaches as discussed in [50].[51] is a multiplex social network with 71 nodes and three layers representing Co-work, Friendship and Advice relationships between partners and associates of a corporate law firm.This data set also includes information about some attributes of each node such as status, gender, office location, years with the firm, age, type of practice, and law school.
Applying MX-ONMTF to this network, we obtain one common community across all layers composed of the nodes colored in red as well as private communities for each layer, as shown in Fig. 7.This network does not have ground truth community structure, but we can compute the NMI between the detected community structure and each of node attributes, i.e., metadata, to gain better insight into the results and to be able to provide quantitative results [18].For each of the attributes, the nodes are divided into communities based on that particular attribute.For example, for the status, the network is divided into two communities, partners and associates.For Age and Seniority, the nodes were grouped into five-year bins.The community structure for each attribute is used as ground truth to compute the NMI between each attribute and the community structure detected by our method.The NMI values given in Table III for the partition obtained by our method, suggest that office location and type of practice (litigation or corporate) are highly correlated with community membership across co-work, friendship and advice relationships.We can also see that the partition detected by MX-ONMTF has greater NMI values for each of the attributes.Therefore, our method detects a community structure that takes all of the attributes into account instead of partitioning with respect to just one attribute as the Aggregated Average does.
2) C. Elegans Network: C. Elegans Network [52], [53] is a multiplex network with 279 nodes and 3 layers representing different synaptic junctions (electric, chemical monadic, and polyadic) of 279 neurons of the Caenorhabditis Elegans connectome.Information about different attributes of the neurons in this dataset such as the group of neuron they belong to (bodywall, mechanosensory, ring interneurons, head motor neurons, etc.), the type of neuron (motor neurons, sensory neurons, interneurons), and the color (blue, red, yellow, orange, etc.) is available.
Table IV shows the NMI values between the community structures detected by each method and each of the three attributes available for this dataset.The partition detected by MX-ONMTF has greater NMI values for each of the attributes compared to the other four methods.3) YeastLandscape Multiplex Network: Yeast Landscape is a multiplex genetic interaction network of a specie of yeast, Saccharomyces Cerevisiae [54], [52].This network has 4458 nodes and 4 layers representing the positive and negative interaction networks of genes in Saccharomyces cerevisiae and positive and negative correlation based networks in which genes with similar interaction profiles are connected to each other.For this paper, we use the bioprocess annotations of the genes available on the supplementary data file S6 of [54] as ground truth.We divided the genes into 18 groups according to their primary bioprocess.There were 1580 genes in this network without attributes.
Table V shows the NMI values between the community structures detected by each method and the bioprocess of the genes.MX-ONMTF gives the highest NMI value followed by the other NMF-based community detection methods.

C. Multiview Networks
In order to evaluate the performance of our method on networks where the communities are common across all layers, we use two multiview data sets, UCI Handwritten Digits 1 [55] and Caltech [56].
The UCI Handwritten Digits data set consists of features of handwritten digits from (0-9) extracted from a collection of Dutch utility maps.There is a total of 2000 patterns that have been digitized in binary images, 200 patterns per digit.These digits are represented by six different feature sets: Fourier coefficients of the character shapes, profile correlations, Karhunen-Loève coefficients, pixel averages in 2 × 3 windows, Zernike moments, and morphological features.Each layer of the multiplex network represents one of the 6 features.The graphs are constructed using k-nearest neighbors graphs with the nearest 50 neighbors and Euclidean distance.
Caltech-101 is a well-known object recognition dataset that consists of pictures of objects belonging to 102 categories.There are about 40 to 800 images per category for a total of 9144 images.This dataset consists of 6 types of features extracted from each image.A multiplex network with 6 layers representing each of the features, 102 classes, and 9144 nodes is constructed from this dataset using k-nearest neighbors graphs with the nearest 50 neighbors.A smaller version of this dataset is also used in these experiments, where only 20 objects are selected, resulting in a multiplex network with 6 layers, 20 communities, and 2386 nodes.
In this case, as we have the true class assignment, we compute the NMI with respect to this ground truth.As it can be seen in Table VI, our method performs better than the rest of the methods for the three networks.This indicates that even 1 https://archive.ics.uci.edu/ml/datasets/Multiple+Features in cases where there are no private communities, our method is successful at obtaining the consensus community structure, thus can be used as an alternative to multiview clustering.

VIII. CONCLUSIONS
In this paper, we proposed a multiplex community detection method based on ONMTF.The proposed method, MX-ONMTF, is able to detect both common and private communities across layers, allowing us to differentiate between the topologies across layers.The proposed algorithm is based on multiplicative update rules and a proof of convergence is provided.A new approach based on the eigengap criterion is introduced for determining the number of communities.Results for both synthetic and real-world networks show that our method performs better than existing community detection methods for multiplex networks as it is able to handle the heterogeneity of the network topology across layers.Moreover, experiments on multiview networks show that our method also performs well in cases where a consensus community structure is needed.

APPENDIX A AUXILIARY FUNCTION PROOF
Proposition 1.The following function, Z(h, h t ij ), is an auxiliary function of L(H), Proof.First, when h = h t ij the equality Z(h, h) = L(h) holds.Now, we need to show that Z(h, h t ij ) ≥ L(h).It can be seen that the first and second terms of Z(h, h t ij ) are greater than the first and second terms in L(h).Therefore, it suffices to show that  11) is an auxiliary function of L(h).
NMI r or Q Dr .22: end for 23: Choose the partition r * = argmax r NMI r (r * = argmax r Q Dr ).

Fig. 2
shows the dendrogram of the hierarchical clustering of the columns of the embedding matrices U 1 , U 2 , U 3 of a 3-layer network with k 1 = 6, k 2 = 6, and k 3 = 5 and the red line indicates where the algorithm stops.For this example, k c = 3, k p1 = 3, k p2 = 4, and k p3 = 3.D. Determining the common community labels for each layerH ∈ R n×kc is the community membership matrix corresponding to the common communities.Each row of H indicates whether a particular node belongs to any of the k c common communities.Algorithm 3 describes how the

Fig. 2 :
Fig. 2: Example dendrogram illustrating the hierarchical clustering of the columns of the embedding matrices U 1 , U 2 , U 3 for a 3-layer network with 3 common communities.The red line indicates where the algorithm stops.

Fig. 3 :
Fig. 3: Mean NMI over 100 realizations of (a)-(c) 3-layer, 4-layer and 5-layer benchmark networks, respectively, for the scenario with 2 common communities across all layers; (d)-(f) 3-layer, 4-layer and 5-layer benchmark networks, respectively, for the scenario with 3 common communities across different subsets of layers.All networks are generated with 8 different values of the mixing parameter µ and n = 256.

Fig. 4 :
Fig. 4: 5-layer network generated with 6 different values of the interlayer dependency probability p 1 , with µ = 0.1, and n = 256 5) Experiment 3: Another parameter in our model is the number of common communities k c .In this experiment, we fixed n = 256, µ = 0.3, and the number of communities in each layer, and varied k c from 1 to 7. When k c = 7, all com-

Fig. 5 :
Fig. 5: 5-layer network generated with different values of common communities k c across layers.

Fig. 7 :
Fig. 7: Communities detected in Lazega Law Firm network across the three layers, advice, friendship, and co-work relationships.Red nodes are in the common community across the three layers.
2: for r=1 to 50 do 3: can be used to represent the different iterations of this algorithm.The m leaves of the dendrogram correspond to the total number of Algorithm 2 Finding k l , k c and k p l , for l ∈ {1, 2, . . ., L} .Input: Adjacency matrices A l for l ∈ {1, 2, . . ., L}. Output: Number of common communities k c , total number of communities per layer k l , and number of private communities per layer k p l , for l ∈ {1, 2, . . ., L}. 1: Let L null l be the normalized Laplacian the of Erdős-Rényi null model. 2:

TABLE I :
Computational complexity of updating each variable per iteration.

TABLE II :
Storage complexity of each variable.

TABLE III :
NMI of the obtained community partition for each method and the metadata available for the Lazega Law Firm Multiplex Network.

TABLE IV :
NMI of the obtained community partition for each method and the metadata available for C. Elegans Network.

TABLE V :
NMI of the obtained community partition for each method with respect to the metadata available for YeastLandscape Network.

TABLE VI :
NMI of the obtained community partition for multiview networks.
L l=1 (4H l G l H l H t S l +4H t H t A l H t S l )ij It can be shown that (H l G l H l H t S l ) ij h t (H t H t A l H t S l ) ij h t ij = p h t ip (H t A l H t S l ) pj h t ij ≥ (H t A l H t S l ) jj , (H t H t A l H t S l ) ij h t ij = p,q h t ip h t qp (A l H t S l ) qj h t ij ≥ h t ij (A l H t S l ) ij , (H t H t A l H t S l ) ij h t ij = m,b (H t H t A l ) im h bj mb h t ij ≥ (H t H t A l ) ii S ljj .H t A l H t S l ) ij h t ij ≥ (H t A l H t S l ) ij + h t ij (A l H t S l ) ij +(H t H t A l ) ii S ljj ,and thus3 L l=1 (4H l G l H l H t S l +4H t H t A l H t S l )ij ij = p,q (H l G l H l ) ip h qj pq h t ij ≥ (H l G l H l ) ii S jj ,