Community Detection in Fully-Connected Multi-layer Networks Through Joint Nonnegative Matrix Factorization

Modern data analysis and processing tasks typically involve large sets of structured data. Graphs provide a powerful tool to describe the structure of such data, where the entities and the relationships between them are modeled as the nodes and edges of the graph. Traditional single layer network models are insufficient for describing the multiple entity types and modes of interaction encountered in real-world applications. Recently, multi-layer network models, which consider the different types of interactions both within and across layers, have emerged to model these systems. One of the important tools in understanding the topology of these high-dimensional networks is community detection. In this paper, a joint nonnegative matrix factorization approach is proposed to detect the community structure in multi-layer networks. The proposed approach models the multi-layer network as the union of a multiplex and bipartite network and formulates community detection as a regularized optimization problem. This optimization problem simultaneously finds the nonnegative low-rank embedding of the intra- and inter-layer adjacency matrices while minimizing the distance between the two to guarantee pair-wise similarity across embeddings. The proposed approach can detect the community structure for both homogeneous and heterogeneous multi-layer networks and is robust to noise and sparsity. The performance of the proposed approach is evaluated for both simulated and real networks and compared to state-of-the-art methods.


I. INTRODUCTION
Many real-world systems, including social and biological systems, are often represented as complex networks capturing the interactions between multiple agents [1], [2]. The different agents are represented as the nodes of the network, and the relationships among them are encoded by the edges of the network. In many real-world systems, multiple modes of interaction may exist between the different entities. These interactions may be interdependent, revealing different kinds of structure in the network. Some examples include social networks where the same people may interact in different ways such as friendship and collaborations or transportation networks where the different cities may be connected through different modes of transportation such as by air, train or bus [3], [4]. These networks can be modeled using multi-layer The associate editor coordinating the review of this manuscript and approving it for publication was Md. Abdur Razzaque . network models where each type of link can be separated into its own layer, thereby connecting the same set of nodes in multiple ways [5]. However, understanding the large-scale structure of multi-layer networks is made difficult by the fact that the patterns of one type of link may be similar to, uncorrelated with, or different from the patterns of another type of link. These differences from layer to layer may exist at the level of individual links, connectivity patterns among groups of nodes, or even the hidden groups themselves to which each node belongs. Therefore, finding community structure in multi-layer networks is an important problem and requires simultaneously considering the interdependence between layers and accurately defining intra-layer and interlayer communities [6], [7].
Existing literature on community detection in multi-layer networks, e.g., [8]- [13], focuses primarily on multiplex networks, where each layer has the same set of entities of the same type. In these networks, intra-layer edges are shown explicitly while inter-layer edges are not given. Therefore, the existing methods are not directly applicable to fully connected multi-layer networks where both intra-and inter-layer edges are given explicitly. Moreover, existing approaches to multi-layer community detection mostly focus on extracting the common (or consensus) community structure across layers. As such, they are not very accurate when the layers are heterogenous.
In this paper, we address this issue by introducing a joint nonnegative matrix factorization based approach for community detection in fully-connected homogeneous and heterogeneous networks. The contributions of the proposed framework are three fold. First, the proposed approach models a multi-layer network as the union of a multiplex network and a bipartite network, where the intra-layer edges correspond to the edges of the multiplex network and the interlayer edges correspond to the edges of the bipartite network. In this manner, the different roles that these edges play in community formation are explicitly taken into account. The respective normalized cut minimization problems for single layer and bipartite networks are then combined with the added constraint that the community assignments are consistent across the two types of networks. Second, the proposed algorithm can detect the community structure in both homogeneous and heterogeneous binary and weighted multi-layer networks efficiently and does not require all the layers to have the same number of nodes. Finally, the proposed optimization based framework is robust to noise and outliers as it employs L 2,1 -norm. This paper is organized as follows: Section II reviews the related work. Section III provides background on graph theory, nonnegative matrix factorization, community detection and asymptotical surprise metric. In Section IV, the proposed approach along with its solution and proof of convergence are presented with the additional details provided in Appendix A-D. Experiments and results are presented in Section V. Finally, the conclusions are summarized in Section VI.

II. RELATED WORK
Unlike the vast literature on community detection on single layer (monoplex) networks [14]- [16], the literature on community detection on multiplex and multilayer networks have been relatively scarce [17]. Majority of the existing work focuses on multiplex networks where there are no explicit inter-layer connectivity. Early work in this area focused on community detection by aggregating the links of all layers into a single layer, and then applying a community detection method to that single layer [18]- [21]. While these methods are computationally efficient, they are only able to identify communities that are common across all layers, and some spurious communities may emerge because of the aggregation process. State-of-the-art community detection methods for multiplex networks are built by generalizing the community detection methods from single layer.
The first class of methods is based on modularity maximization and generalizes the notion of modularity to multiple layers [11], [22], [23]. Principal Modularity Maximization (PMM) [18] extracts structural features for each layer by optimizing its modularity, and then applies PCA on concatenated matrix of structural feature matrices, to find the principal vectors, followed by K-means to perform community assignment. The main drawback of this approach is that it treats structural feature matrices of all layers equally ignoring the heterogenity across layers. On the other hand, Mucha et al. [11] proposed a generalized modularity metric for multiplex networks that can handle layer complementarity and an efficient algorithm, Generalized Louvain (or GenLouvain, GL), to optimize the modularity function. Despite the efficiency, the method is not designed to find consensus clustering assignment across different layers; instead, it only provides node clustering assignment for each individual layer. Moreover, this method does not handle varying strength between the nodes of different layers. In [10], a multiobjective genetic algorithm, MultiMOGA, is proposed to detect a common community structure in multidimensional networks that simultaneously maximizes the modularity on each layer and minimizes the difference between the community structure of that layer and the remaining layers. Similarly, Multi-Objective Evolutionary Algorithm based on Decomposition with Tabu Search (MOEA/D-TS), proposed in [24], detects shared communities in multiplex networks by first finding the community structure of the first layer using modularity density measure, and then formulating the problem as the maximization of modularity for each layer and the Normalized Mutual Information (NMI) between pairs of layers, simultaneously. While these papers are limited to multiplex networks, more recently a few methods have considered the extension of the modularity function and its optimization for fully-connected multi-layer networks [25], [26]. However, these methods still suffer from the resolution limit problem [27], [28].
The second class of methods relies on spectral clustering that generalizes the eigendecomposition from single to multiple Laplacian matrices representing network layers. One of the state-of-the-art spectral clustering methods for multiplex graphs is Spectral Clustering on Multi-Layer (SCML) [29]. For each network layer, SC-ML computes a subspace spanned by the principal eigenvectors of its Laplacian matrix and then, by interpreting each subspace as a point on Grassmann manifold, SC-ML merges subspaces into a consensus subspace from which the clusters are extracted. SC-ML cannot adequately handle network layers with missing or weak connections, or layers that have disconnected parts. In [30], authors propose an extension of normalized cut to multiplex networks by constructing a block Laplacian matrix with M blocks corresponding to the M layers. Standard spectral clustering is applied to this Block Laplacian matrix to obtain the community structure of the multiplex network. This method relies on the selection of a parameter β that controls the consistency of the community structure across different layers.
The third class of methods are information diffusion-based approaches that utilize the concept of diffusion on networks to integrate network layers. One such method is Similarity Network Fusion (SNF) proposed by Wang et al. [31] and captures both shared and complementary information in network layers. It computes a fused matrix from the similarity matrices derived from all layers through parallel interchanging diffusion process on network layers and then applies a spectral clustering method on the fused matrix to extract communities. Another widely used method is Multiplex Infomap [9], which optimizes the map equation, and exploits the informationtheoretic duality between network dimensionality reduction, and the problem of network community detection. However, for noisy networks, the diffusion process, i.e., information propagation, is not very efficient and it may result in poor clustering performance [32].
The last category of methods relies on matrix and tensor factorization and utilize collective factorization of adjacency matrices representing network layers [29], [33]- [36]. Reference [34] introduced the Linked Matrix Factorization (LMF) which fuses information from multiple network layers by factorizing each adjacency matrix into a layer-specific factor and a factor that is common to all network layers. Dong et al. [29], introduced the Spectral Clustering with Generalized Eigendecomposition (SC-GED) which factorizes Laplacian matrices instead of adjacency matrices. Papalexakis et al. [35] proposed GraphFuse, a method for clustering multi-layer networks based on sparse PARAllel FACtor (PARAFAC) decomposition with nonnegativity constraints. A similar approach has been adopted by Gauvin et al. [36] who used PARAFAC decomposition for time-varying networks. Cheng et al. [33] introduced Co-regularized Graph Clustering based on NMF (CGC-NMF), where each adjacency matrix is factorized using symmetric NMF while the Euclidean distance between their nonnegative low-dimensional representations is kept small. More recently, NMF based multiplex community detection methods using different non-negative matrix factorization models, e.g., collective symmetric NMF (CSNMF), collective projective NMF (CPNMF), collective symmetric nonnegative matrix trifactorization (CSNMTF), have been proposed [37], where first, a non-negative, low-dimensional feature representation of each network layer is found and then, the feature representation of layers are fused into a common non-negative, low-dimensional feature representation via collective factorization. In Javed et al. [38], SNMTF based multiplex community detection method is proposed where the objective function finds a common community structure across layers while ensuring that the common community matrix is close to all individual community matrices. In [39], authors propose a Semi-Supervised joint Nonnegative Matrix Factorization (S2-jNMF) algorithm for community detection in multiplex networks whose main goal is to detect common communities across layers. A greedy search of dense subgraphs is performed and these subgraphs are used as a priori information to create new adjacency matrices for each layer. The algorithm jointly decomposes these newly created adjacency matrices into a basis matrix and multiple coefficient matrices and discover the communities based on this basis matrix. This method cannot deal with fullyconnected networks and heterogeneity across layers. Reference [40] use NMF for detecting communities in Multiplex Social Networks, where two different classes of approaches, Unifying and Coupling, are proposed. The first approach finds a common community structure in the networks by aggregating all layers while the second one finds mostly consistent community structures.

III. BACKGROUND
Notation: List of the main notation used in this paper is summarized in Table 1.

A. GRAPH THEORY
A multi-layer network with M layers can be defined as G M = (G Intra , G Inter ), where G Intra = ∪ M α=1 L α and G Inter = ∪ M α,β=1,α =β L αβ represent the set of intra-layer and inter-layer graphs, respectively [5]. Each intra-layer, L α , is denoted as a triplet L α = (V α , E α , A α ) where V α and E α ⊆ (V α × V α ) are the set of intra-layer nodes with cardinality |V α | = n α and intra-layer edges, respectively. A α ∈ R n α ×n α is the intralayer adjacency matrix. On the other hand, inter-layer graph is denoted as a quadruplet as . . , M } and α = β represents the set of inter-layer edges and A αβ ∈ R n α ×n β is the inter-layer adjacency matrix.
The adjacency matrices of the intra-and inter-layer networks, A α ∈ R n α ×n α and A αβ ∈ R n α ×n β , can be used to construct a symmetric supra-adjacency matrix, A Supra ∈ R n×n where n = M α=1 n α , of the multi-layer network, G M , as: In this paper, two types of multi-layer networks are considered: Homogeneous and heterogeneous multi-layer networks. In homogeneous multi-layer networks (HoMLNs), all layers have the same set of objects, i.e., V α is the same as V β for α = β and n α = n β . On the other hand, in heterogeneous multi-layer networks (HeMLNs), each layer has a different set of objects, i.e., V α is different from V β for α = β and n α may or may not be equal to n β . For both types of networks, the supra-adjacency matrix can be constructed using Eq. 1.

B. NONNEGATIVE MATRIX FACTORIZATION (NMF)
Given a data matrix X ∈ R n×m , basic NMF factorizes X into the product of two nonnegative low-rank matrices as: where and . F are the transpose operator and the Frobenious norm, respectively. Multiple approaches have been proposed to solve the NMF problem [41]- [43]. One of the commonly used solutions to the optimization problem in Eq. 2 is the multiplicative update rule (MUR) with the following update rules: These update rules are iterated for each variable by fixing the other one until convergence [41]- [43]. A special case of this factorization is when the input matrix is the adjacency matrix of an undirected graph, where X = X = A. In this special case, also known as the symmetric nonnegative matrix factorization (SymNMF), the factors B 1 = B 2 = B and the optimization problem is defined as follows: and the update rules can be obtained by a Newton-like algorithm or an alternating nonnegative least squares (ANLS) algorithm using projected gradient methods as suggested in [44] and [45]. In addition to the aforementioned NMF approaches, [46] introduced nonnegative matrix tri-factorization (NMTF). In NMTF, an extra factor, S, is added to the factorization to absorb the different scales of X, B 1 and B 2 . The purpose of the factor S is to provide additional degrees of freedom to maintain the accuracy of the low-rank matrix representation. NMTF problem is formulated as follows [46]: and the solution of the problem can be computed using multiplicative update rules as follows: One of the drawbacks of the standard NMF is its instability to noise or outliers since it uses least square error function. Consequently, a robust version of NMF is introduced in [47]. Robust NMF uses the L 2,1 -norm loss function as follows: 22 , . . . , b 2m ] and the iterative update rules are defined as: where D ∈ R m×m represents a diagonal matrix with the ith diagonal element equal to:

C. COMMUNITY DETECTION
Over the past decades, numerous approaches have been proposed for graph partitioning. One of the most popular approaches is solving the minimum cut (min-cut) problem which involves optimization of different objective functions, e.g., ratio cut or normalized cut [48]. Given a static graph, G = {V , E, A}, where V , E and A are the set of nodes, edges and adjacency matrix of the graph, respectively, the mincut problem aims to find a partition, C = {C 1 , C 2 , . . . , C K }, by minimizing the following objective function: whereC k is the complement of C k and cut(C k ,C k ) = i∈C k ,j∈C k A ij for the two disjoint sets C k andC k , where C k ,C k ⊂ V . While the min-cut problem is NP-hard, spectral clustering and nonnegative matrix factorization provide efficient solutions to different relaxed versions of the mincut problem [48], [49]. In particular, spectral clustering solves the min-cut problem by retaining the orthogonality constraint on the basis matrices. On the other hand, nonnegative matrix VOLUME 10, 2022 factorization solves the min-cut problem by retaining the nonnegativity constraint on the factor matrices. In this paper, the proposed approach takes advantage of the equivalency between spectral clustering and non-negative matrix factorization [44], [49], [50] to develop a unified community detection framework for multi-layer networks.

D. ASYMPTOTICAL SURPRISE METRIC
One of the challenges in community detection is determining the number of communities when there is no prior knowledge about the network's ground truth. Different quality metrics have been utilized to tackle this issue including eigengap criterion [30], [48], dispersion coefficent [51], [52], modularity [53] and asymptotical surprise metric [54].
In this paper, the asymptotical surprise 1 (AS) metric is adopted to determine the number of communities in the multilayer network. For a simple graph, L α , AS is defined as AS = 2E α D KL (q|| q ), where 2E α is the total sum of edge weights in the adjacency matrix, A α , q = E intra α /2E α is the observed ratio of the intra-cluster edge weights to the total edge weights and q = p intra /p is the expected ratio of the total sum of intra-cluster edge weights p intra to the total density of the graph, p. D KL is the Kullback-Leibler divergence [55]. Specifically, the AS metric compares the distribution of the nodes and edges in the detected communities in a certain network with respect to a null model.

IV. COMMUNITY DETECTION IN MULTI-LAYER NETWORKS: JOINT NONNEGATIVE MATRIX FACTORIZATION (ML-JNMF) A. DEFINITION OF COMMUNITIES
Let G M be a multi-layer network with M layers. A community C k in G M , can be defined as a tuple C k = {C k,intra , C k,inter } which consists of a subset of nodes from one or more layers. More precisely, . . , M } and α = β. Following the aforementioned definitions, a community, C k , in a multi-layer network can be either an intra-layer community or interlayer community. Intra-and inter-layer communities can be defined as follows.
Definition 1: A community, C k , in a multi-layer network is called an intra-layer or within-layer community, if C k,inter = ∅.
Definition 2: A community, C k , in a multi-layer network is called an inter-layer or cross-layer community, if C k,inter = ∅.
In this paper, the objective of the proposed approach is to detect the community structure of the multi-layer network with M = 2, i.e., 12 ) are the intra-and inter-layer networks, respectively. In particular, we focus on 1 https://github.com/CarloNicolini/communityalg detecting a set of K disjoint intra-and inter-layer communities, C M = {C 1 , C 2 , . . . , C K }.

B. PROBLEM FORMULATION
In the proposed approach, the community structure of the multi-layer network is detected by taking advantage of prior work in nonnegative low-rank approximation of static networks [44], [45] and bipartite networks which encode the relationship between different types of vertex sets [56]- [58]. In the case of a two-layer network, the inter-layer adjacency matrix A 12 is similar to a bipartite network as it encodes the relationships between two sets of vertices in the two layers. However, unlike bipartite networks where there are no edges within a layer, we have intra-layer adjacency matrices, A α . Therefore, any partitioning between layers needs to be consistent with within-layer partitioning.
Given a two-layer network, the objective function based on normalized cut can be written as: Intra-layer low-rank approximation Inter-layer low-rank approximation where U α and B α are the coefficient matrices corresponding to intra-and inter-layer adjacency matrices, respectively. S 12 is the low-rank embedding of A 12 . The different terms in the optimization problem in Eq. 15 achieve the following objectives: • The first term provides nonnegative low-rank embedding of the intra-layer adjacency matrix where the embedding is used for intra-layer community community assignment.
• The second term provides nonnegative low-rank embedding of the inter-layer adjacency matrix where it is used for inter-layer community assignment.
• The last term quantifies the similarity between nonnegative embedding matrices, U α and B α . This term ensures that the within-layer and across-layer community assignments are consistent with each other. Let U α ∈ R n α ×r α and B α ∈ R n α ×r α be two coefficient matrices, the similarity between the two coefficient matrices can then be computed using the following cost function: (16) where K (B α ) represents the kernel matrix of the coefficient matrix B α . In this paper, the linear kernel function, K (B α ) = B α B α , is adopted to quantify the similarity between the coefficients matrices.
• µ 1 and µ 2 are the regularization parameters. A block diagram illustrating the proposed approach is given in Fig. 1.

C. OPTIMIZATION ALGORITHM
Since the objective function, J , in Eq. 15 is not convex for all variables, it is infeasible to find a global minimum for the optimization problem. However, J is convex in terms of each variable separately. Consequently, an iterative updating scheme can be adopted to find the local minima for the objective function. In particular, the optimization problem is divided into multiple subproblems and each variable is optimized while fixing the others. Solving for non-negative matrix factorization with L 2,1 -norm was introduced in [47] using multiplicative update algorithm. In this paper, a similar update procedure will be derived to solve Eq. 15. Splitting the proposed optimization problem results in the following subproblems: 1) Fixing B α with α ∈ {1, 2} and S 12 , U α -subproblem can be written as: 2) Fixing U α with α ∈ {1, 2}, S 12 and B 2 , B 1 -subproblem can be written as: 3) Fixing U α with α ∈ {1, 2}, S 12 and B 1 , B 2 -subproblem can be written as: 4) Fixing U α and B α with α ∈ {1, 2}, S 12 -subproblem can be written as: The subproblems in Eqs. 17-20 can be efficiently optimized using the following update rules: where where A detailed derivation of the update rules can be found in the Appendix. With the update rules in Eqs. 21-24, the proposed algorithm is summarized in Algorithm 1. VOLUME 10, 2022

D. DETERMINING THE QUALITY AND NUMBER OF COMMUNITIES
The quality of the community structure and the final number of communities in the multi-layer network are determined using the following steps: • Initialize U 1 ∈ R n 1 ×r 1 and U 2 ∈ R n 2 ×r 2 in Step 1 of Algorithm 1 using nonnegative double singular value decomposition 2 (NNDSVD) [59] and calculate the AS metric for varying number of communities, e.g., (2−20). The initial number of communities, r 1 and r 2 , for each intra-layer network, is determined as the number that maximizes the AS metric for that layer. On the other hand, the initial number of communities for the interlayer network is determined as r 12 = min{r 1 , r 2 } and B 1 ∈ R n 1 ×r 12 and B 2 ∈ R n 2 ×r 12 are initialized using NNDSVD.
• After the algorithm converges, a set of intra-and interlayer communities are detected through Steps 9-15 of Algorithm 1. In particular, the largest entry in each row of the basis matrices indicates the community assignment of each node. At the end of this step, each node is assigned to both intra-and inter-layer communities, simultaneously.
• The quality of the detected communities is calculated in Step 10 using the communitude metric of the detected communities [60], [61]. The communitude of a community, C k , is computed with respect to the supra-adjacency matrix as follows: where E C k intra is the sum of the edges that connect all the nodes within the community C k , E C k inter is the sum of edges that connect the nodes in community C k to nodes from other communities and 2E is the total edge weight of the supra-adjacency matrix. The communitude metric has an upper bound of 1.
• In Step 11 of Algorithm 1, the detected communities are sorted in a descending order based on their communitude values to determine the best set of communities.
In the proposed approach, each node can belong to either an intra-or inter-layer community. Consequently, the communitude values of the intra-and inter-layer communities that the node belongs to are compared and the community with the higher value is chosen.
• The best set of K communities for the given multilayer network is determined as the set with the highest communitude values, C M = {C 1 , C 2 , . . . , C K }.

E. COMPUTATIONAL COMPLEXITY OF THE PROPOSED ALGORITHM
In the proposed approach, the variables U α , B α where α ∈ {1, 2} and S 12 are updated using the multiplicative Update S 12 using Eq. 24. 8: end while 9: for i = 1 : n α do 10: j * = max j U α (i, j) %Intra-layer communities 11: end for 12: for i = 1 : n α do 13: %Inter-layer communities 14: end for 15: Determine inter-layer communities by factorizing S 12 using NMF. 16: Calculate the quality of the communities using communitude metric defined in Eq. 25. 17: Determine the best communities. update algorithm. The nonnegative embeddings, U α and B α , are initialized using NNDSVD with complexity O(r α n α 2 ) and O(r αβ n α n β ), respectively. The AS metric is then used to determine the initial number of communities in each intra-layer network over a range of communities, {2, 3, . . . , K max }. The computational complexity for computing AS is K max O(|E α |), α ∈ {1, 2} where |E α | is the number of edges in the α-th layer. For each iteration, the updates of U α and B α have a complexity of O(r α n α 2 ) and O(r αβ n α n β + r α n α 2 ), respectively. Moreover, the update of S 12 has a complexity of O(r αβ n α n β ). Therefore, for a 2-layer network with the total number of iterations required until convergence equal to l max , the total complexity of the proposed approach is approximately equal to 2l max O(r α max(n α , n β ) 2 ).

F. CONVERGENCE ANALYSIS OF THE PROPOSED APPROACH
To prove the convergence of the update rule of U α in Eq. 21, the auxiliary function method is adopted [62]. Following [62], an auxiliary function can be defined as follows.
Definition 3: L(U α , U t α ) is an auxiliary function of J (U α ), if it satisfies the following conditions: The concept of the auxiliary function is helpful because of the following lemma: Lemma 1: If L is an auxiliary function, then J (U α ) is nonincreasing under the update: The proof of Lemma 1 is given in [62]. Proof: Let J (u α ) be the part of J (U α ) that is dependent on (u α ) ij , using Eq. 36, we get: The second-order derivative of J (u α ) with respect to (u α ) ij is then equal to: Let (u α ) t ij denote the t th iterative update value of (u α ) ij , the Taylor series expansion of J (u α ) at (u α ) t ij can be then formulated as: We define the following L(u α , (u α ) t ij ) as an auxiliary function of J (u α ): The proof that the function L(u α , (u α ) t ij ) in Eq. 31 is an auxiliary function of J (u α ) is shown in the Appendix.
In line with Lemma 1, we need to find the minimum of L(u α , (u α ) t ij ) with respect to u α which can be determined by setting the gradient to zero: Substituting J ((u t α ) ij ) from Eq. 28 in Eq. 32 and simplifying the equation by cancelling the common terms, we get: Finally, replacing u α by (u t+1 α ) ij , we obtain the same update rule in Eq. 21 as: Similar procedure can be followed to prove the convergence of the other update rules. Consequently, their proofs of convergence will not be presented explicitly in this paper.

V. RESULTS
The performance of the proposed algorithm is evaluated on multiple simulated and real-world multi-layer networks. All experiments are performed using MATLAB R2020b on a desktop with the specifications (Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz 3.00 GHz and RAM of 16GB). The quality of the detected community structure is evaluated through multiple evaluation metrics with respect to the ground truth including normalized mutual information (NMI), variation of information (VI), F-value, recall and purity. All evaluation metrics are normalized between [0, 1]. High values of NMI, F-value, recall and purity and low values of VI indicate better community detection results.

A. SIMULATED MULTI-LAYER NETWORKS 1) SIMULATED WEIGHTED MULTI-LAYER NETWORKS
A set of undirected weighted two-layer networks are generated to evaluate the performance of the proposed approach. The edges of the networks are randomly selected from a truncated Gaussian distribution in the range of [0, 1]. The parameters of the generated networks are shown in Table 2. The networks are constructed with different community structures and number of communities. Moreover, different levels of sparse noise (sn%) is randomly added to the network to evaluate the robustness of the different algorithms against outliers. In addition, a percent of inter-community edges are set randomly to zero (ze%) to control the amount of mixing between communities. More specifically, when ze% = 0, the network is fully connected and as ze% increases the communities become more distinct. In the implementation of the proposed community detection algorithm for weighted networks, a node's community membership is determined by k-means instead of using the largest entry in each row of the basis matrices. This is done since the entries of the basis matrices tend to be very close to each other and the maximum may not always give the correct community assignment. The performance of the proposed approach in detecting the community structure of weighted multi-layer networks (WMLNs) is compared to other algorithms as presented in Table 3. The experiments are repeated 50 times and the comparison is conducted in terms of average NMI, VI, F-value, recall and purity. The regularization parameters are selected by searching the best values in the (µ 1 , µ 2 ) grid. As it can be seen in Table 3, the proposed approach outperforms the other algorithms in terms of the different evaluation metrics. The performance of BLSC, GenLov, SymNMF and ONMTF is relatively good for WMLN 1 since it consists of two large inter-layer communities. However, the performance of these algorithms decays as the size of the inter-layer communities decreases as in WMLNs 2 − 6 and as the sparse noise increases as in WMLN 4 − 6. Moreover, GDG and SSCF algorithms show poor performance over all the networks since they cannot handle weighted networks. On the contrary, the proposed algorithm is robust to noise and outliers. As it can be seen in Table 3, the proposed algorithm is capable of detecting the community structure in the different networks accurately, such as fully connected networks with ze% = 0 as 9 https://sites.google.com/site/nmftool/home WMLN 2 and 6, or noisy networks as in WMLN 4 and 6. In addition, the proposed algorithm performs better than the other algorithms in detecting the community structure of WMLN 5 where the community structure is composed of a large number of small communities. Finally, the proposed approach can handle multi-layer networks with variable number of nodes across layers as in WMLN 3 and 6, unlike CSNMF, CPNMF, CSNMTF and MTCP.

2) SIMULATED GIRVAN-NEWMAN MULTI-LAYER NETWORKS
In order to evaluate the performance of the proposed algorithm in detecting the community structure in binary multi-layer networks, a Girvan-Newman (GN) multi-layer network (GNMLN) is constructed by extending the model for single-layer networks [65]. Each layer in the the constructed network consists of 128 nodes and each node has an average degree of 16. Nodes in each layer are partitioned into 4 communities with 32 nodes each. The community structure is varied using the parameter z out , which determines the number of external edges for each node, i.e., the number of edges that connects a node to nodes from other communities. In particular, as z out decreases, the communities become more distinct. Moreover, the probability of a community to be an interlayer community is determined by a user defined parameter, a ∈ [0, 1]. Particularly, when a = 0 the multi-layer network consists of only intra-layer communities whereas when a = 1 the multi-layer network consists of only inter-layer communities. In particular, the probability of a community to be an inter-layer community is set to 1 − M −1 √ (1 − a) and the probability of intra-and inter-community edges are defined as p in = (16 − z out )/32 and p out = z out /96, respectively.
The proposed approach is applied to a set of Girvan-Newman multi-layer networks with M = 2 and the parameters given in Table 4. The simulations are repeated 50 times. As it can be seen in the table, the proposed algorithm performs better than the other algorithms in detecting the community structure of the underlying network in terms of the different quality metrics. For instance, in GNMLNs 1 − 3, where z out = 5, the communities are more distinct compared to GNMLN 4 and 5. Consequently, the existing algorithms perform relatively better in detecting the community structure of the former. In addition, we notice that the performance of the existing algorithms is affected by the parameter a, with the performance decaying as a decreases. On the other hand, the proposed algorithm performs well for a variety of network structures. In particular, the proposed algorithm can detect the community structure even when z out is increased. Moreover, it is not sensitive to the parameter a and can detect both intra-and inter-layer communities. Finally, it can be seen that GenLov algorithm performs slightly better than the proposed algorithm in detecting the community structure of GNMLN 5. This is because GenLov algorithm relies on modularity maximization which is known to perform better when the network consists of large communities. However, Genlov fails to detect the network's community structure TABLE 2. Parameters and ground truth community structure of the simulated weighted multi-layer networks (WMLNs) including: 1) Percent of zero inter-community edges (ze%); 2) Percent of randomly added sparse noise (sn%); 3) The number of nodes in each layer (n 1 , n 2 ); 4) number of communities (NOC); 5) Ground truth communities; 6) Mean and standard deviation of intra-and inter-community edges for both intra-and inter-layer graphs.
when a decreases which leads to smaller communities, as in GNMLN 4.

B. REGULARIZATION PARAMETER SELECTION
In the proposed approach, two regularization parameters are used, µ 1 and µ 2 . In particular, µ 1 controls the inter-layer lowrank approximation term while µ 2 penalizes the similarity between intra-and inter-layer nonnegative embeddings. The effect of the regularization parameters on the performance of the proposed algorithm is studied through experimental validation. In particular, we found that the proposed algorithm is insensitive to the different regularization parameters values when the network is not noisy and consists of distinct communities. Moreover, we noticed that the algorithm is less sensitive to µ 1 than µ 2 and performs well under different values of µ 1 in noisy networks. On the other hand, µ 2 can be selected depending on the multi-layer network under study. More specifically, if the network tends to consist of intralayer communities only, small values of µ 2 are recommended whereas if the network consists of only inter-layer communities or a mixture of intra-and inter-layer communities, larger values of µ 2 are recommended.

C. SCALABILITY ANALYSIS
In order to test the scalability of the proposed approach, a set of weighted multi-layer networks with different sizes are constructed. The size of the networks is varied from 32 to 8192 on a logarithmic scale. The networks are constructed with the intra-and inter-community edges randomly selected from a truncated Gaussian distribution in the range of [0, 1] with: The network consists of two layers with n 1 and n 2 and each layer consists of two equal size communities: C 1 1 and C 1 2 in layer 1 and C 2 1 and C 2 2 in layer 2. The community structure of the multi-layer network consists of C 12 1 = {C 1 1 , C 2 1 }, C 1 2 and C 2 2 . The number of communities is given as an input for each one of the algorithms. The run time of the different algorithms with respect to the variation of the network's size is shown in Fig. 2. In addition, a performance comparison between the different methods in detecting the community structure is presented in Table 5 in terms of the different quality metrics. As it can be seen from Fig. 2, the different methods are log-linear as the number of nodes increases. In addition, the proposed algorithm is faster than SymNMF, GDG and SSCF with increasing number of nodes. On the other hand, the remaining methods are faster compared to the proposed approach. However, as it can be seen from Table 5, the proposed algorithm maintains its good performance in detecting the community structure as the network's size increases.

D. REAL-WORLD MULTI-LAYER NETWORKS
To evaluate the performance of the proposed algorithm in detecting the community structure of real multi-layer networks, multiple networks from different disciplines are adopted. In this section, a brief description of the tested realworld networks is provided. VOLUME 10, 2022 TABLE 3. Performance comparison between the proposed method (ML-JNMF), block spectral clustering (BLSC), generalized louvain (GenLov), symmetric nonnegative matrix factorization (SymNMF), orthogonal nonnegative matrix tri-factorization (ONMTF), collective symmetric nonnegative matrix factorization (CSNMF), collective projective nonnegative matrix factorization (CSNMF), collective symmetric nonnegative matrix tri-factorization (CSNMTF), Geodesic Density Gradient (GDG) algorithm, subspace based community detection with fusion (SSCF) and multiplex cellular communities for tissue phenotyping (MCTP) in detecting the community structure in weighted multi-layer networks (WMLNs) given in Table 2. The comparison is conducted in terms of average NMI, VI, F-value, recall and purity.

1) MOBILE PHONE NETWORKS
Two mobile phone networks, MIT Reality Mining, 10 [66] and Nokia Research Center (NRC) Lausanne [67], are adopted to test the performance of the proposed approach. 10 http://reality.media.mit.edu/download.php Interactions in both networks represent three types of mobile phone communications between 87 users for the MIT network and 136 users for the NRC network. The interactions are the physical location, bluetooth scans and phone calls. The multi-layer network is constructed where the intra-layer graphs encode physical location and bluetooth scans and the TABLE 4. Performance comparison between the proposed method (ML-JNMF), block spectral clustering (BLSC), generalized louvain (GenLov), symmetric nonnegative matrix factorization (SymNMF), orthogonal nonnegative matrix tri-factorization (ONMTF), collective symmetric nonnegative matrix factorization (CSNMF), collective projective nonnegative matrix factorization (CSNMF), collective symmetric nonnegative matrix tri-factorization (CSNMTF), Geodesic Density Gradient (GDG) algorithm, subspace based community detection with fusion (SSCF) and multiplex cellular communities for tissue phenotyping (MCTP) in detecting the community structure in binary Girvan-Newman multi-layer networks (GNMLNs). The comparison is conducted in terms of average NMI, VI, F-value, recall and purity.
inter-layer graph encodes the phone calls interactions. More details about the constructed networks can be found in [29] and they are publicly available along with their annotated ground truth. In the MIT 11 network, the ground truth communities are the self-reported affiliations of the users, while 11 https://github.com/VGligorijevic/NF-CCE/tree/master/data/nets in the NRC 12 network, the ground truth communities are the users' email affiliations.
The proposed approach is applied to both MIT and NRC networks to detect their community structure. The performance of the proposed approach is compared to other existing algorithms using different quality metrics and the results are reported in Table 6. The detected community structure is evaluated with respect to each one of the layers. As it can be seen in the table, the proposed approach outperforms the other algorithms with respect to multiple quality metrics in both layers.

2) SOCIAL NETWORKS
To evaluate the performance of the proposed algorithm in detecting the community structure in real social multilayer networks, a multi-layer network corresponding to the relationships between workers at Lazega law firm 13 (LLF) [68], [69] is adopted. This network models the interactions between 71 partners and associates who work at a corporate law partnership. The network has three layers, where each layer encodes a particular relationship, i.e., co-work, advice and friendship. In addition, the data set includes 7 attributes that can be used to evaluate the detected community structure. In this paper, the (1) status (partner or associate) (LLF S ), and (3) office location (Boston, Hartford, or Providence) (LLF O ) are used as the ground truth for the communities.
A homogeneous 2-layer, unweighted and undirected network is constructed where the intra-layer graphs encode work relations, i.e., co-work and advice, whereas the inter-layer graph models the friendship relations.
The proposed algorithm is applied to the constructed network and the office and status attributes are used as the ground truth to evaluate the detected community structure. Table 6 presents a comparison between the different methods in terms of NMI, VI, F-value, recall and purity of the detected community structure with respect to each of the ground 13 https://manliodedomenico.com/data.php truth attributes. Since both layers consist of the same set of nodes, the detected community structure is evaluated with respect to each one of the layers. As it can be seen in Table 6, for both layers, the detected community structure is closely related to the office locations, which agrees with previous studies [70], [71]. We also note that layer 2, which encodes the co-work interactions among the workers, reflects the community structure of the network with respect to the office location better than layer 1, which encodes the advice interactions. On the other hand, the community structure detected from the advice layer is more closely related to the status (partner or associate) than the co-work layer.

3) BIOLOGICAL DATA: C. ELEGANS NETWORK
The C. Elegans 14 network [3], [72] is a multi-layer network that consists of 279 nodes. The edges in the network reflect the different synaptic junctions, namely electric, chemical monadic, and polyadic between the neurons of the Caenorhabditis Elegans connectome. This data set, also, includes three attributes that can be used as the ground truth communities. In this paper, (1) the group of the neuron (bodywall, mechanosensory, head motor neurons, etc.) (CEleg G ), and (2) the color of the neuron (grey, red, orange, etc.) (CEleg C ) are used to evaluate the ability of the different algorithms in detecting the community structure. The performance comparison between the different methods is given in Table 6. As it can be seen from the table, the proposed method can detect the community structure better than the other methods with respect to NMI, purity and recall metrics compared to the group and color attributes.

4) HAND-WRITTEN DIGITS DATA: UCI
The UCI 15 [73] data set consists of features of handwritten digits from (0-9) extracted from a collection of Dutch utility maps. There are 200 patterns per digit which results in a total of 2000 patterns that have been digitized in binary images. These digits are represented by different feature sets including (1) Fourier coefficients of the character shapes, (2) profile correlations and (3) Karhunen-Loéve coefficients. Intra-and inter-layer graphs, A 1 , A 2 ∈ R 2000×2000 and A 12 ∈ R 2000×2000 , are constructed using the Guassian kernel similarity function [48] and keeping the nearest 100 neighbors. The different algorithms are applied to the constructed network and their performance comparison is given in Table 7. The proposed approach performs better than all the other methods in terms of all quality metrics in the second layer and in terms of purity in the first layer.

5) CALTECH101 DATA SET
The Caltech101 16 [74] data set is a well-known object recognition data set. It contains 102 objects, i.e., communities, with 40 − 800 images for each object which results in 9144 images in total. The original data set consists of 6 types  Performance comparison between the proposed method (ML-JNMF), block spectral clustering (BLSC), generalized louvain (GenLov), symmetric nonnegative matrix factorization (SymNMF), orthogonal nonnegative matrix tri-factorization (ONMTF), collective symmetric nonnegative matrix factorization (CSNMF), collective projective nonnegative matrix factorization (CSNMF), collective symmetric nonnegative matrix tri-factorization (CSNMTF), Geodesic Density Gradient (GDG) algorithm, subspace based community detection with fusion (SSCF) and multiplex cellular communities for tissue phenotyping (MCTP) in detecting the community structure in real multi-layer networks. The comparison is conducted in terms of average NMI, VI, F-value, recall and purity. The best performance is shown in bold and when ML-JNMF achieves the second best performance is underlined.

TABLE 7.
Performance comparison between the proposed method (ML-JNMF), block spectral clustering (BLSC), generalized louvain (GenLov), symmetric nonnegative matrix factorization (SymNMF), orthogonal nonnegative matrix tri-factorization (ONMTF), collective symmetric nonnegative matrix factorization (CSNMF), collective projective nonnegative matrix factorization (CSNMF), collective symmetric nonnegative matrix tri-factorization (CSNMTF), Geodesic Density Gradient (GDG) algorithm, subspace based community detection with fusion (SSCF) and multiplex cellular communities for tissue phenotyping (MCTP) in detecting the community structure in UCI and Caltech101 networks. The comparison is conducted in terms of average NMI, VI, F-value, recall and purity. The best performance is shown in bold and when ML-JNMF achieves the second best performance is underlined.
of features [75]. In our experiments we selected three types to construct the multi-layer network including (1) Gabor feature, (2) wavelet moments and (3) CENTRIST feature. Intra-and inter-layer graphs, A 1 , A 2 ∈ R 9144×9144 and A 12 ∈ R 9144×9144 , are constructed using the Guassian kernel similarity function [48] and keeping the nearest 1000 neighbors. Since this is a large data set, the number of communities is given as an input to all algorithms. The performance of the different methods in detecting the community structure of the Caltech101 network is compared and the results are reported in Table 7. As it can be seen from the table, the proposed algorithm performs better than the other algorithms in terms of NMI, F-value and purity in the second layer. However, the CSNMF approach preforms slightly better than the proposed approach in the first layer. The reduced accuracy of all the methods for this case is mainly due to the high inter-cluster edge density.

6) IMDB MULTI-LAYER NETWORK
To assess the ability of the proposed algorithm in detecting the community structure in heterogeneous multi-layer networks, a data set that contains a collection of best and worst movies from the Internet Movie DataBase (IMDB) is adopted. The complete IMDB data set is publicly available in SQL format. 17 In this paper, we used an IMDB data set consisting of movies on the IMDB's top 250 and bottom 100 chart along with their genres and directors with 166 movies, 130 directors and 20 movie and director genres. 17 https://relational.fit.cvut.cz/dataset/IMDb

a: HETEROGENEOUS MULTI-LAYER NETWORK CONSTRUCTION
A 2-layer multi-layer network is constructed to model the IMDB data set. The intra-layer graphs model the relations among directors and movies, respectively. On the other hand, the inter-layer graph models the relation between directors and movies. Intra-layer adjacency matrices are constructed based on genre similarity. Genre can be considered as a categorical variable as it can only take on a fixed number of values. For example, in the first layer, a pair of nodes, i.e., directors, are connected if they directed the same genre, where each director can direct the same genre multiple times. Consequently, for every director, a binary-valued vector, g ∈ R 1×20 , is generated where the entries of the vector are 1 if the director directed a movie in that genre and 0 otherwise. Pair-wise Pearson correlation coefficient is then calculated between the directors' genre vectors. The final adjacency matrix is constructed by keeping the edges with correlation values greater than 0.5. The second layer corresponds to the movies and is constructed based on the genre similarity following the procedure described above. The inter-layer adjacency matrix that models director-movie relations is constructed such that a node from the director layer is connected to a node from the movie layer if the director directed that movie. The constructed network and the IMDB data set details are available in our GitHub repository. 18  The proposed approach is applied to detect the community structure of the IMDB multi-layer network. The proposed algorithm detects 4 intra-layer communities in the directors layer and 5 intra-layer communities in the movies layer as well as 2 inter-layer communities. In order to analyze and interpret the detected community structure, the shared genres 43038 VOLUME 10, 2022 between the nodes in the directors, movies and directorsmovies communities are reported in Fig. 3-5. The genres that are shared by at least 50% of the nodes in each community are reported. As it can be seen from the figures, the proposed algorithm is capable of revealing communities that share similar genres. For example, in Fig. 3, 100% of the directors in C 1 1 and C 1 2 directed comedy and drama genres, respectively. Similarly, in Fig. 4, 100% of the movies in C 2 1 , C 2 2 , C 2 3 and C 2 4 are in the comedy, crime, drama and war/drama, respectively. With respect to the inter-layer communities, it was observed that in one of the communities, C 12 1 , the majority of the movies and directors shared the same genre with 94.74% of the movies and directors sharing the family genre. In the second inter-layer community, 60.7% of the nodes corresponded to movies directed by the directors in that community.

VI. CONCLUSION
In this paper, a new joint nonnegative matrix factorization approach is introduced to detect the community structure of multi-layer networks. The proposed approach models the multi-layer network as the union of a multiplex network and a bipartite network. The multiplex network encodes the intra-layer interactions while the bipartite network encodes the inter-layer interactions. The proposed algorithm aims to detect intra-and inter-layer communities, simultaneously.
The performance of the proposed approach is validated on both simulated and real-world multi-layer networks and compared to other existing algorithms. The simulation results show that the proposed algorithm outperforms existing algorithms in detecting the community structure of the network. In particular, the proposed algorithm can detect the community structure in both homogeneous and heterogeneous binary and weighted multi-layer networks by optimizing the same objective function. Moreover, the proposed algorithm is robust to noise and outliers and can reveal the community structure of the underlying network even when it consists of small communities. Furthermore, the proposed method combines computational efficiency and accuracy in detecting the community structure of the multi-layer network. Extensive application of the proposed algorithm to real networks from multiple disciplines shows that it performs well for both homogeneous and heterogeneous multi-layer networks and the detected communities are in agreement with the available metadata. Finally, future work will consider extending the proposed approach to M-layer networks.

APPENDIX A THE U α -SUBPROBLEM
The U α -subproblem in Eq. 17 can be rewritten using the definition of the Frobenious norm and L 2,1 -norm as follows: where Tr is the trace function and Z α ii ∈ R n α ×n α = 1/ a α i − U α u αi . The optimization subproblem in Eq. 35 can be solved by following the standard procedure used in NMF to find the multiplicative updating rules. The gradient of the objective function, J 1 , with respect to U α can be calculated as follows: Using the Karush-Kuhn Tucker (KKT) complementarity conditions, a static point can be reached where the KKT condition for the factor U α can be defined as: VOLUME 10, 2022 which leads to the update rule of the U α factor as: where the update rule can be expressed in matrix form as: where and denote the Hadamard product and division, respectively.

APPENDIX B THE B α -SUBPROBLEM
where Z 12 ii ∈ R n 2 ×n 2 = 1/ a 12 i − B 1 S 12 b 2i . Following the same procedure of the U α -subproblem, the gradient of the objective function, J 2 , with respect to B 1 can be found as: Similar to U α , we use the KKT conditions to derive the update rule for B 1 , then, where µ 2 = 2µ 2 . Similarly, the update rule of B 2 can be found as: The last update rule can be computed similar to the previous factors as follows, where the KKT condition for S 12 is defined as: and results in the following update rule: or in matrix form as: Proof: For L(u α , (u α ) t ij ) to be an auxiliary function of J (u α ), it should satisfy both conditions given in Definition 3. By comparing the Taylor series expansion of J (u α ) in Eq. 30 with L(u α , (u α ) t ij ) defined in Eq. 31, it is obvious that the first two terms in L(u α , (u α ) t ij ) are greater than the first two terms in J (u α ). Now, we have to prove the following: First, we need to prove the following: Since U α and Z α are nonnegative, we have: