Microscopic Structural Analysis of Complex Networks: An Empirical Study using Motifs

Complex Networks can depict a clear image of real-world systems. A real-world scenario can be represented a graph with interconnected layers - called a multi-layer network. Finding motifs can give an idea of the topology of complex systems and helps to understand the dynamics in the graphs. Looking at motifs as atoms of the network is helpful to analyze the relationship between nodes, and between layers. In this work, we suggest a sub-graph enumeration approach to find and count the motifs in multi-layer network. The proposed work has many application in graph mining, particularly to structure and dynamics of complex networks.


I. INTRODUCTION
Real-world systems, like social or biological and technological systems such as transportation networks, protein networks, citation networks are easier to represent by complex networks. Each of the systems has multiple subsystems. As the data associated with the complex networks evolved to be more heterogeneous and complex, the demand for organizing the complex network into a multilayer network grew [1], [2], [3]. Research in Complex networks has transformed the realworld entities represented from single layer network to multilayer networks by combining the set of nodes exhibiting identical behavior [4].
A multilayer graph consists of a set of nodes, edges, organised in layers. The number of layers and their interpretation differs from graph to graph based on the application scenario. The edges represent the relationships or interactions with nodes in the same or different layers as shown in 1.
The motifs are tiny well-connected substructures that act as the basic building units of every network graph. Milo et al. [5] defines network motifs to be subgraph patterns that occur significantly more often in the network. It corresponds to a randomized version of the network, which reflects specific properties of the network. A graph H is called a subgraph of G whenever V (H) ⊂ V (G). A Network motif are recurrent pattern subgraph.In other words,motifs are subgraphs with triad. Network motifs are studied to understand either individual nodes or the network as a whole. For understanding the network structure or network topology, identification and analysis of motifs and their implications are essential to network science [6]. Identifying a network motif is a computationally hard problem, as it demands matching with all possible subgraph patterns. Research on the detection of motifs on subgraphs is extensively done in [7].
Finding the similarity of structure among graphs is termed graph matching. Detecting a common subgraph is helpful to measure the similarity of two graphs. If a maximum common subgraph of G A and G B exists, the more similar the two graphs are. Given two graphs G A = (V A , E A ) and G B = (V B , E B ), with |V A | = |V B |, if a one-toone mapping f : V A → V B such that (u, v) ∈ E B iff (f (u), f (v)) ∈ E A exists, we infer an isomorphism, i.e G D is said to be isomorphic to G A . This type of problem is said to be exact graph matching.
Graph isomorphism establishes the fact that two graphs are structurally equal disregarding the labels of the original graph. Two multilayer networks are structurally equivalent if FIGURE 1: Spectral diagram of a graph: multilayer network. When studying a multilayer network, entities can interact in multiple directions. In the first figure, a three layer multilayer network which corresponds to different dimensions of inter relation and shows the intra relations and inter relations between nodes. It can be represented by its own adjacency matrix or tensor diagram. In the bottom figure,(a) a multilayer graph in which cross linked edges are shown, and can be easily represented by tensor/supra-adjacency matrix, (b) its corresponding projection the vertices in one of them can be relabeled so that the first network is bijective to the second one. The structural analysis of a real world networks will be feasible with sub-graph enumeration where we enumerate sub-graph with different size. Graph Isomorphism is used for identifying the motifs. Subgraph Isomorphism is computationally NP-complete [8]. The problem for computing network motifs using subgraph isomorphism is intractable. Identifying isomorphism for multilayer networks yields establishing relations among various networks represented as layers in a multi-layer network. Motif of any size is broken down into smaller consisting of three nodes. This can be called as the stable size of the motif in the complex networks.
This article proposes a new algorithmic approach for the subgraph enumeration procedure for isomorphism multilayer complex networks. The motif discovery and its analysis have benefited the study of graph isomorphism. The subgraph enumeration methods on a multilayer network are done as motif enumeration using G-Trie (Graph reTRIeval) [10]. A G-Trie is a multiway (m-way) tree capable of storing graphs. The nodes of the m-way hold details like the graph vertex and its corresponding edges to ancestor nodes. A path from the root to any node resembles one single distinct graph. A common subgraph is shared among the descendants of a G-Trie nodes. G-tries are proved to be a feasible and very efficient data structure for network motifs discovery on single layer graph [9]. We are extending the motif discover over the multilayer network using the G-Trie.
The main contributions of this article are: • We propose the use of the G-Trie data-structure over the multilayer network to perform the subgraph matching and counting the motif. • We then propose a sub-graph enumeration procedure for identifying isomorphism in multilayer networks. We mathematically prove that the exact subgraph counting is possible with our proposed solution. Our contributions will help in the discovery of motifs and the popularization of multilayer networks such as Twitter. The paper is organized as follows. Section II illustrates the background studies. The proposed work and its analysis is explained in Section III.The proof of correctness is explained in Section IV. Finally, the article concluded in Section V.

A. MULTILAYER NETWORK : DEFINITIONS AND NOTATIONS
We define a multilayer graph G M as quadruple where V is the complete set of vertices, L represents the total number of layers, V M is the set of vertices in each layer, and E M is the interconnecting edges. The layer L = {L d a } represents a number of layers with d aspects. Integrating all the elementary layers L 1 × .. × L d results in a multilayer network. The vertices in a layer α is represented as V α = {v α 1 , v α 2 , .., v α n } and the edges interconnecting the layer α and β can be represented as The adjacency matrix for each layer α is represented as for 1 ≤ i, j ≤ N α and 1 ≤ α ≤ m. The cross layer or inter layer adjacency matrix corresponding to E αβ is the matrix A [αβ] dj = (a αβ ij ) is given by The projection of multilayer networks is a network graph shown in the Figure 1b, which is represents with notion

B. MATRIX REPRESENTATION
Design and model a multilayer networks is done accurately deployed by the notion of tensor. Tensor algebra corresponds to a multilayer network is a multi-linear graph represents as a product of two vector spaces, V ⊗ L. It is a linear combinations of v ⊗ l, where v ∈ V and l ∈ L Design and model a multilayer networks is done accurately deployed by the notion of tensor. Tensor algebra corresponds to a multilayer network is a multi-linear graph represents as a product of two vector spaces, V ⊗ L. It is a f linear combinations of v ⊗ l, where v ∈ V and l ∈ L A multiplex network M made of m layers, Describe as tensor product, The group of vectors are collectively organized by a summing up of the scalar multiplication of corresponding vectors It is a linear combination of span of each vectors in the set [12], [13]. span({v 1 , ..., n m }) = ⟨{v 1 , ...., v m }⟩ (8) Given the set {v 1 , v 2 , .., v N } and {l 1 , .., l m } are basis of V and L respectively. There for

C. MOTIF
The motifs are tiny connected substructures that act as the basic building units of every network graph [5]. It appears in an undoubtedly much larger number of occurrences in the networks. It has given much importance to the structural analysis of complex networks. The motif discovery and analysis benefit the study of graph isomorphism, in which the enumerated subgraphs are brought together into isomorphism classes and reviewed [14], [15], [16]. The small isomorphism classes help to manage computation and clarify the result of such computation. We realize the advantage of limiting the size of a subgraph, the number of vertices, and the number of layers.
The multilayer isomorphism classes increase as the function of the number of vertices and number of layers. The elementary component of a network is an edge: either a directed or undirected connection between pair of nodes. Nodes reside either in the same layer or between layers. It is described as a tuple( In [21], the authors present an algorithm for estimating the frequency of subgraphs in random networks that enumerates all size k subgraphs. The algorithm starts with a vertex v from the input graph and adds only those vertices to the considered set V extension that satisfies two properties. Firstly labels must be larger than that of v, and secondly they may only be neighbored to the newly added vertex w but not to a vertex already in V subgraph .
In [33], the authors developed a statistical theory for estimating motif counts from a bigger graph. The authors focused on only a fraction of vertices for sampling. The authors have used Horvitz-Thompson type estimation approach VOLUME , 20XX and Neighborhood sampling approach. For the subgraph sampling, the sampling ratio p is , where d is the maximal degree of parent graph for any connected motif h on k vertices and s(g, G) denote the number of induced subgraph of G with ϵ as the multiplicative error. In the neighborhood sampling method, authors labeled neighbors of all vertices in S, denoted by GS. In this approach, the sample of rows of the adjacency matrix of G independently with probability p and then observe the rows together with the row indices and the neighborhood information is acquired for each sampled vertex. Since this method consider only a fraction of vertices, there is a chance that the triad structures are unidentified. This disadvantage is overcome in the proposed algorithm.
Graph matching and motif analysis are key research activities in complex network analysis. The inherent properties are discovered through motif analysis. The subgraph matching based on the structural parameter such as adjacency matrix is a feasible technique. Parallel algorithms have been designed on layered graphs that address computationally challenging problems like Minimum vertex cover, Maximum Independent Set, etc. These problems pave the way to solve many social network problems [19], [20] . The nearest neighborhood trust properties and second hop-neighborhood are discovered through its supra-adjacency matrix or tensor algebra. Since our graph matching algorithm entirely relies upon algebraic connections, the proposed work leaves a great outlook in complex network research, particularly structural analysis.

D. GRAPH MATCHING ALGORITHMS
The graph matching process compares two graphs or subgraphs for any similarity or duplicates. There are two types of matching process, namely exact matching and inexact matching algorithms. The authors in [17] suggests two approaches that are able to generate candidate structures without redundant structures from a complex graph

Ullmann's Algorithm
It is an exact matching algorithm for graph isomorphism and subgraph isomorphism.DFS strategy is applied to deploy the algorithm. Let us consider two graphs, An n × m permutation matrix, M , is to be constructed. It contains only 0 or 1, exact 1 in each row and not more than 1 in each column. A permutation matrix is to be generated by repeatedly changing the row and column of an identity matrix. M × B : move row j changes to row i. (M B) T : move column j to column i. M (M B) T : move column j to column i and row j to row i. It is a popular graph isomorphism and subgraph isomorphism. An isomorphic subgraph can be enumerated from graph B by relabels the nodes [18].

VF2 Algorithm
The VF2 algorithm is considers under the category of inexact algorithm in which adds a pair of nodes to each stage and compare the threshold [22].
The process of finding similarity among two graphs G i and G j are done by a function,M iso which budding of nodes of G i and G j . There are many graph similarity checking procedures available and all are of complex in terms of execution time. The one which is used for sub graph isomorphism is in NP-complete in nature. A most feasible and preferably an linear time algorithms has some bottleneck for performance while considering graph with large nodes and high average degree. The function for comparison, mapping function M iso is performed as a set of pairs A mapping M iso ⊂ N i × N j is said to be an isomorphic only when the mapping function M iso shows the bijective character which uphold the spectral properties of the two graphs under consideration. A mapping M iso ⊂ N i × N j is said to be a graph-subgraph isomorphic only when the mapping function M iso shows the bijective character which uphold the spectral properties of the two sub graphs, G i and G j , under consideration.
Each state s of the similarity checking process is describe by the help of State Space Representation(SSR). The The similarity function associated a partial solution for mapping between samples, which is a function of s of SSR. The solution M iso (s) is subset of M iso and label the samples, as subgraph of G i and G j .
The transition from usual state s to s ′ is addition of pair (v i , v j ) of matched nodes to generic s of SSR. A few subset of the SSR require only the similarity of isomorphism that there are limited environment for attaining solutions. It is to be proven that the consistency conditions,isomorphism or subgraphs isomorphism, the mapping is based on consistency conditions. So that specifically define that the partial graphs G i (s) and G j (s) correlated to M iso (s) are isomorphic. All the consistency properties are maintained by subgraph isomorphism and state which is generated in SSR is a consistent state and has no consistent successors.
The function F (s, v i , v j ) check the feasibility such that whether it is true or not. The state s with pair of nodes is accepting the feasibility rules and it depends only on the spectrum of the graph as sample. The feasibility function is depicted as : where Syntactic feasibility,F syn () depends structure of the graph given only, and semantic feasibility,F sem () rely upon the attributes.
VF3 algorithm VF3 uses tree search method for generating subgraph isomorphism. VF3 employs depth first strategy to reach the goal. VF3 also employs SSR ( State Space Representation) in which each state s of the SSR represents a partial mapping with the matching constraints; and a goal state is a state whose mapping is complete, when covers all the nodes in graph [23].
In the proposed method, we introduce an m-way tree to hold all the subgraph nodes that are being considered. The subgraph enumeration methods on a multilayer network are done as motif enumeration using G-Trie. This approach overcomes the additional task of relabeling when a match pattern is found. The proposed algorithm generates a match in the motif pattern considering an intra-link or interlink among the layer in the multilayer network.

III. PROPOSED WORK
Isomorphic properties of a graph formulate the concept that two graphs have equivalent structures [24] [25]. Two graphs are isomorphic if the first graph can be transformed into another by renaming the vertices [26]. There is no labeling on the edges. Vertex labels at the two endpoints identify the edges. The transformation updates the labels as well. Two multilayer graphs M 1 and M 2 , are isomorphic such that we introduce a bijective funcion ψ : We can define isomorphism in multilayer networks in the same way as of plain graph. If two multilayer graphs are isomorphic, one can be transformed into another by some vertex renaming simply. When we define a vertex isomorphism, introduce a function ψ which is bijective such that M ψ 1 = M 2 . It is essential for one more elementary bijective function, which defines the isomorphism to layers, called layer-isomorphism. The function,τ α : L α → L ′ α , renames the layers of a network to exhibit the isomorphism. There are some situations in which both vertices and layers relabel together for achieving isomorphism, that define a functionδ = (ψ, τ 1 , τ 2 ..τ m ) as a blend of vertex map ψ and layer map τ .
The complete picturization of isomorphism concerning vertex, layer, and both is shown in Fig.3. The actual practice, computation of various types of isomorphism is executed simply with minimum computational complexity. The induced graph enumeration process is complicated while we move on to multiple layers, called aspects, denoted as d.
Complexity increases with the number of aspects increases. The auxiliary graph construction is defined as 2 d types of isomorphism. The small subgraph, which is the basic building blocks, are called motifs. All sub-graphs of a given graph need to be verified and classified into different graph classes for motif analysis. Each element of all groups is to be verified.

MULTIPLEX NETWORKS
We use computational methods to analyze the isomorphism in multi-layer networks. The leading and most common type of multi-layer network is multiplex networks, most suitable for dealing with empirical data. [27], [28], [29], [30]. Multiplex Networks illustrate the advantages of isomorphism. There are multiple types of interactions between vertices, either within a layer or between layers. Multiplex networks are represented using an array of graphs.
The main feature of a multiplex network is that each layer has the same cardinality of vertices, V α = V β for all layers α, β. The characteristic behavior of the multiplex network is called vertex-aligned. In order to bridge the conceptual rifting, a multiplex network describes by choosing a single aspect, M α , considered as linking each vertex to its corresponding replica in neighboring layers. The method by which each vertex establishes a connection with its counterparts in another layer is known as coupling. The isomorphism is facilitated by establishing either categorical coupling or by the inter-layer connections established. The isomorphism classes are well defined in multiplex networks. The relabelling of the vertex is permuted in vertex-isomorphism by preserving the types of edges, and the layer isomorphism is also preserving relabelling of layers to monotonic nature. The matching process performs graph isomorphism, which works best in computational complexities when the number of subgraphs to be compared is relatively tiny. The motif analysis is the best example. The subgraphs enumeration problem is matching and counting non-isomorphic subgraphs [31]. While extending the same to multiplex networks, counting the subgraphs are tabulated with n vertices and aspect b is Definition 2: A node label function ψ : (V l1 1 , V l2 2 ) → K is an arbitrary set. The set K contains the elements, which are node labels. In the elementary group, if nodes v l1 1 and v l2 1 are equivalent in syntactic and semantic, iff ψ(v l1 1 ) = ψ(v l2 2 ). Definition 3: If two graphs M 1 and M 2 are said to be isomorphic by the labeling function ψ(), iff ∃m map : V l1 1 → V l2 2 is a bijective function, which satisfies the following conditions: Definition 4: In two graphs, M 1 is a subgraph of M 2 by defining the function to label nodes,ψ if ∃m map : V l1 is an injective function, which satisfies the following conditions Definition 5: In two graphs, M 1 is an induced subgraph of M 2 by defining the function to label nodes,ψ() if ∃m map : 2 is an injective function, which satisfies the following conditions VOLUME , 20XX Let us consider that the multilayer graph M 1 is a sub graph of M 2 , so M 1 can be searchable in M 2 successfully. A mapping function, consider an injective function M map : 2 is said to be partial mapping or full mapping depends on the domain of the mapping function. If D map ⊆ V l1 1 , it is called partial mapping and if D map = V l1 1 , it is called whole mapping.

A. CANDIDATE PAIR
The pair of node which is to be considered for adding with the existing mapping function m map in the given state is The set candidate pair P (mmap) includes the pair of open neighbours of closed branch nodes, and if there is no such combination of nodes, all the mapped combinations of nodes supplements with two un-mapped nodes.

B. CHECK CONSISTENCY
While adding a candidate pair P (mmap) to an existing mapping function for node labelling by the given problem of subgraph isomorphism. A function ConS(P (mmap) , m) check whether adding candidate pair P (mmap) into m leads to consistent mapping by the given problem. The procedure for analysis and counting motifs from a multilayer graphs is different from frequent subgraphs. The procedure for subgraph enumeration shown in Algorithm 1. This is different from frequent subgraphs. The steps for counting a subgraph progresses in two different approaches, measuring the size and performing isomorphism. We cannot give up any part of the main graphs for subgraph matching. So we are wasting time by searching all portions of the tree and allowing us to drain a lot of time while considering subgraphs one by one. procedure MOTIF-COUNT(G, g k ) for all subgraph c of T.root do Add v to end of G part end for if T.isLeaf () then ReportGraph() for all children c of T do match(c, G, k + 1, G sub ) end for end if end procedure Remove m f rom G sub end procedure

D. G-TRIES
We are making use of the advantages of tree data structure in the process of the mining-induced subgraph. All the subgraphs are loaded in the tree and follow the common topology. We recognize and label the importance of predecessors in the family of tree nodes to present the typical arrangement and pattern image. We utilize the advantages of the topology of the data structure G-Tries[Graph reTRIEval].
The G-Trie is the form of a multiway tree that can make used to store a collection of graphs, it is shown in Fig. 5. Each tree node accommodates information of a single graph vertex and traverses through predecessors. Traversal from root to leaf represents a single graph. Nodes in the path of the ancestry of a G-Trie contribute a common subgraph. Every node in a G-Trie can store information for a newly created vertex connected to the ancestry path. The most common graph representation is the adjacency matrix because of the simplicity of representation. For the representation, 1 is for connection between two vertices and 0 for its non-occupancy. we can store in it the equivalent row. When we increment the total number of predecessor structures, the size of the tree become decreases. The compression rate depends on the number of nodes and vertices in the G-Trie. The G-Trie compression ratio is termed as G − T rie cr G − T rie cr = 1 − #nodes in the tree nodes of stored graph The objective of first subtask is to enumerate all subgraphs for the given size k. It seems to be computationally higher in magnitude due to subgraphs ranging from small to largest. The second subtask is induced subgraph enumeration technique which is explained in the section As we move on to the last part, we are explicitly identifying size of each subgraph classes for a given k and given degree. All-natural, biological and engineered systems have interconnected subsystems. A multilayer graph represents such complex networks. Nodes reside at each layer. Such multilayer models are called multiplex graphs. But all nodes may not be available at all layers in almost all complex systems. Such graph representation is called a multilayer graph in general. Nodes interconnect with each other in the layer and between layers. Such graphs have complex topology and structural properties. The microscopic structure repeats in the systems help to study the structural and dynamical behavior of the systems. The self-repeating structure of the systems is called a motif. The study of motif helps to discover complex networks.
The most stable structure is the trie -the relation between three nodes. The isomorphic structure of the trie is depicted in Fig.8. When we consider layered structure, there are two possibilities for the layout. The first case is that all three modes are in the same layer. The properties of a simple graph are helpful for the study. But the latter one is, any one of the nodes resides in one layer, and the other two reside in another layer. The connection or relationship established between layers is as shown in Fig.9.
The isomorphic view of the motif in multilayer networks are shown in the Fig.9. The Layer 1 have three nodes, V 1 = The motif has three nodes where the other two nodes are in layer two, and the third node in layer one or the first nodes in layer two, and the other two in layer 1. The motif with tree nodes repeats throughout the entire structure. The structural analysis of the complex networks and dynamics can be easily studied through motif analysis, especially with the help of Trie.
The detailed study of motifs occurrence and its frequencies is the direct application of isomorphism that also addressed. More extensive networks and more complex motifs are possible to analyze; G-tries reduce computational and space complexity. Studying the microscopic structure of the complex network helps to model applications across domains like brain network, DNA network, food web, transportation network. Exploration of the structure of networks helps in discovering the dynamic nature of the real-world network. The microscopic structures help in discovering properties like symmetry, transitivity or clustering, reciprocity among the nodes. This study will further help in the spatial perception of real-world networks.
For a multi layer graph G M , the Algorithm 2 finds a ■ Proof 2: The above theorem is proved by induction. Consider the minimum |G sub | = 2 ⇒ no triad and T ∈ G sub . Hence G part = ϕ. Let |G sub | = 3.
For i ̸ = j, T i ̸ = T j ⇒ T i ∈ G part and ̸ ∈ G c part . Now lets assume |G sub | = k, k > 3, then there exists at least one triad T i ∈ G part .
The procedure match considers a subgraph as input and looks for the triad structure. It picks a matching structure and is marked into the set G part in Algorithm 1. The algorithm guarantees to find a triad if there exists. A trie structure is repeatedly constructed among the adjacent pair of vertices within and across the layers. Algorithm 1 also ensures that no duplicate triad is considered. As the subgraph G sub size increases, the complexity increases in exponential time. The Algorithm 2 gets all the matching subgraphs from G part , and enumerates the matching motifs.

V. CONCLUSION
The research work on isomorphisms makes an excellent foundation for many analyses in the fields of multilayer networks. An exceptional application of isomorphism in multilayer networks is the analysis of the structural portrayal of defining the equivalency between multilayer networks. In this article, we look into the analysis and counting motifs detection in multilayer networks. We have performed the induced subgraph enumeration method efficiently to show an outstanding result in the multilayer network explicitly. We defined motifs and accounted for them concerning the isomorphism classes-structure for enumerating subgraph and counting. Experiments on complex real-world networks show that our methodology performed much more efficiently in magnitude than others. By concentrating on the algorithmic view of motif detection, our proposed work concentrates not on simply enumerating and counting but on analyzing large and multilayer complex networks. The matching algorithm and counting network motif is NP-hard in nature. The pattern to be compared performs a match with all induced subgraphs which are enumerated. The complexity of the motif analysis grows as exponentially as the number of nodes in increases. DR. SWAMINATHAN J serves as an Associate Professor at the Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri Campus located at Kollam, Kerala. He has over 22 years of experience in industry, research and academia. He is passionate about teaching. He leads the Code@amrita club. His research interest includes program analytics, visualization and verification. He has publications in several reputed international journals and conferences.