Exploring Hubs and Overlapping Nodes Interactions in Modular Complex Networks

In real-world networks, nodes are usually organized into modules or communities of densely connected nodes. In situations where nodes can belong to multiple communities we say that the communities overlap, and the nodes shared by more than one community are called the overlapping nodes. This occurs especially in social networks where an individual belongs to various social groups and organizations such as working circles, family, friendship or virtual groups on the Internet. Complex networks are known to have a heavy tail degree distribution. Indeed, they are organized with a vast majority of nodes with few interactions and a small set of highly connected nodes called hubs. In this paper, our goal is to study the relationship between the overlapping nodes and the hubs. Indeed, we suspect that the hubs are in the vicinity of the overlapping nodes. If this assumption is confirmed, it gives a new perspective on how the communities are organized and of the crucial importance of the overlapping nodes. In an attempt to investigate the ubiquity of this property, we perform series of experiments on various real-world networks with overlapping community structure. Results show that the hubs represent always a large proportion of the one-step neighbors of overlapping nodes. These results may have implications in various contexts. For example, searching for the hubs in large networks can be done starting from the overlapping nodes. Furthermore, this study may also provide new directions for designing new community detection algorithms.


I. INTRODUCTION
In the study of complex networks, such as social, biological and information networks, many different topological features have been observed to occur commonly. One of the main common characteristic is the degree distribution of the nodes. It is well described by a non-homogeneous distribution with a heavy tail. This results in the majority of nodes share a low amount of connections and a small number of remaining nodes that have a large number of connections. The latter ones commonly referred as hubs [1] tend to be extremely influential. Indeed, in transportation networks, for example, the underlying topology based on a small number of hubs The associate editor coordinating the review of this manuscript and approving it for publication was Muhammad Zakarya . allows efficient travel services. Hubs play also the role of super-spreaders in the context of epidemic spreading.
Community structure is another characteristic frequently observed. Indeed, the majority of real-world networks are found to be naturally partitioned into multiple modules or communities. Many studies have been conducted to model and to analyse the community structure of a network [2]- [14]. Until now, there is no consensual definition of the community structure. The most influential informal definition considers the community structure as a partition of the networks into groups of vertices which are densely interconnected while being loosely connected with vertices of other communities. In other words, a community is a subset of highly connected nodes sparsely connected with the rest of the network. This assumption has been challenged. Indeed, other works [15]- [22] have shown that real-world networks can display overlapping and nested communities. While non-overlapping communities are made of modules where nodes belong to a single community, in overlapping communities, nodes can belong to more than one community. This is typical of social networks, where individuals can belong to various groups at the same time such as family, school, hobby and so on. Our goal here is to go deeper into the understanding of the interplay between the microscopic the mesoscopic organization of real-world complex networks. Indeed, the heavy-tail degree distribution of the networks is characterized by the emergence of a small set of highly connected nodes. Furthermore, overlapping nodes is one of the main organizing principles of modular networks. Investigating the relations between those two types of nodes is essential to develop an effective analysis and modeling tools for realworld networks. Analyzing the topological properties of the community structure is of prime interest. Indeed, it allows a better understanding of one of the main organizing principles encountered in real-world networks. It may also provide clues about the emergence of the community structure in realworld networks. Despite the interest of the issue, there are few works that dive deeply into the subject, in particular in the case of overlapping community structure. The main contribution of this paper is to characterize the relationship between the overlapping nodes and the hubs. Our aim is to give a clear answer about the relations between the hubs and the overlapping nodes in networks with overlapping community structure. To our knowledge, this is the first attempt to conduct a systematic study in order to check if hubs lie in the vicinity of overlapping nodes, and if this is a ubiquitous property of real-world complex networks. Based on a series of experiments (using different evaluation measures) performed on a variety of real-world networks originating from various fields, we show that the overlapping nodes are neighbors of the hubs. Exploring this property is a challenging issue from the perspective of the network structure. Indeed, our analysis sheds more light on how the communities are organized.
The rest of the paper is organized as follows. In Section 2, related work about the topological properties of the overlapping community structure of real-world networks presented. Section 3 introduces the data and the community detection algorithms used to perform the experiments. Section 4 is devoted to the presentation of the various measures and methods used to compare the set of overlapping nodes with the set of hubs. In Section 5, we report and discuss our empirical findings. Finally, conclusions are given in section 6.

II. RELATED WORK
In a seminal paper, Palla et al. [15] show that real-world networks can display significant overlap between communities. They introduce four relevant quantities in order to characterize the overlapping community structure in large networks. The membership number of a node quantifies the number of communities to which it belongs. The overlap size between two communities is the number of nodes they share. The community degree is the number of communities overlapping with it. Finally, the community size is the number of nodes of a given community. In their work, they investigate the distributions of these four quantities. Their results show that the community size exhibits a power-law distribution. These latter show also that the power-law distribution presents a good fit for the overlap size as well as the membership number. However, the community degree distribution exhibits a different behavior. We can distinguish two parts: an exponential decay followed by a power-law tail.
Another important study has been conducted by Yang et al. [23] in order to characterize the overlapping areas of the community structure. The authors show that nodes belonging to the overlapping zones between communities are more densely connected than those belonging to the nonoverlapping areas of the network. This behavior has been observed in a series of experiments conducted on six large networks of various origins. These results contradict the conventional assumption that the overlapping nodes are more sparsely connected than the non-overlapping parts of the communities. In their work, they study the edge probability of a pair of overlapping nodes as a function of the number of their shared communities. Results show that the edge probability between two overlapping nodes increases with the number of communities they have in common. In their experiments, they also investigate if the most connected node in each community belongs to the overlapping zone. More precisely, they study the probability that a hub belongs to the community overlap as a function of the overlap size. Results show that community hubs are not central in a community. They actually tend to reside in the overlapping zone. In addition, the probability that a hub belongs to the overlap area increases linearly with the overlap size. Another work reported in [8] studies the relationship between the transitivity and the community structure strength measured by the network modularity. Extensive experiments show that transitivity increases accordingly with the community structure strength. Furthermore, if a weak community structure is associated with a low transitivity value, the opposite is not true. A network with a close to zero transitivity can still have a well-defined community structure.
More recently Kudelka et al. [24] presented a new perspective on the problem of group detection bridging the gap between structural and ground-truth communities. Using the non-symmetric structural similarity between pairs of nodes, they introduce an algorithm to detect groups referred as zones. Their approach allows highlighting the prominent nodes responsible for large zone overlaps. Results of their investigations on real-world networks clearly show the existence of large and dense overlaps of detected groups.
Besides these works on the topology of the community structure, there have been some other attempts to characterize the overlapping nodes mainly in the context of diffusion dynamics [25]- [28]. Indeed, overlapping nodes play a highly relevant role in the network due to their ability to reach multiple communities. We recently reported their importance on the epidemic spreading process [29], and how they can be used in order to design effective immunization strategies. Indeed, when an epidemic outbreak occurs, it is not possible to immunize every individual through vaccination due to the limited amount of resources or time. Designing effective immunization strategies is therefore crucial to control the epidemic spreading and to reduce the cost of vaccine resources. Classically, deterministic targeted immunization strategies select the most influential spreaders to immunize according to a given centrality measure and an immunization budget. Nodes are ranked in the decreasing order of their centrality value and immunized in that order until the budget of the vaccine is over. We show that overlapping nodes deserve special treatment in deterministic immunization strategies. Indeed, these nodes have a higher level of both local influence and global influence as compared to nodes belonging to a single community.
Works in the same vein have been reported in [27], but in the random immunization strategy context. In order to mitigate the epidemic outbreaks in networks with overlapping structure, Kumar et al. proposed the so-called OverlapNeighborhood strategy. It immunizes randomly selected neighbors of the overlapping nodes. This local strategy is agnostic about the global structure of the network. It requires only to locate the overlapping nodes. It is, therefore, more appropriate for large scale networks than a deterministic strategy that requires to rank all the nodes according to a centrality measure. The main idea of the OverlapNeighborhood strategy is that there is a high probability that overlapping nodes are neighbors of high degrees nodes. Thus, once the overlapping nodes are identified, one can target the hubs in their neighborhood for immunization. Experiments performed with four empirical networks showed that it is almost as effective as the deterministic degree strategy where nodes are ranked in decreasing order of their degree. Therefore OverlapNeighborhood can select highly connected nodes for immunization. It performs sometimes as well as the betweenness strategy while using less information about the overall network structure. Remember that usually random immunization strategies are less effective than deterministic ones. These results corroborate the importance of the overlapping nodes. The fact that the random OverlapNeighborhood strategy is as effective as the deterministic degree centrality strategy suggests that the same set of nodes are targeted in both case. Consequently, there is a high probability that the hubs targeted by the deterministic strategy are immediate neighbors of the overlapping nodes.
To summarize, there is a great deal of work on complex networks topological properties. It is commonly admitted that they are organized around a small set of highly connected nodes, and that the overall structure is modular with nodes belonging to multiple communities. However, how these two types of nodes interact is still an open question. In this paper, we investigate the assumption that the overlapping nodes are the neighbors of the hubs. This idea is inspired by the effectiveness of the OverlapNeighborhood immunization strategy as compared to the degree deterministic strategy.
Note that, to our knowledge, no previous work that systematically explores the relationship between overlapping nodes and hubs has been reported.

III. DATA AND METHODS
In this section, we present briefly the data under investigation and the community detection algorithm used in the study. Indeed, as there is no information about the true community structure of these networks, different community detection algorithms are used to uncover it. This allows us to check the sensitivity of the results to the variations of the community structure linked to the algorithms.

A. DATASET DESCRIPTION
The selected networks come from a variety of domains (social, co-appearance, collaboration and e-commerce networks). Their size varies from hundreds to thousands of nodes and millions of edges to cover a wide range of situations. A short description of these networks follows, for more information refer to [34].

1) ZACHARY's KARATE CLUB
It is a social network of 34 members of a karate club at a US university in 1970. The nodes represent the members of the club while the edges represent their friendship.

2) DOLPHINS
It is a social network of frequent associations between 62 dolphins in a community living in Doubtful Sound, New Zealand.

3) TERRORIST
It is a social network of known social association of the hijackers responsible for the September 11 th , 2001 terrorist attacks.

4) ECOLOGY
This network represents the interactions among species and organisms within an ecosystem.

5) LES MISERABLES
It is a co-appearance network of characters in Victor Hugo's novel 'Les Miserables'. The nodes in this network represent the characters of this book and an edge between two nodes exists if they appear in the same chapter of the novel.

6) GAME OF THRONES
It is a co-appearances network of the characters of the Game of Thrones series. An edge between two nodes exists if they appear within 15 words of each other in the text.

7) ADJNOUN
This network contains the common adjective and noun adjacencies for the novel ''David Copperfield'' of Charles Dickens. An edge exists if two pairs of words occur in adjacent positions in the text of the book. 79652 VOLUME 8, 2020

8) AMERICAN COLLEGE FOOTBALL
It is a network of American football games between Division IA colleges during regular season Fall 2000.

9) EUROROAD
This network represents the international road network located mostly in Europe. It is an undirected network where nodes stand for cities and an edge between two nodes denotes that they are connected by a road.

10) CRIME
This network contains persons who appeared in at least one crime case as either a suspect, a victim, a witness or both a suspect and victim at the same time.

11) YEAST PROTEIN INTERACTION
It is a network of the protein interactions contained in yeast. Nodes represent proteins while edges stand for their metabolic interaction.

12) EGO-FACEBOOK
The ego-Facebook network is collected from survey participants using the Facebook app.

13) ca-GRQC
It is a collaboration network which has been collected from the e-print arXiv. It covers scientific collaborations between authors of papers submitted to the General Relativity and Quantum Cosmology category.

14) GNUTELLA PEER-TO-PEER
It is a sequence of snapshots of the Gnutella peer-to-peer file sharing network from August 2002. The nodes represent the hosts in the Gnutella network while the edges are connections between the Gnutella hosts.

15) WIKIPEDIA VOTE
This network contains all the Wikipedia voting data from the inception of Wikipedia until January 2008. The nodes represent wikipedia users and an edge from node i to node j means that user i voted on user j.

16) ca-HepTh
It is a collaboration network extracted from arXiv. It covers scientific collaborations between authors of the High Energy Physics -Theory category. The network contains an edge from i to j if an author i co-authored a paper with author j.

17) ca-AstroPh
This network represents scientific collaborations between author's papers submitted to Astro Physics category. It contains an undirected edge from i to j if an author i co-authored a paper with author j.

18) FACEBOOK LARGE PAGE-PAGE
It is a network of verified Facebook sites collected in November 2017 and restricted to pages from four categories (television shows, politicians, companies and organizations).
The nodes stand for official Facebook pages while the edges represent mutual likes between sites.

19) ca-CondMat
It is a collaboration network extracted from arXiv. It covers scientific collaborations between authors of the Condense Matter category. The network contains an edge from i to j if an author i co-authored a paper with author j.

20) ENRON EMAIL
It is a communication network where nodes represent email addresses. A given node i is connected to j if the address i sent at least one email to address j.

21) AMAZON CO-PURCHASING NETWORK
This network is collected from Amazon web site. If a product i is frequently co-purchased with product j, the graph contains an undirected edge from i to j.

22) DBLP
The DBLP computer science bibliography contains the list of research papers in computer science. Nodes represent authors and edges connect nodes that have co-authored a paper.
The basic topological properties of these networks are given in Table 1. There is eight small networks with sizes ranging from 34 to 115 nodes. Zachary's karate club network is the one with the smallest size. Seven big empirical networks are also used with sizes ranging from 9877 to 334863 nodes. Amazon and DBLP are the networks with the biggest sizes. Six networks have a medium size. All the networks have a relatively high clustering coefficient value ranging from 0.258 to 0.633. They are disassortative except for American college football and the collaboration networks. The networks of medium and big sizes have a very small density, while it is relatively high for small networks. The basic properties of these empirical networks are very typical of what is generally observed in many real-world situations.

B. COMMUNITY DETECTION ALGORITHMS
Community structure is a common property of many realworld networks [29], [36]- [44]. Research on community detection is very active and a plethora of new algorithms based on various definitions are regularly published. In this paper, three influential overlapping community detection algorithms belonging to two different classes are used. Lancichinetti Fortunato Method and EAGLE (LFME) [45] is a local expansion and optimization algorithm. The Speakerlistener Label Propagation Algorithm (SLPA) together with the Democratic Estimate of the Modular Organization of a Network (DEMON) [46] belong to the label propagation category. More details are given in recent surveys about community detection methods [6], [7].

1) LANCICHINETTI FORTUNATO METHOD AND EAGLE (LFME)
This algorithm is a combination between the Lancichinetti Fortunato Method (LFM) [47] and the agglomerativE hierar-chicAl clusterinG based on maximaL cliquE (EAGLE) [48] VOLUME 8, 2020 TABLE 1. Common Topological properties of the real-world networks under study. N is the total number of nodes. E is the number of edges. < k > is the average degree. k max is the max degree. C is the average clustering coefficient. α is the degree assortativity of the network. µ represents the density.
algorithm. It is commonly known that communities are usually overlapping and hierarchical. Yet, the majority of algorithms investigate these two properties in a separate way. The Lancichinetti Fortunato Method and EAGLE (LFME) use the fitness function optimization of the LFM algorithm to identify with accuracy the overlapping communities. In addition, the hierarchical structure is defined using EAGLE algorithm. The LFME algorithm begins by picking a random seed node as the original member of a community. Then, nodes are added to this community until a fitness function is locally maximal. After uncovering one community, the same process is applied to another seed node until all the vertices of the network are assigned to at least one community. The second stage of this algorithm consists on selecting a specific similarity function. After that, the pair of communities with the maximum similarity are merged into one community. Then, the similarity function is computed between the new community and other communities. This process stops when we obtain just one community. Thus, the whole process can form a dendrogram. This means that the hierarchical structure is uncovered. Through this dendrogram, the overlapping modularity is computed. Finally, the dendrogram is cut when the overlapping modularity has a maximum value.
2) SPEAKER-LISTENER LABEL PROPAGATION ALGORITHM (SLPA) SLPA [35] is an extension of the Label Propagation Algorithm (LPA). While in LPA, each node holds only a single label that is iteratively updated by adopting the majority label in the neighborhood, in SLPA each node possesses a memory containing multiple labels. Starting from a node selected as a listener, its neighbors send out a label following certain speaking rules. The listener selects one label according to a listening rule and adds it to its memory. Once all the nodes have been visited, the communities are extracted from the node's memory converted into a probability distribution of labels that defines the membership degree to communities.

3) DEMOCRATIC ESTIMATE OF THE MODULAR ORGANIZATION OF A NETWORK (DEMON)
DEMON [46] tends to affect a node to the most frequent community by the application of a label propagation algorithm on its neighbors sub-graphs. In other words, for each node, their neighbors vote for its community membership. All the votes are then combined to construct the overlapping community structure.

IV. EVALUATION MEASURES
We suspect that the hubs are neighbors with the overlapping nodes. In order to investigate this hypothesis, we need to form and compare these two sets. To do so, first of all, the overlapping community structure of real-world networks is uncovered using an overlapping community detection algorithm. The overlapping nodes and the set of their neighbors are extracted. Note that n is the size of the neighbors of the overlapping nodes. Then, the set of hubs is formed using the n highest degree nodes of the network (Top n nodes). Strictly speaking, the size of the set of hubs is not necessarily of the same size that the set of the neighbors of the overlapping nodes. Nevertheless, this choice is conservative. It is motivated by the fact that some similarity measures need to be computed on sets of the same size. In fact, some nodes in the hub set may not have a high degree enough to be called hubs, however, rather than speaking of this set as ''the set of the top degree nodes'', we choose to call it the set of hubs for short. As our purpose is to investigate the similarities between these two sets, we compute classical measures such as the proportion of common nodes of the two sets, the Jaccard 79654 VOLUME 8, 2020

Algorithm 1 Extraction of the Set of Hubs
Input : Graph G(V , E) Output: Set of neighbors of the overlapping nodes X and set of hubs Y 1 Extract the set of overlapping nodes S o using a given overlapping community detection algorithm 2 Initialize the size of X the set of neighbors of the overlapping nodes n ←− 0 3 for each v ∈ S o do 4 Add all the neighbors of the node v to the set X 5 n ← n + 1 6 end 7 Sort the set of neighbors of the overlapping nodes X in decreasing order according to their degree 8 Sort all the nodes of the network in decreasing order according to their degree 9 Add the top n nodes of the network to the set of hubs Y 10 Return X , Y Index, the Rank-biased overlap and the correlation measures between the two sets (Pearson and Spearman). We also compare their degree distribution. Secondly, we form the subnetwork restricted to the overlapping nodes, the hubs and their links (the so-called Overlap-Hub network), in order to compare some of its topological properties (diameter, mean shortest path) with the original network. Indeed, we can expect that the distances are smaller in the Overlap-Hub network as compared to the original network if the proportion of hubs that are immediate neighbors of the overlapping node is high enough.

A. COMPARISON BETWEEN THE SET OF NEIGHBORS OF THE OVERLAPPING NODES AND THE SET OF HUBS
Various measures are computed on real-world data to compare the set of neighbors of the overlapping nodes and the set of hubs. They are defined as follows:

1) PROPORTION OF COMMON NODES IN THE SET OF NEIGHBORS OF THE OVERLAPPING NODES AND THE SET OF HUBS
This measure assesses how many hubs are directly connected to the overlapping nodes. The set of hubs is computed according to the algorithm 1. At first, all the neighbors of the overlapping nodes are added to the set X . Next, all the repeated elements are removed from the set X . The size of the set of neighbors of the overlapping nodes n is defined as (n = |X |). After that, the set of hubs is computed. All the nodes of the network are sorted in a decreasing order according to their degree. Finally, the set of hubs Y is defined as the top n nodes of the network.
The proportion of common nodes in the set of neighbors of the overlapping nodes and the set of hubs (proportion of hubs for short) is defined as the ratio of the size of the intersection of the two sets to the size of the neighborhood of the overlapping nodes set. For a given network G, it is given by: where X and Y are the set of neighbors of the overlapping nodes and the set of hubs respectively. n is the size of the neighborhood of the overlapping nodes. If the hubs are randomly distributed, this proportion should be very small. We consider that if this value is greater than 50%, a high proportion of the hubs are neighbors of the overlapping nodes.

2) RANK-BIASED OVERLAP
The measure, presented in [30], quantifies how well two ranking sets are in agreement with each other. It computes the fraction of overlapping elements of the two rankings while incrementally increasing their depths. Furthermore, the overlap of each rank in this measure has a fixed weight. It is set by a parameter that gives more weight to the top elements of the set if it has a low value. Consequently, the top of the set gets a higher weight than the tail. Let's X and Y be two infinite rankings. The set of elements ranges from position 1 to position d in set X is denoted as X :d . The proportion of the overlap of two sets X and Y at depth d can be defined as: where |X :d ∩Y :d | is the size of the intersection between sets X and Y at depth d. The Rank-biased overlap between two sets X and Y is defined as follows: where w d is the weight at position d, which is equal to: The Rank-biased overlap belongs to the interval [0, 1]. The value 0 means that the sets X and Y are disjoint, while 1 means that they are identical. The parameter p determines the weights of the elements. The smaller p, the more top-weighted the metric. When p = 0, only the top-ranked item is considered, and the RBO score is either zero or one. When p approaches 1, the evaluation becomes deep in both sets.

3) JACCARD INDEX
The Jaccard index is a statistic used to measure the similarities between two finite sample sets. It is formally known as the ratio of the size of the intersection between the sets to the size of their union. It is defined as follows: where X and Y are two sets. The Jaccard index can take values ranging between 0 and 1. It is equal to 0 when there is no overlap and 1 when there is a complete overlap between the sets. VOLUME 8, 2020

4) CORRELATION
The correlation is used to measure the association between the set of neighbors of the overlapping nodes and the set of hubs. Two different types of correlation coefficients are employed: The Pearson's correlation coefficient is a common measure of association between two vectors. Let's consider X and Y the vectors of degrees of nodes belonging respectively to the ordered set of neighbors of the overlapping nodes and the ordered set of hubs. The Pearson correlation coefficient ρ between the two vectors X and Y can be obtained by: where: Spearman's correlation coefficient is a rank-based version of the Pearson's correlation coefficient. We compute the Pearson correlation between the ranked set of neighbors of the overlapping nodes and the ranked set of the hubs of the same size. The ranks are computed according to the degree of nodes. The Spearman correlation coefficient denoted ρ s between the two vectors X and Y can be written as follows: where rank(x i ) and rank(y i ) represent the ranks of the elements x i and y i respectively. Both correlation coefficients range from −1 to +1. The values of the two vectors tend to increase or decrease simultaneously (a positive monotonic association) results in ρ > 0, and the values of one vector tend to increase when the values of the other decrease (negative monotonic association) results in ρ < 0. The absence of a monotonic association between the two vectors results in ρ equals to 0. In addition, the correlation is moderate if the coefficient ranges between 0.5 and 0.7. The correlation is significant if the coefficient ranges between 0.7 and 1.0.

5) DEGREE DISTRIBUTION
Real-world networks have some common topological features that distinguish them from random graphs. The most popular property is the heavy-tailed degree distribution [31]. The degree of a node represents the number of connections the node has to other nodes in the network (number of neighbors). Thus, the degree distribution P(k) of a network is estimated as the proportion of nodes in the network with degree k. It is a relevant characteristic of networks which indicates the overall pattern of connections. For a large amount of realworld networks, it often follows a heavy-tailed distribution such as a power-law [32]. The power-law distribution can be denoted as P(k) = k −γ , where γ is a positive exponent. The exponent value ranges usually between 2 and 3 according to several experimental studies [32], [33]. The power-law exponent γ can be employed to characterize the networks. in order to estimate the degree distribution of a network, a wellknown approach is to fit a power-law distribution and to find its power-law exponent.

B. COMPARISON BETWEEN THE OVERLAP-HUB NETWORK AND THE GLOBAL NETWORK
Our aim is to compare the main properties of the global network and the sub-network called the ''Overlap-Hub network''. It is formed with the overlapping nodes, the hubs and their interactions. To this end, we analyze different properties. We measure the average distance, the median distance as well as the diameter of the Overlap-Hub network. These values are compared to their counterpart in the global network. Before describing these properties, we provide a formal definition of the Overlap-Hub network.

1) OVERLAP-HUB NETWORK
denotes the set of nodes and edges respectively. The set of its overlapping nodes is denoted where k represents the numbers of hubs. The Overlap-Hub network is formed from the union of the set of overlapping nodes and the set of hubs. It is obtained by removing all nodes that do not belong to one of the two sets. The Overlap-Hub network is denoted are respectively its set of nodes and edges. Figure 1 illustrates the representation of the Global and the Overlap-Hub network for the Karate club network.

2) DISTANCES
Three topological properties are introduced in this subsection. They are computed on both the original network and the Overlap-Hub network.

a: AVERAGE DISTANCE
A shortest path between two vertices in a network is a path with the minimum number of links. The average distance computes the average shortest path lengths in a network. Let's consider that d(v i , v j ) denotes the distance or the length of the shortest path between nodes v i and v j . The average distance l of a given network G is defined as: where n is the number of nodes of network G.

b: MEDIAN DISTANCE
The median distance is the value separating the higher half from the lower half of the length of shortest paths in FIGURE 1. Representation of the Global and Overlap-Hub networks formed from Karate club data. Nodes are highlighted in different colors according to the community they belong to. Nodes with the same color belong to the same community. Those in gray represent the overlapping nodes. We note that the left column of the global network represents the overlapping nodes, the nodes in the middle column are the hubs while those located in the right column represent the remaining nodes. The size of the nodes is proportional to their degree. Overlapping nodes of the Overlap-Hub networks are at the left while hubs are at the right. The community structure is revealed using the SLPA detection algorithm.
the network. The basic advantage of computing the median distance as compared to the average distance is that it may give a better idea of a more meaningful shortest path value. Indeed, the median distance is not skewed by a small proportion of extremely small or large values. To compute this value, the shortest paths between all the pairs of nodes of the network are sorted in an increasing order. Then, the middle value is picked as the median distance if there is an odd number of shortest paths values. Otherwise, the median distance is defined as the average of the two middle values.

c: DIAMETER
The diameter of a given network represents the largest distance between any two nodes in the network. A small diameter denotes that the nodes of the network are tightly connected. The diameter δ of a network is defined as follows:

V. RESULTS AND DISCUSSION
In this section, we report and discuss our empirical findings.
Various evaluation criteria are used to analyze the relationship between the overlapping nodes and the hubs. At first, we compare the set of neighbors of the overlapping nodes with the set of hubs. To this end, the proportion of common nodes in the two sets and the Rank-biased overlap measures are computed. The correlation is also used to measure the statistical association between the two sets. Moreover, the degree distribution of both sets is estimated and compared. In the second part of our experiment, keeping only the overlapping nodes, the hubs and their connections from the original networks, the so-called Overlap-Hub networks are formed. Topological properties of the Overlap-Hub networks are then compared with their original counterpart. To do so, we analyze some popular topological network properties such as the average and the median distance for both networks. Furthermore, the influence of both the neighborhood size and the membership degree of the overlapping nodes are investigated. In the following, results are reported first based on the SLPA uncovered community structure. Then, in order to evaluate the impact of the community detection algorithms, comparisons are performed with the community structures uncovered by LFME and DEMON. VOLUME 8, 2020 FIGURE 2. Representation of the Global and Overlap-Hub networks formed from Les Miserables network. Nodes are highlighted in different colors according to the community they belong to. Nodes with the same color belong to the same community while those in gray represent the overlapping nodes. We note that the left column of the global network are the overlapping nodes, the nodes in the middle column are the hubs while those located in the right column represent the rest of the nodes. The left column of the Overlap-Hub network are the overlapping nodes, the nodes in the middle column are the hubs. The size of the nodes is proportional to their degree. The community structure is revealed using the SLPA detection algorithm.

A. COMPARISON BETWEEN THE SET OF NEIGHBORS OF THE OVERLAPPING NODES WITH THE SET OF HUBS 1) PROPORTION OF COMMON NODES IN THE SET OF NEIGHBORS OF THE OVERLAPPING NODES AND THE SET OF HUBS
In this experiment, we compute the proportion of identical nodes in the set of hubs and set of the neighbors of the overlapping nodes. A high value of this proportion needs to be uncovered in order to validate the assumption that the two sets are similar. Let's first consider the small size networks. Indeed, in this case, we can compare the two sets visually. The figs. 1 to 8 illustrate the tripartite representation of Zachary's Karate Club, Les Miserables, Game of thrones, Dolphin, Terrorist, Ecology, Adjnoun and American college football networks respectively. The grey nodes of the left column are the overlapping nodes, the nodes in the middle column are the hubs and those located in the right column represent the rest of the nodes. Note that the size of nodes is FIGURE 3. Representation of the Global and Overlap-Hub networks formed from Dolphins network. Nodes are highlighted in different colors according to the community they belong to. Nodes with the same color belong to the same community while those in gray represent the overlapping nodes. We note that the left column of the global network are the overlapping nodes, the nodes in the middle column are the hubs while those located in the right column represent the rest of the nodes. The left column of the Overlap-Hub network are the overlapping nodes, the nodes in the middle column are the hubs. The size of the nodes is proportional to their degree. The community structure is revealed using the SLPA detection algorithm.
proportional to their degree. These networks have a number of overlapping communities ranging from 2 to 5. One can notice that all the overlapping nodes are connected to the hubs and that these hubs belong to different communities. As shown in all the figures, the overlapping nodes are densely connected with the hubs while being sparsely connected with the rest of the nodes. Another way to compare both sets is to rank the nodes in both sets in degree decreasing value and to compute the maximum size of the sets where there is always 100% of the nodes that are common in both sets. This gives us a conservative value as compared to the proportion of common nodes. However, it allows to see if we can reasonably accept the assumption that the hubs are the neighbors of the overlapping nodes. Indeed, we know that the proportion of hubs in a network with a heavy-tailed degree distribution is quite small. If the top 10% of the nodes are common in both sets there is a high probability that they include all the hubs of the network. Table 2 reports the set of overlapping nodes, the set of the neighbors of the overlapping nodes as well as the set of hubs for these small networks. VOLUME 8, 2020 FIGURE 4. Representation of the Global and Overlap-Hub networks formed from Game of thrones network. Nodes are highlighted in different colors according to the community they belong to. Nodes with the same color belong to the same community while those in gray represent the overlapping nodes. We note that the left column of the global network are the overlapping nodes, the nodes in the middle column are the hubs while those located in the right column represent the rest of the nodes. The left column of the Overlap-Hub network are the overlapping nodes, the nodes in the middle column are the hubs. The size of the nodes is proportional to their degree. The community structure is revealed using the SLPA detection algorithm.
The nodes are ranked in decreasing order of their degree values. Labels that are common to the set of neighbors of the overlapping nodes and the set of hubs are written in blue. One can notice that there is a high overlap between these two sets. Furthermore, nodes that are different in the two sets are the ones with the lowest degree values. In the Karate Club network, the top 5 high degree nodes are in the overlapping nodes neighborhood. As the network is made of 34 nodes, this value represents 14.7% of the nodes with the highest degree. In Les Miserables, it is the 23 top high degree nodes out of 77 nodes of the network that belong to the overlapping nodes neighborhood. It is, therefore, 29.8% of nodes with the highest degree which are in the vicinity of the overlapping nodes. The top 16 high degree nodes out of the 107 nodes of the Game of Throne network are neighbors of the overlapping nodes. This represents 14.9% of the highest degree nodes. In the Dolphin network, the node Jet (node ranked with the fifth-highest degree) does not belong to the neighborhood of the overlapping nodes. So, we can say that the top 4 high degree nodes are in the vicinity of the overlapping nodes. This corresponds to 6.8% of the nodes with the highest degree in the network. For the Ecology network, the top 6 high degree FIGURE 5. Representation of the Global and Overlap-Hub networks formed from Terrorist network. Nodes are highlighted in different colors according to the community they belong to. Nodes with the same color belong to the same community while those in gray represent the overlapping nodes. We note that the left column of the global network are the overlapping nodes, the nodes in the middle column are the hubs while those located in the right column represent the rest of the nodes. The left column of the Overlap-Hub network are the overlapping nodes, the nodes in the middle column are the hubs. The size of the nodes is proportional to their degree. The community structure is revealed using the SLPA detection algorithm.
nodes out of 69 nodes of the network are neighbors of the overlapping nodes. This represents 9% of the highest degree nodes. In addition, in Adjnoun, the top 15 high degree nodes out of 112 belong all to the neighborhood of the overlapping nodes except for the node 'little' which is one of the overlapping nodes. So, this represents 12.5% of the highest degree nodes. In Terrorist network, not all the highest-ranked nodes belong to the neighborhood of the overlapping nodes. If we consider the top six nodes with the highest degrees, we can notice that only three nodes ('32', '51' and '24') are neighbors with the overlapping nodes. However, the node '38' which is the second most connected node is one of the overlapping nodes. In the American college football, only the top 6 high degree nodes out of 115 nodes are neighbors of the overlapping nodes (5.21%). This small percentage is due to the fact that this network does not exhibit a heavytailed degree distribution. Thus, the majority of nodes in the American college football have the same degree. In some FIGURE 6. Representation of the Global and Overlap-Hub networks formed from Ecology network. Nodes are highlighted in different colors according to the community they belong to. Nodes with the same color belong to the same community while those in gray represent the overlapping nodes. We note that the left column of the global network are the overlapping nodes, the nodes in the middle column are the hubs while those located in the right column represent the rest of the nodes. The left column of the Overlap-Hub network are the overlapping nodes, the nodes in the middle column are the hubs. The size of the nodes is proportional to their degree. The community structure is revealed using the SLPA detection algorithm.
networks, it is also noticed that the set of hubs and the set of overlapping nodes have some nodes in common. Labels that are common to these two sets are written in red. In Terrorist network, two nodes belonging to the set of overlapping nodes ('38' and '22') are also considered as hubs. This corresponds to 8% of the hubs. Furthermore, the node '38' is the secondranked node of the network. The same for Ecology network, two overlapping nodes belong to the set of hubs. They represent 4.44% of the hubs. In Adjnoun network, six overlapping nodes are also considered as hubs. They represent 11.11% of the hubs. Additionally, one of these nodes (node 'little') is the most connected node of the network. That being said, there is a very high proportion of hubs that belongs to the neighborhood of the overlapping nodes in small networks. Representation of the Global and Overlap-Hub networks formed from Adjnoun network. Nodes are highlighted in different colors according to the community they belong to. Nodes with the same color belong to the same community while those in gray represent the overlapping nodes. We note that the left column of the global network are the overlapping nodes, the nodes in the middle column are the hubs while those located in the right column represent the rest of the nodes. The left column of the Overlap-Hub network are the overlapping nodes, the nodes in the middle column are the hubs. The size of the nodes is proportional to their degree. The community structure is revealed using the SLPA detection algorithm.
Another quite smaller proportion belongs to the set of overlapping nodes. The hubs excluded from this area are the ones with low degrees. All these results are a strong indication that hubs are neighbors of the overlapping nodes.
Results for all the networks under test are reported on Table 3. We can consider that the fraction of common nodes in both sets is high if its value is higher than 50%. If we look at all the networks, the proportion of hubs belonging to the overlapping nodes neighborhood is ranging from 59.25% to 87.27%. It is always above 50%. However, the highest value is for small size networks. As the network size grows, there are more and more peripheral hubs that are not in the vicinity of the overlapping nodes. Indeed, overlapping nodes are in the core of the communities. Moreover, as for small networks, VOLUME 8, 2020 FIGURE 8. Representation of the Global and Overlap-Hub networks formed from American college football network. Nodes are highlighted in different colors according to the community they belong to. Nodes with the same color belong to the same community while those in gray represent the overlapping nodes. We note that the left column of the global network are the overlapping nodes, the nodes in the middle column are the hubs while those located in the right column represent the rest of the nodes. The left column of the Overlap-Hub network are the overlapping nodes, the nodes in the middle column are the hubs. The size of the nodes is proportional to their degree. The community structure is revealed using the SLPA detection algorithm.
we also check if the top-ranked nodes of the set of hubs are neighbors to the overlapping nodes. To do so, we compute the proportion of the most connected nodes located in the neighborhood of the overlapping nodes. We note that the top-ranked nodes represent 10% of nodes with the highest degrees of the network. Experimental results show that the proportion of the top-ranked hubs is always very high for most of the networks. That means that the nodes belonging to the set of hubs with the highest degrees are neighbors of the overlapping nodes. Additionally, it can be noticed that the set of hubs and the set of the overlapping nodes have some nodes in common. This is consistent with the results of Yang et al. study [23] showing that the overlapping nodes are densely connected. However, the proportion of common nodes is relatively small as compared to the proportion of common nodes in the set of the neighborhood of overlapping nodes and the set of hubs. Therefore, the majority of hubs are neighbors of the overlapping nodes, and a small proportion of them are also overlapping nodes.

2) JACCARD INDEX
We compute the Jaccard index between the set of neighbors of the overlapping nodes and the set of hubs. This measure is used to emphasize the similarity between both sets.
Results are reported in Table 3. One can notice that the Jaccard index displays high values for the vast majority of the networks. Only 5 networks out of the 22 understudy exhibit a Jaccard index value below 50%. Results are well correlated with the previous observations. Values are smaller as compared to the proportion of common nodes in both sets. This is due to the fact that the size of the neighborhood of the overlapping nodes is sometimes very large. Its size is VOLUME 8, 2020 TABLE 3. Estimated parameters of real-world networks under study for the SLPA community detection algorithm. N is the size of the network. µ is the density. on is the fraction of overlapping nodes. S is the size of the neighborhood of the overlapping nodes. A n is the proportion of hubs in the neighborhood of the overlapping nodes while σ A is its standard deviation. J represents the Jaccard index between the set of neighbors of the overlapping nodes and the set of hubs while σ J is its standard deviation. A t (%) is the proportion of the top connected nodes in the neighborhood of the overlapping nodes, where t is equal to 10% of the most highly ranked nodes of the network (t = 0.1 * n). A o n is the proportion of hubs belonging to the set of overlapping nodes. We note that each proportion value is the average of 10 SLPA simulation runs. even greater than 65% in some small networks. Consequently, some nodes belonging to the set of hubs have small degrees, since it has the same size as the set of neighbors of the overlapping nodes. So, there is a high chance that these nodes with a low degree do not take part in the neighborhood of the overlapping nodes. This explains the dissimilarities between the set of neighbors of the overlapping nodes and the set of hubs.

3) RANK-BIASED OVERLAP
The Rank-biased overlap (RBO) is also computed in order to compare the set of neighbors of the overlapping nodes and the set of hubs. This measure quantifies the fraction of the overlapping elements of two sets while incrementally increasing their depths. Two sets are similar if their RBO value is equal to 1, whereas they are disjoint if the value is equal to 0. The RBO is computed between the set of neighbors of the overlapping nodes and the set of hubs while assigning different weights to the elements for both sets. The parameter p determines the weights of the elements (see Subsection II.A). More importance is given to the comparison of the top elements of the two sets when the p value is small. Results of the RBO computation for various p values are reported in Table 4. One can notice that the RBO values are quite high when almost all the elements have the same weight (p = 0.98). Indeed, the overlap between the two sets ranges between 83.4% and 99.9%. This value increases when more weight is given to the top-ranked elements of the sets. The overlap between both sets ranges from 98.9% to 100% in this case (p = 0.5). These results confirm the previous observations reported in Table 2. Indeed, almost all the top-ranked nodes are colored in blue. Thus, these nodes belong to the set of neighbors of the overlapping nodes. Therefore, almost all the most highly connected nodes of the network are located in the neighborhood of the overlapping nodes. To conclude, the set of neighbors of the overlapping nodes is generally very similar to the set of hubs. This validates the assumption that hubs are neighbors of the overlapping nodes. Furthermore, the top-ranked nodes according to their degree are always in the vicinity of the overlapping nodes.

4) CORRELATION
In this section, we report results on the correlation between the set of hubs and the set of neighbors of the overlapping nodes. First, the Pearson correlation coefficient, given in Equation 5, is used to measure the degree of relationship between both sets. It is computed using the degrees of these sets sorted in decreasing order. Secondly, we use also the Spearman correlation to measure the strength of the monotonic relationship between the ranks of the set of neighbors of the overlapping nodes and the set of hubs. The ranks of the sets are computed according to the degree of nodes. We note that the value of both correlation measures varies between −1 (perfect negative correlation) and 1 (perfect positive correlation). To illustrate the process, Table 5 represents the correlation between the set of neighbors of the overlapping nodes and the set of hubs for Karate club network. At first, we define both sets based on algorithm 1. Pearson correlation is then computed between the degrees of nodes belonging to these two sets. The nodes are sorted in decreasing order of their degree and each degree is associated with a rank. We note that the highest degree has a rank equal to 1. Spearman correlation is then computed based on the set of ranks of neighbors of the overlapping nodes and the set of ranks of hubs according to Equation 7. It is noticed from Table 5 that nodes belonging to the set of neighbors of the overlapping nodes as well as the set of hubs have about the same degrees and ranks (the overlapping elements of both sets are in the blue color). That explains why the values of both Pearson and Spearman correlation are very high (close to 1). Therefore, there is a very strong monotonic relationship between the set of neighbors of the overlapping nodes and the set of hubs in Karate club network.
Results of Pearson and Spearman correlation for all the other empirical networks are reported in Table 6. One can notice that the values of the correlation measures are most of the time greater than 0.9 for different networks whatever their size and origin. Therefore, it clearly indicates that there is a very strong relationship between the set of hubs and the set of neighbors of the overlapping nodes for all the networks under test. Indeed, the degrees, as well as the ranks of nodes belonging to the set of neighbors of the overlapping nodes, decrease in the same way as those belonging to the set of hubs. Globally, both sets have nearly the same degrees and ranks. These results corroborate the findings made using the previous measures. Thus, one can say that a large amount of nodes belong to both sets.

5) HUB DISTRIBUTION ACCORDING TO THE DEGREE OF OVERLAPPING NODES
One interesting question is to know if hubs tend to link more with overlapping nodes with a high degree. In order to VOLUME 8, 2020 answer it, we compute the proportion of hubs in the neighborhood of the overlapping nodes as a function of the degree of overlapping nodes. Results for the various networks are reported in Figure 9. One can see from these figures that the distributions are quite uniform. This result is valid for almost all the overlapping nodes whatever their degree and for all the networks. Therefore, we can conclude that hubs are linked to overlapping nodes quite uniformly.

6) DEGREE DISTRIBUTION
Here, we examine the degree distribution of both the set of neighbors of the overlapping nodes and the set of hubs. Our purpose is to check if they present the same empirical distribution. The degree distribution can be appropriately described as a power-law (P(k) = k −α ) for a wide number of networks [32], [49]. The value of the exponent of the powerlaw ranges between 2 and 3 according to many experimental studies [32]. Figure 10 represents the cumulative degree distribution of neighbors of the overlapping nodes as well as the hubs for the various empirical networks. These figures show that the cumulative degree distribution of both sets displays the same behavior. In addition, the neighbors of the overlapping nodes as well as the hubs have approximately the same cumulative degree distribution. This is more apparent namely for Condense matter collaboration and Enron Email networks. Moreover, Figure 11 reports the empirical degree distribution of the sets of neighbors of the overlapping nodes together with the hubs for all the real-world networks. It is shown in this figure that the two sets exhibit similar degree distribution (described by a heavy tail degree distribution). Their estimated distribution under the power-law hypothesis is also represented in this figure. The maximum likelihood estimators [31] is used to compute the exponent values which are reported in Table 7. For all the networks, it is noticed that the estimated exponents of the theoretical distribution for neighbors of the overlapping nodes and hubs have quite the same values. Therefore, one can conclude from this experiment that the power-law seems to be a suitable distribution for both sets. Additionally, their exponents have about the same values.

B. COMPARISON BETWEEN THE GLOBAL NETWORK AND THE OVERLAP-HUB NETWORK
In this experiment, we compare the mean and median distance values of the Overlap-Hub network (containing only the overlapping nodes and the hubs) with those of the original network. figs. 1 to 8 (b) are the bipartite representation of the Overlap-Hub network for respectively Karate club, Les Miserables, Dolphin and Game of thrones networks. Nodes in the left column (colored in gray) are the overlapping nodes while the nodes in the right column are the hubs of the overall network presented in figs. 1 to 8 (a). The distance of the Overlap-Hub network is quantified by its average and median measure. It tells us how much the overlapping nodes are close to the hubs. The smaller the value of this measure, the more likely the overlapping nodes are neighbors to the hubs. Table 11 reports the average and the median distance of the Overlap-Hub network (distance between the overlapping nodes and the hubs) together with the same measures for the global network.
Results show that the average distance values for the Overlap-Hub network are quite small. They are lower than the average distance of the global network. In small networks (Karate club, Les Miserables, Dolphin, Game of Thrones and American football network), one can also notice that the average distance of the Overlap-Hub network is slightly lower than the one of the global network. In these networks, the density of links is quite high as compared to the density of large networks. On top of that, small networks have also a relatively small diameter. So, the nodes are tightly connected even if we consider the entire network. That what makes the average shortest path for the global network has also a small value. This explains the small differences between the values of these measures computed on both networks.
In medium and large networks, the average distance of the Overlap-Hub network has always a relatively very small value as compared to the same measure computed on the global network. These networks have very low links density while having a large diameter. That explains why the average distance between overlapping nodes and hubs is much smaller than the average distance of the global network. This means that the overlapping nodes and the hubs are tightly connected as compared to the rest of the network. Moreover, results also show that the median distance displays the same behavior as for the average distance. Indeed, the median distance of the Overlap-Hub network is slightly lower or equal to the one of the global network for all the networks with small sizes. However, the difference is more significant in medium and large networks. Therefore, this confirms that each overlapping node can be neighbor to a large amount of hubs.

C. INFLUENCE OF THE NEIGHBORHOOD SIZE OF OVERLAPPING NODES
In the previous experiments, we checked if the hubs are immediate neighbors of the overlapping nodes. Now, we want to know if there are more hubs in the vicinity of the overlapping nodes. To do so, we increase the size of the neighborhood of the overlapping nodes by increasing the size of the max distance to define the neighborhood. This parameter represents the maximum number of links that must be taken from an overlapping node to reach the perimeter of its neighborhood. Thus, we select only the immediate neighbors of the overlapping nodes if we set the distance to 1. By incrementing the distance, we increase also the circle of the overlapping nodes' neighborhood. The purpose of this investigation is to examine the influence of the size of the overlapping neighborhood on the proportion of hubs belonging to this area. Table 10 reports the proportion of hubs in the neighborhood of overlapping nodes as a function of the distance. One can see that the values of the proportion of common nodes in the set of hubs and the set of neighbors of overlapping nodes hubs exhibit the same behavior. They increase with the distance. Note that it is equal to 100% for small size networks when the distance is only incremented once. This is because these networks have a small diameter. In networks of medium and big sizes, the proportion of hubs does not reach its maximum value until setting the distance to 5 or 6. This is due to their VOLUME 8, 2020 relatively larger diameter. We note the maximum value of the proportion of hubs in the neighborhood of overlapping nodes is usually less than 100%. This is because some hubs belong also to the set of overlapping nodes. Actually, by increasing the size of the neighborhood of the overlapping nodes, there is a high chance that it contains more hubs belonging to different communities. That is the reason why the proportion of hubs always increases while increasing the size of the neighborhood of the overlapping nodes whatever the network origin and size.

D. INFLUENCE OF THE MEMBERSHIP NUMBER OF THE OVERLAPPING NODES
The aim of this experiment is to study the influence of the membership number of overlapping nodes on the proportion of hubs in their neighborhood. Remember that the membership number is the number of communities to which the overlapping node belongs. Table 8 reports the example of Dolphin networks. This network has thirteen overlapping nodes with VOLUME 8, 2020 TABLE 8. Proportion of hubs in the neighborhood of overlapping nodes as a function of their membership for Dolphin network. om is the membership degree of the overlapping nodes. A n is the proportion of hubs in the neighborhood of the overlapping nodes. a membership degree equal to 2, while it has eleven overlapping nodes with a membership degree equal to 3. It is shown in this table that the overlapping nodes with the lower membership (om = 2) has also a lower proportion of hubs in their neighborhood. However, this proportion reaches 100% in the neighborhood of the overlapping nodes with the higher membership (om = 3). Table 9 reports the proportion of hubs in the neighborhood of overlapping nodes as a function of their membership number for the other networks under study. In some networks, all the overlapping nodes have the same membership number (Karate club, Les Miserable, Game of thrones, Dolphins and ego-Facebook). They are not represented in this table. First, note that the values of the overlap with the set of hubs generally increases with the membership number value. In ca-GrQc, for instance, the proportion of hubs is around 86% if the overlapping nodes belong to only two different communities, while it is 100% when the membership of the overlapping nodes is equal to 6. Indeed, if the overlapping node belongs to several communities, it is more likely to be neighbor to a high number of hubs belonging to these multiple modules. In other words, the proportion of hubs in the neighborhood of an overlapping node is higher if it belongs to a higher number of communities. However, as the size of the networks increases, differences are less pronounced.

E. INFLUENCE OF THE COMMUNITY DETECTION ALGORITHMS
In this section, we report the same series of experiments conducted on all the empirical networks of various sizes using alternative overlapping community detection algorithms to SLPA. Both Lancichinetti Fortunato Method and EAGLE (LFME) [45] and Democratic Estimate of the Modular Organization of a Network (DEMON) [46] detection algorithms are used in these experiments. Our main goal is to check the validity of our hypothesis while using a different overlapping community detection algorithm.

1) COMPARISON OF THE UNCOVERED COMMUNITY STRUCTURES
Here, we analyze five measures to highlight the differences between the revealed communities by the 3 community detection algorithms used in the experiments. The Normalized Mutual Information (NMI), the mixing parameter estimate, the number of communities, the fraction of overlapping nodes and the overlapping modularity measure are computed to compare the community structures uncovered by SLPA, LFME and DEMON. Results for all the networks are reported in Table 12.
In the community detection literature, [50] the Normalized Mutual Information (NMI) is commonly used to compare two community structures. Its value is close to 1 if the community structures are very similar and close to 0 if they do not share any information. In small networks, the NMI between the three algorithms has medium values. Yet, it is slightly higher between SLPA and LFME algorithms. This shows that there are significant similarities between the communities revealed by the three algorithms. In medium and large networks, the NMI has relatively small values for all the algorithms. It is higher between SLPA and LFME detection algorithms. This indicates that in this type of networks, the community structures are quite different. We also report the mixing parameter in this table. This measure represents the fraction of the intercommunity links of nodes. It determines the strength of the community structure. For small values, the communities are well defined. For all the networks, the mixing parameter has small values for the three detection algorithms. However, its values are quite close between SLPA and LFME algorithms while it is higher for DEMON algorithm. Thus, the community structure defined by both SLPA and LFME algorithms has similar strengths. Moreover, for small networks, the three algorithms detect nearly the same number of communities. This confirms the similarity of the community structure in this type of networks. In medium and large networks, we observe a larger variation of this measure. The proportion of overlapping nodes is another evaluated parameter. The results show that LFME detects the lower fraction of overlapping nodes followed by SLPA algorithm while this measure is quite high for DEMON algorithm.
Furthermore, the modularity is also used to quantify the quality of the community structure. It assesses the internal connectivity of the identified communities as compared to a random network with no community structure. Note that the overlapping modularity values are low for all the three detection algorithms. However, SLPA has the highest values followed by LFME while DEMON has the lowest values. Therefore, according to this measure for all the various networks, SLPA is the most accurate algorithm, followed by LFME then DEMON.

2) COMPARISON OF THE DISCOVERED OVERLAPPING NODES
Here, we compare the sets of overlapping nodes defined by the various detection algorithms. Our purpose is to check if all the community detection algorithms uncover the same set of overlapping nodes. To do so, we measure the proportion of the overlaps between the sets two by two using the Jaccard index as well as the Rank-biased overlap. Let's start with the small networks that can be visualized. Figure 12 and 13 present the tripartite representation of respectively Karate club and Les Miserables networks using (a) SLPA (b) LFME (c) DEMON algorithms. We note that the left column of the network represents the overlapping nodes, the nodes in the middle column are the hubs while those located in the right column are the remaining nodes. In Karate club, the sets of overlapping nodes detected by SLPA, LFME and DEMON have only one node in common (node 30), while two overlapping nodes are shared by SLPA and DEMON (nodes 30 and 8). In this network, the overlap proportion between SLPA and LFME is higher than the one between SLPA and DEMON. The same behavior is noticed in Les Miserables. The sets of overlapping nodes detected by SLPA, LFME and DEMON have only two nodes in common (Fantine and Marius). In these networks, DEMON algorithm detects a larger number of overlapping nodes as compared to the two other algorithms. In addition, it is also noticed that some nodes defined as hubs with SLPA and LFME algorithms belong to the set of overlapping nodes for DEMON algorithm. That is why the overlaps between the sets of overlapping nodes for SLPA and LFME algorithms are higher than when DEMON algorithm is used. This is true in Karate club and Les Miserables networks. Table 13 reports the proportion of overlaps between the sets of overlapping nodes for all real-world networks when different detection algorithms are used. Results show that the overlaps between the sets of overlapping nodes for the different algorithms are small. They do not exceed 34%. However, unlike the previous networks (Karate club and Les Miserables), SLPA and DEMON algorithms have generally more nodes in common as compared to the LFME algorithm. Table 13 reports also the Rank-biased overlap between these sets. In this measure, we set the parameter p to the value 0.5 to give more weights to the top-ranked nodes. The values of the RBO confirm the findings obtained by the previous results. They display small values. However, they are higher between SLPA and DEMON algorithms. That means that there are usually higher similarities between the sets of overlapping nodes generated by SLPA and DEMON algorithms. These two algorithms detect a large proportion of overlapping nodes while LFME detects a smaller proportion. Therefore, the sets of overlapping nodes defined by SLPA and DEMON algorithms have more nodes in common than the one defined by LFME algorithm.

3) COMPARISON OF THE DISCOVERED NEIGHBORS OF THE OVERLAPPING NODES
Here, we repeat the same experiments between the sets of neighbors of the overlapping nodes defined by SLPA, LFME VOLUME 8, 2020 TABLE 10. Proportion of hubs in the neighborhood of overlapping nodes as a function of the distance. The distance is the maximum number of links from an overlapping node to reach the perimeter of its neighborhood. A n is the proportion of hubs in the neighborhood of overlapping nodes. A o n is the proportion of hubs belonging to the set of overlapping nodes. S represents the size of the neighborhood of the overlapping nodes. Each proportion value is the average of 10 SLPA simulation runs. The standard deviation values are omitted in this table because of their small values ranging from 0 to 3.7.

TABLE 11.
Distance measures of the empirical networks. δ G represents the diameter of the global network while δ is the diameter of the Overlap-Hub network. l G represents the average distance of the global network while l is the average distance between the overlapping nodes and the hubs. l Gm is the median value of the average distance of the global network while l m is the median value of the average distance of the Overlap-Hub network. Each proportion value is the average of 10 SLPA simulation runs. The standard deviation values are omitted in this table because the ranges are very small. and DEMON algorithms. We measure the proportion of overlaps as well as the Rank-biased overlap between these sets two-by-two.
It can be noticed from Figure 12 and 13 that the sets of neighbors of the overlapping nodes revealed by SLPA and LFME have the majority of nodes in common for both Karate club and Les Miserables networks. DEMON has less nodes in common with SLPA and LFME algorithms in these networks. This is because the set of neighbors of the overlapping nodes revealed by DEMON is very large as compared to the other algorithms. Table 14 reports the proportion of overlaps between the sets of neighbors of overlapping nodes for different empirical networks when various community detection algorithms are used. Results show that the overlap between the sets of neighbors of overlapping nodes for the different algorithms is usually high. They can even reach 78%. Yet, unlike in Karate club and Les Miserables networks, SLPA and DEMON algorithms have generally higher proportion of overlap as compared to the LFME algorithm. Indeed, both SLPA and DEMON detect a large proportion of TABLE 12. The estimated number of communities N c , fraction of overlapping nodes on, the mixing parameter µ, the overlapping modularity Q and the Normalized Mutual Information (NMI) in the networks under study for community structures uncovered by SLPA, DEMON and LFME algorithms.

FIGURE 12.
Tripartite representation of Karate network using (a) SLPA (b) LFME (c) DEMON. Nodes are highlighted in different colors according to the community they belong to. Nodes with the same color belong to the same community while those in gray represent the overlapping nodes. We note that the left column of the network are the overlapping nodes, the nodes in the middle column are the hubs while those located in the right column represent the rest of the nodes. The size of the nodes is proportional to their degree. overlapping nodes. Thus, their neighborhood of the overlapping nodes is also large as compared to LFME algorithm. Table 14 reports also the Rank-biased overlap between the sets of neighbors of the overlapping nodes. The parameter p is set to the value 0.5 to give more weights to the topranked nodes. It is noticed that the RBO exhibits also very high values (close to 1). That means that the most highly connected nodes belong to all the sets of neighbors of the overlapping nodes defined by the three community detection algorithms.

4) COMPARISON OF THE DISCOVERED OVERLAP-HUB NETWORKS
In this part of the experiment, we measure the similarities between the discovered sets of the nodes forming the Overlap-Hub network (overlapping nodes and hubs) when SLPA, LFME and DEMON algorithms are used. To do so, we compute the Jaccard index as well as the Rank-biased Overlap between the revealed sets.
It can be noticed from Figure 12 and 13 that the nodes of the Overlap-Hub network (left and middle column) uncovered by, the three community detection algorithms in Karate club and Les Miserables networks have a majority of nodes in common. Indeed, some nodes can be identified as overlapping nodes by some algorithms and as hubs by others. For instance, nodes 33, 0 and 32 in Karate club network are considered as hubs with SLPA and LFME algorithms and as overlapping nodes with DEMON algorithm. The same behavior appears in Les Miserables network for nodes Valjean, Gavroche, Javert and Enjolras. They are considered as hubs for SLPA and LFME algorithms and as overlapping nodes for DEMON algorithm. One can find the explanation in the study reported by Yang et al. [23]. Indeed, the authors show that the overlapping nodes are more densely connected than the other nodes of the network. Thus, they can be considered as hubs by some community detection algorithms. Table 15 reports the Jaccard index between the sets of overlapping nodes and hubs revealed by different community detection algorithms. Experimental results show that the Jaccard index values are very high (close to 1). The values are even higher between SLPA and DEMON algorithm. Therefore, the Overlap-Hub networks defined by the different algorithms have a majority of nodes in common. Table 15 reports also the Rank-biased overlap between the same sets for the three tested algorithms. We note that the parameter p is set to the value 0.5 to give more weights to the top-ranked nodes. It can be noticed that the RBO values are in most of the time equal to 1. These values consolidate the results obtained using the Jaccard index. It shows that the Overlap-Hub networks uncovered by SLPA, LFME and DEMON algorithms are very similar.

5) COMPARISON BETWEEN THE SET OF NEIGHBORS OF THE OVERLAPPING NODES AND THE SET OF HUBS
In this section, we use the same evaluation criteria to examine the relationship between the overlapping nodes and the hubs using LFME and DEMON algorithms. At first, the set of neighbors of the overlapping nodes and the set of hubs are compared. To do so, the previous measures (proportion of hubs, Rank-biased overlap and correlation) are computed. We also analyze their degree distribution. Table 16 represents the proportion of hubs in the neighborhood of the overlapping nodes. As for SLPA, in all the studied networks, the proportion of hubs in the overlapping neighborhood has always high values for both LFME and DEMON. However, the fraction of hubs neighbors of the overlapping nodes is higher for DEMON algorithm. This is due to the higher number of the overlapping nodes revealed by this detection algorithm. In this case, we have a larger neighborhood of the overlapping nodes. Thus, there is a higher chance that it contains more hubs. Table 17 reports the Rankbiased overlap for LFME and DEMON algorithms. Note that TABLE 13. Overlap between the sets of overlapping nodes discovered by SLPA, LFME and DEMON for different real-world networks. J represents the Jaccard Index. r is the Ranked-biased Overlap. Its parameter p is set to the value 0.5 to give more weight to the top-ranked nodes.
the RBO values are quite high for all the community detection algorithms. As for the previous measure, DEMON has always slightly higher values. In addition, the values of the RBO get higher when more weight is given to the top-ranked nodes. That means that the most highly connected nodes of the network belong to the neighborhood of the overlapping nodes for all the community detection algorithms. We also compute both Pearson and Spearman correlation for both detection algorithms. The results are illustrated in Table 18. For all the empirical networks, the values of the two coefficients are most of the time very close to 1 when LFME and DEMON are employed. Therefore, there is a strong monotonic relationship between the set of neighbors of the overlapping nodes and the set of hubs for the three detection algorithms. Globally, DEMON gives better results (in terms of the proportion of hubs, RBO and correlation) as compared to the other algorithms. However, this algorithm defines a very large set of neighbors of the overlapping nodes. It can reach sometimes 71% of the size of the network. In this case, several nodes can be identified as hubs even if they have a small degree. This is because the set of hubs has the same size as the set of neighbors of the overlapping nodes. Thus, LFME identifies a more meaningful set of hubs followed by SLPA.
The degree distribution of the set of neighbors of the overlapping nodes and the set of hubs is also studied for LFME and DEMON algorithms. Figure 14 presents the Overlap between the sets of neighbors of the overlapping nodes discovered by SLPA, LFME and DEMON for different real-world networks. J represents the Jaccard Index. r is the Ranked-biased Overlap. Its parameter p is set to the value 0.5 to give more weight to the top-ranked nodes.
degree distribution of neighbors of the overlapping nodes and the hubs for LFME and DEMON algorithms. It can be noticed that the two sets exhibit the same behavior for both algorithms. They display a heavy-tailed degree distribution. The estimated exponents under the power-law hypothesis are also computed and reported in Table 19. Results show that the exponents of the set of neighbors of the overlapping nodes and the set of hubs are very close whatever the used community detection algorithm. Furthermore, we also compute the proportion of hubs while increasing the size of the neighborhood of the overlapping nodes when LFME and DEMON are used. Results reported in Table 21 show that, as for SLPA, the proportion of hubs increases when we increase the neighborhood of the overlapping nodes for both algorithms. Moreover, Table 20 represents the proportion of hubs as a function of the membership of the overlapping nodes. Results show also that the proportion of hubs increases as the membership of the overlapping nodes gets higher for all the community detection algorithms.

6) COMPARISON BETWEEN THE GLOBAL NETWORK AND THE OVERLAP-HUB NETWORK
In the second part of this experiment, we compare also the Overlap-Hub network and the global network using LFME and DEMON algorithms. Table 22 represents the average and the median distance of both networks for LFME and DEMON VOLUME 8, 2020  Overlap between the sets of the union of the overlapping nodes and the hubs discovered by SLPA, LFME and DEMON for different real-world networks. J represents the Jaccard Index. r is the Ranked-biased Overlap. Its parameter p is set to the value 0.5 to give more weight to the top-ranked nodes.

TABLE 16.
Estimated parameters of the real-world networks for LFME and DEMON detection algorithms. on is the number of overlapping nodes. S is the size of the neighborhood of the overlapping nodes. A n is the proportion of hubs in the neighborhood of the overlapping nodes while σ A is its standard deviation. We note that each proportion value is the average of 10 simulation runs.

TABLE 17.
Rank-biased overlap r between the set of neighbors of the overlapping nodes and the set of hubs. This measure is computed using LFME and DEMON detection algorithms. p determines the weights of the elements. More weights is given to the first elements of both sets if p has a small value. Each proportion value is the average of 10 simulation runs. The standard deviation of RBO values are omitted in this table because the ranges are very small (close to 0). algorithms. We aim to measure how much the overlapping nodes are close to the hubs when different community detection algorithms are used. In small networks, as for SLPA, the average and median distance of the Overlap-Hub network is slightly better than the one of the global network when LFME and DEMON are used. This is due to the high density Log-log representation of degree distribution for real-wold networks under study using LFME and DEMON detection algorithms. The empirical degree distribution of neighbors of the overlapping nodes is in blue, it is in red for the Hubs. Power-Law estimates representation are in black.
of links in these types of networks. In medium and large networks, the average and the median distance between overlapping nodes and hubs are much smaller compared to those of the global network for both algorithms. However, the values of the average and median distance of the Overlap-Hub network get higher when DEMON algorithm is used. Indeed, this algorithm uncovers a large number of overlapping nodes having a large neighborhood size. Thus, the Overlap-Hub network has also a larger size in this case. That explains why the average and the median distance of the Overlap-Hub network are higher for DEMON algorithm. Globally, the distance of the Overlap-Hub network is much smaller than the one of the global network whatever the community detection algorithm used. Therefore, the overlapping nodes VOLUME 8, 2020  and the hubs are tightly connected as compared to the rest of the network when the communities are uncovered by all the three community detection algorithms.

7) SUMMARY
To summarize, the community detection algorithms uncover similar communities in small networks. Indeed, they have a medium values of the NMI and close values of the mixing parameter and number of communities. In medium and large networks, their community structure can be quite different. They have small NMI values (relatively higher values between SLPA and LFME), and a different number of communities. Furthermore, we also use the overlapping modularity to measure the quality of the community structure revealed by the algorithms. The modularity values using the different algorithms are small with the lowest values for DEMON algorithm. Thus, SLPA is the most accurate detection algorithm followed by LFME, then DEMON algorithm. Moreover, we measure the overlap proportion between the sets of overlapping nodes defined by the three community detection algorithms. Results show that the overlap proportion between the algorithms is small. However, it is most of the time higher between SLPA and DEMON algorithms. The same experiment is performed between the sets of neighbors of the overlapping nodes. Results show there is generally a large intersection of these sets for all the community detection algorithms. Yet, there are smaller overlaps between LFME and both SLPA and DEMON algorithms. This is because LFME usually detects a smaller number of overlapping TABLE 20. Proportion of hubs in the neighborhood of overlapping nodes as a function of their membership. om is the membership degree of the overlapping nodes. on(%) is the proportion of overlapping nodes. A n is the proportion of hubs in the neighborhood of overlapping nodes. Each proportion value is the average of 10 simulations for LFME and DEMON detection algorithms. The standard deviation values are omitted in this  table because of their small values. nodes having then a smaller neighborhood size. Additionally, the most highly connected nodes of the network belong to the sets of neighbors of the overlapping nodes revealed by the three community detection algorithms.
Besides comparing the community detection algorithms, we also perform the same experiments using LFME and DEMON algorithms to check the robustness of the results obtained using the SLPA algorithm. We first compare the set of neighbors of the overlapping nodes and the set of hubs. The proportion of hubs, Rank-biased overlap, correlation and the degree distribution of both sets are analyzed. Experimental results show that all the measures display quite similar behavior as compared to the ones obtained based on SLPA. Generally, we get slightly better results when DEMON algorithm is used even though it is the less accurate community detection algorithm. Indeed, this algorithm detects a large amount of overlapping nodes. Thus, the ego-network of overlapping nodes has a big size for all the networks (it can reach 71% of the network). This is why the results are better when DEMON algorithm is used. However, we obtain close results when LFME and SLPA algorithms are employed. Secondly, we compare the Overlap-Hub network and the TABLE 21. Proportion of hubs in the neighborhood of overlapping nodes as a function of the distance. The distance is the maximum number of links from an overlapping node to reach the perimeter of its neighborhood. A n is the proportion of hubs in the neighborhood of overlapping nodes. A o n is the proportion of hubs belonging to the set of overlapping nodes. S represents the size of the neighborhood of the overlapping nodes. Each proportion value is the average of 10 simulation runs for LFME and DEMON detection algorithms. The standard deviation values are omitted in this table because of their small values.

TABLE 22.
Distance measures of the empirical networks. δ G represents the diameter of the global network while δ is the diameter of the Overlap-Hub network. l G represents the average distance of the global network while l is the average distance between the overlapping nodes and the hubs. l Gm is the median value of the average distance of the global network while l m is the median value of the average distance of the Overlap-Hub network. Each proportion value is the average of 10 simulation runs for LFME and DEMON detection algorithms. The standard deviation values are omitted in this table because the ranges are very small.
global network by computing their average and median distance using LFME and DEMON algorithms. Results show that the distance of the Overlap-Hub network is always much smaller than the same measure computed on the global network. Yet, this difference gets slightly smaller when DEMON algorithm is used. That being said, both experiments show that the overlapping nodes are neighbors with a large proportion of hubs. Therefore, our hypothesis seems valid whatever the community detection algorithm used to uncover the community structure.

VI. CONCLUSION
The community structure is one of the main topological features of a vast majority of real-world networks. Unveiling their properties is of great interest in order to gain a better understanding of the structure and dynamics of complex networks. In this paper, we characterize the relationship between the overlapping nodes and the highly connected nodes of the networks (hubs). The overlapping nodes can belong to multiple communities. Thus, we believe that there is a high chance that they can be neighbors with the highly connected nodes in their respective communities. In order to characterize the relation between the hubs and the overlapping nodes, a series of experiments have been performed on a set of real-world networks of different sizes and origins. At first, we compare the set of neighbors of overlapping nodes and the set of hubs using some evaluation measures (the proportion of hubs, Rank-biased overlap and correlation). Extensive investigations show that there is a big overlap between these two sets. It appears that a high proportion of the hubs are VOLUME 8, 2020 one-step neighbors of the overlapping nodes. This confirms the assumption that overlapping nodes are neighbors of the highly connected nodes of the network. Results show also that the Ranked-biased Overlap between the set of neighbors of the overlapping nodes and the set of hubs has very high values. It confirms that there is a great similarity between the two sets. Additionally, there is a very high correlation between the set of neighbors of the overlapping nodes and the hubs. Moreover, the degree distribution analysis of both sets shows that they display comparable empirical distribution. They exhibit a power-law degree distribution with very close exponent values. Secondly, the global network has been compared with its sub-network formed by the overlapping nodes and the hubs (the so-called Overlap-Hub network). The average and median shortest paths of both networks have been compared in order to examine how overlapping nodes are close to the hubs. Experimental results show that the distance between overlapping nodes and hubs is much smaller as compared to the overall network.
Furthermore, the influence of the neighborhood size of the overlapping nodes on the proportion of hubs has been investigated. Results show that using a n-steps neighborhood allows finding a larger proportion of hubs in this neighborhood. In addition, we also look at the influence of the membership degree of overlapping nodes on the proportion of hubs in their neighborhood. Results show that if overlapping nodes belong to a higher number of communities, the proportion of hubs in their neighborhood becomes higher. We also performed the same experiments using LFME and DEMON detection algorithms to uncover the communities. Results of the investigations show that the conclusions are quite comparable. Even if the community structures are not the same, the evaluation measures of the proximity of overlapping nodes and hubs have overall close values. This is quite valid for all the networks under test. This confirms that the overlapping nodes are neighbors with a large amount of hubs.
Results of our investigations are relevant in multiple settings. Indeed, our analysis sheds light on the organization of complex networks and provides new directions for research on community detection. This work can also help to elaborate new strategies to target the most influential nodes in modular networks.