Local Community Detection: A Survey

Community detection is a flourishing research field with a plethora of applications ranging from biology to sociology. Local community detection has emerged as a promising subfield of research concerned with community identification around a set of seeding nodes. The practical significance of local community detection is important for numerous real-world applications such as protein interactions and targeted advertising. Since 2005, when the first research paper on local community detection appeared, the literature has been vast and difficult to navigate, as each method works best under certain conditions and assumptions regarding the seed nodes and the identification of their community. For this reason, and motivated by the many real-world applications of local community detection, in this paper we provide a comprehensive overview and taxonomy of local community detection algorithms. There are quite a lot of surveys on community detection that make a compendious reference to local community detection. However, they do not achieve a systematic and comprehensive coverage of this particular field. Since the research area of local community detection is quite extensive, it is necessary to categorize and discuss the various methods, techniques, and assumptions used to address the problem. This survey aims to fill this gap and help researchers get a clear overview of the local community detection problem. To this end, we have also gathered the best documented tools and the most commonly used datasets in the local community detection literature to help researchers identify the tools they can use to prove their methods.


I. INTRODUCTION
Various real world systems [1], [2], [3], [4] are often described by networks because it is a convenient way to represent data. In particular, community detection can reveal the hidden structures and functions in networks [5], [6], and therefore attracts the attention of researchers from various fields [7].
As a result, there are many algorithms in the literature which aim at identifying communities in networks. Most of them focus on detecting the partitioning of the entire network into communities, i.e., global community detection [8], which implies a global knowledge of the structure of the entire network. Girvan and Newman's (GN) [9] method was the The associate editor coordinating the review of this manuscript and approving it for publication was Yang Tang . first proposed method for global community detection. Since then, the problem of community detection has attracted the attention of a large part of the scientific community, and a very large number of articles have already been published. More importantly, this large number of articles spans a variety of different disciplines, from computer science to physics, to biology, and to social sciences.
In reality, however, it is more common to know only part of a network, either because of its size or because it is dynamic (e.g., the WWW network) or because one is interested in a particular part of the network (e.g., particular neurons in a brain network). In such cases, it is very difficult or even impossible to obtain the information of the whole network [10]. As a result, local community detection (LCD) has attracted the attention of researchers. In general, LCD is applied to find one or a few communities starting from VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ certain nodes of the network, called seeds. This problem is also known as seed set expansion [11]. The practical importance of LCD is very significant for numerous real-world applications, such as analysing terrorist activities or determining the placement of advertisements in social media [12]. Furthermore, LCD can be applied in web search, where with a few known pages containing similar information, one can generate a larger set of web pages that contain the relevant content related to a particular search query. Besides, in product networks, expanding the starting set allows an analyst to automatically categorize products that are in the same community as the tagged items [13]. In addition, LCD is also critical in biological networks analysis like protein-protein interactions, where LCD helps determine additional members of a protein complex [14], or in human brain co-activation networks, where the discovered communities provide spatial and functional meanings to the analyst. Nowadays, people pay great attention to the flavor profiles of culinary ingredients in order to customize their diets. The network of flavors where nodes represent ingredients and edges are between two ingredients if they have similar flavor compounds, can be used to create new recipes from existing ones by considering the local communities discovered in this network for key ingredients [15]. What is more, LCD is effectively used in collaboration networks where an analyst can determine the membership of a particular person in a working group by detecting local communities [16]. Clauset A. [17] was the first to suggest an LCD method. Since then, several LCD methods have been proposed by researchers. These approaches are not all appropriate for every LCD problem. Many factors need to be evaluated to select the most appropriate [18].
The motivation of this survey stems mainly from the fact that the literature for local community detection is scattered mainly between Physics and Computer Science and there is not a single point of entrance for a new researcher in this area. This results in raising the difficulty of entrance to this area of research. Indeed, to the surprise of the authors when they started working on local community detection, there was no publication (or a survey to be more specific) that would allow us to get to know this research area faster. This was the starting point for envisaging such a survey that could help researchers quickly review existing work and decide which of them fits their own problem.
Additionally, motivated by the numerous practical applications of LCD in real world networks, and with a view to help scientists choose quickly the best fitting method for particular problems, we present here a classification of existing local community detection techniques. While there are some surveys on community detection in general, as far as the authors are aware, there is no specific survey on the important problem of LCD. Therefore, the innovation of our paper lies on the fact that it attempts to fill this gap by providing an assistive taxonomy of existing LCD methods in the literature. Furthermore, the fundamental difference between our survey and other existing surveys, is that our work is dedicated to local community detection techniques and not generally to community detection.

A. CONTRIBUTIONS
The purpose of this review is to organize and categorize the existing approaches to LCD in networks that have been proposed in the literature especially over the last decade. Taking into account past, present, and future trends, the present work aims to help researchers and practitioners understand the field of local community detection with respect to the following aspects: • Taxonomy and Review: We propose a systematic taxonomy of existing local community detection techniques. For each class, we summarize and review representative work.
• Source Code, Tools and Datasets: We provide statistics on the availability of the source code of the papers studied and the datasets used. We have also compiled and compared the available and best documented tools that help researchers detect communities in networks, by type, platform, and license.
• Trends and Future Directions:As research in local community detection has seen an upsurge in recent years, we discuss current trends in the field and open problems for the future.
The rest of the survey is organized as follows. Section II contains basic definitions related to the current survey domain. In Section III, related surveys on the general problem of community detection are discussed. The classification of the existing LCD approaches is presented in Section IV. In Section V, well-documented tools for community detection are presented. Besides, commonly used datasets in LCD experimental evaluations are presented in the same section. The authors' insights over the Local Community Detection literature as well as future directions are discussed in Section VI. Finally, we conclude in Section VII.

II. DEFINITIONS
In the present work, we assume that a network G = (V , E) consists of a node set V = {1, . . . , n} where n = |V | are the nodes, and an edge set E ⊆ V 2 where m = |E| are the edges. In Table 1 we summarize main mathematical symbols used throughout the paper.
Definition 1: A Community C is defined as a sub-network of G in which the nodes inside C are more densely connected to nodes within C than to nodes outside C.
Definition 2: Seed Nodes or Seeds or Source Nodes are the nodes that define the community one wishes to discover.
Definition 3: A Local Community LC is defined as the community to which the seed nodes belong. Thus, a network G can be divided into LC and the rest of the network G−LC = U . Figure 1 shows the commonly accepted definition of the local community in a network G.
Definition 4: Community Detection is the procedure to find all communities C i with i ∈ Z forming the network G  and C i = V . Local Community Detection (LCD) is the process of finding only a subset of the communities which constitute the network. Thus, only a part of G is known at a local level and not the whole community structure of it.
At this point it should be noted that LCD is a term used in the literature for two similar problems. On the one hand, LCD refers to finding the community to which a seed node (or group of seed nodes) belongs. On the other hand, LCD refers to a method that uses local information to discover all communities in the network [19]. In the taxonomy of a following section, algorithms of both categories are presented, as they can both be used to uncover the community of particular node(s).

III. RELATED WORK
There are many surveys in the literature on the problem of community detection. However, there is no survey that focuses exclusively on methods for detecting local communities. Nevertheless, in some of these works there is a subsection describing the techniques of LCD, but this is not comprehensive. More specifically, Table 2 lists in chronological order some of the community detection surveys published in the last decade. In particular, Papadopoulos et al. [20] investigated the computational and memory performance of community detection algorithms specifically in social media. They studied six different local methods, focusing mainly on their scalability compared to global methods. In addition, Dhouioui Z. and Akaichi J. [21] reviewed community detection literature that can be used to detect overlapping communities. Under these circumstances, they dedicated one of the five algorithm categories to local expansion algorithms. Specifically, eleven different local community detection algorithms for overlapping communities were reviewed. Furthermore, Harenberg et al. [22] evaluated community detection algorithms with datasets that have ground-truth communities. Two of the thirteen algorithms evaluated focus on local expansion. Fortunato and Hric [7] gave a comprehensive overview of the problem of community detection in networks. However, this review did not specifically address local methods, but only mentioned five of them. Besides, Maivizhi et al. [23] investigated another aspect of the community detection problem, that of the tools used. In particular, the authors provided an overview of the tools available for community detection and mining. In addition, Javed et al. [24] surveyed community detection approaches and their related real-world applications. The authors categorized community detection algorithms into four different classes. Among these surveyed algorithms, the authors have selected some local algorithms that fall into the first four classes, so no section is devoted to local approaches. Moreover, Ma et al. [25] presented an experimental evaluation of local community metrics. More specifically, they divided the metrics into degree-based and similarity-based ones and conducted experiments on eight different metrics. Besides, Dakiche et al. [26] presented a classification of community evolution tracking methods in dynamic social networks into four main categories. The authors discussed three approaches for detecting local communities that fall into these categories. Furthermore, Dilmaghani et al. [19] proposed a scheme to explore the concept of locality at each step of a community detection process. The authors suggested a four-step community detection process (flow) and reviewed works that follow this flow. More specifically, they reviewed nineteen different works that use the concept of locality in one or more steps of their flow. Huang et al. [27] presented a survey of approaches to community detection in multilayer networks. The work discussed included three local approaches for multilayer networks. Souravlas et al. [18] proposed a three-class categorization of community detection methods in social networks. Among the discussed methods there were some local approaches. Furthermore, Meena et al. [28] survey also presented community detection techniques applied to social networks. They also reported other applications where community detection is used. The methods discussed are divided into three categories. A few local approaches that fall into these categories were presented. More recently, Su et al. [29] presented a survey of deep learning methods for community detection. These methods were divided into three categories. Popular datasets and evaluation metrics were also discussed.
The overview includes a discussion of a corresponding local method.
All this related literature present indicative methods for local community detection but they are very far from covering it completely as intended in this survey. Therefore, there is a big gap with respect to the extent and depth of the coverage of local community detection methods from the existing survey papers on community detection that this survey aspires to fill. Of course, there is an undoubtable connection between the general concept of community detection discussed in the related work and LCD, but there are also important applications and specific methods that exploit aspects of local approaches. Thus, there is a need for an overview of methods related to local community detection.
To the best of the authors' knowledge, the present study is the first comprehensive study dealing exclusively with local community detection techniques in networks.

IV. LOCAL COMMUNITY DETECTION TAXONOMY
It has been quite a while since Clauset A. [17] firstly proposed the formal definition of local community detection in networks in 2005. Subsequently, many approaches have been proposed until today making LCD a very active research problem in networks. In this survey, we provide an overview of the work that has been proposed on this problem since the introduction of LCD in 2005. However, we mainly focus on works from the last decade (2012-2022), since they are the most widely used today. We also present the most popular of the earlier approaches (2005)(2006)(2007)(2008)(2009)(2010)(2011).
These methods can be classified into different groups considering various criteria. Therefore, the classification of the approaches is not absolute, but differs with respect to the aim of the taxonomy. Algorithms proposed to detect local communities are classified here according to the type of networks in terms of static, temporal/dynamic or fully streaming ones. Besides, we classify the methods according to the type of approach to detect local communities: greedy or non-greedy ones.
The three categories of local community detection algorithms in terms of network type are: 1) the methods used in static networks, 2) the techniques used in dynamic/temporal networks, and 3) those used in fully streaming networks. Static networks are those whose components, i.e., nodes and edges, remain the same. Dynamic or temporal networks are networks that change over time. Thus, both nodes and edges can be added or removed, resulting in evolving communities. For both forms of networks, the initial or past information about them is always available. More precisely, for the static networks the former is self-explanatory. For the dynamic networks, this means that all past information about the network is available when the network changes. In contrast, edges in graph streams are only available once. More specifically, when an edge e i is processed in stream S, the past edges cannot be accessed again, and the subsequent edges in the stream are unknown, i.e. ∀j = i, e j is not accessible when e i arrives in stream S.  Figure 2 shows the general taxonomy of local community detection techniques that the present survey suggests. Table 3 briefly presents all works reviewed for the survey, classified according to the proposed taxonomy of Figure 2. We have used the same colors for each class as in Figure 2 to help the reader understand in which class each work is classified. The arrows show the flow of further classification.

A. TECHNIQUES FOR STATIC NETWORKS
Techniques related to static or dynamic networks can be divided into two main groups of algorithms in terms of the approach they follow to expand a community. These groups are greedy and non-greedy. Regarding greedy algorithms, these methods focus on the technique used to add nodes: A node is selected for inclusion in a community if and only if  that node has the maximum value of a quality function that assigns a value to each node [36]. The majority of existing algorithms belong to this category. The group of non-greedy algorithms includes techniques that do not attempt to greedily expand a community, but use various methods to find local communities. Most algorithms in this category use random walks.

1) GREEDY TECHNIQUES
In greedy techniques there is a general scheme that the algorithms follow and describes the knowledge of the graph at each step. This scheme consists of three main steps and is depicted in Figure 3. LC represents the local community under investigation in a network G, and is usually initialised with the starting node(s). The set N contains nodes adjacent to at least one node of LC. Nodes that are not adjacent to any element of LC form the set U , the unexplored nodes. Apart from the first set classification, LC can be further divided into two subsets: LC core , which contains nodes that have no neighbour in the set N , and LC boundary , which contains nodes with at least one edge to a node in the set N . Thus, LC core = LC −LC boundary . Greedy local community detection methods are also called greedy seed expansion methods. In the following, we provide the description of the basic greedy scheme for LCD based on Chen et al [45].
1) Initialize LC with seed node n 0 2) Initialize N with the neighbours of n 0 3) The value of the quality function Q of the initial community is set to 0 4) Find the node n ∈ N which maximizes the quality function used in the algorithm 5) If the insertion of the above node to the set LC increases the quality criterion, then the node is moved from N to LC and the subset N is updated 6) The former process is repeated until there is not any node n whose inclusion to LC increases the quality function 7) (Optional step) Check LC for nodes that need to be removed according to a filtering method 8) Now LC contains the local community of n 0 The former process can be generalized to more than one seed nodes.
In general, there are two main problems to solve in greedy techniques: 1) starting node selection and 2) the quality criterion [47]. The first describes the problem of which node should be selected as the starting node (seed) to begin the process of discovering a local community. The second concerns the quality criterion that is checked after each node is added to the community. This criterion is also crucial for the termination of the process, i.e. when to stop adding nodes to the community.
Based upon the two main problems arising in greedy techniques for LCD, we categorize them in two groups; a) greedy methods for LCD which try to improve the node selection process IV-A1.a, and b) greedy methods for LCD which try to optimize the quality function. IV-A1.b

a: METHODS FOCUSING ON NODE SELECTION
This category contains algorithms that focus mainly on optimizing the selection of the seed node(s). There are two main problems that arise when using greedy seed expansion, and both are related to seed node(s) selection. The first problem is the seed-dependent problem. It means that the location of the seed node(s) affects the quality of the detected local community. The second problem is known as seedinvalid problem, which means that many LCD algorithms cannot ensure that the seed node(s) participate in the final detected local community [37]. The methods in this category follow different approaches to selecting the starting node, with a view to overcome the above problems. Most of these approaches focus on finding important nodes that are good representatives of the community to which they belong.
Bagrow et al. [30] aim to better represent the actions that members of a network would take to identify their own communities. To achieve this, they propose to start the LCD process from an l − shell where l is the distance from starting node to all shell nodes N . The suggested method is based on a parameter, α, which controls when to stop the spread of the l − shell. The performance of their approach highly depends on the parameter l as well as on the starting node, since the resulting communities differ a lot if the algorithm starts from border nodes. The speed of the algorithm is also very low [10]. Experiments are conducted only on small networks, i.e. the largest one is the polbooks dataset [103] with 105 nodes.
Xu et al. [12], acknowledge the problem of the limited ability to deal with large-scale networks suggesting a method of community discovery based on the seed expansion, which is inspired by the ego-centered theory. More specifically, they suggest dividing the network into k communities and find the k-top leader nodes. These k-top leaders are then selected as the seed nodes. In each iteration the extension can start with these new seed nodes. Nodes are added in the community if its similarity (calculated based on common neighbors) with the community is above a threshold σ . They only experiment with real networks, and besides discovering the local community, their method also finds the linked nodes that are loosely related to the community. A conclusion made by the authors about their method is that threshold σ has a great impact on the community size, i.e. the largest the threshold, the smaller the community. The largest dataset (Sina micro-blog [104]) that they experiment with has 310.000 nodes.
Another method that focuses on the starting node is the one proposed by Zhang et al. [31]. The authors' motivation is the fact that if the position of the starting node is at the boundary of the communities, the local community structures detected have low accuracy. Thus, they propose to count the degree centrality of each node in the communities and select the nodes with maximum degree as core nodes. The first step is to find the communities of the network and then identify the core nodes of each community. From these nodes, they start the LCD process to obtain the new communities of core nodes. They choose to test their method on real networks. However, sometimes their proposed method fails to detect communities with high accuracy. The largest dataset that they experiment with is Political blogosphere [105] with 1733 nodes.
Moradi et al. [33] propose a different parameter-free method for finding better seed nodes. In particular, the authors suggest a seeding algorithm that first computes the similarity score of each node based on link prediction, and then it uses a biased graph coloring algorithm to improve the seeding process. There are various proposed similarity scores but they prefer using metrics that require only access to the node's neighborhood. The experiments performed on real networks of products, collaboration and social networks, have shown that the proposed method successfully detects communities of good quality in less time than without their proposed seeding procedure. The largest dataset that they use to test their method is a dataset collected by the authors from Soundcloud with 5187722 nodes.
Furthermore, Xia et al. [10] suggest the ILCDSP algorithm with a view to overcome the seed-invalid problem. This algorithm uses a selection probability for the candidate nodes at each step, so that the nodes with a high selection probability are more likely to be included in the community. They experiment with both synthetic and real networks. The largest network that the authors use to evaluate their method is a synthetic one generated by the LFR method [106] with 5000 nodes. Compared to Clauset [17] and Luo et al. [44] the proposed method achieves better overall F1 score, precision and recall values for the small number of real networks that were used. Although their suggested algorithm improves the accuracy of community detection, it is not stable enough in terms of not achieving the best precision values in all datasets.
Moreover, Fanrong et al. [34] discuss the seed-invalid problem, where the seed node sometimes does not participate in the detected community. Thus, they propose to start community expansion not from a single node, but from a maximal clique containing the seed node and suggest an algorithm named Local Community Detection algorithm based on Maximum Cliques extension (LCD-MC). Their experiments with both synthetic and real networks show that the proposed method can detect high-quality communities. The largest datasets that the authors use are synthetic generated by the LFR method [106] with 5000 nodes each. More precisely, their method compared to Clauset A. [17] and Wu et al. [48] presents better results in terms of F1 score and NMI (Normalized Mutual Information) [107].
Hamann et al. [36] also suggest to start the expansion of the community from the largest clique in the neighbourhood of the seed node. Besides, they introduce Triangle-based Community Expansion (TCE) as an alternative strategy for greedy community expansion. Their method utilizes the fact that edges inside communities are usually embedded in triangles. Thus, the authors propose triangle-based edge evaluation to decide which node should be added to the community in each step. The proposed approach is applicable to both unweighted and weighted networks and overlapping communities. They conduct experiments with synthetic as well as real networks and compare their method with other algorithms, including Infomap [108], a global community detection algorithm. They are comparing their method to seven different local community detection approaches. Four of them are density based ( [45], [71], [109], [110]) and the others ( [34], [46], [53]) start the LCD process from a maximal clique (e.g., triangles). Their largest datasets are from Facebook [111] with at most 41536 nodes. They conclude that any algorithm which assumes a maximal clique rather than a single node, leads to higher quality communities in terms of F1 score metric, especially in real networks. However, the choice of the expansion algorithm should always depend on the network type. In addition, the source code of their proposed method is publicly available.
Besides, Ding et al. [37] aim to overcome both the seed-dependent and the seed-invalid problems, and suggest replacing the seed node with the core member of the target community. The core node is a node that has the strongest relationship with the starting node and also higher centrality than the starting node. Next, the core member is taken as the initial community and then they extend it based on community relation strength. Their method is called RTLCD. The authors conduct experiments on both synthetic and real networks and find that their suggested method is more robust to the seed-dependent problem and the seed-invalid problem compared to [17], [44], [45], [48], [34], and [112] in terms of F1 score, NMI and NCR (Node Coverage Rate). Furthermore, their method tends to find more ground-truth community members. The largest dataset that this method is tested on is the DBLP network [113] with 317080 nodes.
Tasgin et al. [38] focus on the boundary nodes of the communities and propose an LCD method based on label propagation. In this method, after identifying the boundary nodes between communities and ranking them, a node with the highest score among its neighbors propagates its label and finally, the communities are extracted from the network. The suggested method is better than existing ones in terms of NMI only when the communities are subtle. However, experiments with synthetic and real networks (co-purchase, collaboration and social) revealed the main weaknesses of this method, namely instability in determining the final communities and weak performance in identifying a reasonable number of communities. The largest dataset that this method is tested on is the youtube network [113] with 1134890 nodes. The source code of the proposed method is publicly available.
In addition, Xu et al. [39] present a distributed LCD algorithm name DLCD-CCE, based on community center extension for large complex (weighted) networks. Their motivation is to provide an algorithm to an as wider as possible scientific and analytic community. The algorithm is evaluated using a Spark-based prototype system to verify its accuracy and scalability. The results of experiments on co-purchase, social and collaboration networks, show that DLCD-CCE has better accuracy, stability, and scalability compared to typical local community detection algorithms like [17], [44], and [32], in terms of precision, recall and F1 score, and effectively overcomes the problem that existing algorithms are sensitive to the location of initial seeds. The largest dataset that this method is tested on is the youtube network [113] with 1134890 nodes.
Guo et al. [8], propose the InfoNode algorithm aiming at overcoming quality and stability deficiencies in overlapping community detection. It uses local degree central nodes and Jaccard coefficient to detect core members in the network. The detected core members are then treated as seed nodes, which guarantee that they are central nodes of the communities. Next, the node with the highest degree among the seed nodes applies the fitness function strategy for pre-expansion. In the last step, the top k nodes with the best performance in the pre-expansion process are expanded by the fitness function with internal force between nodes. Three parameters have to be defined to use InfoNode: α (used to control the scale of communities detected), k (used to select the top k nodes that have the best performance in the pre-expansion process), and (used to select seeds in the network). Experiments with synthetic and real networks have shown that InfoNode can accurately discover communities. The largest dataset that this method is tested on is the CA-GrQc network [114] with 14845 nodes. More precisely, InfoNode was compared to several methods like [109], [32], and [75], upon many different datasets (social, co-purchase, power, collaboration), and was found that it can accurately uncover local communities in terms of EM (extension of Modularity) [115] and NMI. VOLUME 10, 2022 Furthermore, Liu et al. [16] propose the algorithm HqsMLCD to detect multiple overlapping communities for a given starting node. Their motivation is to define an LCD method that is not sensitive to the position of the seed node but it is sensitive to the local structure of the seed node. HqsMLCD first finds seeds of higher quality than the given seed node, and then expands these seeds sequentially to obtain multiple communities that are likely to overlap. They test their method on real networks and find that it successfully discovers high-quality communities (in terms of F1 score), compared to [75], [80], and [116]. They experiment with three real datasets (co-purchase, collaboration and social), the largest of which is the LiveJournal network [113] with 3997962 nodes.
In addition, a recent approach by Aghaalizadeh et al. [41] proposes a local and deterministic method named LCDR for detecting communities in social unweighted and weighted networks using core nodes. The authors are motivated by the fact that the quality of the detected communities depends on the selected important nodes as community cores. In this method, core nodes of the networks that are responsible for influencing other remaining nodes are detected and then communities are formed around these nodes using a Sorensen similarity index. The core nodes are selected according to their importance, which is calculated considering the first and second degree neighbors of each node. The experiments performed on both synthetic and real networks (co-purchase, collaboration, social) show that the proposed method is stable and detects high-quality communities quite quickly, compared to other approaches like [108] and [38], in terms of modularity and NMI. The largest dataset that this method is tested on is the youtube network [113] with 1134890 nodes.
Finally, Hu et al. [42] propose three hybrid local centrality measures which combine degree and denseness of node neighborhood to evaluate how good a node is as a seed, and then apply them to a local algorithm for selecting seeds in networks. They focus on local seeding because of its efficiency. Experimental results with the proposed method, called S-LM, in both synthetic and real networks show that their method performs well. However, it is inferior to using the conductance centrality in networks where the difference between degrees is small and the community structure is clear. The largest dataset that this method is tested on is the LiveJournal network [113] with 3997962 nodes. The results of the experiments were evaluated with F1 score and overlapping modularity [117].
Further research papers that belong in this category include Chen et al. [32] (LMD method), Wang et al. [35] (MAGA-LC method), Luo et al. [40] (LCDNN method) and Ji et al. [43] (CAELCD method) which are not analyzed due to the similarity with the previously discussed methods.

b: METHODS FOCUSING ON QUALITY FUNCTION
The local community approaches of the present category focus mainly on community valuation. More precisely, the starting node of the local community we want to discover is already given. The goal is to extend the community beyond the starting node by optimizing a quality function.
Clauset A. [17] was the first to propose an approach for LCD, motivated by the fact that in many cases there is a lack of global knowledge of a graph's topology. Thus, the author propose a measure of community structure independent of global properties, the local modularity R. This metric represents the fraction of boundary edges which are internal to the local community. The proposed algorithm merges the neighbour nodes one by one, that increase the value of R, until the community size reaches a predefined size. This means that it is not a parameter-free approach. The suggested method is used in both synthetic and real networks (co-purchase, collaboration). The largest dataset that this method is tested on is the Amazon.com recommender network with 409687 nodes. The discovered communities usually have high recall but low accuracy (compared to the ground-truth communities), as the algorithm is sensitive to the position of the source node. It is also possible for this method to identify multiple communities for the seed node as it explores the network. Furthermore, the resulting community may contain many outliers.
Chen et al. [45] propose the metric L to decide whether a new node is going to be added in the local community, aiming at avoiding outliers. This metric is the ratio between the community internal relation (based on node degrees), to the community external relation (based on node degrees). Based on this metric, a two-phase algorithm which does not require initial parameters, is suggested to identify the local community of starting nodes. Experiments conducted on real networks (football, co-purchase) prove the accuracy and the effectiveness of the proposed method compared to the ground-truth communities in terms of precision, recall and F1 score. The largest dataset that this method is tested on is the Amazon.com network collected by [44] with 585283 nodes. However, when the starting node is on the boundary of another community, the proposed method may not recognize its proper community. LTE algorithm is suggested by Huang et al. [46] based on a quality metric, named similarity-based tightness. The authors focus on avoiding a priori assumptions of network properties and predefined parameters like other methods. Tightness metric consists an adaptation of the cosine similarity function and computes the internal similarity between nodes in a community. The proposed method can be used in both unweighted and weighted network as well as for overlapping and non-overlapping communities. Experiments on both real (social, football, co-purchase) and synthetic networks showed that the suggested approach achieves good performance in efficient time compared to methods [17], [44], [109] in terms of precision, recall, NMI and Generalized Normalized Mutual Information (GNMI) [109]. The largest dataset that this method is tested on is the Amazon.com network collected by [44] with 585283 nodes. However, the suggested method is seed-dependent in networks without a clear community structure.
Branting et al. [47], focus on improving node selection in the community expansion phase independently to the termination and filtering stages. The authors compare different local community detection methods, which are classified into two categories, xenophobic and non-xenophobic: the former try to maximize (or minimize) internal (or external) connectivity, while the non-xenophobic algorithms discard external connectivity. They also propose an evaluation criterion for LCD algorithms that takes into account the relative centrality of nodes within the target community, Normalized Utility-Weighted Recall (NUWR). After an experimental evaluation with synthetic and real networks (power, collaboration, football, social), the authors conclude that there is no single LCD algorithm, but instead algorithms should be selected based on the properties of the graph and the nature of the community to be detected. The largest dataset that this method is tested on is the Western US Power Grid [118] with 4941 nodes.
Moreover, Wu et al. [48], propose a method to determine the local community structure by analyzing the link similarity between the community and a node, motivated by the fact that most existing methods are dependent on seed node and impose too strict a policy in the expansion phase. Link similarity of a node is regarded as the ratio between the number of common edges of the node and the community and the number of neighbors of the node. Inspired by the fact that elements in the same community are more likely to have common edges, the community structure is explored heuristically by prioritizing nodes that have high link similarity with the community. To improve the quality of the community structure, a three-step process is also used. Experiments with both synthetic and real networks (football, citation) show that the suggested method detects stable and high-quality communities compared to [17], [44], and [45] in terms of F1 score, although it is only suitable for undirected networks. The largest dataset that this method is tested on is the Cora Citation Network [119] with about 17000 nodes.
Ngonmang et al. [49] try to optimize a well-known quality metric, modularity, in order to detect local overlapping communities, overcoming the problem of failure when the starting node is at the boundary of community. More specifically they are based on the work of Chen et al. [45], which suggests a metric that considers the densities of intra-community edges and outer edges rather than their number, and try to optimize it. The problem with the former method is that it may not recognize the local community of a starting node that is on the boundary of another local community. The method that is suggested here is called IOLoCo. Experiments with synthetic and real networks (collaboration, co-purchase, social) show that their proposed method can be successfully used to improve recommendation systems in social networks, compared to [17], [44], and [45] and evaluated by performance index [120]. The largest dataset that this method is tested on is the Amazon.com network collected by [121] with 548552 nodes.
Furthermore, a method called GMAC for LCD is presented by Ma et al. [51], focusing on the seed-depended problem. This method estimates the similarity between nodes via examining the neighborhood of nodes and reveals a local community by maximizing its internal similarity while minimizing its external similarity. The Compactness-Isolation (CI) Metric is used to decide the inclusion of new nodes in the local community. According to this metric, the nodes of a good local community should have high internal similarities and low external similarities, resulting in a high CI value. The authors conducted experiments on synthetic and real networks (social, football, collaboration, power, co-purchase) and proved that their method is more insensitive to seeds than approaches like [46] and [17]. In addition, the suggested method discovered local communities more accurately in terms of precision, recall and F1 score. The largest datasets that the authors use are synthetic by the LFR method [106] with 10000 nodes each.
Moreover, Fagnan et al. [53] focus on accurately identifying communities, outliers, and hubs in social networks. The main component of the algorithm is the T -metric, which evaluates the relative quality of a community by considering the number of internal and external triads (i.e., 3-node cliques/triangles) it contains. Intuitively, this metric favours nodes that form many triads with nodes inside the community and few triads with nodes outside the community. Experiments on real networks with ground truth communities (copurchase, football, social, politics) verify that the proposed method performs as well as, or in some cases better than, state-of-the-art methods like [17], [44], and [45], in terms of Adjusted Rand Index (ARI) [122]. The largest dataset that they experiment with is Political blogosphere [105] with 1733 nodes. The authors consider their method applicable to directed networks too, even though they did not experiment on such networks.
Besides, Wu et al. [54] examine an interesting aspect of LCD methods: the free-rider effect. This effect is related to the observation that most existing metrics tend to include irrelevant subgraphs in the detected local community. More precisely, they study the existing goodness metrics and provide theoretical explanations for the causes of the free-rider effect. They also develop a query-oriented node weighting scheme to reduce this effect, and a corresponding algorithm for LCD. Experimental results in synthetic and real networks (co-purchase, social, collaboration) show that the proposed method efficiently reduces the free-rider effect and discover ground-truth communities more accurately in terms of F1 score when compared to other approaches like [17], [44], [71], [79]. The largest dataset that they experiment with is Friendster [113] with 65608366 nodes. Besides, the authors propose two variants of their suggested method for overlapping and multiple communities respectively.
Xia et al. [56] propose a new local modularity metric G and a two-stage algorithm for LCD based on it, considering networks lacking global information. The method they follow adds nodes to the community until G stops increasing. G is defined as the ratio between the number of internal edges in the community and the sum of internal edges in the community, edges with only one endpoint in the community, and the number of edges between the neighboring nodes' set of the community. The suggested method was tested on synthetic and real networks (football) and proved to be quite effective in terms of precision, recall and F1 score compared to [17] and [44] methods. The largest dataset that this method is tested on is NCAA football network [9] with 179 nodes.
In addition, Kanawati et al. [57] explore and evaluate the combination of different local modularity functions to identify the node-centric communities in weighted networks. Various ensemble-based approaches are proposed and implemented, including approaches based on ensemble rankings and ensemble clustering. Experiments are conducted on synthetic and real networks (social, co-purchase, football) with ground-truth communities and show that ensemble ranking-based approaches outperform other approaches considering ARI and NMI values. The largest dataset that the authors use is a synthetic one by the LFR method [106] with 2000 nodes.
Zhao et al. [58] propose an approach for LCD via edge weighting. More specifically, they first design a new measure of node similarity considering the weights of neighbouring nodes. They also develop an edge weighting method based on this similarity measure. Then, they define a new goodness metric, Closeness-Isolation (CI), to quantify the quality of the local community by integrating the edge weights. In this algorithm, they discover the local community by giving priority to those shell nodes that have maximum similarity with the current local community. The CI metric is defined as the ratio between the sum of the weights of all edges in the community and the sum of the weights of all edges with one node in the community and the other in the shell. The suggested approach has been evaluated with both synthetic and real networks (social, football, co-purchase) and achieves good performance in terms of precision, recall and F1 score, compared to [17], [44], and [51]. However, the real networks used are quite small. The largest dataset that this method is tested on is NCAA football network [9] with 179 nodes.
Liu et al. [59] are motivated by the fact that there is no algorithm that guarantees finding an optimal local community structure and develop a method for LCD with a given starting node. This method has as its first step finding the most similar node adjacent to the starting node and forming the initial local community C together with the starting node. The similarity is calculated based on the number of common neighbours between the two nodes. Then, the connectivity degree of the nodes belonging to the neighbours of C is calculated and the node whose connectivity degree is maximum to C is added when the modified local modularity measure is increased. The modification refers to the exception when the external edges of a community are zero. More precisely, the local modularity proposed by Luo F. et al [44] is defined as follows: where E in is the number of edges with two endpoints in C and E out is the number of edges with one endpoint in C. Liu J. et al [59] modify the local modularity as: The proposed method was tested on synthetic and real networks (social, football, co-purchase, politics) and, in terms of precision, recall and F1 score, exhibited good results compared to [17] and [44]. The largest dataset that this method is tested on is the Amazon.com network collected by Liu et al. [44] with 585283 nodes.
Besides, Interdonato et al. [60] propose the first method for LCD in multilayer networks (ML-LCD). Multilayer networks model multiple but different interactions among nodes. For example, in social computing, a person often has multiple accounts in different social networks, and in fact, nowadays it has become important to link distributed user profiles of the same user from multiple platforms. Therefore, in the presented work, a greedy algorithm is proposed to find a community shared by all networks. The authors provide three definitions of the objective function of the ML-LCD problem, corresponding to different ways to incorporate intra-layer and cross-layer topological features. They also provide the source code of the method, which is also tested on seven real networks (transportation, biological, collaboration and social). The suggested approach was compared with both single-layer LCD methods, ( [45] and [13]), as well as multilayer global community detection methods. The results show that the method successfully detects multilayer local communities. The largest dataset that this method is tested on is the DBLP network [123] with 83901 nodes.
Moreover, Luo et al. [61] propose two LCD algorithms, DMF_M and DMF_R, based on the local modularity metrics as suggested by Luo et al. [44] and Clauset [17], respectively. The former methods consider the characteristics of the local community during its creation process. These two algorithms divide the detection process into three stages and employ different dynamic membership functions for each stage to find local communities. The results after experiments on synthetic and real networks (social, co-purchase, football) show that the detected local communities are of high quality compared to methods of Luo et al. [44] and Clauset [17], in terms of precision, recall and F1 score. However, the proposed method is only applicable to undirected networks, and the algorithm is sensitive to seed node's position. The largest dataset that this method is tested on is the Amazon.com network [113] with 334,863 nodes.
Given a node that belongs to multiple communities, the approach of Ni et al. [62], focuses on finding the communities to which it belongs according to local information. The authors propose a framework named LOCD, for overlapping communities in three steps: first, a group of nodes in different communities to which the node belongs is identified, then the representative members of the group are selected, and finally the communities to which they belong are determined.
Experiments with synthetic and real networks (co-purchase) lead to high quality communities considering precision, recall and F1 score, compared with methods like MULTICOM [75]. LOCD is proved to be simpler and more efficient than MUL-TICOM. The largest dataset that this method is tested on is the Amazon.com network [113] with 334863 nodes.
Furthermore, Li et al. [63] investigate the LCD problem in multi-layer networks concerning the trust relation. The framework proposed by the authors is called MTLCD and is based on the selection of nodes to be included in social network communities according to the trust relation. Users who are trusted can influence others according to their preferences, since they want to connect with influential and trustworthy people. Through experiments with real multilayer networks (biological, satelite, social, mobile network), the results show that the proposed algorithm is highly competitive for multilayer social networks compared to other methods in terms of modularity. However, in nonsocial networks, where the trust relation between nodes cannot be computed well, the results deteriorate as the social trust increases. The largest dataset that this method is tested on is the Mobile QQ Zone network [124] with 562062 nodes.
An LCD algorithm based on local modularity density (LCDMD) is proposed by Guo et al. [64], with a view to limit sensitivity to seed node selection and unstable communities problems. More specifically, the algorithm divides the process of local community formation into a core area detection phase and a local community expansion phase according to the density of the community tightness based on the Jaccard coefficient. In the core area detection phase, the modularity density is used to ensure the quality of the communities. In the local community expansion phase, the influence of the nodes and the similarity between the nodes and the local community are used to determine the boundary nodes to reduce the sensitivity to the selection of the starting nodes. The source code of the proposed algorithm is publicly available. Experiments with synthetic and real networks (social, co-purchase, word, collaboration, transportation) lead to high accuracy and stable communities in terms of precision, recall, f1 score and conductance, compared to several methods like [14], [17], [44], [116], and [75]. The largest dataset that this method is tested on is the roadNet-CA [125] with 1965206 nodes.
More recently, the Hint Enhancement Framework (HEF) has been proposed by Baltsou et al. [65]. In particular, this framework provides an efficient method for detecting better quality local communities of predefined important nodes called hints. In many cases the choice of seed(s) incorporates external knowledge that attaches to these nodes an additional importance for their community. This knowledge, may be derived from an expert on the domain, or may arise from the network's side information and it constitutes the motivation for the this work. HEF applies a two-step procedure to discover the community of hints: 1) it modifies the network by enhancing the hints using reweighting or rewiring strategies to materialize the preference, and 2) it applies local community detection algorithms to the modified network from step 1. The proposed method is applied on both unweighted and weighted networks. Extensive experiments with synthetic and real networks (social, collaboration, biological) prove the accuracy of the framework to detect local communities compared with local methods like [36], [46], [126] and a global one [108], considering precision, recall and F1 score. The largest dataset that they experiment with is Friendster [113] with 65608366 nodes.
Shang et al. [66] propose an interesting approach to extend local communities, with a view to solve the problem of poor algorithm results caused by low-quality seeds. The algorithm suggested is called HSEI and is based on higher-order structure and edge information. More specifically, first, different ways of selecting the first node to join a local community are used depending on the motif degree of the seed. Second, a new motif-based modularity function is proposed to extend the local community so that the extended community is more tightly connected. A new motif-based central node of the community is defined to extend the central part of the local community. For the edge of the community and the area with sparse connections, the edge information is used to mine the membership strength between nodes and communities to obtain more complete members of the local community. They experimented on various real networks including social, collaboration, co-purchase and www networks. The largest dataset that this method is tested on is the Live-Journal network [113] with 3997962 nodes. The approach was compared with others like [13], [37], [61], and [40] in terms of precision, recall and F1 score, and outperformed the other methods even when the seed nodes were considered of low quality.
Further research papers that belong to this class include Kudvelka et al. [50], Cui et al. [52], Chang et al. [55] which are not analyzed due to the similarity with other, previously discussed works.

2) NON-GREEDY TECHNIQUES
This group includes algorithms that find local communities in ways other than greedily adding a node at each step. Such techniques require a stopping criterion to define the boundary of the community. Conductance is a widely used metric for determining the boundary of a community. Methods that use conductance try to minimize it, since it measures the fraction of edges that leave the community. Conductance is defined for a community C as [101]: The conductance minimization is an NP-hard problem [127] thus non-greedy techniques try to approximate a solution (e.g., [81]). The techniques used in non-greedy LCD are categorized into Flow-Propagation (FP) and Random-Walk (RW) based ones. The methods of this category work as follows: the seed node emits a stream that shares flow with the adjacent nodes, i.e. its neighbours. Each of these nodes, stores a flow to pass on to its neighbors but may also return a part of the flow to the first node. The probability p(x) of a node to belong to the same community as the seed node s, is analogous to the stored flow of the node x. This probability is: the shortest path distance between node x and s, ρ the average ratio of local links to node degree value and n(x) the neighbours of node x.
The work of Orecchia et al. [67] presents the first local graph-partitioning algorithm that combines flow and spectral methods. The authors propose this combination to achieve better results when searching for low conductance cuts. In particular, they show how to locally find a Õ( 1 γ ) approximation to the conductance given a starting set that overlaps the cut by a γ fraction. Their approach can be generalised to weighted networks. Although this method is theoretically efficient, it relies on a complicated variation of Dinic's algorithm [128], which is difficult to implement in practice.
The FlowPro algorithm is proposed by Panagiotakis et al. [68] with a view to provide a useful community detection tool for a simple user of a social network, which is impossible to know the entire graph structure. More precisely, in each iteration of the main process of the algorithm, the starting node propagates a flow that is shared among its neighbors. Each node can store a portion of the received flow, propagate it to its neighbors, and send it back to the starting node. When the algorithm converges, the flow stored in the nodes belonging to the community of the initial node is usually higher than the flow stored in the rest of the nodes of the graph, since the stored flow of a node is analogous to p(x), and thus forms the desired community. The algorithm does not require access to the entire network to proceed. FlowPro has the additional ability to remove and add edges to s to increase d(x) for nodes x that do not belong to C(s) (e.g., removing bridges) or decrease d(x) for nodes x that do belong to C(s). This feature increases the convergence and performance of FlowPro, as experiments with many synthetic (208 synthetic networks in total) and real networks (collaboration social, www) have shown, compared to [109]. FlowPro detected communities more accurately and efficiently than [109]. Besides, FlowPro seems to perform better against the seed-invalid problem than methods like [109]. The source code of the proposed method is also provided. The largest dataset that this method is tested on is the WWW network with 325730 nodes. The proposed method was tested also on networks with low overlapping communities and can be extended to weighted networks.
Veldt et al. [69] introduces the SimpleLocal algorithm that begins with a reference set reflecting an important part of the network and seeks for a better conductance set nearby. Their method is simple to implement and can take advantage of many maximum flow algorithms [129]. Experiments were conducted on real networks (collaboration, biological) and accurately discovered local communities considering precision and recall. However, the suggested method relies on repeatedly solving numerous maximum flow problems, without taking advantage of the similarity between consecutive flow problems. An improvement on the SimpleLocal algorithm, called FlowSeed, is presented in Veldt et al. [70] by the same authors. In this work, the authors develop a framework that allows users to place strict constraints and soft penalties on excluding specified seeds from the result set, depending on the user's level of confidence for whether or not each node should belong to the result set.Experiments on real networks (social, co-purchase) show the robustness of the proposed method compared to others like [71], [77], and [69], in terms of F1 score and execution time. The largest dataset that these methods were tested on is an MRI network [130] with approximately 18000000 nodes.

b: RANDOM-WALK BASED TECHNIQUES
The general idea of RW-based methods is that if a given network has a community structure, a random walker should be trapped in a community for a relatively long time before leaving it. This arises from the high density of edges within communities and the sparse connections between communities. In the algorithms in this category, the paths of random walkers are repeatedly sampled around the starting node(s) until a convergent probability distribution for the visits of random walkers to the nodes is obtained. Then, this probability distribution is considered as a measure of the similarity between the starting node and its neighbors. Nodes with higher similarity values indicate that they are more likely to be placed in the seed community [131]. Most algorithms in this category can be classified into three main groups: 1) PageRank, 2) Heat-kernel and 3) Local spectral based.

c: PAGERANK BASED TECHNIQUES
Personalized PageRank (PPR) is based on a random walk with restart and is the most commonly used score for nongreedy LCD. Given a parameter α ∈ (0, 1), it is assumed that a random walk X 0 , X 1 , . . . starts uniformly at random from the starting set S and at each step moves from a node u to a node v with probability a A uv d u and restarts from a uniformly at random selected node of S with probability 1 − α. A is the adjacency matrix of the network and d is the degree of a node. For all t ≥ 0 and v ∈ V it holds: There is a unique stationary distribution p for the Markov chain (X t ) t≥0 , which is the limiting distribution of X t when t → ∞. This distribution satisfies the linear system: The vector p is known as the Personalized PageRank (PPR) associated with the seed set S. However, solving the above linear system is computationally inefficient [75]. Therefore, methods for approximating PPR have been proposed in the literature. In practice, experimental results show that a few iterations of the above fixed point equation are sufficient to obtain a very good ordering of the nodes [74]. In general, the methods in this category attempt to solve the conductance minimization problem locally, where the running time depends only on the volume of the output set.
Andersen et al. [71] suggested a method called PageRank Nibble that improves the work of Spielman et al. [132]. More specifically, they suggest a local partitioning algorithm using a variation of PageRank with a specified starting distribution and show that the nodes' ordering produced by a PageRank vector, reveals a cut with small conductance.
An evaluation of different variants of the PageRank method can be found in Kloumann et al. [74] focusing on heuristics that one can control in practice. In particular, the authors show that standard PageRank performs better than degree-normalized, personalized PageRank (DN PageRank). DN PageRank is adopted by several competing PageRank-based community detection methods, including the PageRank Nibble method by Andersen et al. [71]. Their approach is evaluated using real networks with ground-truth communities (collaboration, co-purchase, social) and the authors conclude that almost all PageRank performance improvements result from only two or three iterations of the PageRank update rule. The largest dataset that this method is tested on is the youtube network [113] with 1134890 nodes. However, the highest value of recall is achieved when a large proportion of target community nodes is used as seeds, e.g., 10% in the case of the collaboration network.
A different approach is proposed by Yin et al [14] where the Motif-based Approximate Personalized PageRank (MAPPR) algorithm is presented as an adaptation of the classical Approximate Personalized PageRank (APPR) method. The aim of this work is to find the local community that the seed node belongs, which has the minimal motif conductance, a generalization of the conductance metric tailored to network motifs. The authors consider as motif any small connected graph (such as a triangle). Experiments on both synthetic and real networks (collaboration, co-purchase, social) show that the proposed framework MAPPR successfully detect local communities compared to the ground truth, in terms of precision, recall and F1 score. The largest dataset that they experiment with is Friendster [113] with 65608366 nodes. The suggested method can be applied to directed networks too.
Moreover, Hollocou et al. [75] propose MULTICOM, a method to detect multiple local communities around a given seed set that may overlap. MULTICOM finds multiple local communities by iteratively finding new seed sets and determining local communities on that basis. The algorithm, whose implementation is publicly available, consists of three main steps. In the first step, the local algorithm is used to find a community for each node in the seed set. In the second step, the scoring function is used to assign a vector to each node in the network. The obtained vectors are clustered using the DBSCAN algorithm to obtain multiple communities. The algorithm is tested on both synthetic and real networks (social, co-purchase, collaboration) and compared to other approaches like [71], [133], and [13] in terms of F1 score. The largest dataset that this method is tested on is the youtube network [113] with 1134890 nodes. However, in the suggested method the number of communities detected is highly related to the input parameters, making it not so efficient as other methods like the approach of Ni et al. [62].
Similar research papers to the ones described above include Gharan et al. [72] and Zhu et al. [73].

d: HEAT-KERNEL BASED TECHNIQUES
The heat kernel is another type of graph diffusion that is useful for discovering communities near a starting node. The heat kernel method uses the Taylor series expansion of the exponential function of the transition matrix.
After theoretically analyzing the heat kernel diffusion property in a previous work [133], Chung et al. [76] propose a randomized Monte Carlo method to estimate the diffusion. More specifically, they present an algorithm that takes as input a graph and a boundary condition and outputs a vector that is a good approximation to the solution of the linear SDD (Symmetric Diagonally Dominant) system with boundary conditions. The suggested method can be generalised to weighted networks too.
Besides, Kloster et al. [77] are the first to propose a deterministic method for computing a heat kernel diffusion in a graph. More precisely, the authors present a relaxation method (hk-relax) to solve a linear system for estimating the heat kernel diffusion, where the heat value of each node represents the probability of association. Thus, it consists a useful approach compared to similar ones like [76] They show, after an experimental evaluation with synthetic and real networks (collaboration, social, co-purchase), that the heat kernel outperforms the personalized PageRank of [71], with higher values in the F1 score. The largest dataset that they experiment with is Friendster [113] with 65608366 nodes. They also provide the source code of their approach.
A parallel version of heat kernel algorithms, among others, is presented by Shun et al. [78], which besides being tested on the large network of Friendster [113] with 65608366 nodes, is also tested on a very large network from Yahoo [134] with 1413511391 nodes.

e: LOCAL SPECTRAL BASED TECHNIQUES
Algorithms in this category apply spectral techniques to detect local communities. In particular, these methods require VOLUME 10, 2022 first obtaining the embedding of the nodes and then applying the vector clustering method to the embedding matrix.
Mahoney et al. [79] suggest a locally-biased analogue of the second eigenvector that can be used to compute an approximation to the best partition near an input seed set. They also provide an empirical evaluation of their method called LocalSpectral, on synthetic and real networks (social, collaboration) and show that it can be applied to finding locally-biased sparse cuts around an input node set in small social and information networks. The largest dataset that this method is tested on is a collaboration network [135] with 379 nodes.
In addition, the LEMON (Local Expansion via Minimum One Norm) algorithm is presented by Li et al. [13]. The algorithm is based on the extraction of a sparse vector y in the span of the so-called local spectral subspace of the graph around the seed set S. Its source code is also publicly available for use. The previously mentioned vector y is used as a scoring function. The LEMON algorithm is an iterative algorithm, since the nodes with the highest scores are used to expand the seed set S and find a new vector y. The iteration stops when the conductance starts to increase. Experiments with both synthetic and real networks (co-purchase, collaboration, social) prove the effectiveness and efficiency of LEMON in finding communities with high accuracy, compared to local methods like [77], [136], and [74], and certain global community detection methods, considering the F1 score. The largest dataset that this method is tested on is the orkut network [113] with 3072441 nodes. In interesting insight concerns how the size of the seed set affects the algorithm's performance. In general, as the seed set size increases, the seed expansion algorithm performs better. Nevertheless, in LEMON only a small proportion of the target community nodes is needed in order to accurately detect a local community, i.e. 2 to 3 nodes.
Moreover, a method for local overlapping community detection is proposed by He et al. [80]. The method is called LOSP and its source code is publicly available. More precisely, the authors suggest extracting the local community by searching a sparse vector from the local spectral subspaces using l1 norm optimization. Experiments with real-world networks (co-purchase, collaboration, social) show that LOSP outperforms in terms of F1 score methods like [77] and [13]. In contrast to LEMON [13], which needs more than one seed node to execute, LOSP can achieve high F1 score values even with a single seed node. The largest dataset that this method is tested on is the LiveJournal network [113] with 3997962 nodes. An extension of this work is also proposed by the same authors in He et al. [85]. However, the quality of detected communities of the suggested method is sensitive to the position of the seed node in the community.
Besides, an LCD method based on network motifs (LCD-Motif) is proposed by Zhang et al. [86], which incorporates higher-order network information. LCD-Motif uses the local expansion of a seed set to identify the local community with minimal motif conductance, a generalization of the conductance metric for network motifs. Different from PageRank-like diffusion methods, LCD-Motif finds the community by searching for a sparse vector in the span of the local spectra such that the seeds lie in its support vector. The authors evaluate their approach on both synthetic and real networks (citation, biological, social, food chain) with good and comparable results to several state-of-the-art approaches like [13], [74], [77], [136], and [14] in terms of precision, recall and F1 score. The largest dataset that this method is tested on is the LiveJournal network [113] with 3997962 nodes.
Finally, Shi et al. [88] propose a Locally-Biased Spectral Approximation (LBSA) approach to identify all latent members of a local community from very few seed members, which improves the suggested algorithm by Li et al. [13]. In addition, this method is the first to use the Lanczos iteration [137] for the local community detection problem, a classical method for computing eigenvalues (the spectra of the matrix). They also use the heat kernel as a sampling method instead of detecting communities directly. Experimental results with several synthetic and real networks (co-purchase, collaboration, social) show that the proposed method outperforms state-of-the-art LCD algorithms like [71], [77], [80], and [13] in terms of F1 score and Jaccard index. The largest dataset that they experiment with is Friendster [113] with 65608366 nodes. The source code of their method is provided.

B. TECHNIQUES FOR DYNAMIC/TEMPORAL NETWORKS
The methods for LCD in dynamic/temporal networks can be divided into two groups: 1) Snapshot model and 2) Temporal Smoothness. In the first group, snapshots of the evolving network are available at different points in time. At each of these points in time (timestamps), several changes (additions/deletions of nodes/edges) may occur. Because of these changes, the communities may also change. The algorithms belonging to this category focus on updating the communities in each snapshot considering the community structure of the previous snapshot. Temporal smoothing methods focus on finding local communities over time and updating them after each atomic change, i.e., addition/deletion of nodes/edges.

1) SNAPSHOT MODEL
Takaffoli et al. [90] propose an incremental community mining approach called incremental L metric, in order to detect more stable communities compared to previous approaches. The main contribution of their work is to adopt the static L-metric approach [138] to compute dynamic communities. Community mining starts at each snapshot with the communities found in the previous one. The communities found in different snapshots are then matched based on their similarity (i.e., the L-metric) and grouped as instances of the communities evolving over time. The main assumption of the L-metric is that a community has fewer connections from its edge nodes to the unknown part of the graph. Although the experiments were only performed with a real network, they resulted in meaningful communities compared to other methods like [139] in terms of a modified modularity metric. The largest dataset that this method is tested on, is Enron email network [125] with 87273 nodes.
Furthermore, in Zakrzewska et al. [11] a dynamic algorithm is proposed to expand the seed set by updating the fitness score of each snapshot incrementally, focusing on efficiency. The work focuses on maintaining a community centered around the seed. Therefore, it is necessary to track the order in which the nodes were added. Thus, the proposed method ensures that the order of fitness scores remains monotonically increasing and then restarts the algorithm. The process is finished when there is no node whose addition to the community increases the fitness score. The choice of fitness score metric is left to the user. Experiments with real networks (social, email) show that the proposed method is faster than re-computation approaches and that the performance improvement is greatest when low-latency updates are required in terms of precision, recall, ratio of the fitness scores in the dynamic algorithm vs. those obtained by re-computation and the ratio of the size of the community output by the former methods. However, it holds that the suggested algorithm is not time-efficient. The largest dataset that this method is tested on is the DBLP network [113] with 317080 nodes.
An interesting approach is presented by Nathan et al. [92]. Their main goal is to tie together community detection and centrality by studying how personalized centrality metrics can be used for LCD in dynamic networks. The authors are the first to update the personalized centrality vector each time the network changes, and then determine the new local community based on the updated centrality vector. More specifically, they combine Katz and PageRank centrality values to obtain local communities of a network. Katz centrality values count the number of weighted walks in a network starting from a node, penalizing longer walks with a user-selected parameter α. PageRank, as mentioned earlier, is also a walk-based centrality metric that assigns a high score to nodes visited by a large number of random walks in the network. Once the personalized Katz or PageRank centrality scores are calculated, the local community is formed from the nodes with the highest centrality scores. The main drawback of this method is that it requires a priori knowledge about the size of the community. Experiments with synthetic and real networks (social, email) show that the proposed method results in similar high-quality communities as static re-computation approaches, but is more efficient in terms of both time and number of iterations required. The largest dataset that this method is tested on is the youtube network [140] with 3223589 nodes. Moreover, the work by DiTursi et al. [93] aims to find communities with stable membership over time, where members interact mainly with each other rather than with the rest of the network. To model this intuition, the authors propose the temporal conductance measure, an extension of the already well-known conductance metric. Thus, they suggest a method called PHASR to find the temporal community with the lowest conductance. Evaluations on synthetic and real networks (road, communication, www) show that the proposed algorithm scales better than existing alternatives like [71] and [90] and achieves significant runtime reduction. The largest dataset that this method is tested on, is a preferential attachment synthetic network [141] with 15000 nodes. PHASR, is related to [90], with the difference that PHASR achieves the discovery of communities with lower conductance since it considers the full timeline as opposed to considering only consecutive time steps.
Javadi et al. [94] suggest using the concept of leader nodes to detect dynamic communities, as they assume that communities are formed around them. They define the node with the highest degree of centrality as the leader. Experiments have shown that their method can effectively detect communities in both real (social, communication) and artificial dynamic networks. However, as the size of the dynamic networks increases, the time efficiency decreases dramatically. The largest dataset that this method is tested on, is the Enron email network [125] with 87273 nodes. Leader nodes have also been used by Gao et al. [91] to detect a network's community structure with their proposed method called EvoLeaders.
In addition, the method of Guo et al. [95] first finds the starting nodes of the community using a metric called the local fitness of the nodes. Next, a static algorithm is used to obtain communities in the initial snapshot of the network. Finally, node contribution is proposed to incrementally reveal communities in non-initial snapshots of the network with a suggested method called DyCDNC. The authors perform experiments on synthetic and real networks (email, routers) to prove the accuracy of their method for detecting local communities compared to methods like [142] and [143] considering modularity and NMI. The largest dataset that this method is tested on, is Enron email network [125] with 87273 nodes.
More recently, Papadopoulos et al. [96] extended the PHASR [93] algorithm to conform to the Apache Spark engine distributed processing standard. An approximation method for computing the personalised PageRank vector in the refinement step is also proposed. Performance evaluation results with synthetic and real networks (communication, web, road) have shown that the proposed approach is scalable by increasing the degree of parallelism.
Since a starting node can belong to multiple communities, in [97] Liu et al. extend their static algorithm [16] to the dynamic network to solve this problem. In fact, they are the first to propose a method for multiple LCD in dynamic networks, called HqsDMLCD. Their method achieves comparable results to the static method in real networks (co-purchase, collaboration, social), in terms of F1 score, conductance, community coverage and number of detected communities. The largest dataset that this method is tested on is the LiveJournal network [113] with 3997962 nodes. VOLUME 10, 2022 2) TEMPORAL SMOOTHNESS Hu et al. [98], propose a local algorithm (LDM-CET) for dynamic community detection that focuses only on the part of the network that changes at each time step, since network data usually does not change dramatically in a short time. First, a static community detection algorithm is executed, and then a personalized PageRank approach is applied. The starting nodes are selected based on a strategy that tracks the behavior of dynamic communities that construct a partial evolution graph. In general, the PageRank approach is chosen because it runs fast. However, the chosen strategy for finding seed nodes is impractical because it spends a lot of time checking whether a node is a local-minimal conductance node or not. Experimental results with synthetic and real networks (co-authorship) show that the proposed method tracks the community structure well when a network does not change dramatically, in a more efficient way than the compared methods like [139]. The largest network that the authors use to evaluate their method is a synthetic network generated by the LFR method [106] with 1000000 nodes.
An interesting approach is presented by Rigi et al. [99], where a method for detecting local communities inspired by geometric active contours is proposed. Geometric active contours are widely used in machine vision to detect objects in 2D images. They are known for their speed, autonomous and unsupervised nature, and ability to track dynamic objects. The proposed model introduces and uses the derivative-based concepts of curvature and gradient of the boundary of a connected subgraph in networks. Then, a velocity function based on curvature and gradient is proposed to determine whether the boundary of a community should evolve to include a neighbouring candidate. A framework is proposed to approximate derivatives in graphs. This framework is tested on real networks (social) and leads to local communities with high accuracy compared to [30] and [79] in terms of conductance, precision, recall and F1 score. The largest network that the authors use to evaluate their method is Facebook graph FB-JHK of John Hopkins University [111] with 5180 nodes.
Besides, method L-MEGA is suggested by Fu et al. [100], which is based on motif-based clustering. The authors use the multi-linear PageRank vector by edge filtering and motif push operation, and then apply an incremental sweep cut to obtain the local community. Experimental analysis of synthetic and real networks (rating, communication, contact) shows that the proposed method can detect high-quality communities in an efficient manner, compared to both static and dynamic state-of-the-art methods like [144] and [14] considering conductance and triangle density (i.e. the ratio of triangles in the returned local community). The largest network that the authors use to evaluate their method is the contact interaction network [145] with 10972 nodes.
Finally, based on static local seeding, Hu et al. [42] propose a dynamic local seeding algorithm to handle dynamic networks by considering only the nodes affected by network changes. More precisely, the authors suggest a technique to update the centrality values of nodes involved in network change. Experiments with synthetic networks show that the proposed method, called D-LM, is quite fast, but not when many nodes are affected by changes. The authors use the F1 score, coverage and modularity in order to evaluate the suggested method. The largest network that the authors use to evaluate their method is a synthetic one by the LFR method [106] with 1000000 nodes.

C. TECHNIQUES FOR GRAPH STREAMS
As far as the authors are aware, there are only a few papers in the literature on LCD in graph streams, since the specific problem has engaged the scientific community only in the last two years.
More specifically, Liakos et al. [101] are the first to propose a streaming graph community detection algorithm, which they call CoEuS (source code is publicly available). The algorithm aims to expand seed sets from nodes to communities under the constraints posed by the streaming model, which dictates that only a single access to the stream is possible and the working memory is limited. The proposed approach has been evaluated on synthetic and real networks (social, copurchase, collaboration) and leads to comparable results with methods for detecting local communities without streaming like [13] and [80], but utilize the entire graph structure, considering F1 score. The largest dataset that they experiment with is Friendster [113] with 65608366 nodes.
Moreover, Baltsou et al. [102] and its extension Christopoulos et al. [146], describe a framework, which is used to strengthen the vicinity of the seed set (called anchors) by exploiting the fact that the seed set is of central importance for the evolving community. Informally, the anchors are considered the core of their community because they contribute in some way to its formation and definition, regardless of their topological properties within the network. The multi-step framework that is suggested, firstly applies a static algorithm to discover the initial anchor's community and then for each incoming edge change in the influence range of the anchor, the anchor's community is updated. With a view to discover the most stable anchor's community, the authors suggest using a node rewarding method. That is, for each update, the stable edges in the anchor's influence range are rewarded by a weight increase. Experiments are conducted on synthetic datasets along with three proposed rewarding methods, and the results are compared with the case where no rewarding method is used. The authors' findings indicate that all three dynamic methods with rewards outperform the dynamic method without rewards in terms of recall, precision and F1 score. The largest network that the authors use to evaluate their method is a synthetic one by the RDyn generator [147] with 5000 nodes.
Tables 4-12 contain all publications discussed in the present survey, separated by their class. Each Table contains the references of a class together with the name of the proposed algorithm (if any), some important features, and the availability of the source code.

V. TOOLS AND DATASETS USED FOR LOCAL COMMUNITY DETECTION A. TOOLS
Since network analysis is a very important research topic in recent years, many tools have been developed to help researchers uncover important properties of networks. As can be inferred, these tools are not specifically designed for detecting local communities, but for detecting communities in general. Of course, researchers can use such tools for LCD, and here we present several that are either packages based on graphical user interfaces (GUIs) or libraries in scripting/programming languages [23], [148]. In Table 13 we present in alphabetical order the most popular tools used in network community detection, along with their type (GUI or library), the platform requirements to be installed and their licenses.

1) GRAPHICAL USER INTERFACES
As can be deduced, GUIs are easier to use than scripting/programming languages because they consist of a graphical interface that guides the user. Here we present the most commonly used ones, which are also well documented. a: GEPHI [149] It is a free open source platform for interactive network visualization. Users can interact and manipulate structures, colors, and shapes to discover hidden properties of the graph. Communities can be detected using the Louvain algorithm [150]. The Louvain algorithm attempts to greedily optimize modularity by randomly moving nodes in multiple layers from one community to another. b: CYTOSCAPE [151] This is an open source software platform for visualizing complex networks and integrating them with any type of attribute data. c: GRAPHVIZ [152] It is an open source graph visualization software. It contains various graph designs for displaying networks in interactive mode. d: SocNetV [153] It is a free open source software for social network analysis and visualization. This tool provides various metrics for graph and network cohesion, as well as numerous layout models. It also implements community detection algorithms such as triad and clique census. Finally, it offers famous datasets for social network analysis for use. e: PAJEK [154] It is a software program for analysis and visualization of very large networks, which is free for non-commercial use. It helps to calculate various network metrics and detect community structures. Communities are found using the Louvain and Visualization Of Similarities (VOS) methods. It also includes several network layouts. f: CFinder [155] It constitutes a free software for finding and visualizing overlapping network communities, based on the Clique Percolation Method (CPM) [156]. CFinder uses spring layouts to visualize graphs. g: VISONE [157] It is a free tool for social network analysis and visualization. It implements the Louvain algorithm for community discovery.
h: NetMiner [158] It is not an open source tool. NetMiner can be used to analyze and visualize network data and implements several algorithms to detect community structures such as Edge betweenness, Blondel, Eigenvector, Label propagation and modularity. In addition, it supports a wide range of visualization layouts.
i: NodeXL [159] It is a tool for displaying graph data, performing network analysis, and exploring networks visually. It supports multiple social network data providers that import graph data into a spreadsheet format.

2) SCRIPTING/PROGRAMMING LANGUAGES LIBRARIES
Scripting/programming language libraries require knowledge of the particular language to be used. Therefore, they may be more difficult to learn than GUIs, but they are much more powerful and extensible. a: networkX [160] It is a Python package for creating, editing and studying the structure, dynamics and functions of complex networks. It contains many community detection algorithms such as Louvain, k-cores, label propagation and others. b: IGRAPH [161] It is a library for network analysis and visualization in R, Python, and C/C++. It contains several algorithms for community detection, such as Edge betweenness, Infomap, Label propagation, Louvain method and Random walk method among others. c: Neo4j [162] It is a platform for graph data that also provides many tools for network analysis. More specifically, the Neo4j library includes several community detection algorithms such as Louvain, Label Propagation and others. It also includes many useful metrics such as centralities, similarities, etc.   [163] It is a library for network analysis. It contains interfaces for both Python and C++. It also provides a large number of different network datasets, including temporal ones. e: GUNROCK [164] It is currently the most powerful CUDA graph processing library designed specifically for the GPU. Many methods, useful for community detection, are implemented and executed in optimal time, such as Louvain, PageRank Nibble and others. However, it only supports static graph datasets and not dynamic ones. f: SENTENCETRANSFORMERS [165] It is a Python framework for state-of-the-art sentence, text and image embeddings. It can be used along with GPU for very fast implementation. Among others, it implements a local community detection algorithm that is tuned for large datasets (50k sentences in less than 5 seconds). The user can also specify the minimal size of a local community.
Among the works included in this survey, only a few provide their source code. More precisely, the 20% of the works make their algorithm available online for anyone to use. Figure 4 shows the source code availability of the works presented in this survey.

B. DATASETS
There is a plethora of datasets that are used to experiment with proposed approaches. However, there are certain datasets that are very commonly used by researchers. In particular, in the LCD-related literature, we found that most papers use the same network datasets for their experiments. The main reason for this is that it allows us to make a fair comparison with competing methods. Another reason is that there are not many well-described datasets that simultaneously share certain characteristics, e.g., ground truth communities. In Table 14 we present the most commonly used synthetic and real-world datasets in the works for LCD along with their usage percentages.
In most works, concerning LCD, researchers choose to experiment with both synthetic and real network datasets. More specifically, about 78% of the papers presented here use both synthetic and real networks, while 22% use only real networks. As can be deduced, there is no work that uses only synthetic networks.
As for synthetic datasets, about 59% use the LFR benchmark network [106]. The remaining works use one of the following: Dynamic Benchmark Network Generator [169], Girvan-Newman network generator [9], R-MAT generator [170], Stochastic Block Model [171], preferential attachment synthetic networks [141], dynamic network generator based on Markovian evolution [172], Ulam networks [173] and the RDYN graph benchmark [147].  Regarding the real-world datasets they are of different domains, although most of them are social networks. The DBLP [113] website contains an extensive list of research papers from the field of computer science. The dataset is a collaboration network where each node denotes a researcher and each edge represents the existence of a collaborative work. Amazon [44] represents a co-purchase network in which nodes refer to products sold on the Amazon website, and the edge between two nodes indicates that they are frequently purchased together. Football [9] describes American football games between division IA colleges during the Fall 2000 regular season. The nodes represent football teams, and an edge between two nodes indicates that a game occurred between the two teams. The remaining datasets are social networks. More specifically, Karate [166] is a network in which each node represents a member of Zachary's karate club, and an edge indicates that two members are friends. Youtube [167] is another social network where each node represents a user and each edge represents a friendship between two users. LiveJournal [113] is another social network where edges indicate that two users have formed a friendship. Polbooks [103] is a network of books about US politics. Each node represents a book, and an edge between two books indicates that they are often bought together. Dolphins [168] is a social network with frequent interactions between dolphins living in New Zealand. The nodes represent dolphins and the edges indicate that the corresponding dolphins have frequent contact. Besides, Orkut [113] is also a social network where users make friends with each other. Finally, Friendster [113] is an on-line gaming network while it started as a social networking site where users could be form friendships with other users. Figure 5 shows the usage percentage of real networks used in the literature of this survey.

VI. INSIGHTS AND FUTURE DIRECTIONS
To suggest the best performing methods to the reader, it is necessary to take into consideration several of their aspects. Thus, we have to consider if a method is tested along with state-of-the-art approaches and proved to perform better in terms of specific metrics. Besides, it would be a plus if a method is tested on several and of different real datasets. Furthermore, it is also important for a method to provide the source code, so as the researchers could compare their methods with the existing ones.  References of LCD on static networks. Non-greedy, random-walk based techniques using heat kernel.

TABLE 9.
References of LCD on static networks. Non-greedy, random-walk based techniques using local spectral.
The best performing methods that focus on node selection in terms of accurately identify local communities are those that propose starting the LCD process from maximal cliques rather than a single seed node proposed by Fanrong et al. [34] and Hamann et al [36]. However, the method proposed by Hamann et al. [36], might be the best overall approach, since it is extensively experimented on many different datasets and compared to several other state-of-the-art approaches. Furthermore, the suggested method is suitable for (un)weighted and overlapping networks. Moreover, the source code is publicly available.
Considering greedy methods that focus on a quality metric, probably the best performing method is the one proposed by Guo et al. [64]. With a view to limit sensitivity to seed node selection and the instability of communities, the authors suggest an algorithm that uses a simple quality metric based on Jaccard coefficient. The proposed method is publicly available and the authors conducted many experiments with both real and synthetic networks in order to compare with several existing methods.
As for non-greedy methods, the random-walk based method seems to perform better than flow based methods, especially when the seed set is small compared with the community it belongs. The random-walk based method proposed by Shi et al. [88] performs very well compared to other state-of-the-art methods. The authors also provide the source code of the proposed approach and conducted experiments with several real and synthetic datasets.
Concerning the temporal networks, the methods proposed by DiTursi et al. [93] and Fu et al. [100] respectively, seem to perform more effectively and efficiently than   other state-of-the-art approaches. Both methods are extensively compared with others on synthetic and real networks of various types, and detect high quality communities (in terms of conductance and triangle density) in an efficient manner.
Another aspect that we discuss here concerns the scalability of LCD methods. In general, one of the reasons that local approaches are preferred over global ones, is the fact that the size of the network does not have an important role in the community detection process, since the researcher is interested in only a specific region of the network i.e. around the seed nodes. Thus, the time needed to detect a community is proportional to the size of the community [14], [33], [34]. However, mostly in diffusion approaches, their run-time as well as the quality of the detected local communities may depend on some other characteristics of the networks like density and community diameter [25], [39], [68], [100]. For instance, a simple method such as a short random walk may be efficient on a dataset with small diameter communities, but may fail to reach a large part of a community in a sparser graph with a larger diameter. Nonetheless, a more complex approach such as a local spectral method might maintain consistent community quality across these graphs, but as the graph size increases, orders of magnitude become slower.
Considering the future directions of local community detection, these may be determined, to some extent, by current trends in the field. More specifically, as the available amount of data becomes larger and more mutable, there is the need for effectively and efficiently analyze rapidly changing large networks. Thus, researchers of the domain should probably focus on techniques designed for dynamic networks and for graph streams. Another trend that was observed, concerns multilayer networks. Since social networks play a vital role in users data analysis, it is important to link distributed user profiles of the same user from multiple platforms. In such cases, multilayer networks are used as a modeling tool. We believe that the local community detection approaches concerning multilayer networks will attract more and more researchers' interest.
Finally, deep learning is gaining popularity in recent years as it can be used to solve many problems in various fields such as speech recognition, natural language processing, image classification, and so on. One important reason for this is the performance superiority of Deep Learning techniques in processing large amounts of data compared to traditional machine learning methods. In problems related to community detection, convolutional graphs are used as a semi-supervised method. More specifically, given several communities of a network as training data the aim is to discover more communities in the same network [174]. Today, there are few works on LCD in graph convolutional networks [175], [176], but in the future such methods may attract the interest of researchers, although they should focus at unsupervised methods. VOLUME 10, 2022

VII. CONCLUSION
Local community detection has been a very active area of research in recent years. There are a variety of proposals to achieve this goal, but they are not all applicable to any particular problem. Motivated by the abundance of practical applications of local community detection, we suggest a classification of the local community detection techniques proposed in the literature, focusing on the techniques of the last decade. This classification is not absolute, as there may be different separation criteria. Our goal is to help researchers decide which approach is the best for their particular problem, based on the type of network and local community detection method they wish to use. Besides, we discuss tools that are commonly used in the detection of local communities. These tools are either GUIs or scripting/programming languages. Furthermore, we present a list of the most commonly used synthetic and real datasets in the literature so that researchers can test their approaches and compare them to others.
As it can be concluded from the present survey, there is no LCD approach that can be said to be the best or the one that fits all. Rather, the researcher must choose the most appropriate method for the problem at hand, depending on the nature and characteristics of the network and the constraints that she/he might place on the setting.
In the future, it would be very interesting to extend the present study with an experimental comparison of local community detection algorithms. This would give, to any interested researcher, a more thorough and complete overview of the available options to LCD problems.