Path Querying in Graph Databases: A Systematic Mapping Study

Path querying refers to the evaluation of path queries in a graph database. New research in this topic is crucial for the development of graph database systems as path queries are associated with relevant use-cases and application domains. The aim of this article is to identify and establish what is currently known about path querying in graph databases. To achieve this, we conducted a systematic mapping study (SMS) in which we explored four digital libraries and collected research papers published from 1970 to 2022. These articles were filtered, classified and analyzed to extract quantitative and qualitative information which is presented in this article. Additionally, we provide a concise description of keywords, use-cases and application domains associated with path querying in graph databases.


I. INTRODUCTION
In graph theory, the term Path Querying refers to the process of searching for specific paths within a graph or hierarchical data structure.Recall that, at its simplest form, a graph is an abstract structure consisting of nodes and edges, such that each edge connects a pair of nodes; and a path is a sequence of nodes and edges connecting a pair of nodes.
In graph data management, Path Querying refers to the evaluation of path queries in a graph database, i.e. a type of database in which the schema and the data are represented as a graph or a generalization of graphs [1].A path query allows to search paths satisfying a given criteria, involving constraints over labels (types), properties, or combinations thereof [2].
The research around path querying is crucial for the development of graph database systems [3] because path queries are associated with use-cases in various application domains.For instance, in social networks, the shortest paths between two users are used to model search ranking functions [4], to analyze the characteristics of the communities [5], to find influential users [6], or to study the behavior The associate editor coordinating the review of this manuscript and approving it for publication was Vlad Diaconita . of information spreading [7].In banking and financial services, path querying is used for fraud detection [8], [9].In bioinformatics, path queries are used for alignment of proteins [10], for analysis of cellular networks [11], and to extract meaningful pathways from protein interaction networks [12].
In line with the above, the community of graph data management has emphasized the significance of path querying [3], and existing graph query languages are providing and creating new features associated with paths.In particular, SQL/PGQ [13](the fragment of the standard ISO SQL:2023 that allows querying graphs on top of relational databases) and GQL [14](the proposal of standard query language for graph databases), included features like returning paths (instead of tables of solutions), multiple path semantics (to avoid infinite solutions), and grouping of path query solutions.
Although there are several works that review the advancements in querying graph databases (see Table 1), we could not find a specific study about path querying in graph databases.
This article presents a systematic mapping study (SMS) about path querying, with a focus limited to graph databases.In this sense, we identify, categorize, and analyze existing literature relevant to the research topic.Additionally, we present a comprehensive organization and description of key terms and concepts related to path querying (e.g.''path pattern'' or ''path discovery'').Our goal was to provide a clear definition for each term, with the intention of providing future researchers with easily accessible information on important concepts associated with path querying in graph databases.
The rest of the paper is organized as follow: the related work is presented in Section II; the research method is presented in Section III; the preparation phase is presented in Section III-A; the gathering phase is presented in Section III-B; the analysis phase is presented in Section IV; our review of research topics is presented in Section V; finally, the conclusions are presented in Section VI.

II. RELATED WORK
The review of the related work that is presented in this section was divided into three groups: graph databases, graph queries, and graph query languages.Each group includes articles that provide surveys, reviews, revisions, comparisons (similar to a systematic mapping study), or analysis based on the topic.The articles are presented in chronological order.

A. GRAPH DATABASES
In 2008, Angles and Gutierrez [1] presented a systematic review of graph database models.Their research discusses various data models and query languages that form the foundation of modern graph database systems.While this survey mentions significant articles on path querying, it does not offer an analysis of these articles.
In 2012, Angles [15] presented a comprehensive comparison of different graph database models.This research encompasses an examination of data models, query languages, and their overall characteristics.It also entails a general analysis of fundamental path queries that were supported by the available query languages at that particular time.
In 2015, Wang et al. [16] published a review of graph database systems and their applications.The study provides general information regarding query languages and query processing techniques used by the analyzed systems.
In 2021, Rabuzin and Šestak [17] presented a comprehensive overview of graph database systems, discussing their history, current state, and potential developments.The article includes an analysis of trends in this field and offers recommendations for future improvements to such systems.
In 2021, Timón-Reina et al. [18] conducted a literature review to investigate the development, efficiency, and utilization of graph database systems.The authors have a goal of applying graph-oriented technologies in the field of bio-medicine.

B. GRAPH QUERIES
In 2011, Barcelo et al. [19] analyzed the foundations of querying graph patterns.The authors identify the key features of patterns (i.e. a sub-graph with variables and regular expressions).They investigated standard graph queries and provided exact descriptions of data and combined complexity for various pattern classes.
In 2013, Bagan et al. [20] investigated the evaluation of regular path queries, and studied additional restrictions.Furthermore, they proposed a classification scheme for regular languages according to their complexity.
In 2018, Bonifati and Dumbrava [21] conducted a study on various types of graph queries.They analyzed query languages that offer a well-balanced combination of theoretical principles and practical applications.The article also provides a review of diverse query evaluation techniques, encompassing the concepts of ''Query approximation'' and ''Query learning''.
In 2021, Wang et al. [22] published a survey about typical graph queries with attributes.They developed a taxonomy based on the inputs and outputs of these queries.Their research encompasses a collection of various query types organized by domain, and for each query, they provide an examination of its meaning and the algorithms used to resolve it.

C. GRAPH QUERY LANGUAGES
In 2012, Wood [23] presented a brief revision of graph query languages and discussed the core functionalities provided by these languages, including subgraph matching, finding nodes connected by paths, comparing and returning paths, aggregation, node creation, and approximate matching and ranking.The author also review theoretical results related to expressive power and complexity of evaluating graph queries.
In 2017, Angles et al. [24] presented a revision of graph data models and practical graph query languages.The primary focus of this article was on graph pattern queries and navigational queries (also known as path queries), along with a detailed discussion on various semantics and complexity results.Additionally, the article provided several examples to highlight the syntactic variations between different query languages commonly used in this field.
In 2018, Angles et al. [2] defined the concept ''graph query language'' as the way to retrieve or extract data which have been modeled as a graph and whose structure is defined by a graph data model.This article presents a summary of the main types of regular path queries, along with their associated challenges and key applications.
In 2021, Sharma et al. [25] published the results of a systematic literature review on graph query languages.The purpose of this article is to review the main features, current methods, and the types of graph patterns supported by the existing query languages.
After reviewing the literature, we have found multiple articles that discuss graph databases, graph queries and graph query languages, including systematic reviews and surveys.However, we did not find any research specifically focused on path querying.Therefore, conducting a systematic mapping study on path querying in graph databases would be a valuable starting point for a comprehensive review and detailed analysis of this field.Additionally, the findings presented in this article highlight unexplored features and could inspire the creation of new query languages.

III. RESEARCH METHOD
The development of this article is based on the notion of Systematic Mapping Study (SMS), which is a research method that involves an extensive review of primary studies in a specific topic area with the objective of identifying the available evidence on the topic [26].The process entails conducting a literature search to determine which topics have been previously addressed and where the research has been published [27].The result of a systematic mapping study is a collection of papers related to the topic area, organized according to a classification.Therefore, a mapping study offers an overview of the range of the subject area and enables the identification of research gaps and trends [27].
Based on the proposal made by Petersen et al. [28], our systematic mapping study follows a process comprising three phases: • Preparation phase, where several parameters and criteria to guide the systematic mapping are defined.
In essence, research objectives, research questions, bibliographical sources, search terms, and selection and relevance criteria are defined in this phase.
• Gathering phase, which is oriented towards gathering relevant research papers according to the parameters and criteria defined in the preparation phase.In essence, we search, gather, and select candidate works.Then, we classify the works and select the relevant ones based on their classification.Finally, we gather new search terms and repeat this phase with the new terms.
• Analysis phase, that involves extracting information and knowledge from the research papers obtained during the gathering phase.The research objectives and research questions defined in the preparation phase guide this phase.The result of this phase is a report that includes both quantitative and qualitative analyses.In the following sections, we will present present the results of applying the phases described above.

A. PREPARATION PHASE
In this section, we will outline the research objectives, research questions, bibliographical sources, search terms, general selection criteria, and relevance criteria.

1) RESEARCH OBJECTIVES
The main goal of the systematic mapping described in this article is to gather, categorize, and analyze the majority of research studies that are concerned with ''path querying'' in the field of graph data management.The specific aims of this project are: • Provide statistical information about the research papers related to path querying in graph databases; • Identify topics and research lines in the area of path querying, and raise awareness about their level of development; • Identify application domains and uses cases where path querying is applied.

2) DEFINITION OF RESEARCH QUESTIONS
The systematic mapping presented in this article has been designed to address the following research question: RQ1.What terms associated with ''path querying'' in graph databases are the most influential?
This question is oriented to know the most important terms associated to the notions of ''path querying'' in graph databases.Such terms will be collected from the articles found in the bibliography, and those with the highest frequency of use will be considered influential.RQ2.What kinds of research papers have been developed in the context of ''path querying'' in graph databases?This question will deliver information about the types of research papers found in the literature.We consider two classifications: based on deepness (define, implement, optimize, learn, or compare) and based on scope (use case, practical, survey or theoretical).RQ3.How the research related to ''path querying'' in graph databases has evolved?This question is oriented to know the evolution of the associated terms in the time.Specifically, we would like to know the periods of time with more and less activity for each associated term.It is also useful to rank the associated terms according to their level of activity.RQ4.Where the research papers related to ''path querying'' in graph databases are published?This question aims to find out the scientific events (conferences and workshops) and journals in which research on path querying in graph databases has been published.It is helpful to know where to publish new research related to both topics.RQ5.What query languages allow ''path querying''?
This question aims to discover which query languages support operations for path querying.We are interested in specific aspects such as language syntax, language semantics, expressiveness, and computational complexity.RQ6.What are the application domains for ''path querying''?
We would like to gather information about the application domains and the specific use-cases related to path querying in graph databases.We desire to understand the challenges that have been addressed and resolved by utilizing path queries and graph databases.

4) SEARCH TERMS
In order to define the search terms, we conducted an initial exploration of the bibliographic sources by using the keyword ''Path''.As shown in Table 2, this search returned a high number of articles, in particular for Springer.After

5) GENERAL SELECTION CRITERIA
A research paper is considered a ''candidate paper'' if it satisfies the following conditions: • Publication year: Higher than 1970; • Language: English; • Article type: Journal, conference or workshop; • Knowledge areas: Computer Science, Data Management, Graph Databases.

6) RELEVANCE CRITERIA
We considered four levels of relevance: • Highly Relevant: A research paper that proposes, develops or compares fundamental elements related to path querying in graph databases, including data models, query languages, algorithms, methods or strategies.
• Moderately Relevant: A research paper that compares models and methods related to path querying in graph databases, or describes their application in specific application domains.
• Somewhat Relevant: A research paper that is not related to path querying in graph databases, however the models and methods presented in the paper can be a source of inspiration for the area of path querying.
• Not Relevant: A research paper that corresponds to the search terms, but it is not related to the field of data management.A paper will be considered a ''Relevant Work'' if it fits into one of the top two levels defined above.

B. GATHERING PHASE
This section presents the results of the gathering phase of our systematic mapping.It includes the search of research papers, the selection of candidate works, the classification of papers, the selection of relevant papers, and the review of search terms.

1) SEARCH AND GATHERING OF WORKS
The search of papers depends on the search engine provided by each bibliographic source.Next we present the search string used in each bibliographic source.

2) SELECTION OF CANDIDATE WORKS
The search of the bibliographic sources with the selected keywords, yielded a total of 9193 works.To ensure accurate search results, we checked that each papers included the appropriate search term in its title, abstract, or list of keywords, performing a screening of these articles and discarding those that did not have a direct relationship with ''Path Querying'' and ''Graph Databases'', like articles related with mathematics (path algorithms), operation management (path optimization), internet networks (shortest paths), among others.
In Table 3, we show the number of candidate research papers discovered in each bibliographic source, totaling 305, where 67 papers are shared among the sources, totaling 238 unique papers.While Scopus provided the largest number of papers, some of them are also accessible in the other bibliographic sources.Furthermore, the list of research papers was supplemented with two papers suggested by the authors of this paper.
Additionally, in Table 4 we show the number of works for each search term and bibliographic source.Initially, it is evident that the term ''Path query'' has the highest number of results, particularly in Scopus.Conversely, the term ''Path manipulation'' did not yield any results.

3) CLASSIFICATION OF CANDIDATE WORKS
In order to provide a general comparison of the candidate papers, we defined the following three classifications: • Keyword-based classification: Each candidate papers was related to one or more search terms listed in Section III-A4.
• Work type classification: -Theoretical: An article that presents a formal development of the concepts, theories and methods related to a research topic.-Practical: An article that describes an specific model or method which can be applied to an abstract or real problem.-Use case: An article that shows the use of a model or method in an specific application domain.-Survey: An article that compares the models and methods related to a research topic.-Comparison: An article that compares operations, languages and methods associated with path querying in graph databases.These classifications will be used later (Section IV) in order to address the research questions.

4) SELECTION OF RELEVANT WORKS
In this step, each candidate paper was reviewed in order to determine its relevance according to the criteria defined in Section III-A6.Figure 1 shows that, for a total of 238 articles, 64 (27%) were considered Highly Relevant, 29 (12%) as Moderately Relevant, 69 (29%) as Somewhat Relevant, and 76 (32%) as Not Relevant.Based on the inclusion criteria defined in Section III-A6, 93 (39%) papers were selected for the next step (Highly Relevant and Moderately Relevant), while 145 (62%) were excluded (Somewhat Relevant and Not Relevant).
Please note that the chosen papers in this classification represent less than half of the papers chosen in the previous stage.This indicates that the search terms are broad and apply to multiple knowledge areas, resulting in a significant number of unrelated papers during the search process.

5) REVIEW OF SEARCH TERMS
The relevant papers were reviewed in order to find new search terms.To do this, for each article, we read the ''Abstract'', ''Keywords'' and ''Introduction'' sections.
This step resulted in six additional search terms: ''Finding paths'', ''Navigational Query Language'', ''Path properties'', ''Path query evaluation'', ''Path sequence'' and ''Path traversal''.According to our methodology, the gathering phase was repeated by using the above search terms.After the search step we obtained 1759 works, of which 18 works were selected as candidates, and 4 were selected as relevant.
The second examination of search terms yielded no additional search terms, therefore the total number of papers acquired during the Gathering Phase was 97.This total includes 93 articles obtained in round 1, plus 4 articles obtained in round 2.

IV. ANALYSIS PHASE
In this section, we will address the research questions that were proposed during the Preparation phase.The answers provided here are derived from the 97 research papers that were selected during the gathering phase.

A. WHAT TERMS ASSOCIATED WITH PATH QUERYING IN GRAPH DATABASES ARE THE MOST INFLUENTIAL? (RQ1)
The terms associated with path querying in graph databases, which were obtained during the Gathering phase, are listed in Table 5.It is evident that the term ''Path Query'' has the highest number of selected papers, while the term ''Path manipulation'' is not mentioned in the literature.It should be noted that the number of selected papers is less than half of the total number of papers found.It is worth recalling that some papers were excluded during the selection phase as they were focused on different research areas such as database optimization, semi-structured databases, and specific application domains.

B. WHAT KINDS OF RESEARCH PAPERS HAVE BEEN DEVELOPED IN THE CONTEXT OF ''PATH QUERYING'' IN GRAPH DATABASES? (RQ2)
In order to answer this question, we will use the ''Paper Type'' and ''Paper Focus'' classifications used during the gathering phase.In the first case, a research paper can be classified as Theoretical, Practical, Use-case or Survey.In the second case, we have five classes: Definition, Implementation, Optimization, Learning and Comparison.The number of papers in each classification is shown in Figure 2. The specific references for both categories are compiled in Table 7 and Table 8.
The papers that are labeled as ''Theoretical'' propose various methods for representing, storing, and querying paths.These methods also include analysis of their expressiveness and computational complexity.On the other hand, the papers categorized as ''Practical'' describe the implementation of methods and algorithms used for storing and querying paths.These papers also provide empirical evaluations of their effectiveness.The class ''Use-case'' includes papers describing the use of paths in different application domains, including workflow provenance [33], road networks [34], [35] and communication networks [36].In [22], the single paper classified as ''Survey'', the authors propose a taxonomy of attributed graph queries, including an analysis of semantics and algorithmic motivations behind such queries.In some cases, a work can be classified in more than one category.For example, Bai et al. [37] present theoretical and practical results related to the optimization of path queries.
Regarding the classification based on the focus of the paper, most of the research is centered around implementing methods for path querying, followed by optimization techniques.The use of machine learning techniques to enhance the evaluation of path queries is a relatively new research area.

C. HOW THE RESEARCH RELATED TO ''PATH QUERYING'' IN GRAPH DATABASES HAS EVOLVED? (RQ3)
In this case, we analyzed the popularity of the research terms (associated with the relevant papers) in different periods of time, as shown in Figure 3. Starting in 1970, we can see a significant occurrence of terms like ''path expression'' and ''path algebra'', with a quiet period at the end of the eighties.The term ''path query'' began to gain popularity during the nineties, and it remained relevant between 2010 and 2015, and it is still popular to this day.The term ''path expression'' also gained popularity in the nineties due to the research on semi-structured databases and the development of XML technologies.
From 2010 to 2020, there was a significant emphasis on ''path query languages''.Specifically, researchers concentrated on studying regular path queries (RPQs), which serve as the fundamental method for expressing path queries [38].The term ''path discovery'' also gained popularity during this period due to its association with the utilization of machine learning techniques in identifying network connections.
In the last years, the three most popular terms are ''path querying'', ''path discovery'' and ''path pattern''.The last term has gained interest because it poses a challenge for future graph query languages.

D. WHERE THE RESEARCH PAPERS RELATED TO ''PATH QUERYING'' IN GRAPH DATABASES ARE PUBLISHED? (RQ4)
The venues and journals were the research papers related to path querying in graph databases have been published are shown in Figure 4 (the name associated to each acronym is given in Appendix 9).We discovered papers in significant conferences such as the International Conference on Data Engineering (ICDE), the Symposium on Principles of Database Systems (PODS), and the International Conference on Management of Data (SIGMOD).
The Figure 5 shows the distribution of the research paper according to the type of publication venue.We found that most of the papers were published in conferences, followed by journals, and a small fraction in workshops.This indicates that the field is dynamic and evolving, as conferences are more suitable for presenting new ideas and receiving feedback.

E. WHAT GRAPH QUERY LANGUAGES ALLOW ''PATH QUERYING''? (RQ5)
The notion of path querying has been consider in several query languages, and the specific features depend on the data management approach and the corresponding data model.In particular, we can identify graph query languages like G-CORE [39] and Cypher [40], Semantic Web oriented query languages like SPARQL [41], and Web oriented query languages like GraphQL [42].The specific path-oriented  features supported by theoretical and practical graph query languages can be reviewed in papers like [24] and [43].
The functionalities for querying paths and their subsequent manipulation are just supported by two graph query languages: G-CORE and Cypher.G-CORE defines a data model where paths are first class citizens (i.e.paths can be created and stored), the query language supports a powerful and tractable set of regular path queries, but the functions for path creation are very simple.Cypher [40] supports some types of path queries (e.g.shortest path, fixed and variable length path), and the resulting paths are represented as a sequence of nodes and edges, in such a way that paths can be manipulated by using list-oriented functions (e.g.create, combine and divide).
A Social Network is represented as a graph where the nodes represent people and the edges represent different types of relationships (e.g.familiar and friendship).Path operations are used to determine relevant people (e.g.influencers or bridges), relevant groups (clustering), possible connections, and distance between people.
In a graph representing a road network, each node represents an intersection (e.g. a city), and each edge (u, v) represents a road segment that enables traversal from u to v. A common use-case, that implies path operations, is to find the optimal route between places, considering distance and expenses efficiency.

V. REVIEW OF RESEARCH TOPICS
Based on the search terms used in the previous phases, we defined a collection of research topics grouped in four categories (see the relevant papers obtained during the gathering phase.The aim of this section is to offer easily accessible information regarding important concepts associated with path querying in graph databases.

A. GENERAL TOPIC 1) PATH PROBLEM
The term ''path problem'' can be seen as a unified framework for problems found in different fields [60].In graph theory, the most studied path problem is the shortest path problem [61].In the context of data management, a typical path problem pertains to effective resource management during the computation of the shortest path.This involves minimizing the amount of the path stored in memory, facilitating reduced memory usage and the utilization of dynamic paths [62].Additionally, path problems can also be associated with a range of enumeration and optimization problems that focus on generating or comparing paths in graphs [63].

2) SHORTEST PATH PROBLEM
Consider a graph in which each edge is connected to a weight or cost.When given a pair of nodes (s, t), the shortest path from s to t is a path that meets the condition that no other path has a lower weight [64].This is a highly researched problem in graph theory and has two well-known algorithms for its solution, namely the Bellmand-Ford and Dijkstra's [61].

3) PATH RETRIEVAL
Given a graph G, and a query Q that searches for a path between two nodes represented as P, the term ''path retrieval'' refers to the process of finding a set of paths that satisfy the query Q on G [65].Path retrieval enables users to determine if a path exists between a pair of nodes, and if so, it returns the paths that connect those nodes [66].The term path retrieval is utilized in various applications domains.For example, in transportation networks, path retrieval refers to the algorithms used to retrieve optimal paths [67].

4) PATH EVALUATION
Defined as the evaluation of a path with the purpose of obtaining its cost, which can be its length (the number of edges between them) or some other property.Some of the properties that can be assessed for a path include normalization, monotonicity, relocation, homogeneity, and additivity [68].

5) PATH FINDING
This term is used to describe the process of finding a path between any two nodes in a graph.The result depends on the algebra used and can be either a single path or a set of paths.To explore this concept, Manger [63] investigated various algebras for finding paths in graphs.Depending on the selected algebra, it can return all possible paths, only simple paths (without repeated nodes), or only elementary paths (without repeated edge labels).

6) PATH PLANNING
This concept is usually utilized in the context of navigation devices, localization based services (GPS), and autonomous vehicles [34].Seen as a graph, the nodes represents points of interest, and the edges represent the roads that connects the points.This graph has properties, such as time or distance, and the objective is to obtain the optimal path based on minimizing one of such properties [35].In a traffic path planning of vehicle navigation, the selection of a path with minimum traveling time is mostly the major objective [35].The most common used algorithms are the Dijkstra algorithm and the A-star algorithm [35].

7) PATH EXPRESSION QUERY PROCESSING
It means the application of processing techniques for computing path queries.Among the available methods are: Forward traversal, for path expressions that involve a selection operation on the start of the traversal, and Reverse traversal, for path expressions that involve access to the end of the path traversal [69].

8) PATH QUERY EVALUATION
It concerns the methods and techniques used to evaluate path queries.There are two approaches: the first approach is called direct evaluation, where the path is traversed at the query time exploding algorithms like A* or depth-first search [70]; the second approach is called pre-processing the graph, meaning that the graph is pre-processed in order to obtain reachability information that can be used to make a more efficient evaluation, but requiring more disk space [71].

9) PATH DISCOVERY
This terms is equivalent to ''path querying'' Defined as the search of one or more paths between a pair of nodes [72].According to Hong et al. [73], it is one of the most important search queries for large dynamic graphs representing complex networks, as social networks, website networks, telephonic networks, among others.Usually the most used algorithms for their resolution are related to the search of the shortest path.

10) SHORTEST PATH DISCOVERY
This is a generalization of the shortest path problem (see above).Given a directed edge-weighted graph, this term concerns the task of finding the shortest path for fixed source and target nodes such that initially the edge-weights are unknown, but they can be queried.Querying the cost of a path is expensive and hence the goal is to minimize the total number of edge cost queries executed [72].

B. PATH REPRESENTATION 1) PATH EXPRESSION
In the most general sense, a path expression is a way to specify a path from a source element to a target element in a database.For example, in the relational model, a path expression specifies a path from a source relation to a target relation rather than using some type of join operation [46].In semi-structured query languages, a path expression represents a sequence of steps from a source node (e.g. the root node of the data tree) [74] to a target node in the data tree.In the context of graph query languages, a path expression represents a path from a source node to a target node in a data graph [75].A path expression is also called a regular path expression.

2) PATH PATTERN
A path pattern is a path extended with conditions on its nodes, edges and properties.Such conditions are commonly represented using regular expressions.The are several semantics for evaluating path patterns, including arbitrary path semantics, shortest path semantics, no-repeated-node semantics, and no-repeated-edge semantics [24].For instance, an evaluation under no-repeated-node semantics returns the set of all paths (also called simple paths) where each node appears once [52], [76].

3) PATH SEQUENCE
Given a graph G and a distinguished source node s, we use P(s, v) to denote the existence of a regular expression P which allows to obtain all paths from s to any other node v in G.A path sequence is a sequence of expressions (P 1 , v 1 , w 1 ), (P 2 , v 2 , w 2 ), . . ., (P l , v l , w l ) such that P i is an unambiguous path expression of type (v i , w i ).A path sequence can be used to solve the single-source path expression problem for any source [77].

4) PATH TRAVERSAL
It is defined as the route between different pair of nodes on a graph by traversing the corresponding edges.In other context, it is recognized as one of the strengths of object oriented query processing, given that it allows the recovery of information through the use of concepts related with the pointer navigation [69].

5) PROPERTY PATH
This term is introduced in SPARQL 1.1, the second version of the standard query language for RDF databases.A property path is a triple pattern (i.e. a tuple of subject, predicate and object) where the predicate is a regular expression, representing an arbitrary length path [78].A property path allows to express different types of navigational queries [79], [80].In [81], Losemann and Martens study the complexity of regular expressions and property paths in SPARQL.

6) REGULAR PATH PATTERN
Defined as a path pattern expressed trough the use of regular expressions.A regular path pattern allows to express various repetitive patterns and restrictions on the same query [52].

7) SHORTEST PATH
Given a graph G, the shortest path between a pair of nodes x and y is the path with the smallest number of edges (i.e. the length of the path).The term ''Short path'' is used with the same meaning [82].

C. PATH QUERYING 1) PATH QUERY
It is a fundamental graph query paradigm, which aims to test for the existence of a path between pairs of nodes, and also obtain such paths to return from queries [83].Path queries consist of finding all nodes that can be reached by traversing a path whose labels form a word in a regular expression over an alphabet of labels [84].Hellings [85] states that a path query q is specified by the use of a language L which contains all the traces of the path of interest.

2) GENERAL PATH EXPRESSION QUERY
A general path expression query is a type of query that is commonly used in object-oriented database management systems (OODBMS) [86], and consists of more than two classes of node objects connected trough a relation.This kind of queries can be made upon multiple two-class path expressions.According to Taniar [69], forward and reverse traversals can be applied to a general path expression query, and an important point to improve is the optimization of this type of query, allowing an indeterminate number of classes to be connected.

3) NAVIGATIONAL QUERY
Navigational queries correspond to property path queries that use recursive operators as Klenee star (*) or Klenee plus (+) [87].These queries are well known in the context of graph databases [88], [89].One of the most basic form to include navigation in graph queries is to start with a basic graph pattern and augment it with navigational primitives like regular expressions [90].Navigational queries are one of the key differences between graph and relational databases.Relational databases are not designed to handle this type of query, whereas graph databases are specifically designed to efficiently handle them.Additionally, there has been extensive paper on query languages that can express complex navigational patterns [91].

4) PATH PATTERN QUERY
A path pattern is a directed labeled path with conditions which represent regular expressions.The result of a path pattern query is a set of all simple paths (i.e.path queries that allow traversing each node only once [20]) satisfying given conditions.Also known as path pattern matching, this notion is very important in the management and mining of data with graph structures.Today, many emergent algorithms depends directly or indirectly on the effective computation of paths between two nodes.There are three problems to be solved for the resolution of path pattern queries: how to express clearly and concisely a path query with complex constraints; how to deal with large scale data; and how to make full use of the centric computing platforms [37], [52], [76].

5) PROPERTY PATH QUERY
It is related to search paths in RDF graphs.This is achieved trough the introduction of property paths operators (see Property Path).This queries tend to inevitably coincide with numerous results [88].

6) REACHABILITY QUERY
Given a pair of nodes (a, b), where a is the start node and b is the end node, a reachability query looks for the existence of a path from a to b.If such path exists, then we can say that b is reachable from a.A reachability query is considered as a fundamental query in graphs [92], [93], [94].

7) REGULAR PATH QUERY
A regular path query (RPQ) returns a set of pairs of nodes connected in the database through a path conforming to a regular expression [95].Formally, an RPQ is defined as an expression x L − → y where L is a regular language from a labeled graph, and x, y are the start and the end node respectively.The result of an RPQ is a path that complies with the expression L [96].RPQs are a fundamental to express path queries in graph query languages [97].

8) REGULAR SIMPLE PATH
A regular simple path query (RSPQ) is similar to a regular path query (RPQ), but it allows to go trough the same node multiple times, i.e. an RSPQ passes one time on each node.The use of this type of query is desirable in scenarios such as transportation networks, metabolic networks, DNA matching and wireless networks [20].

9) CONTEXT-FREE PATH QUERIES
Context-free path queries (CFPs) are defined as an extension of regular path queries (RPQs), where context-free grammars can be used as constraints in navigational graph queries.CFPQs are very powerful and more expressive than traditional RPQs, but their computation in large graphs requires high processing resources [98].CFQs are used in model verification, bioinformatics and parser's construction [85].

10) CONJUNCTIVE REGULAR PATH QUERIES
A conjunctive regular path query (CRPQ) expresses the existence of patterns in a graph database, which satisfies a series of regular restrictions [91].CRPQs are not expressive enough for many natural query tasks, and need to be extended for increase their expressive power [99].CRPQs cannot output paths and they cannot express relation between paths [75].

11) SHORTEST PATH QUERY
Defined as path queries that use shortest path semantics for its resolution, that is, paths of minimal length that satisfy an imposed constraint over the query.Current graph query languages allows to return a single shortest path or all shortest paths [24].There are several algorithms for computing shortest path queries; one of the most used is Dijkstra ′ s algorithm as it performs well for a variety of cases [100].

12) PATH QUERY LANGUAGE
It is a language that allows expression of path queries.Typically, a path query is achieved through the use of regular expressions [101], [102].A path query language has various properties, such as compositionality, which means that the output of a path query can be used as the input in another query [36].

13) NAVIGATIONAL QUERY LANGUAGE
It is a language used to navigate graph structures, i.e. it includes operations to move from one node to an adjacent one in a graph, recursively when necessary.A navigational query language is usually based on binary relational algebras (like XPath) or in regular expressions (like RPQs) [80].Some navigational query languages have a common disadvantage: they are very suitable for expressing relevant properties from the underlying topology of a graph database, but not about how it interacts with the data.Two languages (walk logic and regular expressions with memory) have been proposed to overcome this problem [89].

14) PATH ALGEBRA
A path algebra concerns a set of operators and rules which allow the manipulation of paths [103], [104].Manger [63] defines a path algebra equipped with two binary operations, join and product.Such algebra can be used to find paths in a graph, specifically, to identify one path between any pair of nodes.

15) PATH MANIPULATION
The term ''path manipulation'' is not defined in the literature.In the area of data management, the term data manipulation is related not only to the action of querying a database, but also to modify and update data in a database [105].Hence, we define ''path manipulation'' as the operations to query, add (insert), delete and modify (update) paths in a database.

16) PATH OPERATION
Informally, a path operation can be defined as an action that can be applied over a path or collection of paths.Examples of path operations are adding a path to a graph, delete a path from a graph and join two paths [106].Current graph database systems are studying and implementing basic graph operations.For example, Cypher (the query language of Neo4J [40]) provides functions to create a path from a set of elements, combine two paths, create sub-paths from a path, and return the elements of a path.

D. APPLICATION DOMAINS AND USE CASES 1) COMMUNICATION NETWORK
A communication network is comprised of a group of hosts (machines or devices) that provide services to the users, as well as a subnetwork that connects the hosts, enabling communication between them (users and hosts).Networks of this type typically exhibit characteristics of being extensive, ever-changing, and intricate.A communication network can be modeled as a graph in which machines are depicted as nodes, and each pair of connected machines is represented by an edge [36], [49].

2) FINDING PATH
The concept relates to finding the best path between two nodes.Usually, it consists of reliable path-finding algorithms.Path finding is used in multiple areas, such as GPS navigation, auto routing systems, artificial intelligence, computer games, among others.One of the most commonly used algorithms to solve path finding is Dijkstra's algorithm and its variants [47], [55].
3) ROAD NETWORK Defined as a network of roads that can be modeled as a graph, where the nodes represent a set of road intersections and the edges represents a set of road segments between two neighboring junctions.Being more specific, a taxi road network includes routes executed by different vehicles over different hours, where the time and the start and end point (nodes) of a taxi ride are saved.This information is later used to calculate optimal routes for taxi drivers [55].Some related use cases are: • Route query: It is defined as a path query usually used on maps, where the goal is obtaining a path to follow from a starting point to a destination point [100].For example, in traffic networks a route query consists in to find a path between two locations.In social networks, a route query implies the search of the closest relationships (e.g.friendship) between two persons [107].For example, in the context of a geographical network, where the nodes represent places in a city, and the edges represent the roads connecting the places, a common use-case is the search of paths allowing an efficient use of resources [108].Several graph algorithms are used for route searching, including divide & conquer, breadthfirst search, depth-first search, among others.
• Taxi trajectory: It means using a graph to represent streets, intersections, and locations in a city, and using paths to represent taxi trips [55].

4) SEMANTIC NETWORK
A semantic network is a graph structure used to represent knowledge by connecting nodes and arcs in specific patterns [109].The nodes in a semantic network, referred to as units, represent an idea, event, situation, or any other object that typically has a complex structure.On the other hand, the directed links (edges) represent the relationships between the units [110].In order to design semantic networks effectively, it is crucial to maintain a balance.This is because if the number of node types or link types increases, although the network becomes more expressive, it becomes increasingly challenging to implement inference engines on this type of graph [44].

5) SOCIAL NETWORK
A social network can be defined as a finite set of actors and relationships that link the actors [52], [53].A social network can be seen as a attributed graph where the users are the nodes, their profiles are attributes of nodes, and the relationships are represented by the edges.Numerous queries can be performed on a social network, such as searching for a connection between two individuals or analyzing interactions among people [22].A related use case is: • Social network analysis (SNA) Imagine a social network graph, where nodes symbolize actors and edges symbolize the relationships between these actors.Social network analysis focuses on solving intricate graph queries, such as determining the most influential actors and discovering the central actors within the network.These specific scenarios are addressed by utilizing graph measures like Page-Rank, Hubs, Centrality, Density, and others [111], [112], [113].One important study related to SNA involves the examination of a network's structure.The objective of this study is to develop a better understanding of how the structure forms and which mechanisms influence the diffusion of information, contagion, community identification, and cohesive subgroup formation [114].

6) SOFTWARE LIFE CYCLE ACTIVITIES
A software life cycle can be represented by a directed graph where the nodes are activities and each edges represent an implication between activities.The advantage of representing a software life cycle as a graph is the possibility of using path expressions to represent specific cycles of software development [115].

7) ROUTING
This is a combinatorial problem in which we seek the best possible path from one point to another.The best path could be determined by various factors, such as length, time, and others.For example, given a road network containing cities and roads, a common routing query is to find the optimal path between the cities [116].Other use case for routing, is the search of an optimal path for information routing on a communication network.Routing problems are usually solved by using shortest path algorithms [117].Some related use cases are: • Route searching: In general, this concept is related to the problems where is needed to search all the routes between two or more nodes in a graph.Route searching is a use-case in geographical information systems navigation, computer games, virtual reality, among others [47].
• Traffic Routing: It pertains to a set of routing problems in which the objective is to discover the most efficient route between two points [116].For example, given a graph where the nodes represent geographical locations, paths are useful to study congestion and security problems [56].Another example is a graph of broadcast information between computers, where the computers or network devices are the nodes, and the connections between them are the edges.In this context, a use-case concerns the distribution of the traffic throughout the network by using the best possible path [117].

VI. CONCLUSION
This paper was carried out to identify and establish the current state of knowledge on path querying in graph databases.
To conduct a study using a standardized framework, we performed a systematic mapping study (SMS) in which we reviewed four digital libraries and collected research papers published from 1970 to 2022.After the gathering phase, we obtained a total of 97 relevant papers among the 17 selected research keywords.The results provide both quantitative and qualitative information on path querying in graph databases.Our quantitative analysis shows that most of the research focuses on path query, especially from 2000 to 2022.However, there is little research on specific areas like path manipulation, where we found no related work.Another interesting finding was related to the year when the first related paper was established, which was around the 70's.It gained popularity at the end of the 90's, where the three most popular terms were ''path querying'', ''path discovery'' and ''path pattern''.The last term has gained interest because it poses a challenge for future graph query languages.
On the other hand, our qualitative analysis reveals that most of the research is about implementing methods for path querying, with optimization techniques as the second most common topic.We also found papers in significant conferences such as ICDE, PODS and SIGMOD.Another interesting point is related to query languages that use paths.
In particular, we identified some graph query languages like G-CORE [39] and Cypher [40], Semantic Web oriented query languages like SPARQL [41], and Web oriented query languages like GraphQL [42].
For future research in the topic of graph querying, we suggest that the investigation could focus on areas such as path manipulation, path indexing, path query optimization, machine learning for path querying, and others.This is based on the ongoing work on the standardization of the graph query languages, that is now a global standards project carried out by the GQL standards committee [118].
Finally we trust that this article can serve as a central source for bibliographic citations, providing quick and easy access to research papers related to path querying.Moreover, it can encourage further research on path manipulation, which includes the development of concepts such as path algebra, path operations, path insertion, path deletion, path updating, path construction, and more.

TABLE 2 .
Research papers found in each bibliographic source, using only the ''Path'' keyword.quickly reading the returned articles, which included the title, abstract, and keywords list, we defined the following list of initial search terms: Path querying, Path manipulation, Path query, Path query language, Path operation, Path expression, Path algebra, Path pattern, Path evaluation, Path discovery, Path finding, and Path retrieval.Additionally, we discarded research papers related to the following terms: Computer path (files), Management operations, Network analysis, Optimization, Path constraint, Path database, Path data model, Path language, Path logic, Path syntax, and Printers.

FIGURE 2 .
FIGURE 2. Research papers classified by type and focus.

FIGURE 3 .
FIGURE 3. Popularity of search terms by year of publication.

FIGURE 4 .
FIGURE 4. Journals and conferences where articles related to path querying in graph databases have been published.

FIGURE 5 .
FIGURE 5. Research papers distribution by venue.

TABLE 1 .
Comparison with related works.

TABLE 3 .
Candidate research papers found in each bibliographic source.A total of 238 unique papers were found (where 67 articles are shared among the sources).The column ''Repeated'' indicates the number of results shared with other digital libraries.

TABLE 4 .
Number of research papers organized by search term and bibliographic source.

TABLE 5 .
Terms related to path querying in graph databases.The table also indicates the number of found papers and the number of selected papers for each term.

Table 6 )
: General terms, Path representation, Path querying and Application domain (with use cases).For each research topic, we presented a short description based onFIGURE 6. Number in each application domain.

TABLE 6 .
topics (grouped in four categories) associated with path querying and path manipulation.

TABLE 7 .
Classification of research papers based on their type.

TABLE 8 .
Classification by research papers based on their orientation.