Cohesive Ridesharing Group Queries in Geo-Social Networks

Ridesharing has gained much attention as a solution for mitigating societal, environmental, and economic problems. For example, commuters can reduce traffic jams by sharing their rides with others. Notwithstanding many advantages, the proliferation of ridesharing also brings some crucial issues. One of them is to rideshare with strangers. It makes someone feel uncomfortable or untrustworthy. Another complication is the high-latency of ridesharing group search because users may want to receive the result of their requests in a short time. Despite continuous efforts of academia and industry, the issues still remain. In this paper, for resolving the obstacles, we define a new problem, <inline-formula> <tex-math notation="LaTeX">$\ell $ </tex-math></inline-formula><italic>-cohesive</italic> <inline-formula> <tex-math notation="LaTeX">$m$ </tex-math></inline-formula><italic>-ridesharing group</italic> (<inline-formula> <tex-math notation="LaTeX">$\ell m$ </tex-math></inline-formula><italic>-CRG</italic>) query, which retrieves a cohesive ridesharing group by considering spatial, social, and temporal information. The problem is based on the three underlying assumptions: people tend to rideshare with socially connected friends, people are willing to walk but not too much, and optimization of finding good groups is essential for both drivers and passengers. In our ridesharing framework, queries are processed by efficiently taking geo-social network data into account. For this purpose, we propose an efficient method for processing the queries using a new concept, <italic>exact</italic> <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula><italic>-friend set</italic>, with its efficient update. Moreover, we further improve our method by utilizing <italic>inverted timetable</italic> (<italic>ITT</italic>), which grasps crucial time information. Specifically, we devise <italic>time-constrained and incremental personalized-proximity search</italic> (<italic>TIPS</italic>). Finally, the performance of the proposed method is evaluated by extensive experiments on several data sets.


I. INTRODUCTION
With the development of GPS-enabled smart devices, ridesharing has attracted attention as a solution for resolving crucial societal, environmental, and economic problems [1]- [9]. For example, drivers can alleviate traffic congestion by sharing empty seats of their vehicles. Especially, they use ridesharing services to conveniently find ridesharing groups. For example, Uber, which is a pioneer in the ridesharing industry, successfully provides the service through various riding options such as Express Pool [10]. Other good examples are DiDi [11], BlaBlaCar [12], and Lyft [13]. These services have benefited various fields. According to a recent study [14], the potential reduction in traffic jams is as high as 31 %. The decrease in carbon dioxide (CO 2 ) is another major advantage. In addition, ridesharing can reduce the costs of fuel and parking.
The associate editor coordinating the review of this manuscript and approving it for publication was Gang Mei . Although many people have already taken advantage of ridesharing services, significant challenges remain. One of the problems is the discomfort faced while traveling with strangers, which is aggravated by untrustworthy riders. This is especially important for weak or young people, tourists in an unknown place, and travelers enjoying a long journey. Unfortunately, commercial services still struggle with addressing this problem. In addition, most existing studies of ridesharing have been weak in forming a cohesive group, including a driver and passengers, because they have mainly focused on other factors such as locations of users, driving fare, and the optimized route.
The use of social networks, which for a decade have been investigated in academia and industry, is one of the reasonable approaches for considering the cohesion of ridesharing members [14]- [21]. A graph consisting of vertices and edges is still effective in expressing the closeness between users, meaning an edge between two users represents their friendships in many studies. An increase in the minimum number of edges (hop) between two users intuitively means less friendship. Specifically, the research of subgraphs such as clique, club, and plex has been progressed for studying the cohesion of social groups [22]- [24].
However, the research on the problem of cohesion-related ridesharing is in its infancy. For example, user profiles in social networks were used for matching drivers and passengers [15]. However, it is insufficient for organizing cohesive groups. Although the study [17] utilized social network data for considering the harmony of ridesharing members, it assumed that riders arrive at a departure point in time or filtered people who cannot be in time out before processing queries. Furthermore, the study only focused on direct friendship (1-hop friends). In other words, its social model could be somewhat restrictive to the passengers and the driver. Another study [18] also considered social network data but used its own ranking function unlike subgraph models for checking the social aspect. Moreover, the study dealt with a problem for finding top-k taxis based on trip requests in road networks.
In this paper, we propose a problem, -Cohesive m-Ridesharing Group ( m-CRG) query, which retrieves a cohesive ridesharing group based on the available timeslot, location, friendship of the group members, and the number of empty seats. As a ridesharing model for our study, we adopt slugging [25], which is a simple but potent ridesharing model. One of the slugging rules is that drivers never pick up new passengers after departure and en route to the destination. The simplicity of this model makes it more effective by reducing traffic congestion, whereupon the ridesharing form has become more popular. Another considerable benefit of slugging is not just cost-saving but to drive on High-Occupancy Vehicle (HOV) lane [26].
Remark 1: Slugging was originally developed for commuters not entrepreneurs. Nowadays, however, some companies (e.g., Uber and BlaBlaCar) adopt the slugging for providing various options to users. Thus, other service providers also could make their own policy on driving fare regardless of our method. Figure 1(a) depicts an example of slugging in a ridesharing service. For effective ridesharing, passengers wait for the ridesharing request of a driver after enrolling their requests to the ridesharing service system. When a driver wants to take on some passengers, the driver announces the itinerary to the potential passengers by enrolling information about the trip. If the itinerary is the same or similar, they ride a vehicle together at the departure point of the driver. After arriving at a destination, the passengers alight and go to their respective destinations. Figure 1(b) shows each step of Express Pool, which is Uber's latest ridesharing (slugging) model [10].
In the below, each scenario explains the need to use of a social boundary: Scenario 1 (Commute With Acquaintances): In the morning or evening, commuters usually rideshare with colleagues or someone who travels in a similar area or direction. This helps to mitigate traffic congestion during rush hour by decreasing the number of vehicles on the road. Moreover, ridesharing gives them a chance to make friendships. Sometimes, however, some people want to ride with acquaintances or somebody they have already shared their rides.
Scenario 2 (Road Trip on Trust): Nowadays, a road trip is one of the wide-spread approaches for travel using automobiles. To save money on the road trip, many people use ridesharing services if directions of their trips are the same or similar. However, some people have concerns over safety when ridesharing. Although many ridesharing services are keen to solve the problem of safety, sufficient measures have certainly not been taken.
The contributions of this paper are as follows: • We define a new type of ridesharing queries, the m-CRG query, which finds a ridesharing group, with the target for resolving the trust-and comfort-related problems. The query considers the number of empty seats, departure and arrival points, the social relationship of ridesharing members, and their available timeslots, to assist ridesharing services efficiently.
• We propose a new concept of exact n-friend set made up of only vertices connected by the minimum n edges. We additionally design the update method of the exact nfriend set using n-social counting filter, and an m-CRG query processing method.
• We develop an enhanced query processing method, timeconstrained and incremental personalized-proximity search (TIPS), which consists of cell-time pruning and bidirectional cell-time traversing, using the inverted timetable (ITT) containing time information of riders.
• We conduct various experiments on various data sets to evaluate the performance of our methods. Further, through varying query parameters, we demonstrate the performance of the proposed method for supporting ridesharing services. VOLUME 8, 2020 The remainder of this paper is structured as follows: Section II introduces preliminaries of our study, including a ridesharing system, a problem definition, and a basic solution. Then, a novel concept and a new approach for processing m-CRG queries are presented in Section III. For optimizations, we propose a further improved method in Section IV. Section V discusses the performance of our methods, and Section VI reviews related work. Finally, Section VII presents the conclusions and possible future work.

II. PRELIMINARIES
We introduce the ridesharing system architecture in Section II-A and a problem definition in Section II-B. Additionally, we discuss a naïve approach to process the m-CRG query in Section II-C. Table 1 summarizes the notations commonly used throughout this paper.

A. RIDESHARING SERVICE SYSTEM
In our ridesharing system (Figure 2), service users can choose their roles by specifying and sending ridesharing requests. The storage and index of a ridesharing system are updated by an update module based on the requests of passengers. Each request contains the available (preferred) timeslot and itinerary. The search space of a query processor is made by all passengers' requests. Meanwhile, the m-CRG queries are created according to the requests of drivers. Each driver's request includes departure and arrival times, an acceptable social boundary, an itinerary, and the number of available seats. After receiving the requests, the ridesharing system creates and processes a query for matching a driver and passengers optimally. The result is returned to the driver and passengers after processing the query. Concretely, the answer consists of a driver, other passengers, an itinerary, and departure and arrival times. In this paper, we especially focus on organizing a ridesharing group owing to its importance in ridesharing services. Note that the search space is always changed by the requests of passengers, so the size is usually city-or country-scale.

B. PROBLEM STATEMENT
In this work, we define the m-CRG on the basis of the three underlying assumptions: people tend to rideshare with socially connected friends, people are willing to walk but not too much, and optimization of finding good groups is essential for both drivers and passengers.
Given an undirected social graph G = {V , E}, a vertex v (u) ∈ V means a passenger (a driver), i.e., a data object, and an edge e(v, v ) ∈ E denotes the friendship between two data objects v and v . Intuitively, the fewer number of edges indicates a closer friendship. Each v has its own itinerary , where p d v and p a v represent departure and arrival points, and the possible timeslot and t a v denote the earliest possible time of departure and the latest possible time of arrival (in short, departure and arrival times). For simplicity, we set the timeslot to one hour in this paper, but it can always be changed. Without ambiguity, for instance, [1 : 2] indicates a timeslot from 01:00 to 02:59.
To determine group cohesion, we define a new notion of -cohesive group similar to club [22], which is one of the popular subgroup models in the literature of social network analysis, but slightly relaxed.
Definition 1 ( -Cohesive Group): Given an undirected social graph G = {V , E}, an -cohesive group is a subgraph whose minimum social diameter (distance) d s (v, v ) is less than or equal to an acceptable social boundary (integer) (> 0), where v and v are members of the -cohesive group.
However, all of the -cohesive groups cannot be candidates of the answer of the m-CRG query because a car has a limited number of seats. In other words, m members of groups can only be the result of ridesharing requests if the number of seats is m.
Definition 2 ( -Cohesive m-Group): Given an undirected social graph G = {V , E} and the number of seats m, There are several other groups, but we mainly describe groups including a driver u because u must be a member of a result group. A group (a) is composed of (direct or 1-hop) friends v 1 , v 3 , and u, and the maximum d s (v, v ) of groups (b) and (c) is 2 (≤ ). Thus, the three groups can be candidates of the final answer. However, a group (d) cannot be an eventual group because v 2 is connected to v 4 through three edges. Even though a group (e) has a proper d s (v, v ), u is excluded from this group. The u has to be included in the final group because u can only drive a car. Furthermore, v 5 can never be a member of candidate groups because v 5 has no acceptable social friendship with u in this social network.
Not only a driver and passengers but also the passengers of an -cohesive m-group are linked together within edges. One may think extreme cases based on six-degree of separation [27]; e.g., every maximum number of edges between each member is . However, the cases are rare thanks to the property of the group structure. In other words, members of a result group naturally share some popular friends [28], [29]. Furthermore, the uncommon case could be also acceptable especially to someone who wants to travel with slightly trustworthy strangers for privacy concerns. The important thing is that each member must be linked by other friends or friends of a friend. This is why we choose this subgraph model for grouping the driver and passengers in our ridesharing system. Note that we can also form groups consist of only people connected by exact edges by simply using the exact n-friend set in Section III-A although we do not explain in this paper.
Moreover, available times of drivers and passengers do not always match. Specifically, passengers enroll their time information at the ridesharing service system before the arrival of a driver request. After receiving the ridesharing request, the ridesharing system searches the result of the request by checking whether the available (usually departure and arrival) time of the driver is included by those of passengers.
Throughout the paper, we also use another notation t u t v for denoting t d v ≤ t d u and t a u ≤ t a v at once. Note that the available timeslot does not mean the fixed departure and arrival times, so group members can make their own plans by considering the travel. The arrival times of drivers are generally decided by the car navigation system they use.
Example 2: Figure 4 describes the available timeslots of five passengers and a driver u. Their social network is shown in Figure 3. Passengers v 1 , v 3 and v 4 can ride a car together because their timeslots fully overlap the timeslot t u = [11 : 13] of u. However, the possible departure time of v 2 is later than that of u, so it is difficult for v 2 to join the final group.
Though v 5 can arrive at the departure point before 11, his/her arrival time is before that of the driver, i.e., t d v 5 < t a u of u. Furthermore, s/he has a deficient friendship with others. In our ridesharing model, passengers may slightly travel from their positions to the departure point of a driver because the driver never picks up or drops off anyone while driving. Intuitively, a longer distance between the locations of a passenger and a driver means an outside chance of successful matching of a ridesharing group. For this purpose, we need to know each intuitive movement cost of a passenger and an -cohesive m-group. Definition 4 (Movement Cost of a Passenger): Given a driver u and a passenger v, the movement cost MC(v) of v is defined as: where d g (a, b) indicates the shortest geographical such as Euclidean and a road network distances between a and b.
The cost is one of the well-known metrics in ridesharing and can be easily substituted for the distance on road networks [17], [19]. VOLUME 8, 2020 Meanwhile, it is unnecessary for passengers to get into a car if their own movement costs are higher than the shortest distance between their departure and arrival points. In other words, our ridesharing system must filter that sort of passenger out because self-travel is better than ridesharing in that case.
Definition 6 (Spatial Condition): Spatial condition to be satisfied by every passenger v is formally presented as ( As for another spatial aspect, we also define the movement threshold h to practically apply our methods to real-world services, that is, passengers whose MC(v) exceeding h can never be a member of a query answer. For example, system managers or the driver can decide on the value of h by considering their environments because it should be mostly different by region or time. We experimentally discuss the effect of h in the section on performance evaluation.
Example 3: Figure 5 shows an example of an m-CRG query considering friendship and timeslots in Figures 3 and , v 5 is insufficiently close to u in the previous (friendship) examples. The next smallest movement cost is v 2 's one. However, v 2 is also filtered out due to the temporal condition. Consequently, the final group of the query is {u, v 1 , v 4 }.
A brute-force approach for processing the m-CRG query is described below. First, we construct all G Consequently, an improved method is necessary to tackle such a drawback.
Remark 2: Though someone may concern that social network (graph) data may be insufficient for ensuring the safety and comfort perfectly, an increasing number of social-aware ridesharing has been investigated [14]- [21], [30]. One of the main reasons is that it is easy to make a social graph, i.e., friendships of users, without strenuous efforts. Existing systems and services can be equipped with the graph by matching previous grouped members like [16]. Another practical method is to cooperate with other social media they have already joined. Next, lots of valuable works related to the graph have been still proposed, so it will be other chances for the social-aware ridesharing to be improved with the latest graph algorithms.

C. NAÏVE SOLUTION
A naïve method (called Naive) for processing the m-CRG query utilizes a simple branch and bound technique. The main concept of Naive is to expand its search space constantly until a query answer is found. In other words, we gradually seek two nearest p d v and p a v to p d u and p a u by turns. Note that v and v are probably not always the same.
Given an m-CRG query, we find the nearest p d v to p d u . Conceptual partitioning (CPM) [31], which is a commonly used method for processing the nearest neighbor queries, is adopted for incrementally finding those points. In dynamic environments, specifically, CPM conceptually divides a normal grid plane C into various-sized rectangles, which enclose at least one grid cell c ∈ C. It helps a query processor in efficiently reducing the number of cell accesses, which is directly related to the performance of the search algorithms.
Whenever searching p d v , we check the friendship between a corresponding v and u by using a breadth-first approach among the standard graph algorithms because a final ridesharing group only consists of some v such that d s (v, u) ≤ . We also check the spatial and temporal conditions simultaneously. If the checked v meets these conditions, we insert v into the checked list of the departure side. Next, the same work is executed at the arrival side by repeatedly retrieving p a v . The whole above procedure is repeated until searching a query answer or an entire search space if there is no query result. In the middle of that iteration, we try to create G ( ) m by checking the social condition among the candidate passengers if the number of u and v is over or equal to m. Then, if current m−1 passengers satisfy the social condition properly, we calculate MC(G ( ) m ). When there is no chance to make a better group, Naive returns a current G ( ) m as a query answer by stopping this algorithm before exploring an expected whole area.

III. THE PROPOSED METHOD
We first propose a new concept of an exact n-friend set in Section III-A, its update method using n-social counting filter in Section III-B, and incremental personalized-proximity search (IPS) in Section III-C.

A. EXACT n-FRIEND SET
Although we are surely able to find a query answer using Naive, there are some drawbacks. First, when finding p d v after checking the corresponding v of p a v , we can avoid redundant breadth-first search by adopting a notion of -hop friend list [32]. Especially, this is effective in checking the social relationship among the passengers, i.e., G ( ) However, the search performance, which highly depends on the method for creating each V Every F (n) v must be maintained by every v prior to the query processing. When propagating friend lists, we combine several sets, e.g., F

B. UPDATE OF EXACT n-FRIEND SET USING n-SOCIAL COUNTING FILTER
A major downside of the exact n-friend set is a feeble update. When some users want to break social connections or opt out of the service, their friend lists have to change or are removed. Then, all the exact n-friend sets of relevant vertices also need to be updated. However, a basic method for looking each friend list up is definitely inefficient. Especially, the performance of the update grades badly when the size of friends changed is bigger.
For tackling the issue, we devise an n-social counting filter S (n) , which grasps information about how many edges are connected with each vertex in the exact n-friend set. Then we can decide whether exact n-friend sets must be revised or not. The gist is that we do not always modify all exact n-friend sets if there are still other connections.
An integer n is chosen by the n used in the exact n-friend set. Furthermore, S (n) v has to be created when making F (n) v simultaneously.
Example 4: Figure 6 depicts an example of a 2-social counting filter. The 2-social counting filters of v 1 and v 3 are shown in Figure 6(a). First, consider V At the moment, suppose an edge e(v 1 , v 3 ) is removed and the scenario is presented in Figure 6(b). If v 1 has only a friend v 3 , we would get rid of v 6 and v 8 from F (2) v 1 . In this case, however, v 1 has other friends such as v 2 , v 4 , and v 5 , so v 6 could remain in F (2) v 1 exactly thanks to e(v 2 , v 6 ) and e(v 4 , v 6 ). On the other hand, we have to delete v 8 from F (2) v 1 because there is no other edges between V (1) v 1 and v 8 . We easily figure out the number of connected edges using S (2) v 1 instead of the traversing all friends V . Someone might think that v 3 and v 4 are the entries of V (2) v or F (2) v , and v 6 , v 7 , v 8 and v 9 are the members of V v but that is wrong because or n is decided by the minimum number of edges according to Definition 8.
Algorithm 1 describes a method for updating the friend list with the exact n-friend set. When receiving an update command udt, a function UpdateFriList is executed for VOLUME 8, 2020 Insert v and v into V deleting or inserting an edge. A simple update of the social graph just modifies V (1) v and V (1) v in terms of e(v, v ), but we have to invoke UpdateEnS twice per a udt to update S (n) as well.
The UpdateEnS detailed in Algorithm 2 is mainly divided by two functions DeleteSCF and InsertSCF. Although we just consider the DeleteSCF when the udt is the deletion of e(v, v ), the insertion of e(v, v ) could bring a complex process because we need to check whether v is already linked with v through some edges or not. According to Definitions 8 and 9, the v cannot be contained in more than one group such as V (1) v , and F (2) v . If so, we remove v from a related group before invoking the InsertSCF (Lines 1-13). We recursively traverse F is zero, which means the minimum number of edges between v and f is not i anymore (Lines 14-24). On the other hand, a similar process is executed when the udt is the insertion of e(v, v ) by considering the revision of S (j) and F (j) . After inserting f and v into F as regards every f (Lines 25-35).

C. INCREMENTAL PERSONALIZED-PROXIMITY SEARCH (IPS)
We introduce the IPS, a novel method for processing m-CRG query. Given an m-CRG query, we incrementally seek two nearest c and c covering That is, we do not need to examine some cells, which do not cover at least one DeleteSCF(v, v , n, k, udt); InsertSCF(v, f , j + 1, udt); -hop friend list and exact n-friend set. After that, whenever a new p d v or p a v is found, we attempt to construct a new G ( ) m and check whether the group can be a result group. This procedure continues until ensuring that there is no answer or a current group is optimal. 97424 VOLUME 8, 2020 u . Thus, v 1 with MC(v 1 ) is inserted into a candidate list ordered by MC(v). Next, we search c 84 as a check of an arrival side, and then insert v 2 into the candidate list. The first element in the candidate list is now v 2 not v 1 because MC(v 2 ) < MC(v 1 ). The above procedure continues until inserting v 6 into the list. Two points p d v 7 and p a v 8 are the next order but we can terminate this algorithm before searching c 6 and c 25  Although the incremental manner of this method is similar to that of Naive, we can further avoid several unnecessary computations and data accesses through the proposed approach. We describe a pseudocode of the query processing method in Algorithm 3. First, we initialize the basic components. Then, we construct V u into the min-heaps DH and AH, respectively (Lines 5-6). That is, the min-heaps absolutely contain all cells related to candidate passengers after finishing FindCoverCell. Subsequently, this algorithm continues until two heaps have no cell because empty heaps mean that we have checked all cells or data objects. If heaps are not vacant, two c and c are first popped from DH and AH. We can terminate this algorithm if the gap between bestCost and candiCost is greater than or equal to the sum of d g (p d u , c) and d g (p a u , c ) (Lines 7-11). This is proved by Theorem 1 at the end of this section. The above condition can never be satisfied at the first trial because both bestCost and candiCost are initialized to infinite numbers. Thus, we search all p d v and p a v in c and c by invoking a function SearchCellData. From that, we can obtain a min-heap VH containing some passengers (∈ V ( ) u ), which meet the temporal and spatial conditions (Lines 12-13). If VH has at least one entry, we repeatedly pop v, which has the minimum MC(v) from VH until VH is emptied. Otherwise, we can terminate this iteration by checking whether MC(v) is greater than or equal to the gap between bestCost and candiCost. This is also proved by the following Lemma 1. If it is satisfied, this iteration can be stopped because we ensure that a current VH contains no candidate v having better (lower) MC(v). If not, we invoke a function ComputeGroup that tries to organize a candidate group by checking the relationships using each V . We repeat this procedure until DH and AH are empty or Theorem 1 is satisfied. We finally return RS as a final answer to the m-CRG query.

Lemma 1: Let bestCost be the movement cost of a current optimal -cohesive m-group including u, and candiCost be the minimum sum of movement costs of m − 2 vertices in MC(v)-ordered min-heap CL. Then, suppose that v j is just popped from MC(v)-ordered min-heap VH.
Proof: Suppose p d v and p a v are the next nearest points covered by c and c . Then, we have It means that there is no more group which has a lower MC(G ( ) m ) according to Lemma 1.

IV. OPTIMIZATIONS
In Section IV-A, we suggest ITT for improving the performance of our method. Furthermore, TIPS, an enhanced query processing method, is proposed with cell-time pruning and bidirectional cell-time traversing in Section IV-B.

A. INVERTED TIMETABLE
We propose the ITT for processing m-CRG queries efficiently. Although our previous method could efficiently deal with the queries, there are other chances to support the performance of our method because it does not consider time information, which is one of the key roles in ridesharing services. In other words, it is essential to handle time information as well as location and social aspects in our ridesharing system.
When indexing all points p d v and p a v , we also construct each ITT c ∈ ITT of a cell c considering the time factors of each v, i.e., t d v and t a v . Therefore, each c has its own ITT c after indexing all data objects. The where elements of column T are ranged in ascending order. Note that the column T can be divided into not only 24 but also any other number of rows according to the various system environments.  Figure 8(b).
The update of ITT proceeds while conducting that of the location of passengers, whereupon the entire steps are similar to the method in a normal grid index. The difference lies 97426 VOLUME 8, 2020 only in dealing with the time information, which is easily considered.

B. TIME-CONSTRAINED IPS (TIPS)
In this part, we present two types of strategies to prune unnecessary search, and an efficient query processing method taking advantage of the ITT.

1) CELL-TIME PRUNING
We can effectively prune the specific cells if it is guaranteed that the cells do not cover the departure or arrival points of passengers who satisfy the temporal condition (Definition 3). We know whether a cell c can be pruned or not by just comparing t d u (of a driver u) and the minimum t d v (of a passenger v) when p d v is covered by c. This filters the cases out when there is no v satisfying the temporal condition, namely the minimum t d v is greater than t d u in c. Thus, we do not need to search this c. The case of the arrival time is similar. If the maximum t a v is lower than t a u when p a v in c, the search about the arrival side is also unnecessary. We formally describe the above concept using the cell-timeslot. : t c = [t d c : t a c ] denotes the cell-timeslot, where t d c is a departure cell-time and t a c is an arrival cell-time such that To summarize, given an m-CRG query, we can skip the search of cells if t d u < t d c while processing the departure-side cell search. The arrival-side cell search can also be omitted if t d c < t d u .

2) BIDIRECTIONAL CELL-TIME TRAVERSING
We traverse a set of passengers if a cell covers some departure or arrival points of the passengers satisfying each temporal condition. At this moment, we are able to efficiently traverse the passengers by utilizing the inverted timetable of the cell. When traversing, for example, passengers whose departure points p d v are in a cell c, we only have to check the if v / ∈ CS then 7 Insert v into CS; and v ∈ V ( ) u then 10 Insert v into VH;

11
T ← T + 1; while t a u ≤ T do 15 for each v ∈ V a c (T) do 16 if v / ∈ CS then 17 Insert v into CS; 20 Insert v into VH;

return (VH, CS);
V d c (T), i.e., t d u < T, are intuitively pruned because the elements of the column T of ITT c are ranged in ascending order of time. For searching arrival-side passengers, we explore V a c (T) in reverse, i.e., for T, from t a c to t a u . Example 7: Consider the inverted timetable as shown in Figure 8. When processing an m-CRG query, if the current search is for the departure-side, we traverse the table from On the other hand, the search starts from V d c 1 (23) (to V d c 1 (t a u )) if the current procedure is for the arrival side. By doing so, we can avoid irrelevant computation, i.e., a set of V d c 1 (T) and V a c 1 (T) (t d u < T < t a u ).

3) QUERY PROCESSING
We propose the TIPS as a final method for processing m-CRG queries. The above strategies, i.e., cell-time pruning and bidirectional cell-time traversing, should be applied to the query processing method before traversing passengers of each cell in Algorithm 3. Thus, we just convert the function SearchCellData into a function SearchCellTimeData for ease in the explanation of the TIPS. VOLUME 8, 2020   (Lines 2-3). We can prune this search if the T is greater than t d u . If not, using a checked set CS, each v ∈ V d c (T) is examined for checking that we have already seen the v, i.e., at the arrival-side search (Lines 4-6). When ensuring that this v is not yet examined, we insert the v into CS. Otherwise, we again check whether the v meets other conditions to be G ( ) m . We insert v into VH if the above conditions are satisfied by the v, and continue the above procedure until T is equal to the u's departure time t d u (Lines 7-11). Finally, we return the completed (VH, CS). Arrival-side search is also proceeded in a similar manner, except that we decrease T from t a c to t a u (Lines 12-21).

V. EXPERIMENTAL EVALUATION
In this section, we describe data sets and basic settings for experiments in Section V-A. Then, we report performance evaluations through extensive experiments: query processing time, area of search space, group matching rate, update time, and voluminous data sets in from Section V-B to Section V-F.

A. DATA SETS AND SETTINGS
We used four types of data sets, namely BE, GE, BA, and GA. All data sets were sampled from real data sets, Brightkite and Gowalla [33]. Specifically, both BE and GE denote data sets in the area of Europe extracted from original data sets (e.g., see Figure 9; an edge connects departure and arrival points.). The space of every data set has been normalized to a square of 100 on a side. Additionally, BA and GA represent data sets from the area of America. We regard the two points among the check-ins of each vertex as the departure and arrival points. Furthermore, check-in times (hours) of the selected points are considered the available departure and arrival times. Table 2 details the real data set statistics.  We next evaluate the performance of our methods. All experiments were conducted on an Intel Core i5 with 3.40 GHz CPU and 16 GB memory, and implemented in Python. We maintained data sets in main-memory, and stored locations of data objects in an index structure of 250 × 250 size. We measured the average performance through 200 experiments with random queries. We particularly evaluate our methods by varying four types of parameters. Table 3 lists the parameter settings.
In addition, we utilize only the exact 2-friend set F (2) v , i.e., n = 2, on account of the following reasons. Larger n (F v , etc.) needs considerable space, and the size |V ( ) v | of -hop friend list is rapidly growing from especially when = 3 [34]. We can discover a similar tendency in the real data sets. Figure 10 shows the average number of sampled -hop friend list varying . The size of each -hop friend list is similar and manageable until only = 2.
Since there is no existing method for processing m-CRG query exactly, we compare the performance of algorithms discussed in this work and slightly modified geo-social algorithms: • SER -This approach was newly implemented based on the insight of a similar geo-social ridesharing [17]. We took its main aspects; departure/arrival R-trees and social-awareness. Additional information is in Section VI-C and Appendixes.
• NCF -The core functions of this competitor are made of the method [32], which efficiently finds not just k nearest neighbors but top-k nearest friends on social grid index. Even though the original aim of [32] is not to find groups, it is significant for processing the m-CRG query to find the nearest friend. That is, the performance highly depends on finding the nearest friend. Thus, we carefully extend the method to find ridesharing groups.
• IPS -This algorithm in Section III-C uses the exact n-friend set for processing m-CRG query. When making -hop friend list, this approach wins the benefit in efficiency through preprocessing of the exact n-friend set. • TIPS -This method in Section IV-B3 is our final method, which is the same as IPS except cell-time pruning and bidirectional cell-time traversing. Hence this method can effectively avoid irrelevant graph traversing and unnecessary cell searches. Note that we could not plot the result of SER in every single graph because the gap of performances between other methods is significant. Alternatively, we compare the CPU time in Table 4. To tell the result in advance, other algorithms definitely outweigh SER in our problem.

B. EXPERIMENT FOR QUERY PROCESSING TIME
We report the running time of experiments with various parameters and data sets in Figure 11. Firstly, Figure 11(a) depicts the effect of varying acceptable social boundary on BE. Although the processing times of the three methods increase with , TIPS always outperforms other methods because TIPS avoids unnecessary cell search and -hop friend check effectively. These tendencies are nearly identical to those of the tests conducted on GE, BA, and GA. According to the above results, the overall performances of the methods heavily depend on varying . Next, we plot the effect of varying empty seats m on BE in Figure 11(b). The performance of TIPS is better than those of the others like the previous experiment. In addition, when m grows, the performance of all methods degrades because larger m obviously makes the methods struggle to search more departure and arrival points.
Third, Figure 11(c) reports the effect of the varying movement threshold h on BE. Sometimes potential passengers may not be willing to have long trips from their points to a departure point in real-world services. In this case, drivers are able to specify h by considering their situations. For applying this concept, we can just filter those kinds of passengers whose movement costs are higher than h out when each algorithm checks other conditions (e.g., spatial and social conditions). Thus, we evaluate the proposed methods by varying h from 1 % of the entire width to the infinity. Note that the default h of the remaining experiments is each driver's own d g (p d u , p a u ). In the experiment, the best algorithm is TIPS and the worst thing is NCF as well. When h increases, the elapsed time decreases because all methods can find an optimal group earlier. In other words, there are a number of people finding it difficult to reach the departure location of a driver on time. One more interesting thing is when there is no limit with regard to h, i.e., infinity. Contrary to the expectation, the query processing is fairly time-consuming because we have to consider a greater part of vertices whenever candidate groups are made and checked. Surprisingly, TIPS is stable and it means that lots of passengers cannot meet the time constraints of u.
Lastly, Figure 11(d) explains experiments conducted with varying timeslot t u on BE. Until now, we have described all experiments with each driver's own timeslot. In other words, the default of t u is movable. On some occasions, however, the driving time has to be always fixed, for example, school buses. This is why we conducted these tests through clas-sification into 12 timeslots. Then, we can fully catch basic trends even though the time distribution does not always cover all cases. As we expected, the best method is TIPS and the worst method is NCF like other experiments. Furthermore, we can see that all methods take a much longer time at [00:01], [20:21], and [22:23]. One of the reasons is that many people do not log their locations at that time, whereupon our algorithms spent much time for finding optimal groups or terminating their procedures.
We conducted the same tests on GE, BA and GA in Figures 11(e)-(p). All trends are similar to the tests on BE but the average time for processing queries is larger due to the increase of data size. Consequently, the gap would be remarkable as the data size increases. Different spatial, temporal and social distributions affect the result as well.
Additionally, we evaluate the performance of SER by comparing that of TIPS and the results are in Table 4. For the compactness, we report how much more time SER averagely needs to find query answers by varying each factor of every parameter. The largest gap is when changing similar to the previous tests. Also, the efficiency of SER degrades as the size of data sets increases.

C. EXPERIMENT FOR AREA OF SEARCH SPACE
To study the effectiveness of pruning techniques, we compare the area of the search space. Note that the whole area is 10,000 because of the normalization of spatial space. Figure 12 shows the experimental results of the extent of search space with diverse data sets and parameters like previous experiments. Notably, we describe only NCF and TIPS for this experiment since the results of experiments on NCF and IPS are exactly equal in terms of these tests. Indeed, it is nothing worth comparing of to check the result of SER since it is a technique based on two data-driven structures. That is, the sum of two search spaces is extremely large due to overlaps of rectangles containing data points. Figure 12(a) illustrates the effect of varying on BE. When is 1 or 2, the search spaces of both NCF and TIPS are quite small and similar because the number of pruned cells is still tiny. However, when exceeds 2, the gap between the results of two methods is much larger. Although the outcome of TIPS also changes, its ratio is far better because NCF is difficult to utilize time information for pruning search spaces, i.e., cells which cover -hop friends in our study. In other words, the pruning technique of TIPS can be used well when the area of search space gets larger with the increase in the number of -hop friends.
Next, the result of experiments on BE with varying m shown in Figure 12(b). When m is 2, the size of the search space of both methods is quite small because m = 2 means just to find the nearest -hop friend of u. That is, the final friend might be around query locations. As m increases, those of two methods grow because the algorithms continuously investigate spaces until finding m − 1 passengers.
Third, Figure 12(c) shows the results for varying h on BE. We can observe that the area of search space is falling as h increases. That is, larger h gives doable performances. The gap between the results of the two methods gradually becomes narrow because our method finds candidate passengers and a result group more easily. Additionally, when h is infinity, the outcomes of the two methods are similar because all passengers satisfying other constraints could be candidates in terms of the time constraint.
Finally, Figure 12(d) reports the results for varying t u . In this experiment, a notable tendency is lots of space exploration at night. The result is highly related to the tests in Figures 11(d), (h), (l), and (p). Europe data sets are affected by driving timeslots than America data sets.
The same experiments on GE, BA, and GA are depicted in Figures 12(e)-(p). As we might expect, all trends of the experiments are similar to the tests on BE except in some cases. For example, we can see an interesting result of the test for varying on GA; the size of the search space of TIPS is getting smaller after is 3 distinct from the test on BE. This is because -hop friends of u can sufficiently be an optimal ridesharing group before checking more cells than that of the experiment when is 3. This outcome is remarkable when is 5.

D. EXPERIMENT FOR GROUP MATCHING RATE
We report group matching rate with diverse parameters in Figure 13. Note that the rate is changed by different data sets because our methods exactly find an answer if there is an optimal group. Since the default m is 4, we regard the returned group as a failure if the size of the group is under 4. Moreover, the results of all methods are the same, so we conducted these tests by varying data sets. Figure 13(a) illustrates the experimental results for varying . Intuitively, it is hard to group four (default m = 4) members in 1-hop friend list because all members have to be connected directly. Therefore, the rate is significantly worse when is 1. As increases, the rate of all methods grows as well. Interestingly, the general group matching rates of experiments on BE and GE are smaller than those of tests on BA and GA. The reason could be that passengers in America or Europe have different social graphs or spatial distribution regardless of the data size.
Next, Figure 13(b) describes the results for varying m. When m is larger, the rate decreases due to the fixed number of -hop friend list. That is, it is hard for all methods to find optimal groups using the limited number of candidates. Recall the experiments in Figures 11 and 12 for the correlation.
Then, Figure 13(c) displays the effect of varying h. Intuitively, h is a huge constraint in terms of the matching rate. Thus, the result is surely affected by varying h by filtering many passengers out. Furthermore, the data size and types also affect the group matching rate, so the result of the test on GE outperforms those of other experiments in contrast with previous tests. Another interesting thing is that the result on GE shows a quite high matching rate compared to that of BE even though two data sets were extracted from the same region. Thus the correlation between h and GE quite high. Lastly, the results of tests with varying t u are depicted in Figure 13(d). The general outcome is similar to other experiments we discussed, but the matching rate at night on BE and GE is especially low. It is one of the reasons that friends of the driver might insufficiently stay nearby the driver at that time.

E. EXPERIMENT FOR UPDATE TIME OF EXACT n-FRIEND SET
In our ridesharing system, the social update is more crucial than the spatial update because a simple grid update sufficiently can support the movement of objects. Therefore, we focus on the social update: insertion and deletion. Figure 14 explains experiments for the update of the exact n-friend set. Since there are two types of updates, we separately conducted the tests and set n as 2 according to that we already discussed in Section V-A. When a friend v of v is created, we have to check V (1) v for modifying F (2) v . For the deletion of an edge e(v, v ), V (1) v needs to be inspected again because the deletion affects F (2) v . We compare our method in Section III-B, namely SCF, to the basic method called Basic, which updates the F (2) v by simply searching each V (1) v . As expected, SCF always outweighs Basic, and the execution time increases as the size of the update or data set increases. Furthermore, the deletion of e(v, v ) takes a longer time because we check whether every friend of the deleted vertex also has to be eliminated or not. If there is no edge between v and v , we get rid of the corresponding vertices.

F. EXPERIMENT FOR VOLUMINOUS DATA SETS
Finally, we evaluate the performance of each method with varying data sizes. It looks pretty hard to gather instant passengers' itinerary data over the size of data sets in the preceding experiments because the search space of every query is rebuilt after processing just before the query. Nonetheless, this simple experiment could be valuable to support lots of requests in a short time such as when a big concert or a crucial football match is over. For the experiments, we generated various sizes of synthetic data sets. Departure and arrival points are uniformly distributed, and social graphs are made on the basis of the Brábasi-Albert model [28]. The numbers of vertices are from 60 k to 300 k, and edges are from 300 k to 2.4 M. Figure 15 explains execution time with varying data sizes on synthetic data sets. TIPS always achieves the best running time, and IPS is the following approach as expected. The gaps between each method become larger as the data size increases because especially the number of edges considerably affects the performance as we observed in Figures 11(a), (e), (i), and (m). Understandably, it is hard to compare that of SER due to its unusable performance.

VI. RELATED WORK
We review related studies in this part. Research on ridesharing is introduced in Section VI-A, and geo-social studies are described in Section VI-B. Lastly, social-aware ridesharing is reviewed in Section VI-C.

A. RIDESHARING
The ridesharing has been a promising transportation system for not only reducing traffic issues but other related problems. The work [1] divided a ridesharing problem into two views, called Join-based Ride Sharing and Search-based Ride Sharing. An approximated algorithm and a best-first approach win to tackle the problems with the enhancement of shared routes. Also, a taxi-sharing system, which deals with real-time requests, was developed in [3]. The system includes taxi-searching and scheduling algorithms, and especially considers vehicle capacity, time window, and monetary constraints. With regard to another system, [7] introduced Xhare-a-Ride targeting especially at in-memory indexing. The authors proposed and exploited clusters composed of land marks for efficient ridesharing. Recently, [9] generalized the ridesharing to a new problem, Unified Route Planning for Shared Mobility. Especially, the proposed multi-objective function enables the problem to be feasible.
In addition to traditional concepts, the latest research on ridesharing has been studied with interesting objectives. In [35], a fair pricing model for drivers and riders was designed for increasing the revenue of the service provider. Its Auction-based Price-Aware Real-time ride-sharing framework satisfies the temporal and monetary constraints. The authors focused on a real-time environment as well. Another work [36] introduced Utility-aware Ridesharing on Road networks for maximizing the satisfaction of riders for worthwhile travel. The utility is composed of vehicles, riders, and trajectories in the work. Next, the study [37] proposed Activity-Based Ridesharing Algorithm to raise the rate of group matching by utilizing the activity information. Specifically, its active-based algorithm creates a pool of alternative destinations for effective ridesharing. The work has shown its effectiveness and efficiency by comparing tripbased approaches. Moreover, another study [38] tackled one of the ridesharing issues from the privacy point of view.

B. GEO-SOCIAL QUERIES
Social media has given researchers and developers opportunities to study geo-social queries in the past few years. In [39], a Circle of Friend Query searches the k − 1 friends of a query user. The query was defined with the concept of diameter and the score of the ranking function. In another work [40], a geo-social query processing framework was proposed. The framework envelopes primitive queries, called Range Friends and Nearest Friends queries, which consider directed friends. Authors developed simple but effective algorithms for processing queries on centralized and distributed machines.
Other fundamental geo-social queries and those processing methods on Social Grid Index (SGI) were studied [32], [41]. Specifically, -Close Range Friends queries retrieve all people who are connected in specified social boundaries of an input user within a spatial range. Similarly, k-Nearest -Close Friend (k -NCF) queries find k nearest neighbors among -hop friends of a query user. The authors developed several algorithms with an efficient update method for the SGI. Recall that one of the baselines is a group version of [32] because our social model, i.e., k-club, can be effectively decomposed by k -NCF queries. Social-aware spatial algorithms have been generalized as group queries [40], [42]- [44]. Most works adopt popular graph models or made their own ranking functions for calculating and comparing the scores of every group. Using geo-social network data, additionally, a density-based spatial clustering algorithm [45] was proposed. Also, geo-social group queries can be considered as a special case of attributed graph or community search problems [46]- [48], which are immensely exploited in the fields of medical, science, and engineering.

C. SOCIAL-AWARE RIDESHARING
With the development of ridesharing services, its relevant crime rate is getting higher. Although an increasing number of works have chosen a social-awareness to mitigate the problems, those are in its initial stage [17]- [21].
A work [17] searches socially aware groups for ridesharing services by proposing and processing Social-aware Ridesharing Group queries. Specifically, ridesharing member's trip has to be similar to that of the driver, and a social-awareness is defined by k-core, which is one of the graph models. Every member of the group has at least k direct friends among the members according to the definition of the model. Furthermore, the authors also suggested Social-Info R-tree, which accelerates their processing method.
Additionally, the research [18] has concentrated on the topk taxi searches in road networks. In the study, riders can select their favorable taxi among the top-k taxis. Especially, the authors of the work proposed Top-k Social-aware Taxi Ridesharing queries, which return the top-k taxis meeting a trip request based on their ranking function.
A study [19] also has found ridesharing groups by defining Community aware Ridesharing Group queries. Contrary to other works, authors utilized not users' friendship but the relationship at community levels because their objective includes avoiding privacy threats. That is, at least m members in a formed group have to share at least k communities. This concept is a modified k-core, so we put it in the same category.
Another work [20] has covered a problem called Assignment of Requests to Offers for the practical ridesharing in a similar environment to [18]. Authors proposed cost functions for evaluating a social distance and a common interest through an undirected social network and a modified Jaccard similarity. The problem is tackled by a linear programming method and heuristic techniques. VOLUME 8, 2020 model for controlling an acceptable social bound. We summarize related works and compare those differences in Table 5.

VII. CONCLUSION AND FUTURE WORK
Nowadays, ridesharing services are being rapidly developed owing to the benefits in many fields. However, the cohesionrelated problem of ridesharing groups also occurs. For cohesive ridesharing, we define a new problem, the m-CRG query, taking into account the spatial, social, and temporal aspects. We also develop a series of algorithms for efficiently matching a driver and passengers in our framework. Furthermore, the efficiency of the proposed methods is proved using several data sets.
In the future, we will study an enhanced ridesharing system taking other aspects such as the shortest route and the interest of riders. Our method could be applied to various applications and the environment with the study on spatial networks, obstacles on the road, and location uncertainty. Furthermore, other ridesharing models can be an up-and-coming topic for future research.

APPENDIX
The SER is one of m-CRG query processing methods for evaluating the performance. Its main concept is to encapsulate the R-tree and the social-awareness at once called the Social-Equipped R-tree (SER). For fair evaluation with others, the SER exploits the same concept of -hop friends as social processing, and a whole process is also similarly devised to the proposed method by considering the properties of [17].
The pseudo code of the main query processing is in Algorithm 5, and that of core functions is described in Algorithm 6.

ACKNOWLEDGMENT
This work is based on the dissertation of C. Shim conducted at Korea University [50].