Discovery of Loose Group Companion From Trajectory Data Streams

The general usability of location tracking devices has been generated a high volume of spatial-temporal data in the form of trajectory. Exploring useful knowledge from these trajectory data can contribute to understanding many real-world applications, such as trafﬁc monitoring and weather forecasting. The main task of trajectory data analysis is the tracking of an object group movement pattern. Existing algorithms, studying the evolving structure of moving object trajectories, have high computational complexity, particularly when tracking loose group companions. To address this problem, we describe a loose group companion tracking framework over trajectory data streams in an incremental manner, which reduces computational time. Loose group companion is the moving objects group that travels together. However, some members are allowed to leave at some timestamp. A crucial part of our framework is the micro-group based loose group companion discovery. It follows a moving object group and then incrementally detects the loose group companions. We validated our techniques using two real vehicle data sets and one synthetic data set. Our approach was, on average, 45% faster than previous algorithms.


I. INTRODUCTION
The increasing availability of location tracking technologies, including GPS, WLAN networks, Radio Frequency Identification, and mobile phones, has enabled tracking of a variety of moving objects, for example, vehicles, animals, and people. It results in large volumes of spatial and temporal data as the form of trajectories. Therefore, moving object tracking applications generally need to handle trajectories, when they arrive at the server, for immediate analysis and return of the result.
Past work has discussed various kinds of object group movement patterns, e.g., a group of m moving objects that travel together at k time intervals. Examples of this pattern are flocks [1], [2], convoys [3], [4], swarms [5], platoons [6], and gatherings [7], [8]. To identify various patterns, we need to find the moving object cluster efficiently.
The associate editor coordinating the review of this manuscript and approving it for publication was Hocine Cherifi . Some existing clustering algorithms find the cluster from the whole trajectory database, but this leads to high computational complexity in pattern discovery. To address this problem, Tang et al. described a buddy-based clustering algorithm to accelerate the traveling companion discovery. However, in this clustering algorithm, the detection of splits and merges of buddies leads to high computational complexity, when the trajectory of each moving object changes in every snapshot(timestamp), and every buddy center needs to update in every snapshot [9], [10]. Therefore, we developed a microgroup based clustering algorithm for discovering traveling companion to reduce running time. We have published details of this algorithm [11].
There is a wide range of applications in trajectory data mining, such as path discovery, destination prediction, and movement behavior analysis for an individual or a group of moving objects. For example, in traffic management systems, the early discovery of the moving object group in vehicle trajectories can assist in avoiding traffic congestion and can identify regular routes. A commuter can discover a traveler with the same route to share a carpool. In biology, moving object group discovery supports the scientist, who wants to study animal migration. In weather forecasting, the scientist can forecast the weather in advance, by analyzing bird migration.
Although there are many existing applications in trajectory mining, our focus is to discover moving object group behavior in traffic management systems. In the real world, moving object trajectories always change rapidly, influenced by external influences. For example, in a university, groups of students move together to their destination. At some random times, some students leave the group to buy snacks. In traffic monitoring, a group of vehicles travels together along the trip. Some vehicles temporarily leave the group to refuel at a petrol station or a car park. Following these natural behaviors of object movement, in traveling companion discovery, an inflexible requirement for the same members along the trip can miss some unusual pattern.
Tang et al. [10] introduced a loose companion as a member of a group of objects that move together for a specific period, but where the member can temporarily leave the group. It calls the buddy-based loose companion discovery that still has high computational time complexity in the buddy-based clustering, and the candidate extension.Thus, we focus on finding the loose group companions efficiently from trajectory data streams. To address these problems, we adopt our micro-group based clustering and introduce pruning strategies in candidate extension for loose group companion discovery.This paper is an extension of an efficient traveling companion discovery framework [11] by using micro-group based clustering in the discovery of loose group companions. Our main contributions are: • Introducing the loose group companion discovery over evolving trajectory data stream.
• Formulating the corresponding loose group companion discovery algorithm based on the micro-group structure and • Evaluating effectiveness and efficiency on both real and synthetic datasets. Fig.1 illustrates a difference between traveling companion and loose companion discovery over the data stream.
Each snapshot lasts for 1 minute. If we set the candidate size threshold, δ s = 3, and the candidate duration threshold, δ t = 4 minutes, then we have only three members: 4 } as traveling companions. It means we find at least three members of the moving objects group that travel together for at least 4 minutes. We miss members, o 1 and o 5 because they leave temporarily. So, we set the leave time threshold, δ l = 2, and o j .τ be the period that o j leaves the group movement. Then, o 1 leaves for 1 minute (i.e., o 1 .τ ≤ δ l ) and o 5 leaves for 2 minutes (i.e., o 5 .τ ≤ δ l ) based on their distance calculation in some snapshots. After this, companion discovery returns five members: In the rest of this paper, we describe the related work in section II. Problem formulation is described in section III, and the incremental loose group companion discovery process is discussed in section IV. The experiments are described in section V and discussed in section VI. Finally, we conclude and describe future work in section VII.

II. RELATED WORK
We review previous works on trajectory data clustering and various kinds of moving object group pattern discovery from trajectory data.

A. TRAJECTORY CLUSTERING
Lee's TRACLUS used a sub-trajectory clustering approach to detect missing sub-trajectories. In essence, it firstly partitioned a trajectory into a set of sub-trajectory line segments. It found the group of density-connected sub-trajectories [12]. Birant et al.'s ST-DBSCAN [13] used a new density-based clustering, based on DBSCAN [14], which discovered the cluster with non-spatial, spatial, temporal properties of moving objects. Due to the changes in the distribution of moving object data points with time, Amini et al. developed LeaDen-Stream [15]. To reduce time complexity, they chose the proper micro-cluster leader in the online phase. They sent it to an offline phase to form the final cluster.
Silva et al. developed ClUstering Trajectory Stream (CUTiS), which investigated a group of moving object trajectory data as an incremental structure [16]. To handle missing clusters in density peaks clustering (DPC), Du et al. [17] proposed a density peaks clustering based on k nearest neighbors (DPC-KNN), which adopts the idea of k nearest neighbors (KNN) into the DPC for local density computation. Moreover, Du et al. [18] also proposed a new density-adaptive metric into the DPC to tackle the assumptions of local consistency, which means nearby points are likely to have a similar local density, and global consistency, which means points on the same high-density area are likely to have the same label.

B. GROUP PATTERN MINING
Many researchers described various kinds of group patterns -flock, convoy, swarm, platoon, and gathering -to extract information from the trajectory data stream. The earliest group pattern is the flock pattern [1], which captured the VOLUME 8, 2020  [2] sped up the discovery of flock patterns using geometric techniques, e.g., plane sweeping, binary signatures, and inverted indices. Furthermore, the circular shape of a flock cannot capture the actual moving object group. So, Jeung et al. used a convoy pattern that has an arbitrary shaped density-based cluster. These moving object groups are moved together for at least k consecutive timestamps [3]. To avoid missing convoys and invalid ones, Yoon and Shahabi developed a Valid convoy Discovery Algorithm (VcoDA) to detect a valid convoy. They firstly discovered partially connected convoys and then validated their density-connected of group members to retrieve accurate convoys [4].
However, both flocks and convoys have a rigid constraint for the continuous timestamp. Li et al. developed a timerelaxed movement pattern for moving object groups, called a swarm pattern, in which they mined closed swarm by developing the ObjectGrowth method [5]. On the other hand, a very relaxed timestamp constraint in the swarm pattern and a strict requirement on the consecutive time in the convoy discovery, which resulted in an undesired pattern in real applications. So, Li et al. designed the platoon pattern, which considered locally consecutive time constraints [6]. Wang and Lim [19] defined a group pattern, which is a group of users that are within a distance threshold from one another for at least a minimum time. To mine such a group pattern, they developed AGP and VG-growth, derived from the Apriori algorithm and FP-growth algorithm. Wang et al. [20] extended this to the uncertain group pattern -a group pattern that is discovered from the uncertain trajectory. Since the search space for the group pattern is extremely large, they designed an efficient pattern mining algorithm based on pruning. Fan et al. designed a parallel framework to discover the co-movement pattern in the form of a combination of the flock, convoy, swarm, group, and platoon [21].
Naserian et al. formed a loose traveling companion pattern (LTCP) from individual trajectory data to avoid a strict requirement on retaining the same members during a group lifetime in the area of Guangzhou Baiyun International Airport [22]. They also extended this work with more optimistic algorithms to detect the LTCP more efficiently [23]. The main difference between LTCP and our approach is that the members in LTCP leave as sub-groups. To detect such sub-groups, they used a hierarchical clustering method. Hence, there was an obstacle to discovering LTCP by applying the densitybased clustering algorithm. The advantage of our loose group companion, over LTCP, is that our approach allows members to leave as individuals or as a sub-group.
Zheng et al. introduced a gathering model, which is the congregation of moving objects that last for a certain period in the form of traffic congestion [7], [8]. Zhang et al. retrieved the gathering pattern by developing a gathering and retrieving algorithm (GR) based on the Bron-Kerbosch maximum clique discovery algorithm [24]. Xian designed the gathering pattern discovery over parallel distribution fashion as batch and streaming models [25]. To avoid the rigid requirement on the number of participants in the gathering pattern, Lan et al. designed an evolving group pattern as the dense group of moving objects. They proposed a discovery framework that efficiently detects the evolving groups using sliding window technique [26]. Also, Chen et al. defined a congregate group pattern to capture various congregations from trajectory data, in the form of a dense group but allows the member to join or leave the group at any time if some members remain [27]. Zhou et al. [28] modeled multibehavior periodic mining and single-behavior periodic mining by developing the periodic pattern detection algorithms based on spatial and temporal multi-granularity considerations. Feng and Zhu discussed the trajectory data mining techniques related to group pattern behavior analysis [29]. Table.1. summarizes some characteristics of the related pattern discovery and our approach.

III. PROBLEM DEFINITION
In this section, we formulate the definition of the problem used in our system. Table.2. lists the terms used in this paper.

Definition 1 (Trajectory Data Stream): A trajectory data stream S consists of a sequence of snapshots
Definition 2 (Snapshot): Each snapshot s i contains a set of moving objects: . . , (o n , x n , y n )} where each object o n is associated with the location (x n , y n ).
Definition 3 (Micro-Group): Given a snapshot moving object set O s i , a micro-group distance threshold ε, and a micro-group size threshold γ , then a micro-group g i is defined as a small group of moving objects g i = {o 1 , o 2 , . . . , o n } with one representative object o 1 (denoted as R[g i ] ) and member objects {o 2 , . . . , o n }, where: . . , c m s i } be the snapshot cluster set, where c j s i be j th cluster at snapshot s i . Each c j s i includes a set of micro-groups and objects. Definition 5 (Loose Group Companion Candidate): Given a candidate size threshold δ s , a candidate duration threshold δ t , and a candidate leaves time threshold δ l , then R = {r 1 , r 2 , . . . , r i , . . . , r v } be the loose group companion candidate set, where r i is a loose group companion candidate if: • r i contains more than δ s density connected members, i.e., size (r i ) ≥ δ s • the members of r i are density connected by themselves for a certain time, where duration (r i ) < δ t • ∀o j ∈ r i , let o j .τ be the period that leaves the group, then o j .τ ≤ δ l . In this case, the period of an object is the total number of snapshots that leaves the group. For example, in Fig.1, the leaving period of object o 2 is 2 snapshots (minutes) (i.e., o 2 .τ = 2).

Definition 6 (Loose Group Companion):
Given a candidate size threshold δ s , a candidate duration threshold δ t , a candidate leave time threshold δ l , and the loose group companion candidate set R = {r 1 , r 2 , . . . , r i , . . . , r v }, then Q = {q 1 , q 2 , . . . , q i , . . . , q w } be the loose group companion set, where q i is a loose group companion if: • the duration of the loose group companion candidate r i is more than a candidate duration threshold δ t , then r i is denoted as a loose group companion q i i.e., duration(q i ) ≥ δ t .

IV. LOOSE GROUP COMPANION DISCOVERY FRAMEWORK
In this section, we describe the incremental discovery of the loose group companion from the trajectory data stream. The pseudo-code is shown in Algorithm 1. Before describing the detailed steps of the algorithm, according to the variation of sampling rate in the trajectory data, we apply the linear interpolation as a preprocessing step to adjust the missing point in the trajectory data stream. There are three primary phases in the loose group companion discovery: 1) clustering phase to represent moving object group, 2) candidate extension to detect loose group companion, and 3) candidate creation. To reduce computational time complexity in the clustering phase, we apply the micro-group based clustering algorithm that we proposed previously [11]. Moreover, to avoid the overhead intersection between candidate and cluster in candidate extension, we define the following lemmas and a definition.
Lemma 1: If the remaining number of objects in r i , after intersecting with one of the current snapshot clusters is less than δ s , then further intersections with remaining clusters will not generate any required member size.  Proof 1: By definition, clusters do not overlap: (i.e., the position of each object only appears once in a single snapshot and solely belongs to one cluster). So, if there are more than size (r i ) − δ s objects appearing in the intersected result, then the remaining number of objects in r i will not be higher than δ s , i.e., size (r i ) < δ s .
Lemma 2: As opposed to Lemma 1, if the remaining members' size of the cluster c i s i after intersecting with the candidate r i is less than δ s , then continuously crossing with the remaining candidates will also not generate any required member size.
Proof 2: If there is more than size (c i s i )−δ s objects appearing in the intersected result, then the remaining size of the cluster c i s i will not be higher than δ s , i.e., size(c i s i ) < δ s . Lemma 3: If the size of intersected candidate members is less than δ s (i.e., size(v) < δ s ), then it will not generate any candidate that satisfies δ s .
Definition 7 (Closed Cluster): For each snapshot cluster c i s i ∈ C s i , if the cluster c i s i satisfies with the candidate size threshold, i.e., size(c i s i ) ≥ δ s , then the c i s i is said to be the closed cluster.
The explanations of Algorithm 1 are as follows. For the first snapshot, when the new moving objects arrive, the system performs the micro-group creation in Algorithm 2. In this algorithm, in order to create a micro-group g i , the system randomly selects the representative object among all objects that have enough neighbors of objects. It means the number of objects in g i must be greater than or equal to γ . This process is performed until all moving objects are set as visited. The list U keeps the remaining objects that are not included in micro-group creation.
For continuous snapshots, the system uses the maintenance structure of the micro-group in Algorithm 3. Firstly, the system gets the micro-group set from the previous snapshot. It updates the micro-groups based on the changes in their location by detecting their evolution events such as survive, appear, disappear, split, and merge.
After getting a micro-group set G s i , the system creates the cluster based on the micro-groups and the list U in Algorithm 4. In this case, there are two considerations: (1) if the distance between the representative objects of two microgroups is no more than the sum of their radius values, then these micro-groups are set as the members of the cluster. It is shown in Fig.2(a). The radius of each micro-group is calculated from the representative object to its farthest member object. (2) if the distance between a member object o j of the micro-group and an object o k in the list U is ≤ ε, then the system sets o k as the member of the cluster. It is shown in Fig.2(b). By using these ways, the system discovers the clusters in the form of a density-connected structure.
The candidate extension phase is done for consecutive snapshots. For the current snapshot s i , the system firstly gets the loose group companion candidate set R from R new in the previous snapshot s i−1 (line 8). Furthermore, it performs candidate extension by intersecting between candidates and

Algorithm 4 Micro-Group Based Clustering
Input: G s i , U , ε Output: C s i 1 while the micro-group g i ∈ G s i is unvisited do 2 initialize the cluster c j s i with g i and set g i as visited ; 3 for each g k ∈ G s i do 4 if dist(R[g i ], R[g k ]) ≤ radius(g i ) + radius(g k ) then 5 add g k to c j s i and set g k as visited;  the current snapshot clusters from C s i (lines 9-25). We proposed to reduce time complexity. Therefore, the algorithm intersects only the candidates and clusters that satisfy with δ s (lines 10-15). The intersected members are assigned to the a temporary list, v. If there has enough common members, i.e., size(v) ≥ δ s , then a new candidate r i is derived, by updating member objects' leaving period if some objects leave group, and their duration (lines [16][17][18][19]. If the new candidate r i is satisfied with δ t , then it is labeled as a loose group companion q i and added to the loose group companion set, Q (lines [20][21][22]. Otherwise, r i added to the temporary candidate list, R new , for the candidate extension in the next VOLUME 8, 2020 The final phase is the candidate creation (lines [26][27][28][29]. For the first snapshot, the system initializes all closed clusters (definition 7) as loose group companion candidates. For continuous snapshots, the system creates the closed clusters (i.e., not the extension of candidates) as the loose group companion candidates.
Example 1: Let δ s = 5, δ t = 3, δ l = 1, γ = 2. The running procedure of incremental loose group companion discovery based on Algorithm 1 is explained in Fig.3. Each snapshot lasts for 1 minute. In the first snapshot s 1 , the algorithm finds the cluster c 1 s 1 using Algorithm 2. It initializes as a new loose group companion candidate r 1 (candidate creation).
At the snapshot s 2 , the system maintains the cluster using Algorithm 3, gets two snapshot clusters c 1 s 2 , c 2 s 2 , performs the candidate extension with them. Firstly, the candidate r 1 intersected with c 1 s 2 . Since the members of r 1 and c 1 s 2 are more than δ s (i.e., size(r 1 ) = 8 ≥ δ s and size(c 1 s 2 ) = 5 ≥ δ s ), the intersection is performed between them and resulted as, v = {g 2 , g 3 , o 7 }. Its members size is more than δ s (i.e., size(v) = 5 ≥ δ s ). Then, the system updates r 1 as a new candidate r 1 by updating its duration as 2 minutes and the leave time of the group members (i.e., g 1 .τ = 1 and o 8 .τ = 1) and retains them in r 1 since their leave time is less than δ l . For the intersection of r 1 and c 2 s 2 , in our system, clusters do not overlap. In essence, each object only appears once in a single snapshot and only belongs to one cluster. The system removes the members of intersected results v from r 1 , (i.e., , since the members in the intersection can not appear again in others. To avoid unnecessary intersection between r 1 and c 2 s 2 , the remaining members' size of r 1 = {g 1 , o 8 } are validated. Since size(r 1 ) < δ s , the results of the intersection with r 1 cannot be generated any required candidate members (Lemma 1). Hence, the system skips the intersection between r 1 and c 2 s 2 . Then, the cluster c 2 s 2 is initialized as a new candidate r 2 . Both of new candidates r 1 and r 2 are kept in R new for the next snapshot.
In the next snapshot s 3 , the system firstly gets the snapshot cluster set, C s 3 = {c 1 s 3 , c 2 s 3 }, and the candidate set, R = {r , r 2 }. The micro-group g 1 and the member o 8 are joined again to the cluster c 1 s 3 . Firstly, r 1 is intersected with c 1 s 3 and gets as v = {g 1 , g 2 , g 3 , o 7 , o 8 }. Then, it is updated as a new candidate r 1 . At this time, according to the duration threshold (δ t ), the candidate r 1 is outputted as a loose group companion q 1 . With c 2 s 3 , the system skips the intersection as the previous snapshot.
In the case of r 2 , typically, it has to do intersection with c 1 s 3 , c 2 s 3 . Nevertheless, most of all members in c 1 s 3 are appeared in intersected results: v. After removing the intersected members from c 1 s 3 (i.e., c 1 s 3 − v = 0), the members of c 1 s 3 is less than δ s (Lemma 2). So, the system skips this intersection to reduce the running time. Next, the system does the intersection between r 2 and c 2 s 3 and then resulted as r 2 by modifying its duration.

V. EXPERIMENT
In this section, we evaluated our system efficiency and effectiveness compared with the baseline pattern discovery methods, explained in the following.
Dataset and Parameter Settings: We used two real trajectory datasets -Trucks (D1) [30], T-Drive project (D3) [31], and one synthetic dataset (D2) [32], with parameters listed in Table.3. The 'Obj#' indicates the number of moving objects, and the 'Record#' refers to the number of GPS points (i.e., locations). For D1, it contains 63,794 GPS points of 50 trucks. The default values are selected based on high effectiveness and acceptable efficiency. The micro-group distance threshold, ε, is set based on the gathering pattern discovery [7] that uses the T-Drive dataset. The dataset is also the taxicab trajectory data, that are collected around the Beijing metropolitan area.
Environment: Experiments used an Intel Core i7-6500 CPU 3.59 GHz and 8.00 GB of RAM and Windows 10 operating system. All the algorithms were implemented in Java using Eclipse Kepler with JDK 1.7.

A. EFFECTIVENESS COMPARISON
In this section, we determined our algorithm's effectiveness to evaluate the discovered loose companion group. In this evaluation, we used precision and recall comparison. The retrieved results of SC are set as ground truth since SC is  a commonly used baseline DBSCAN clustering algorithm. Here, precision and recall are defined as follows: Precision: the proportion of true group companions over all the retrieved results of the algorithm.
Recall: the proportion of true group companions over the ground truth.
The evaluation was conducted on the candidate size threshold (δ s ) and candidate duration threshold (δ t ) over the real dataset D1. Since TC is a sub-trajectory clustering method, it was not affected by values of δ s and δ t . So, we did not consider it in the effectiveness comparison.    4 shows effectiveness (i.e., precision and recall) comparison based on δ s . The precision and recall of MU and BU are very close since the companion groups that they output are nearly identical. The improvements of them were: the accuracy is ∼0.4 precision and ∼0.5 recall compared to SW, and ∼0.3 precision and ∼0.2 recall compared to CI. More than 40% of CI's loose group companions were useless because it includes many redundant groups and cannot count group companions accurately since it did not use a candidate pruning strategy. SW generated a closed swarm pattern, i.e., a group of objects moving together that frequently meets at a specific time. Moreover, about half of the discovered closed swarm did not output as the loose group companion. Thus it reports <0.5 in precision and recall.
At δ s = 10, the number of loose group companion of MU is 205, and the number of loose group companion of SC as ground truth is 154. Then, the true companion is the common group between them. The number of a true companion is 144.
The precision of MU is 0.702 (i.e., 144 205 ), and the recall of MU is 0.935 (i.e., 144 154 ). Fig.5 shows precision and recall of the variation of δ t . When δ t increased, precision also increased in all algorithms.
The reason is that all loose group companions travel together for a long time. Overall, for precision, MU and BU were better than CI and SW. For recall, CI was nearly identical to MU and BU, because CI also retrieved the same true companions. Thus the variation of δ t had little effect on the resulting companion. For SW, recall increased gradually with δ t . SW could output the true loose group companions when we set tremendous δ t values. Nevertheless, its actual recall was still low compared with others.

B. EFFICIENCY IN CLUSTERING
The efficiency of our Micro-group based clustering (MC) method was compared with the Buddy-based clustering algorithm (BC) [9], [10], TraClu (TC) [12] and DBSCAN (DC) [14]. In BC, the buddy center is updated separately, with every object entering or leaving. However, splitting and merging of the buddies in evolving trajectories made BC computationally costly. In DC, the reconstruction of the index for each object in each snapshot causes the computational complexity. In this evaluation, the two parameter values (i.e., maximum radius of the neighborhood, eps, and minimum number of points in an eps-neighborhood, minPts) for DC are set as the same values of ε and τ . Since TC cannot 85864 VOLUME 8, 2020 incrementally handle the trajectories, it discovered the cluster group from the whole database. Fig.6 compares the running time for all clustering methods on the different datasets. Overall, the running time of MC is the lowest in all the different datasets.

C. EFFICIENCY IN PATTERN DISCOVERY
To compare efficiency in pattern discovery, we evaluated running times and space costs for all datasets. The running times for SW and TC were the cost for the pattern discovery process from the whole database because it cannot output results incrementally. The space cost is measured based on the number of objects (#) that are included in the candidate sets. So, the results for TC are not included in the space comparison, because it can only group the sub-trajectories and cannot store any candidates for the pattern discovery. Fig.7 depicts the efficiency (time and space) for all datasets based on the default settings. Overall, the SW is the highest because the ObjectGrowth algorithm discovers the pattern from the whole database, i.e., not incremental manner. The CI's running time is higher than SC because the intersection is computed for every pair of candidates and clusters in the candidate extension.
However, the DBSCAN clustering algorithm used in SC is the slowest component. Thus, Tang et al. developed a buddy-based clustering algorithm (BU) and applied it in the discovery of the loose companion [10]. The improvement of BU over SC is shown in Fig.7. Due to the evolving nature of the trajectories, BU still has a long-running time. Therefore, in MU, we tracked the loose group companion, using our micro-group based clustering approach and the pruning strategies in candidate extension. Compared with BU, MU reduced running times by an average of 45% overall datasets. Fig.8 compares running time and space usage on the variation of the candidate size threshold, δ s , on datasets: D1 and D3. As δ s increased, the running time of all algorithms turned to decline because the fewer number of clusters was set as the qualified candidates. Except, the time is not affected by δ s for TC. As δ s increased, the number of intersections between clusters and candidates was reduced. So, the cluster and candidate matching mechanism became more effective. In SW, even though the number of clusters was reduced, the running time of SW decreased only slightly, because fully tracking of the ObjectGrowth pattern was slow. On the other hand, the increasing δ s directly affected to the candidate size. Thus, the space to store the candidates was also significantly reduced as δ s increased. In summary, our MU used less time and space compared to the baseline pattern discovery.

E. EFFECT OF CANDIDATE DURATION THRESHOLD
We also investigated the influence of candidate duration threshold, δ t , on the running time and space. Fig.9 shows the effects of δ t on the datasets D1 and D3. The running time increased gradually with δ t . The candidate members were maintained for a longer time, i.e., until they reached δ t . Although δ t was varied, the time and space of MU were the lowest.
F. EFFECT OF CANDIDATE LEAVE TIME THRESHOLD Fig.10 depicts the effect of candidate leave time threshold, δ l , on time, and space for D1 and D3. SW and TC were not included in this evaluation since SW did not allow members to leave the group, and TC could only discover the cluster. The δ l assumed the member could leave temporarily. At that time, objects in the candidate set were not removed immediately from the candidate set and maintained in the buffer until they violated δ l . So, Fig.10 shows that as δ l increased, time and space also increased. With increased δ l , our MU approach showed better performance than the baselines.

A. PARAMETERS SETTING
In the previous section, we described extensive experiments to demonstrate the efficiency of our approach. Parameter value selection played an essential role in the evaluation (both efficiency and effectiveness). So, in this section, we will explain how to set the value for each threshold parameter. There are three fundamental threshold values: 1) candidate size threshold (δ s ), 2) candidate duration threshold (δ t ), and 3) candidate leave time threshold (δ l ).

1) CANDIDATE SIZE THRESHOLD (δ s )
We set the δ s range based on the two primary considerations. The first was the size of the groups that we want to handle. The second was based on the quality of the group (i.e., effectiveness) that we want to achieve. In Fig. 4, δ s = 15 gives high precision and recall. So, we selected this value as a default for δ s . It should be noted that the performance of all discovery approaches is affected by this parameter. As δ s increased, time and space also decreased significantly.

2) CANDIDATE DURATION THRESHOLD (δ t )
We chose this parameter to set the maximum 'coherence' of groups, i.e., the time a group would stay together. Generally speaking, δ t did not significantly affect time or space for any algorithm. However, the setting δ t too short led to low precision as shown in Fig. 5. It was challenging to detect loose group companions. Most of the group members moved together for a long time and left temporarily at a particular time. Based on this, we chose the default δ t = 15 that achieved acceptable performance in both precision and recall.

3) CANDIDATE LEAVE TIME THRESHOLD (δ l )
If we set the object to leave time, δ l = 0, then we do not allow the member to leave the group at any time and will get as a traveling companion. If the leave time threshold was too large, it did not track the loose companion accurately. In essence, a member temporarily joins another group and returns later. Thus, increasing δ l degraded the performance of the algorithm's result.

B. DATA SETTING
In real applications, moving objects report their positions with different timestamps. The different timestamps of each object can cause difficulty in tracking objects meaningfully and accurately in the loose group companions. So, we set a fixed time interval and generate a snapshot for each time stepwith equal length snapshots. For example, if an object reports its position to the server within every 10-second interval, then we set each snapshot for 10 seconds.
In our experiment evaluation, the sampling rate was different for each dataset. In the truck dataset, some trucks reported their positions every 30 seconds, and some reported every minute. So, we set one minute per snapshot. It consists of 50 concrete delivery trucks heading to construction sites around the Athens metropolitan area in Greece. In the T-Drive dataset, the sampling time varied from 3 to 10 minutes: it was collected from GPS devices on taxis in Beijing, China. On average, we set 5 minutes for each snapshot. For the synthetic dataset, the data was generated with the GSTD data generator [32] at one-minute intervals.
In setting this period for each snapshot, point redundancy and missing point problems occurred in some snapshots. Point redundancy occurred when a single object reported multiple positions in a snapshot. To solve the redundancy problem, we found a common point by averaging the points. Objects sometimes missed reporting their positions due to lost connections or device faults. To handle the missing points, we used the linear interpolation to fill the gaps. Fig.11 shows an example of the redundancy and the missing points that occur in the trajectories of three objects for three snapshots.

VII. CONCLUSION AND FUTURE WORK
We described an improved strategy for tracking loose group companions to avoid the strict requirement for members in a single group. Since the whole discovery process from the database may cause high computational time complexity, we proposed the algorithms in the form of an incremental manner. Besides, due to the evolving nature of the trajectory data stream, the cost of discovering a loose companion with the baseline discovery algorithms is still high. Thus, we firstly developed a micro-group based clustering method to reduce the time complexity in the clustering phase. Our clustering method outperformed when moving objects change their locations.
Consequently, we defined a new micro-group based loose companion discovery algorithm with candidate pruning strategies. Our proposed algorithm also improved the efficiency of the loose group companion discovery. As a result, our micro-group based loose group companion discovery was about 45% faster than Tang et al.'s buddy-based loose companion discovery.
In the traffic monitoring system, this research benefits the authorities in understanding the routes of a group of vehicles and helps to avoid traffic congestion by detecting the loose group companions in advance. In future work, we will discover some challenging moving object group patterns as realworld traffic scenarios to be used in the traffic monitoring system.