Efficient Retrieval of Top-k Weighted Triangles on Static and Dynamic Spatial Data

Due to the proliferation of location-based services, spatial data analysis becomes more and more important. We consider graphs consisting of spatial points, where each point has edges to its nearby points and the weight of each edge is the distance between the corresponding points, as they have been receiving attention as spatial data analysis tools. We focus on triangles in such graphs and address the problem of retrieving the top-k weighted spatial triangles. This problem is computationally challenging, because the number of triangles in a graph is generally huge and enumerating all of them is not feasible. To overcome this challenge, we propose an algorithm that returns the exact result efficiently. We moreover consider two dynamic data models: (i) fully dynamic data that allow arbitrary point insertions and deletions and (ii) streaming data in a sliding-window model. They often appear in location-based services. The results of our experiments on real datasets show the efficiency of our algorithms for static and dynamic data.


A. MOTIVATION AND CHALLENGE
Given a set P of spatial points and a distance threshold r, a spatial neighbor graph of P consists of a set of vertices that correspond to points in P and a set of edges where an edge is created between two points iff the distance between them is not larger than r and the weight of this edge is the distance. Graph-based structures provide intuitive relationships between spatial points, so techniques that mine some patterns (i.e., sub-graphs) from spatial neighbor graphs are often required. In graph contexts, triangles are particularly considered as triangle is one of the simplest yet important primitive sub-graph patterns having many applications [17], [22]. For instance, spatial triangles can be utilized in group search [14], co-location pattern mining [34], and urban planning [12], [13]. Note that the number of triangles in a spatial neighbor graph is generally huge. It is not feasible to enumerate all of them, and the output size should be controllable (by a userspecified parameter k) [13], [17]. In spatial databases, given a subset of points in P , the cohesiveness of the subset is a factor in measuring its importance [14], [34].
The above applications and observations motivate us to address the problem of retrieving the top-k weighted spatial triangles. The weight of the triangle formed by points p x , p y , and p z is defined as dist(p x , p y )+dist(p y , p z )+dist(p x , p z ), where dist(·, ·) measures the Euclidean distance between two points, which takes into account the cohesiveness. Then, given P and k, this problem retrieves k spatial triangles with the minimum weight among all triangles in the spatial neighbor graph of P . For example, this problem formulation yields the following observation: EXAMPLE 1. We ran the above problem on a real dataset Places, a set of POIs in the U.S.A., by setting k = 100, VOLUME 4, 2016 and found some co-location patterns. First, we observed an intuitive pattern: industrial and precision manufacturing facilities exist near machine shops. We secondly observed that ⟨dentist, psychologist, consultant⟩ appears multiple times in the top-k triangles, suggesting that (psychological) consulting services tend to exist near clinics (or hospitals). In addition, we found that consultant services tend to exist near capital and risk management services, such as investment and stocks, in the top-k result.
As seen above, this problem helps analysts and experts mine (i) relationships between points and (ii) patterns/knowledge hidden in spatial datasets, and they also help consider where to open a new store (or service).
However, this problem is computationally challenging. A straightforward approach is to enumerate all triangles and then output k triangles with the minimum weight. The number of triangles in the spatial neighbor graph is exponential to the dataset size, suggesting the infeasibility of this approach. To alleviate this cost, DHL [17], which is a heuristic algorithm and was proposed originally for graph databases, can be used. DHL needs to sort edges in order of weights, because it greedily accesses the edges in this order to avoid enumerating triangles with large weights. However, if we employ DHL, we face substantial time incurred by sorting a large amount of edges of the spatial neighbor graph of P .

B. CONTRIBUTION
To solve the above issues, we propose an efficient algorithm that returns the exact answer. We observe that a subset of the spatial neighbor graph, which usually contains the topk weighted triangles, can be built offline. From this partial graph, for each point p ∈ P , we can enumerate a triangle having p with a small weight in O(1) time offline. These n triangles provide a tight threshold for the top-k result, which helps filter unnecessary points and triangles, accelerating online computation. Thanks to these observations, our algorithm does not need to correctly build the spatial graph and sort all edges.
We moreover consider insertions of new points and deletions of existing points, because this case often appears in location-based services [2], [10]. In this case, the top-k result may change, thus we need to efficiently update the result whenever we have an update (insertion or deletion). We show that our filtering idea for static data is still effective for dynamic data. Furthermore, we consider a sliding-window model for applications that focus only on recently generated points [2], [3], [19], [20]. We also design an efficient and exact algorithm for this case.
We summarize our main contributions below.
• We address the problem of retrieving the top-k weighted spatial triangles. To the best of our knowledge, this is the first work to tackle this problem in spatial databases. • We propose a simple yet efficient algorithm for solving this problem exactly.
• We show how to deal with fully dynamic data to efficiently update the top-k result. • We design an efficient and exact algorithm for monitoring the top-k result under a sliding-window model. • We conduct experiments on real datasets, and the results show that (i) our solution for static data is up to three orders of magnitude faster than a baseline algorithm and (ii) our solutions for dynamic data can quickly update the top-k result. This article significantly extends our conference paper [26]. Compared with this paper, this article provides • more detailed explanations of our solution for static data with examples and pseudo codes, • an exact algorithm for fully dynamic data, • an exact algorithm for streaming data in a slidingwindow model, • a detailed performance statistics of our solution for static data, • experimental results of our solutions for dynamic data, and • surveys about related works.

C. ORGANIZATION
The rest of this article is organized as follows. Section II introduces preliminary information. We present our solutions for static data, fully dynamic data, and sliding-window data, in Sections III, IV, and V, respectively. We report our experimental results in Section VI. We review related work in Section VII. Finally, in Section VIII, we conclude this article.

II. PROBLEM DEFINITION
Let P be a set of spatial (or geo-location) points in a Euclidean space. A spatial point p ∈ P has 2-dimensional coordinates ∈ R 2 . The Euclidean distance between p and p ′ is denoted by dist(p, p ′ ). Given a distance threshold r, we can build a spatial neighbor graph of P defined below: DEFINITION 1 (SPATIAL NEIGHBOR GRAPH). Given a set P of points and a distance threshold r, the spatial neighbor graph of P is an undirected graph consisting of a set of vertices that correspond to the points in P and a set of edges where an edge is created between p i and p j iff dist(p i , p j ) ≤ r. The edge between p i and p j is represented as e i,j and has a weight w(e i,j ) where w(e i,j ) = dist(p i , p j ).
In the spatial neighbor graph, there are triangles consisting of three points fully connected to each other. We define their weight: DEFINITION 2 (WEIGHT OF A TRIANGLE). Given a triangle △ x,y,z consisting of three points p x , p y , and p z , the weight of this triangle, w(△ x,y,z ), is: w(△ x,y,z ) = dist(p x , p y )+dist(p y , p z )+dist(p x , p z ). (1) Section III addresses the problem defined as follows: DEFINITION 3 (TOP-K WEIGHTED TRIANGLE RETRIEVAL PROBLEM). Given a set P of points, an output size k, and

Notation
Meaning p a 2-dimensional (geo-spatial) point P a set of n points dist(p, p ′ ) the Euclidean distance between p and p ′ △x,y,z the triangle formed by px, py, and pz w(△x,y,z) the weight of △x,y,z k a result size r a distance threshold B a batch size T a set of triangles τ a weight threshold θ an edge weight threshold R a top-k result set N (p) a set of neighbors of p W a sliding window size a distance threshold r, this problem is to retrieve at most k triangles in the spatial neighbor graph of P with the minimum weight.
We assume that r is reasonably specified so that there are many triangles in the graph. When P is a dynamic set of spatial points, it is required to update the top-k result. This problem, which we address in Section IV, is formally defined as follows: . Given a dynamic set P of points, an output size k, and a distance threshold r, this problem is to monitor (or update) at most k triangles in the spatial neighbor graph of P with the minimum weight, whenever P has updates (insertions and/or deletions of points).
Last, when P is a set of streaming points, a sliding-window model, which takes only the most recent W points into account, is usually employed [2], [3], [19], [20]. Section V assumes this case and addresses the following problem. DEFINITION 5 (TOP-K WEIGHTED TRIANGLE MONITOR-ING PROBLEM ON A SLIDING-WINDOW MODEL). Given a set P of streaming points, an output size k, a windows size W , and a distance threshold r, this problem is to monitor (or update) at most k triangles in the spatial neighbor graph of P W with the minimum weight, where P W contains the W most recently generated points in P . Table 1 summarise notations used frequently in this article.

III. OUR SOLUTION FOR STATIC DATA
This section presents our proposed solution. Section III-A introduces our main idea. In Sections III-B and III-C, we detail our offline and online algorithms, respectively.

A. MAIN IDEA
To efficiently output the result, pruning points that do not contribute to the top-k result is important. Assume that triangle △ x,y,z is included in the top-k result. From Equation (1) and Definition 3, it is intuitively seen that, for p x , edges e x,y and e x,z would be (two of) the t nearest neighbors (t- NNs) of p x , where t is a small constant. This suggests that the top-k triangles can be retrieved from the t-NN graph and that correct building of the spatial neighbor graph of P is not necessary. It can be seen that we can obtain the result from a sparser graph than the spatial neighbor graph.
Now assume that we have the t-NN graph of P , then, we can enumerate a promising triangle having p, i.e., the triangle formed by p and its 2 nearest neighbors offline, for each p ∈ P . Even if these triangles are not included in the topk result, they have small weights, leading to a tight threshold for online computation that helps prune unnecessary points (and triangles). Our algorithm is designed based on the above ideas and consists of a one-time offline computation and online computation.

B. OFFLINE PROCESSING
Algorithm 1 describes our offline algorithm. The objectives of this offline processing are to (i) build a B-NN graph of P , where B ≥ 3 is a batch size, and (ii) enumerate triangles with small weights. The batch size B is tuned empirically, and we show that a small constant (e.g., 10) is enough in Section VI-A. We use p.E to denote the set of edges held by a point p ∈ P . Given P and B, for each p x ∈ P , we compute the B-NNs of p x in P \{p x } by using a kd-tree [6]. The B-NNs are maintained in p.E and sorted in ascending order of weight (i.e., distance). Moreover, for each p x ∈ P , we compute the triangle △ x,y,z , where p y and p z are respectively the NN and 2-NN of p x . This triangle is maintained in T , so T has at most n triangles (we remove duplicated triangles). Last, we sort the triangles in T in ascending order of weight.
Remark. Our offline algorithm needs O(n 1.5 ) time [26]. Let s avg be the average number of edges held by each point. Building the spatial neighbor graph of P incurs O(n( √ n + s avg )) time. Our offline algorithm is hence cheaper, and it is VOLUME 4, 2016 Algorithm 1 OFFLINE PROCESSING Require: P (set of points) and B (batch size) 1: T ← ∅ // a set of triangles 2: for each p x ∈ P do 3: edges are sorted in ascending order of weight 4: T ← T ∪ {△ x,y,z } where p y and p z are respectively the NN and 2-NN of p x 5: end for 6: Sort the triangles △ ∈ T in ascending order of w(△) general to any k and r.

C. ONLINE PROCESSING
To efficiently retrieve the top-k weighted spatial triangles, we consider edge access order. Let τ be an intermediate threshold of the top-k result (i.e., the weight of the intermediate top k-th triangle). From τ and triangle inequality, for any edges, we can obtain a weight θ that has to be satisfied to form the top-k weighted spatial triangles. That is, any triangles that have edges with weights larger than θ do not have to be enumerated. We exploit this observation along with the triangles in T and the B-NN graph obtained offline.
Algorithm 2 overviews our online algorithm. Let P cand be the set of points that may form top-k triangles, and P cand = P at initialization. Our online algorithm has the following steps: 1) We first initialize the top-k result R and the threshold τ from the n triangles obtained offline in DETERMINE-THRESHOLD(P cand , r). Then, from τ , we compute a threshold θ for edges. As seen later, any edges with weights larger than θ cannot form top-k triangles. 2) (If necessary, we update the B-NN graph by increasing B.) In REDUCE-CANDIDATES(P cand , i, θ), we remove points with no edges satisfying θ any more from P cand . 3) For each point in P cand , we additionally enumerate triangles that could be in the top-k result and update R if necessary. 4) We repeat steps 2 and 3 until we have P cand = ∅, and then R is returned.
• Step 1. Recall that T is a sorted set of triangles obtained offline. Each triangle in T is formed by a point p, its NN, and 2-NN. (We remove all triangles in T that have edges with weights larger than r.) In DETERMINE-THRESHOLD(P cand , r), we initialize R by the first k triangles in T , and τ is the weight of the k-th triangle. Let △ x,y,z be the k-th triangle. We set the threshold θ for edges as follows: This is used in the next step.
Algorithm 2 ONLINE PROCESSING Require: P (set of points), k (output size), r (distance threshold), B (batch size), and T (a set of triangles) Ensure: R (set of k triangles with the minimum weight) 1 R ← ENUMERATE-TRIANGLES(P cand , r, i)

15:
Execute lines 4-6 16: Step 2. We next filter unnecessary points in P cand by using θ. Let p xj be the j-th NN of p x . Consider the i-th iteration of REDUCE-CANDIDATES(P cand , i, θ). For p x ∈ P cand , if w(e x,xi+2 ) > θ, triangles including e x,xi+2 can be ignored. (Recall that NN and 2-NN were considered in the offline processing.) PROPOSITION 1. For a point p x ∈ P cand , if w(e x,xi+2 ) > θ, any triangles that have e x,xi+2 cannot be the top-k weighted spatial triangles.
Proof. See [26]. □ From this observation, we see that, if w(e x,xi+2 ) > θ, all unseen triangles having p x do not have to be enumerated and p x can be safely removed from P cand . REDUCE-CANDIDATES(P cand , i, θ) does this point removal.
The triangles enumerated offline practically have small weights, as they are based on NN and 2-NN. Therefore, τ and θ are tight even when i is small, and we can effectively reduce the size of P cand in early iterations. EXAMPLE 3. We use Figure 2 to understand our point filtering. Assume that DETERMINE-THRESHOLD(P cand , r) returns the triangle formed by the red edges, and θ is also obtained as depicted in this figure. Focus on p x and p y that are described by green and blue, respectively. The edge between p x (p y ) and its 3-NN is described by the same color, and its weight is shown in the right part of this figure. We have w(e x,x3 ) > θ and w(e y,y3 ) > θ, and unseen triangles that have p x or p y cannot be the top-k result. Therefore, we can remove them from P cand .
• Step 3. After filtering unnecessary points in the above step, we enumerate triangles that may become the top-k result in ENUMERATE-TRIANGLES(P cand , r, i). Consider the i-th  iteration of this step. For each p x ∈ P cand , we enumerate triangles formed by p x , p xi+2 , and p xj , where j ∈ [1, ..., i + 1], while updating the top-k result R, τ , and θ.
W.r.t. p xj , we access it in order of p x1 , ..., p xi+1 . Then, it is important to notice that w(e x,xj )+w(e x,xi+2 ) monotonically increases. When we have w(e x,xj ) + w(e x,xi+2 ) ≥ τ , we see that triangles with these edges cannot be the top-k result, thus we can stop enumerating triangles without losing correctness.
Analysis. Let n i be the size of P cand at the i-th iteration of step 2. In addition, let n ′ i be the size of P cand at the i-th iteration of step 3. Our online algorithm needs where I is the number of iterations of step 3. (The detail appears in [26].) In Section VI-A, we show that our algorithm has a small n ′ i and I in practice, yielding O( . This suggests that our algorithm practically beats any approaches that build the spatial neighbor graph of P , as they need at least Ω(n 1+ϵ ) time where ϵ > 0.

IV. OUR SOLUTION FOR FULLY DYNAMIC DATA
We next consider the case where P is subjective to updates (point insertions and deletions), and address the problem defined in Definition 4. In this case, the top-k result R may change because of the update of P . We below consider how to minimize the result update cost while keeping the correct answer, and show that our approach in Section III-C can actually deal with point insertions and deletions flexibly. Hereinafter, we assume that the top-k result R is initialized by our algorithm in the previous section.

A. INSERTION CASE
Assume that we have a new point p x . It is important to note that triangles which can newly become the top-k result are limited to the ones having p x . We use this observation to incrementally update the top-k result.
1) Given p x , we run a range search on a kd-tree where its query point is p x and radius is r, to update the B-NN graph. (Observe that the points whose B-NNs may be updated exist within the distance r from p x , due to the constraint of r.) For each point p y in this range search result, if p x becomes a new B-NN of p y , we add p x into the edge set p y .E. Also, for p x , we make p x .E from this range search result. 2) We next consider the triangle △ x,x1,x2 . If w(e x , e x1 ) > θ or w(e x , e x2 ) > θ, the weights of new triangles having p x are larger than τ . Hence, we terminate the update. 3) Otherwise, we run lines 7-16 of Algorithm 2 by setting P cand = {p x }. The main cost of this case is incurred by the range search, which needs O( √ n + s), where s is the size of the range search result. The second operation needs O(1) time. Also, the third operation needs a trivial cost ≪ O( √ n) because it has a few iteration numbers in practice.

B. DELETION CASE
We next assume that a point p x is removed from P . We have two cases incurred by this point removal.
No triangles are removed from R. If no triangles having p x are in the top-k result, it is trivial to see that the top-k result does not change. In this case, we simply remove the edges corresponding to p x from the B-NN graph.
Some triangles are removed from R. In this case, we need to update the top-k result. Note that this case is essentially the same as the static case, because R has less than k triangles. Therefore, to update R, we update the B-NN graph, update R via DETERMINE-THRESHOLD(P, r) 1 , and then verify R through lines 4-16 of Algorithm 2.
Clearly, the former case needs O(1) time. The cost of the latter case is the same as our online algorithm in Section III-C. It is intuitively seen that the latter case rarely occurs for datasets with a large n. This implies that the amortized update cost for a deletion can come close to the former cost.

V. OUR SOLUTION FOR SLIDING-WINDOW MODEL
This section addresses the problem in Definition 5. Different from the fully dynamic case in Section IV, we need to consider insertion and deletion at the same time in the slidingwindow model. This is because a window slide removes the oldest point and inserts a new point. Therefore, under this model, the top-k result has to be updated when 1) the weights of triangles having a new point are less than the threshold of the current top-k result and 2) the removed point has triangles included in the current top-k result. To efficiently deal with these cases, we maintain the following triangle for each point p ∈ P W . (Recall that P W is a set of points in the current window.) DEFINITION 6 (△ min p ). Consider a point p ∈ P W , and △ min p represents the triangle that has the minimum weight among the set of triangles having p but not being included in the top-k result. Remove all such triangles from R 3: end if 4: for each p y ∈ N (p x ) such that p x ∈ △ min Although △ min p may be updated when the window slides, it supports efficient top-k result update. For case 1), we can focus only on points p having w(△ min p ) < τ (recall that τ is the threshold of the top-k result) and can ignore the other points. For case 2), by adding △ min p into an intermediate topk result, we can obtain a tight τ , which also supports pruning unnecessary points. Since we maintain only a single triangle △ min p for each point p ∈ P W , the space complexity is only O(W ). Below, we show how to maintain △ min p when p x is removed from and is added to the window.

A. DEALING WITH REMOVED POINT
When p x is removed from the window, we confirm whether p x ∈ △ min of some points in P W . Let N (p x ) be a set of neighbors of p x . If p x ∈ △ min y for a point p y ∈ P W , we have to update △ min y . To achieve this, we need to enumerate triangles containing p y , and this can be done by essentially the same operation in step 3 of our algorithm for static data, see Section III-C.
Algorithm 3 describes how to deal with p x when it is removed from the window. We first remove invalid triangles from the current top-k result R. Then, we update △ min y for each p y ∈ N (p x ) in the way explained above, which corresponds to UPDATE-△ min (P W , p y ).

B. DEALING WITH NEW POINT
When a new point p x is inserted into the window, we evaluate whether triangles having p x and p y can be △ min y for each p y ∈ N (p x ). (We retrieve N (p x ) through a range search.) Algorithm 4 describes this procedure. We first compute △ min x . Then, for each p y ∈ N (p x ), we update △ min y if necessary. How to enumerate triangles follows the same way as in Section V-A.

C. TOP-K RESULT UPDATE
Recall that we need to update the top-k result R when (i) w(△ min x ) < τ where p x is a new point and (ii) triangles having p y , where p y is a removed point, are included in R.
Algorithm 5 UPDATE-TOP-k Require: P W , r, T (set of W triangles with △ min ) and R (the current top-k result) Ensure: R 1: Run Algorithms 3 and 4 in order 2: Sort the triangles △ min ∈ T in ascending order of w(△ min ) 3: l ← k − |R| 4: if l > 0 then R ← R ∪ {l triangles with the smallest weight in T } 5: end if 6: △ x,y,z ← triangle with the k-th smallest weight in R 7: τ ← w(△ x,y,z ) 8: i ← 1 9: while i ≤ l do Taking into account this fact, we update R in the following three steps, which are summarized in Algorithm 5.
1) We first remove invalid triangles from R in Algorithm 3. Then, if |R| < k, we add (k − |R|) triangles △ min with the minimum w(△ min ) to R to obtain an intermediate top-k result with a (probably) tight threshold. 2) If △ min z was inserted into R in the previous step, there may exist other triangles △ having p z such that w(△) < τ . We hence enumerate such triangles and update △ min z and R in lines 11-12. 3) Last, due to the update of τ , there may exist other points p ∈ P W such that w(△ min p ) < τ . If so, we do the same operations in the second step for p in lines 16-20. The number of triangles enumerated in Algorithm 5 cannot be bounded (and the worst case can be similar to our static algorithm) because it depends on data distributions. Nevertheless, it is practically small because the top-k result does not change so frequently. In Section VI-C, we show that Algorithm 5 never reaches the worst case.

D. OPTIMIZATION
Assume that, for a point p a ∈ P W , △ min and △ min c . These degrade the performance of Algorithm 5. To avoid such redundancy, we employ a directed spatial neighbor graph.

DEFINITION 7 (DIRECTED SPATIAL NEIGHBOR GRAPH).
Assume that points in P W are maintained by the generation order, and o(p i ) ≺ o(p j ) shows that p i was generated before p j . Then, given P W and r, in the directed spatial neighbor graph of P W , there is a direct edge e i,j between p i and p j if and only if dist From this definition, hereinafter, N (p i ) is also re-defined as a set of points p j such that dist(p i , p j ) < r and o(p i ) ≺ o(p j ). Below, we present why this structure can remove the redundancy.
When p x is removed. Recall that the sliding-window model removes the oldest point, so it is important to notice that o(p x ) ≺ o(p) for every p ∈ P W . We then see that p x / ∈ N (p) and △ min p never contains p x for every p ∈ P W . Therefore, when p x is removed, we do not have to update △ min p for every p ∈ P W . EXAMPLE 4. We explain this observation by using Figure 3. Figure 3(a) illustrates a directed spatial neighbor graph consisting of P W = {p 1 , p 2 , p 3 , p 4 , p 5 , p 6 }, where o(p i ) ≺ o(p j ) for i < j. Now assume that the window slides and p 1 is removed. As shown in Figure 3(b), the other points do not have direct edges to p 1 and do not change N (p). Hence, △ min p also does not change.
When p y is added. In this case, we update the directed spatial neighbor graph by using a range search. Then, for each p x ∈ P W such that dist(p x , p y ) < r, we update △ min y by enumerating triangles that have both p x and p y (if necessary). Note that we have △ min y ̸ = △ min z for p y and p z such that o(p y ) ≺ o(p z ), since N (p z ) does not contain p y .
Top-k result update. Thanks to the above optimization, we have no duplication w.r.t. △ min thus can avoid unnecessary triangle enumerations. We incorporate this optimization into Algorithms 3-5.

VI. EXPERIMENT
For experiments, we used a Ubuntu machine equipped with 3.6GHz Intel Core i9-9900K CPU and 128GB RAM. In addition, all algorithms were compiled by g++ 9.3.0 with -O3 flag and ran in a single thread mode.

A. EVALUATION ON STATIC DATA
This section evaluates our algorithm for static data. We compared it with DHL [17], which can compute the exact answer from the spatial neighbor graph of P . As mentioned in Sections I-A and VII, DHL is the only existing algorithm that can deal with our problem. For DHL, we used the original implementation 2 .
Dataset. We used two real large datasets, CaStreet 3 and Places 4 , to investigate how efficiently our algorithm runs on large datasets. Recall that one of our objectives is to design an efficient (and exact) algorithm for the problem defined in Definition 3. CaStreet consists of the minimum bounding rectangles of road segments in the U.S.A. We used bottomleft and upper-right points, and its cardinality is 4,499,454. Places consists of the geo-locations of public places in the U.S.A, and its cardinality is 9,356,750.
Parameter. We set n = 1, 000, 000 (via random sampling), k = 100, and r = 0.01 by default. Impact of r. Note that as r increases, the number of neighbors also increases. Table 2 shows the result of our experiment with varying r. The computation time of our algorithm is essentially the same even when we use a larger r than the default one. This result shows the robustness of our algorithm against r.   Offline time. We report the offline time of our algorithm at the default parameter. On CaStreet and Places, our offline algorithm took 21.05 and 26.54 seconds, respectively. Since our offline algorithm is general for any k and r, the offline time is reasonable. (Actually, even if our algorithm begins from offline computation, it took less time to return the answer than DHL.) Impact of n. Figure 4 studies the scalability of our algorithm to the cardinality of dataset n. Our algorithm has a linear scalability to n, while DHL is superlinear w.r.t. n. This clarifies the advantage of our algorithm. When we used all points of CaStreet and Places, our algorithm is 2807 and 6193 times faster than DHL on CaStreet and Places, respectively. To understand the linear scalability of our algorithm, we investigated the size of P cand and the number of triangles enumerated in each iteration. Table 3 shows the result on Places when n = 1, 000, 000 and n = 9, 356, 750. (We omit the result on CaStreet, because it is similar to the one in Table  3.) It is important to note that the numbers of iterations and triangles enumerated are both very small. This also clarifies the effectiveness of our idea. Recall that the time complexity of our online algorithm is O(

Impact
. In practice, I and n ′ i are sufficiently small. In addition, when i ≥ 2, n i = n ′ i−1 and n i is also sufficiently small. Notice that n 1 = n, then we have O( . Now it is clear why we have the linear scalability. This section evaluates our solution for fully dynamic data. Because no existing works have addressed this problem so far, we compared our solution with our static algorithm that computes the result from scratch whenever we have an update.
Dataset. We used the same datasets as the ones in Section VI-A, and we used 1,000,000 points for initialization.
Workload. We used 10,000 updates as a workload. This workload consisted of (1 − α) × 10, 000 insertions and α × 10, 000 deletions. (Given an update is a deletion, we removed a random point in P .) To investigate the result update efficiency of our solution, we conducted experiments with varying α (i.e., deletion rate). We set k = 100.
Result. We measured the time to complete the workload, and Figure 5 depicts the result. Due to the incremental update, our algorithm for dynamic data, which is represented by "Our-Dynamic", completes the workload significantly faster than the algorithm for static data (represented by "OurStatic"). For example, in the case of CaStreet and α = 0.1, OurDynamic completes the workload in about 400 seconds. Its average update time per an update is hence about 40 milliseconds, whereas that of OurStatic is about 9000 milliseconds 5 . When α is larger, the performance difference becomes more bigger. We see that, as α increases, OurDynamic needs less time to complete the workload. In most deletion cases, the top-k result did not change, meaning that OurDynamic incurs only O(1) time in each of these cases. We had these cases more as the deletion rate increases, thus its time becomes shorter.

C. EVALUATION ON SLIDING-WINDOW MODEL
Last, we evaluate our algorithms for the sliding-window model. This problem also has no existing works, so we compared our algorithms with our static algorithm that computes the result from scratch whenever the window slides.  We use "Ours", "Ours-Opt", and "Static" to respectively denote Algorithm 5 without the optimization in Section V-D, Algorithm 5 with the optimization, and the static algorithm.
Dataset. We used the same datasets in Section VI-B.
Workload. After the first W points were contained in the window, we ran 10,000 slides. We set r = 0.1 and k = 100.
Result. We measured the total time to deal with 10,000 window slides. Figure 6 shows the result of experiments with different window sizes. We observe that larger W needs a longer time. However, Ours and Ours-Opt keep short update time. When the window size is 1,000,000, Ours is about 700 (10,000) times faster than Static on CaStreet (Places). Furthermore, even when the window size is 2 million, Ours-Opt needs only 82 [msec] and 2 [msec] on average to update the top-k result per slide on CaStreet and Places, respectively, suggesting that it scales well to large window sizes. It is also seen that Ours-Opt is always faster than Ours, thanks to the optimization.

VII. RELATED WORK
This section reviews existing works that relate to the problem of retrieving the top-k weighted spatial triangles.
Graph-based Spatial Data Mining. Graph is a simple yet effective structure for representing relationships between data. Spatial points usually have relationships if they locate in close positions. Therefore, graph-based spatial data mining has been receiving attention. (Note that our work is different from works for road networks, e.g., [8], because these assume that graphs are given and P is constrained by the road networks.) Literature [12] considers spatial pattern matching. Given P and a query that is a graph pattern, it finds all subsets of P that matches the query. Different from our problem, this spatial pattern matching requires to specify a sub-graph of P . Clearly, the graph structure of P is not pre-known, so it is not an easy task to specify a concrete query. Moreover, the query result size is not controllable. Literature [13] considers a top-k version of spatial pattern matching, but it still has the former drawback. Spatial maximal clique in the spatial neighbor graph of P is considered in [34]. Since a triangle is a 3-clique, this problem is similar to ours. The authors of [34] found that the finding a spatial maximal clique corresponds to doing a maximal convex polygon. Their solution is based on this observation, and they do not consider the weight of polygons. Therefore, their technique cannot be employed for finding the top-k weighted spatial triangles.
Given a set W of location-based service providers and a set U of users with locations, [27] tackled the bipartite matching between W and U . Unlike the above works that try to "mine interesting sub-graphs", this problem focuses on "building a graph". Recently, [30] designed a system that builds spatial proximity graphs (e.g., a k-NN graph and Delaunay graph) from a given set P of points for multicore processors. It also supports other operations, such as clustering and computing minimum spanning trees on spatial proximity graphs. However, retrieving the top-k weighted spatial triangles is not supported, and we are the first to study this problem in spatial databases.
Spatial Data Analysis. Because spatial point analysis is well known to be important, much efforts have been made to develop query processing techniques, machine-learning models [23], [29], and systems [32], [33]. We below review some examples of analytical techniques.
The problem of maximizing range sum queries was addressed in [11]. Given a rectangle, this problem finds the location of the rectangle that maximizes the weight of points enclosed by the rectangle. A streaming version of this problem was also considered in [2], [3]. Such location selection problems have been extensively studied, e.g., in [15]. The interaction between spatial points was addressed in [4]. Some works considered spatial data visualization. In [16], to achieve interactive visualization of spatial points, the authors proposed an efficient algorithm that incrementally updates the visualization result from the previous one. Moreover, [7] proposed an efficient bounding technique for kernel density visualization.
Triangle Enumeration/Counting. Because the problem of triangle enumeration/counting is one of the classic problems in graph databases, many works tackled it. State-of-the-art algorithms for static and dynamic graphs can be found in [1], [18], [24], [25], [31]. Unfortunately, existing works for graph databases generally assume unweighted graphs and do not consider any ranking of triangles.
Similarly to our problem, DHL addressed the problem of retrieving the top-k weighted triangles in graph databases. It was originally proposed for weighted graphs, so it can deal with our problem by building the spatial neighbor graph of P . (DHL originally retrieves k triangles with the maximum weight, but it is straightforward to focus on triangles with the minimum weight.) However, because of the overhead incurred by dealing with the spatial neighbor graph, DHL is significantly outperformed by our algorithm. VOLUME 4, 2016 TAKAHIRO HARA received the B.E, M.E, and Dr.E. degrees in Information Systems Engineering from Osaka University, Osaka, Japan, in 1995, 1997, and 2000, respectively. Currently, he is a full Professor of the Department of Multimedia Engineering, Osaka University. His research interests include distributed databases, peer-to-peer systems, mobile networks, and mobile computing systems. He is a distinguished scientist of ACM, a senior member of IEEE, and a member of three other learned societies. VOLUME 4, 2016