Loading [MathJax]/extensions/MathMenu.js
Understanding the Intrinsic Characteristics of Spatial Partitioning in Distributed Spatial Join | IEEE Conference Publication | IEEE Xplore

Understanding the Intrinsic Characteristics of Spatial Partitioning in Distributed Spatial Join


Abstract:

Spatial join has become a frequently used yet resource-intensive operation in geospatial applications, driven by the increasing volume and complexity of geospatial data. ...Show More

Abstract:

Spatial join has become a frequently used yet resource-intensive operation in geospatial applications, driven by the increasing volume and complexity of geospatial data. With Hadoop and Spark becoming the de facto standard platforms for distributed computing, scalable spatial data processing is primarily achieved by partitioning the input space to form parallel units on these platforms. Effective spatial data partitioning is critical for task parallelization and load balancing, but it faces significant challenges due to data skew and the geometric and topological complexity of spatial objects, particularly in supporting spatial joins. This paper examines the interplay among query performance, spatial data partitioning, query types, data, and system characteristics. We qualitatively and quantitatively analyze the features of representative partitioning algorithms that impact overall query performance. Along with these analyses, we propose a data sampling-based approach for selecting optimized partitioning strategies. Extensive experiments on large and complex datasets using MapReduce frameworks are conducted to validate the correctness of our analysis and the effectiveness of our optimization approach.
Date of Conference: 15-18 December 2024
Date Added to IEEE Xplore: 16 January 2025
ISBN Information:

ISSN Information:

Conference Location: Washington, DC, USA

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.