TY - JOUR
T1 - Spatial Independent Range Sampling
AU - Xie, Dong
AU - Phillips, Jeff M.
AU - Matheny, Michael
AU - Li, Feifei
N1 - Funding Information:
8 CONCLUSION We studied the uniform and weighted spatial independent range sampling (SIRS) problem aiming at retrieving random samples with independence over points residing in minimum bounding rectangles (MBRs). We designed concise index structures with careful data layout based on various space decomposition strategies, and proposed novel algorithms for both uniform and weighted SIRS queries with low theoretical complexity and excellent practical performance, showing orders of magnitude improvement over existing baselines. Overall, we propose using KDS, which better handles dynamic updates and avoids adversarial cases. In future work though, we will explore in more details on various choices of tree structures, space decomposition strategies and fanout, as well as how to adapt the combined sampling LSM trees to different workloads with updates. Acknowledgment. Dong Xie was supported by a Microsoft Research Ph.D Fellowship. Jeff M. Phillips thanks his support from NSF CCF-1350888, CNS-1514520, IIS-1619287, IIS-1816149, CNS-1564287, CFS-1953350, and an award from Visa Research. Feifei Li thanks his support from NSF CNS-1514520, IIS-1619287, IIS-1801446, and IIS-1816149.
Publisher Copyright:
© 2021 ACM.
PY - 2021
Y1 - 2021
N2 - Thanks to the wide adoption of GPS-equipped devices, the volume of collected spatial data is exploding. To achieve interactive exploration and analysis over big spatial data, people are willing to trade off accuracy for performance through approximation. As a foundation in many approximate algorithms, data sampling now requires more flexibility and better performance. In this paper, we study the spatial independent range sampling (SIRS) problem aiming at retrieving random samples with independence over points residing in a query region. Specifically, we have designed concise index structures with careful data layout based on various space decomposition strategies. Moreover, we propose novel algorithms for both uniform and weighted SIRS queries with low theoretical cost and complexity as well as excellent practical performance. Last but not least, we demonstrate how to support data updates and trade-offs between different sampling methods in practice. According to comprehensive evaluations conducted on real-world datasets, our methods achieve orders of magnitude performance improvement against baselines derived by existing works.
AB - Thanks to the wide adoption of GPS-equipped devices, the volume of collected spatial data is exploding. To achieve interactive exploration and analysis over big spatial data, people are willing to trade off accuracy for performance through approximation. As a foundation in many approximate algorithms, data sampling now requires more flexibility and better performance. In this paper, we study the spatial independent range sampling (SIRS) problem aiming at retrieving random samples with independence over points residing in a query region. Specifically, we have designed concise index structures with careful data layout based on various space decomposition strategies. Moreover, we propose novel algorithms for both uniform and weighted SIRS queries with low theoretical cost and complexity as well as excellent practical performance. Last but not least, we demonstrate how to support data updates and trade-offs between different sampling methods in practice. According to comprehensive evaluations conducted on real-world datasets, our methods achieve orders of magnitude performance improvement against baselines derived by existing works.
UR - http://www.scopus.com/inward/record.url?scp=85108979225&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108979225&partnerID=8YFLogxK
U2 - 10.1145/3448016.3452806
DO - 10.1145/3448016.3452806
M3 - Conference article
AN - SCOPUS:85108979225
SN - 0730-8078
SP - 2023
EP - 2035
JO - Proceedings of the ACM SIGMOD International Conference on Management of Data
JF - Proceedings of the ACM SIGMOD International Conference on Management of Data
T2 - 2021 International Conference on Management of Data, SIGMOD 2021
Y2 - 20 June 2021 through 25 June 2021
ER -