TY - JOUR
T1 - AB-tree
T2 - 48th International Conference on Very Large Data Bases, VLDB 2022
AU - Zhao, Zhuoyue
AU - Xie, Dong
AU - Li, Feifei
N1 - Funding Information:
Zhuoyue Zhao thanks the support for NSF IIS-1816149 and a Google PhD Fellowship. The authors appreciate the valuable feedback from the anonymous PVLDB reviewers for improving this manuscript.
Publisher Copyright:
© 2022, VLDB Endowment.
PY - 2022
Y1 - 2022
N2 - There has been an increasing demand for real-time data analytics. Approximate Query Processing (AQP) is a popular option for that because it can use random sampling to trade some accuracy for lower query latency. However, the state-of-the-art AQP system either relies on scan-based sampling algorithms to draw samples, which can still incur a non-trivial cost of table scan, or creates samples of the database in a preprocessing step, which are hard to update. The alternative is to use the aggregate B-tree indexes to support both random sampling and updates in database with logarithmic time. However, to the best of our knowledge, it is unknown how to design an aggregate B-tree to support highly concurrent random sampling and updates, due to the difficulty of maintaining the aggregate weights correctly and efficiently with concurrency. In this work, we identify the key challenges to achieve high concurrency and present AB-tree, an index for highly concurrent random sampling and update operations. We also conduct extensive experiments to show its efficiency and efficacy in a variety of workloads.
AB - There has been an increasing demand for real-time data analytics. Approximate Query Processing (AQP) is a popular option for that because it can use random sampling to trade some accuracy for lower query latency. However, the state-of-the-art AQP system either relies on scan-based sampling algorithms to draw samples, which can still incur a non-trivial cost of table scan, or creates samples of the database in a preprocessing step, which are hard to update. The alternative is to use the aggregate B-tree indexes to support both random sampling and updates in database with logarithmic time. However, to the best of our knowledge, it is unknown how to design an aggregate B-tree to support highly concurrent random sampling and updates, due to the difficulty of maintaining the aggregate weights correctly and efficiently with concurrency. In this work, we identify the key challenges to achieve high concurrency and present AB-tree, an index for highly concurrent random sampling and update operations. We also conduct extensive experiments to show its efficiency and efficacy in a variety of workloads.
UR - http://www.scopus.com/inward/record.url?scp=85134365174&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134365174&partnerID=8YFLogxK
U2 - 10.14778/3538598.3538606
DO - 10.14778/3538598.3538606
M3 - Conference article
AN - SCOPUS:85134365174
SN - 2150-8097
VL - 15
SP - 1835
EP - 1847
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 9
Y2 - 5 September 2022 through 9 September 2022
ER -