TY - GEN
T1 - FishStore
T2 - 2019 International Conference on Management of Data, SIGMOD 2019
AU - Xie, Dong
AU - Chandramouli, Badrish
AU - Li, Yinan
AU - Kossmann, Donald
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/6/25
Y1 - 2019/6/25
N2 - The last decade has witnessed a huge increase in data being ingested into the cloud, in forms such as JSON, CSV, and binary formats. Traditionally, data is either ingested into storage in raw form, indexed ad-hoc using range indices, or cooked into analytics-friendly columnar formats. None of these solutions is able to handle modern requirements on storage: making the data available immediately for ad-hoc and streaming queries while ingesting at extremely high throughputs. This paper builds on recent advances in parsing and indexing techniques to propose FishStore, a concurrent latch-free storage layer for data with flexible schema, based on multi-chain hash indexing of dynamically registered predicated subsets of data. We find predicated subset hashing to be a powerful primitive that supports a broad range of queries on ingested data and admits a high-performance concurrent implementation. Our detailed evaluation on real datasets and queries shows that FishStore can handle a wide range of workloads and can ingest and retrieve data at an order of magnitude lower cost than state-of-the-art alternatives.
AB - The last decade has witnessed a huge increase in data being ingested into the cloud, in forms such as JSON, CSV, and binary formats. Traditionally, data is either ingested into storage in raw form, indexed ad-hoc using range indices, or cooked into analytics-friendly columnar formats. None of these solutions is able to handle modern requirements on storage: making the data available immediately for ad-hoc and streaming queries while ingesting at extremely high throughputs. This paper builds on recent advances in parsing and indexing techniques to propose FishStore, a concurrent latch-free storage layer for data with flexible schema, based on multi-chain hash indexing of dynamically registered predicated subsets of data. We find predicated subset hashing to be a powerful primitive that supports a broad range of queries on ingested data and admits a high-performance concurrent implementation. Our detailed evaluation on real datasets and queries shows that FishStore can handle a wide range of workloads and can ingest and retrieve data at an order of magnitude lower cost than state-of-the-art alternatives.
UR - http://www.scopus.com/inward/record.url?scp=85069429781&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069429781&partnerID=8YFLogxK
U2 - 10.1145/3299869.3319896
DO - 10.1145/3299869.3319896
M3 - Conference contribution
AN - SCOPUS:85069429781
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1711
EP - 1728
BT - SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data
PB - Association for Computing Machinery
Y2 - 30 June 2019 through 5 July 2019
ER -