TY - GEN
T1 - Simba
T2 - 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
AU - Xie, Dong
AU - Li, Feifei
AU - Yao, Bin
AU - Li, Gefei
AU - Zhou, Liang
AU - Guo, Minyi
N1 - Funding Information:
Feifei Li and Dong Xie were supported in part by NSF grants 1200792, 1302663, 1443046. Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo were supported by the National Basic Research Program (973 Program, No.2015CB352403), and the Scientific Innovation Act of STCSM (No.13511504200, 15JC1402400). Feifei Li and Bin Yao were also supported in part by NSFC grant 61428204.
Publisher Copyright:
© 2016 ACM.
PY - 2016/6/26
Y1 - 2016/6/26
N2 - Large spatial data becomes ubiquitous. As a result, it is critical to provide fast, scalable, and high-throughput spatial queries and analytics for numerous applications in location-based services (LBS). Traditional spatial databases and spatial analytics systems are diskbased and optimized for IO efficiency. But increasingly, data are stored and processed in memory to achieve low latency, and CPU time becomes the new bottleneck. We present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba is based on Spark and runs over a cluster of commodity machines. In particular, Simba extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API. It introduces indexes over RDDs in order to work with big spatial data and complex spatial operations. Lastly, Simba implements an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput. Extensive experiments over large data sets demonstrate Simba's superior performance compared against other spatial analytics system.
AB - Large spatial data becomes ubiquitous. As a result, it is critical to provide fast, scalable, and high-throughput spatial queries and analytics for numerous applications in location-based services (LBS). Traditional spatial databases and spatial analytics systems are diskbased and optimized for IO efficiency. But increasingly, data are stored and processed in memory to achieve low latency, and CPU time becomes the new bottleneck. We present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba is based on Spark and runs over a cluster of commodity machines. In particular, Simba extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API. It introduces indexes over RDDs in order to work with big spatial data and complex spatial operations. Lastly, Simba implements an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput. Extensive experiments over large data sets demonstrate Simba's superior performance compared against other spatial analytics system.
UR - http://www.scopus.com/inward/record.url?scp=84979663010&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84979663010&partnerID=8YFLogxK
U2 - 10.1145/2882903.2915237
DO - 10.1145/2882903.2915237
M3 - Conference contribution
AN - SCOPUS:84979663010
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1071
EP - 1085
BT - SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
PB - Association for Computing Machinery
Y2 - 26 June 2016 through 1 July 2016
ER -