CORD: Parallelizing Query Processing Across Multiple Computational Storage Devices

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Query processing on large-scale scientific datasets often suffers from performance bottlenecks due to significant data transfers between storage nodes and applications in decoupled distributed storage environments. This issue is particularly pronounced in high-selectivity queries where unnecessary data is transferred between the storage plane and the compute plane. To tackle this challenge, we introduce the integration of SmartSSDs, functioning as Computational Storage Devices (CSDs), into the storage layer. By offloading simple filter-projection operations to these CSDs, we significantly reduce data transfer bottlenecks, leading to lower query latency and higher throughput. Our novel framework, CORD (parallelizing query processing across multiple Computational stORage Devices), facilitates parallel query execution across multiple CSDs while considering data locality. CORD is compatible with any decoupled storage system equipped with CSDs. Our extensive empirical evaluation demonstrates that CORD achieves up to 93 × speedup for high-selectivity queries compared to traditional (compute plane) execution strategy and offers a further 1.64 × speedup in cases of uneven data distribution. Additionally, we present two optimizations for batch query processing. Results from our experiments with 4 CSDs reveal substantial performance improvements provided by the optimizations embedded in CORD.

Original languageEnglish (US)
Title of host publicationProceedings - 2025 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1141-1153
Number of pages13
Edition2025
ISBN (Electronic)9798331532376
DOIs
StatePublished - 2025
Event39th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2025 - Milan, Italy
Duration: Jun 3 2025Jun 7 2025

Conference

Conference39th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2025
Country/TerritoryItaly
CityMilan
Period6/3/256/7/25

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'CORD: Parallelizing Query Processing Across Multiple Computational Storage Devices'. Together they form a unique fingerprint.

Cite this