TY - GEN
T1 - IMC-SORT
T2 - 30th Great Lakes Symposium on VLSI, GLSVLSI 2020
AU - Li, Zheyu
AU - Challapalle, Nagadastagiri
AU - Ramanathan, Akshay Krishna
AU - Narayanan, Vijaykrishnan
N1 - Funding Information:
This work was supported in part by Semiconductor Research Corporation (SRC) Center for Research in Intelligent Storage and Processing in Memory (CRISP).
Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/9/7
Y1 - 2020/9/7
N2 - Processing-in-memory (PIM) architectures have gained significant importance as an alternative paradigm to the von-Neumann architectures to alleviate the memory wall and technology scaling problems. PIM architectures have achieved significant latency and energy consumption improvements for various emerging and widely used workloads such as deep neural networks, graph analytics, databases and computational genomics. In this work, we propose IMC-Sort, an in-memory parallel sorting architecture using the hybrid memory cube (HMC) for accelerating the sort workloads. Sort is one of the fundamental and widely used algorithm in various applications such as databases, networking, and data analytics. IMC-Sort architecture augments the hybrid memory cube memory system by incorporating custom sorting network at each of the HMC vault's logic layer. IMC-Sort uses optimized folded Bitonic sort and merge network to sort input sequences of arbitrary length at each vault and optimized address mapping mechanism to distribute the input data across HMC vaults. Merging of the sorted results across individual vaults is also performed using the vault's sorting network by communicating with other vaults through the HMC's crossbar network. Overall, IMC-Sort achieves 16.8×, 1.1× speedup and 375.5×, 13.6× savings in energy consumption compared to the widely used CPU implementation and state-of-the-art near memory custom sort accelerator respectively.
AB - Processing-in-memory (PIM) architectures have gained significant importance as an alternative paradigm to the von-Neumann architectures to alleviate the memory wall and technology scaling problems. PIM architectures have achieved significant latency and energy consumption improvements for various emerging and widely used workloads such as deep neural networks, graph analytics, databases and computational genomics. In this work, we propose IMC-Sort, an in-memory parallel sorting architecture using the hybrid memory cube (HMC) for accelerating the sort workloads. Sort is one of the fundamental and widely used algorithm in various applications such as databases, networking, and data analytics. IMC-Sort architecture augments the hybrid memory cube memory system by incorporating custom sorting network at each of the HMC vault's logic layer. IMC-Sort uses optimized folded Bitonic sort and merge network to sort input sequences of arbitrary length at each vault and optimized address mapping mechanism to distribute the input data across HMC vaults. Merging of the sorted results across individual vaults is also performed using the vault's sorting network by communicating with other vaults through the HMC's crossbar network. Overall, IMC-Sort achieves 16.8×, 1.1× speedup and 375.5×, 13.6× savings in energy consumption compared to the widely used CPU implementation and state-of-the-art near memory custom sort accelerator respectively.
UR - http://www.scopus.com/inward/record.url?scp=85091308340&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091308340&partnerID=8YFLogxK
U2 - 10.1145/3386263.3407581
DO - 10.1145/3386263.3407581
M3 - Conference contribution
AN - SCOPUS:85091308340
T3 - Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI
SP - 45
EP - 50
BT - GLSVLSI 2020 - Proceedings of the 2020 Great Lakes Symposium on VLSI
PB - Association for Computing Machinery
Y2 - 7 September 2020 through 9 September 2020
ER -