TY - GEN
T1 - Enhancing Parallelism in Commercial PIM DRAM with LUT-Based Design
AU - Panigrahi, Prapti
AU - Biswas, Rudra
AU - Devic, Alexandar
AU - Narayanan, Vijaykrishnan
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/6/29
Y1 - 2025/6/29
N2 - Real-world industrial Processing-in-Memory (PIM) devices, such as UPMEM PIM-DIMM, Samsung HBM-PIM, and SK Hynix AiM, integrate computation within memory modules to alleviate data transfer bottlenecks. However, area density constraints often limit the uniform deployment of compute units across all dies. Opportunistically, we propose a Multi-Mode PIM design that exploits non-compute dies by storing precomputed results in look-up tables. Our approach achieves high parallelism for low-precision computations using otherwise idle dies, while maintaining minimal area overhead. In our evaluation, the Multi-Mode PIM achieves up to 19.3× speedup over HBM-PIM and 26.1× over BLIMP for low-precision workloads, and up to 2.12× and 3.98×, respectively, for mixed-precision workloads.
AB - Real-world industrial Processing-in-Memory (PIM) devices, such as UPMEM PIM-DIMM, Samsung HBM-PIM, and SK Hynix AiM, integrate computation within memory modules to alleviate data transfer bottlenecks. However, area density constraints often limit the uniform deployment of compute units across all dies. Opportunistically, we propose a Multi-Mode PIM design that exploits non-compute dies by storing precomputed results in look-up tables. Our approach achieves high parallelism for low-precision computations using otherwise idle dies, while maintaining minimal area overhead. In our evaluation, the Multi-Mode PIM achieves up to 19.3× speedup over HBM-PIM and 26.1× over BLIMP for low-precision workloads, and up to 2.12× and 3.98×, respectively, for mixed-precision workloads.
UR - https://www.scopus.com/pages/publications/105017801381
UR - https://www.scopus.com/pages/publications/105017801381#tab=citedBy
U2 - 10.1145/3716368.3735260
DO - 10.1145/3716368.3735260
M3 - Conference contribution
AN - SCOPUS:105017801381
T3 - Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI
SP - 399
EP - 400
BT - GLSVLSI 2025 - Proceedings of the Great Lakes Symposium on VLSI 2025
PB - Association for Computing Machinery
T2 - 35th Edition of the Great Lakes Symposium on VLSI 2025, GLSVLSI 2025
Y2 - 30 June 2025 through 2 July 2025
ER -