TY - GEN
T1 - CAPE
T2 - 27th Annual IEEE International Symposium on High Performance Computer Architecture, HPCA 2021
AU - Caminal, Helena
AU - Yang, Kailin
AU - Srinivasa, Srivatsa
AU - Ramanathan, Akshay Krishna
AU - Al-Hawaj, Khalid
AU - Wu, Tianshu
AU - Narayanan, Vijaykrishnan
AU - Batten, Christopher
AU - Martinez, Jose F.
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/2
Y1 - 2021/2
N2 - Processing-in-memory (PIM) architectures attempt to overcome the von Neumann bottleneck by combining computation and storage logic into a single component. The content-Addressable parallel processing paradigm (CAPP) from the seventies is an in-situ PIM architecture that leverages content-Addressable memories to realize bit-serial arithmetic and logic operations, via sequences of search and update operations over multiple memory rows in parallel. In this paper, we set out to investigate whether the concepts behind classic CAPP can be used successfully to build an entirely CMOS-based, general-purpose microarchitecture that can deliver manyfold speedups while remaining highly programmable. We conduct a full-stack design of a Content-Addressable Processing Engine (CAPE), built out of dense push-rule 6T SRAM arrays. CAPE is programmable using the RISC-V ISA with standard vector extensions. Our experiments show that CAPE achieves an average speedup of 14 (up to 254) over an area-equivalent (slightly under 9 mm2 at 7 nm) out-of-order processor core with three levels of caches.
AB - Processing-in-memory (PIM) architectures attempt to overcome the von Neumann bottleneck by combining computation and storage logic into a single component. The content-Addressable parallel processing paradigm (CAPP) from the seventies is an in-situ PIM architecture that leverages content-Addressable memories to realize bit-serial arithmetic and logic operations, via sequences of search and update operations over multiple memory rows in parallel. In this paper, we set out to investigate whether the concepts behind classic CAPP can be used successfully to build an entirely CMOS-based, general-purpose microarchitecture that can deliver manyfold speedups while remaining highly programmable. We conduct a full-stack design of a Content-Addressable Processing Engine (CAPE), built out of dense push-rule 6T SRAM arrays. CAPE is programmable using the RISC-V ISA with standard vector extensions. Our experiments show that CAPE achieves an average speedup of 14 (up to 254) over an area-equivalent (slightly under 9 mm2 at 7 nm) out-of-order processor core with three levels of caches.
UR - http://www.scopus.com/inward/record.url?scp=85105022896&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105022896&partnerID=8YFLogxK
U2 - 10.1109/HPCA51647.2021.00054
DO - 10.1109/HPCA51647.2021.00054
M3 - Conference contribution
AN - SCOPUS:85105022896
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 557
EP - 569
BT - Proceeding - 27th IEEE International Symposium on High Performance Computer Architecture, HPCA 2021
PB - IEEE Computer Society
Y2 - 27 February 2021 through 1 March 2021
ER -