TY - JOUR
T1 - Cramming More Weight Data Onto Compute-in-Memory Macros for High Task-Level Energy Efficiency Using Custom ROM With 3984-kb/mm2Density in 65-nm CMOS
AU - Yin, Guodong
AU - Chen, Yiming
AU - Zhou, Mufeng
AU - Tang, Wenjun
AU - Lee, Mingyen
AU - Yang, Zekun
AU - Liao, Tianyu
AU - Du, Xirui
AU - Narayanan, Vijaykrishnan
AU - Yang, Huazhong
AU - Jia, Hongyang
AU - Liu, Yongpan
AU - Li, Xueqing
N1 - Publisher Copyright:
© 1966-2012 IEEE.
PY - 2024/6/1
Y1 - 2024/6/1
N2 - Owing to the mature process and low access energy, static random-access memory (SRAM) has become a promising candidate for compute-in-memory (CiM) acceleration of multiply-accumulate (MAC) operations. However, SRAM-based CiM cells have rather low density and thus very limited total on-chip memory capacity. This fact, unfortunately, results in undesired weight data reload operations from the off-chip dynamic random-access memory (DRAM) in data-intensive scenarios and may even tarnish the energy efficiency of CiM at the task level. Therefore, exploration toward higher density CiM in CMOS is critical to ensure truly high energy efficiency in practice. Aligned with the goal of ultrahigh density, this article presents the first one-transistor (1T) multi-level-cell (MLC) read-only memory (ROM) CiM macro for multi-bit MAC. The highlights of the proposed ROM CiM techniques include: 1) multi-source-driven (MSD) 1T-MLC ROM; 2) charge-domain capacitor sharing (CDCS) for ultrahigh CiM memory density; and 3) ROM-based transfer-learning architectures to provide flexible support of different tasks with minor accuracy degradation. These techniques are demonstrated with a fabricated 2-Mb 1T-MLC ROM CiM macro for 8 b times 8 b MAC computing. This macro features a record-high cell density of 0.096-mu text{m}{2} /bit and a macro weight density of 3984 kb/mm2 in a 65-nm pure CMOS technology. It also achieves 3.8times -55.3times lower energy consumption per image inference than the state-of-the-art CiM macros when considering the possible DRAM access.
AB - Owing to the mature process and low access energy, static random-access memory (SRAM) has become a promising candidate for compute-in-memory (CiM) acceleration of multiply-accumulate (MAC) operations. However, SRAM-based CiM cells have rather low density and thus very limited total on-chip memory capacity. This fact, unfortunately, results in undesired weight data reload operations from the off-chip dynamic random-access memory (DRAM) in data-intensive scenarios and may even tarnish the energy efficiency of CiM at the task level. Therefore, exploration toward higher density CiM in CMOS is critical to ensure truly high energy efficiency in practice. Aligned with the goal of ultrahigh density, this article presents the first one-transistor (1T) multi-level-cell (MLC) read-only memory (ROM) CiM macro for multi-bit MAC. The highlights of the proposed ROM CiM techniques include: 1) multi-source-driven (MSD) 1T-MLC ROM; 2) charge-domain capacitor sharing (CDCS) for ultrahigh CiM memory density; and 3) ROM-based transfer-learning architectures to provide flexible support of different tasks with minor accuracy degradation. These techniques are demonstrated with a fabricated 2-Mb 1T-MLC ROM CiM macro for 8 b times 8 b MAC computing. This macro features a record-high cell density of 0.096-mu text{m}{2} /bit and a macro weight density of 3984 kb/mm2 in a 65-nm pure CMOS technology. It also achieves 3.8times -55.3times lower energy consumption per image inference than the state-of-the-art CiM macros when considering the possible DRAM access.
UR - http://www.scopus.com/inward/record.url?scp=85177066872&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85177066872&partnerID=8YFLogxK
U2 - 10.1109/JSSC.2023.3326955
DO - 10.1109/JSSC.2023.3326955
M3 - Article
AN - SCOPUS:85177066872
SN - 0018-9200
VL - 59
SP - 1912
EP - 1925
JO - IEEE Journal of Solid-State Circuits
JF - IEEE Journal of Solid-State Circuits
IS - 6
ER -