TY - GEN
T1 - ZEBRA
T2 - 29th Asia and South Pacific Design Automation Conference, ASP-DAC 2024
AU - Chen, Yiming
AU - Yin, Guodong
AU - Zhong, Hongtao
AU - Lee, Mingyen
AU - Yang, Huazhong
AU - George, Sumitha
AU - Narayanan, Vijaykrishnan
AU - Li, Xueqing
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Deploying a lightweight quantized model in compute-in-memory (CIM) might result in significant accuracy degradation due to reduced signal-noise rate (SNR). To address this issue, this paper presents ZEBRA, a zero-bit robust-accumulation CIM approach, which utilizes bitwise zero patterns to compress computation with ultra-high resilience against noise due to circuit non-idealities, etc. First, ZEBRA provides a cross-level design that successfully exploits value-adaptive zero-bit patterns to improve the performance in robust 8-bit quantization dramatically. Second, ZEBRA presents a multi-level local computing unit circuit design to implement the bitwise sparsity pattern, which boosts the area/energy efficiency by 2x-4x compared with existing CIM works. Experiments demonstrate that ZEBRA can achieve <1.0% accuracy loss in CIFAR10/100 with typical noise, while conventional CIM works suffer from > 10% accuracy loss. Such robustness leads to much more stable accuracy for high-parallelism inference on large models in practice.
AB - Deploying a lightweight quantized model in compute-in-memory (CIM) might result in significant accuracy degradation due to reduced signal-noise rate (SNR). To address this issue, this paper presents ZEBRA, a zero-bit robust-accumulation CIM approach, which utilizes bitwise zero patterns to compress computation with ultra-high resilience against noise due to circuit non-idealities, etc. First, ZEBRA provides a cross-level design that successfully exploits value-adaptive zero-bit patterns to improve the performance in robust 8-bit quantization dramatically. Second, ZEBRA presents a multi-level local computing unit circuit design to implement the bitwise sparsity pattern, which boosts the area/energy efficiency by 2x-4x compared with existing CIM works. Experiments demonstrate that ZEBRA can achieve <1.0% accuracy loss in CIFAR10/100 with typical noise, while conventional CIM works suffer from > 10% accuracy loss. Such robustness leads to much more stable accuracy for high-parallelism inference on large models in practice.
UR - http://www.scopus.com/inward/record.url?scp=85189300419&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85189300419&partnerID=8YFLogxK
U2 - 10.1109/ASP-DAC58780.2024.10473851
DO - 10.1109/ASP-DAC58780.2024.10473851
M3 - Conference contribution
AN - SCOPUS:85189300419
T3 - Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC
SP - 153
EP - 158
BT - ASP-DAC 2024 - 29th Asia and South Pacific Design Automation Conference, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 January 2024 through 25 January 2024
ER -