Abstract
Attention-based models such as Transformers represent the state of the art for various machine learning (ML) tasks. Their superior performance is often overshadowed by the substantial memory requirements and low data reuse opportunities. Processing in Memory (PIM) is a promising solution to accelerate Transformer models due to its massive parallelism, low data movement costs, and high memory bandwidth utilization. Existing PIM accelerators lack the support for algorithmic optimizations like dynamic token pruning that can significantly improve the efficiency of Transformers. We identify two challenges to enabling dynamic token pruning on PIM-based architectures: the lack of an in-memory top-k token selection mechanism and the memory underutilization problem from pruning. To address these challenges, we propose PRIMATE, a software-hardware co-design PIM framework based on High Bandwidth Memory (HBM). We initiate minor hardware modifications to conventional HBM to enable Transformer model computation and top-k selection. For software, we introduce a pipelined mapping scheme and an optimization framework for maximum throughput and efficiency. PRIMATE achieves 30.6× improvement in throughput, 29.5× improvement in space efficiency, and 4.3× better energy efficiency compared to the current state-of-the-art PIM accelerator for Transformers.
| Original language | English (US) |
|---|---|
| Title of host publication | ASP-DAC 2024 - 29th Asia and South Pacific Design Automation Conference, Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 557-563 |
| Number of pages | 7 |
| ISBN (Electronic) | 9798350393545 |
| DOIs | |
| State | Published - 2024 |
| Event | 29th Asia and South Pacific Design Automation Conference, ASP-DAC 2024 - Incheon, Korea, Republic of Duration: Jan 22 2024 → Jan 25 2024 |
Publication series
| Name | Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC |
|---|
Conference
| Conference | 29th Asia and South Pacific Design Automation Conference, ASP-DAC 2024 |
|---|---|
| Country/Territory | Korea, Republic of |
| City | Incheon |
| Period | 1/22/24 → 1/25/24 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
All Science Journal Classification (ASJC) codes
- Electrical and Electronic Engineering
- Computer Science Applications
- Computer Graphics and Computer-Aided Design
Fingerprint
Dive into the research topics of 'PRIMATE: Processing in Memory Acceleration for Dynamic Token-pruning Transformers'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver