PRIMATE: Processing in Memory Acceleration for Dynamic Token-pruning Transformers

Yue Pan, Minxuan Zhou, Chonghan Lee, Zheyu Li, Rishika Kushwah, Vijaykrishnan Narayanan, Tajana Rosing

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Attention-based models such as Transformers represent the state of the art for various machine learning (ML) tasks. Their superior performance is often overshadowed by the substantial memory requirements and low data reuse opportunities. Processing in Memory (PIM) is a promising solution to accelerate Transformer models due to its massive parallelism, low data movement costs, and high memory bandwidth utilization. Existing PIM accelerators lack the support for algorithmic optimizations like dynamic token pruning that can significantly improve the efficiency of Transformers. We identify two challenges to enabling dynamic token pruning on PIM-based architectures: the lack of an in-memory top-k token selection mechanism and the memory underutilization problem from pruning. To address these challenges, we propose PRIMATE, a software-hardware co-design PIM framework based on High Bandwidth Memory (HBM). We initiate minor hardware modifications to conventional HBM to enable Transformer model computation and top-k selection. For software, we introduce a pipelined mapping scheme and an optimization framework for maximum throughput and efficiency. PRIMATE achieves 30.6× improvement in throughput, 29.5× improvement in space efficiency, and 4.3× better energy efficiency compared to the current state-of-the-art PIM accelerator for Transformers.

Original languageEnglish (US)
Title of host publicationASP-DAC 2024 - 29th Asia and South Pacific Design Automation Conference, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages557-563
Number of pages7
ISBN (Electronic)9798350393545
DOIs
StatePublished - 2024
Event29th Asia and South Pacific Design Automation Conference, ASP-DAC 2024 - Incheon, Korea, Republic of
Duration: Jan 22 2024Jan 25 2024

Publication series

NameProceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC

Conference

Conference29th Asia and South Pacific Design Automation Conference, ASP-DAC 2024
Country/TerritoryKorea, Republic of
CityIncheon
Period1/22/241/25/24

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design

Cite this