Conventional processors suffer from high access latency and power dissipation due to the demand for memory bandwidth for data-intensive workloads, such as machine learning and analytic. In-memory computing support for various memory technologies has provided formidable improvement in performance and energy for such workloads, alleviating the repeated accesses and data movement between CPU and storage. While many processing in-memory (PIM) works have been proposed to efficiently compute dot products using Kirchoff's law, such solutions are unsuitable for many analytic workloads where working data is too large and too sparse to efficiently store in memory. This article closely focuses on the peripheral circuit design for diode-selected crossbars and configures the compute-embedded fabric to efficiently compute sparse matrix-vector multiplication (SpMV). On average, our proposed end-to-end SpMV accelerator achieves $7.7\times $ speed up and $4.9\times $ energy-savings compared to the state-of-the-art Fulcrum.
|Original language||English (US)|
|Number of pages||11|
|Journal||IEEE Transactions on Very Large Scale Integration (VLSI) Systems|
|State||Published - Dec 1 2021|
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
- Electrical and Electronic Engineering