Abstract
The performance and energy costs of coordinating and performing data movement have led to proposals adding compute units and/or specialized access units to the memory hierarchy. However, current on-chip offload models are restricted to fixed compute and access pattern types, which limits software-driven optimizations and the applicability of such an offload interface to heterogeneous accelerator resources. This paper presents a computation offload interface for multi-core systems augmented with distributed on-chip accelerators. With energy-efficiency as the primary goal, we define mechanisms to identify offload partitioning, create a low-overhead execution model to sequence these fine-grained operations, and evaluate a set of workloads to identify the complexity needed to achieve distributed near-data execution. We demonstrate that our model and interface, combining features of dataflow in parallel with near-data processing engines, can be profitably applied to memory hierarchies augmented with either specialized compute substrates or lightweight near-memory cores. We differentiate the benefits stemming from each of elevating data access semantics, near-data computation, inter-accelerator coordination, and compute/access logic specialization. Experimental results indicate a geometric mean (energy efficiency improvement; speedup; data movement reduction) of (3.3; 1.59; 2.4) ×, (2.46; 1.43; 3.5) × and (1.46; 1.65; 1.48) × compared to an out-of-order processor, monolithic accelerator with centralized accesses and monolithic accelerator with decentralized accesses, respectively. Evaluating both lightweight core and CGRA fabric implementations highlights model flexibility and quantifies the benefits of compute specialization for energy efficiency and speedup at 1.23 × and 1.43 ×, respectively.
| Original language | English (US) |
|---|---|
| Title of host publication | Proceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022 |
| Publisher | IEEE Computer Society |
| Pages | 1160-1177 |
| Number of pages | 18 |
| ISBN (Electronic) | 9781665462723 |
| DOIs | |
| State | Published - 2022 |
| Event | 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022 - Chicago, United States Duration: Oct 1 2022 → Oct 5 2022 |
Publication series
| Name | Proceedings of the Annual International Symposium on Microarchitecture, MICRO |
|---|---|
| Volume | 2022-October |
| ISSN (Print) | 1072-4451 |
Conference
| Conference | 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022 |
|---|---|
| Country/Territory | United States |
| City | Chicago |
| Period | 10/1/22 → 10/5/22 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
Fingerprint
Dive into the research topics of 'An architecture interface and offload model for low-overhead, near-data, distributed accelerators'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver