TY - GEN
T1 - Design space for scaling-in general purpose computing within the DDR DRAM hierarchy for map-reduce workloads
AU - Rai, Siddhartha Balakrishna
AU - Sivasubramaniam, Anand
AU - Kumar, Adithya
AU - Rengasamy, Prasanna Venkatesh
AU - Narayanan, Vijaykrishnan
AU - Akel, Ameen
AU - Eilert, Sean
N1 - Funding Information:
This work was supported in part by CRISP, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA and NSF grants 1909004, 1714389, 1912495, 1629915, 1629129, 1763681.
Publisher Copyright:
© 2021 ACM.
PY - 2021/5/11
Y1 - 2021/5/11
N2 - This paper conducts a design space exploration of placing general purpose RISCV cores within the DDR DRAM hierarchy to boost the performance of important data analytics applications in the datacenter. We investigate the hardware (where? how many? how to interface?) and software (how to place data? how to map computations?) choices for placing these cores within the rank, chip, and bank of the DIMM slots to take advantage of the locality vs. parallelism trade-offs. We use the popular MapReduce paradigm, normally used to scale out workloads across servers, to scale in these workloads into the DDR DRAM hierarchy. We evaluate the design space using diverse off-the-shelf Apache Spark Workloads to show the pros-and-cons of different hardware placement and software mapping strategies. Results show that bank-level RISCV cores can provide tremendous speedup (up to 363X) for the offload-able parts of these applications, amounting to 14X speedup overall in some applications. Even in the non-amenable applications, we get at least 31% performance boost for the entire application. To realize this, we incur an area overhead of 4% at the bank level, and increase in temperature of < 4°C over the chip averaged over all applications.
AB - This paper conducts a design space exploration of placing general purpose RISCV cores within the DDR DRAM hierarchy to boost the performance of important data analytics applications in the datacenter. We investigate the hardware (where? how many? how to interface?) and software (how to place data? how to map computations?) choices for placing these cores within the rank, chip, and bank of the DIMM slots to take advantage of the locality vs. parallelism trade-offs. We use the popular MapReduce paradigm, normally used to scale out workloads across servers, to scale in these workloads into the DDR DRAM hierarchy. We evaluate the design space using diverse off-the-shelf Apache Spark Workloads to show the pros-and-cons of different hardware placement and software mapping strategies. Results show that bank-level RISCV cores can provide tremendous speedup (up to 363X) for the offload-able parts of these applications, amounting to 14X speedup overall in some applications. Even in the non-amenable applications, we get at least 31% performance boost for the entire application. To realize this, we incur an area overhead of 4% at the bank level, and increase in temperature of < 4°C over the chip averaged over all applications.
UR - http://www.scopus.com/inward/record.url?scp=85106059208&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106059208&partnerID=8YFLogxK
U2 - 10.1145/3457388.3458661
DO - 10.1145/3457388.3458661
M3 - Conference contribution
AN - SCOPUS:85106059208
T3 - Proceedings of the 18th ACM International Conference on Computing Frontiers 2021, CF 2021
SP - 113
EP - 123
BT - Proceedings of the 18th ACM International Conference on Computing Frontiers 2021, CF 2021
PB - Association for Computing Machinery, Inc
T2 - 18th ACM International Conference on Computing Frontiers 2021, CF 2021
Y2 - 11 May 2021 through 13 May 2021
ER -