TY - JOUR
T1 - Exploring energy scalability in Coprocessor-Dominated Architectures for dark silicon
AU - Zheng, Qiaoshi
AU - Goulding-Hotta, Nathan
AU - Ricketts, Scott
AU - Swanson, Steven
AU - Taylor, Michael Bedford
AU - Sampson, Jack
N1 - Copyright:
Copyright 2014 Elsevier B.V., All rights reserved.
PY - 2014/7
Y1 - 2014/7
N2 - As chip designers face the prospect of increasingly dark silicon, there is increased interest in incorporating energy-efficient specialized coprocessors into general-purpose designs. For specialization to be a viablemeans of leveraging dark silicon, it must provide energy savings over the majority of execution for large, diverse workloads, and this will require deploying coprocessors in large numbers. Recent work has shown that automatically generated application-specific coprocessors can greatly improve energy efficiency, but it is not clear that current techniques will scale to Coprocessor-Dominated Architectures (CoDAs) with hundreds or thousands of coprocessors. We show that scaling CoDAs to include very large numbers of coprocessors is challenging because of the energy cost of interconnects, the memory system, and leakage. These overheads grow with the number of coprocessors and, left unchecked, will squander the energy gains that coprocessors can provide. The article presents a detailed study of energy costs across a wide range of tiled CoDA designs and shows that careful choice of cache configuration, tile size, coarse-grain power management and transistor implementation can limit the growth of these overheads. For multithreaded workloads, designer must also take care to avoid excessive contention for coprocessors, which can significantly increase energy consumption. The results suggest that, for CoDAs that target larger workloads, amortizing shared overheads via multithreading can provide up to 3.8× reductions in energy per instruction, retaining much of the 5.3× potential of smaller designs.
AB - As chip designers face the prospect of increasingly dark silicon, there is increased interest in incorporating energy-efficient specialized coprocessors into general-purpose designs. For specialization to be a viablemeans of leveraging dark silicon, it must provide energy savings over the majority of execution for large, diverse workloads, and this will require deploying coprocessors in large numbers. Recent work has shown that automatically generated application-specific coprocessors can greatly improve energy efficiency, but it is not clear that current techniques will scale to Coprocessor-Dominated Architectures (CoDAs) with hundreds or thousands of coprocessors. We show that scaling CoDAs to include very large numbers of coprocessors is challenging because of the energy cost of interconnects, the memory system, and leakage. These overheads grow with the number of coprocessors and, left unchecked, will squander the energy gains that coprocessors can provide. The article presents a detailed study of energy costs across a wide range of tiled CoDA designs and shows that careful choice of cache configuration, tile size, coarse-grain power management and transistor implementation can limit the growth of these overheads. For multithreaded workloads, designer must also take care to avoid excessive contention for coprocessors, which can significantly increase energy consumption. The results suggest that, for CoDAs that target larger workloads, amortizing shared overheads via multithreading can provide up to 3.8× reductions in energy per instruction, retaining much of the 5.3× potential of smaller designs.
UR - http://www.scopus.com/inward/record.url?scp=84905978850&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905978850&partnerID=8YFLogxK
U2 - 10.1145/2584657
DO - 10.1145/2584657
M3 - Article
AN - SCOPUS:84905978850
SN - 1539-9087
VL - 13
JO - Transactions on Embedded Computing Systems
JF - Transactions on Embedded Computing Systems
IS - 4 SPEC. ISSUE
M1 - 130
ER -