TY - JOUR
T1 - Trifecta
T2 - A nonspeculative scheme to exploit common, data-dependent subcritical paths
AU - Ndai, Patrick
AU - Rafique, Nauman
AU - Thottethodi, Mithuna
AU - Ghosh, Swaroop
AU - Bhunia, Swarup
AU - Roy, Kaushik
N1 - Funding Information:
Manuscript received February 19, 2008; revised July 16, 2008. First published April 28, 2009; current version published December 23, 2009. The work of P. Ndai, S. Ghosh, and K. Roy was supported by the Focused Center Research Program and the work of N. Rafique and M. Thottethodi was supported in part by NSF Award CCF–0702612.
PY - 2010/1
Y1 - 2010/1
N2 - Pipelined processor cores are conventionally designed to accommodate the critical paths in the critical pipeline stage(s) in a single clock cycle, to ensure correctness. Such conservative design is wasteful in many cases since critical paths are rarely exercised. Thus, configuring the pipeline to operate correctly for rarely used critical paths targets the uncommon case instead of optimizing for the common case. In this study, we describe Trifecta - an architectural technique that completes common-case, subcritical path operations in a single cycle but uses two cycles when the critical path is exercised. This increases slack for both single- and twocycle operations and offers a unique advantage under process variation. In contrast with existing mechanisms that trade power or performance for yield, Trifecta improves the yield while preserving performance and power. We applied this technique to the critical pipeline stages of a superscalar out-of-order (OoO) and a single issue in-order processor, namely instruction issue and execute, respectively. Our experiments show that the rare two-cycle operations result in a small decrease (5% for integer and 2% for floating-point benchmarks of SPEC2000) in instructions per cycle. However, the increased delay slack causes an improvement in yieldadjusted-throughput by 20% (12.7%) for an in-order (InO) processor configuration.
AB - Pipelined processor cores are conventionally designed to accommodate the critical paths in the critical pipeline stage(s) in a single clock cycle, to ensure correctness. Such conservative design is wasteful in many cases since critical paths are rarely exercised. Thus, configuring the pipeline to operate correctly for rarely used critical paths targets the uncommon case instead of optimizing for the common case. In this study, we describe Trifecta - an architectural technique that completes common-case, subcritical path operations in a single cycle but uses two cycles when the critical path is exercised. This increases slack for both single- and twocycle operations and offers a unique advantage under process variation. In contrast with existing mechanisms that trade power or performance for yield, Trifecta improves the yield while preserving performance and power. We applied this technique to the critical pipeline stages of a superscalar out-of-order (OoO) and a single issue in-order processor, namely instruction issue and execute, respectively. Our experiments show that the rare two-cycle operations result in a small decrease (5% for integer and 2% for floating-point benchmarks of SPEC2000) in instructions per cycle. However, the increased delay slack causes an improvement in yieldadjusted-throughput by 20% (12.7%) for an in-order (InO) processor configuration.
UR - http://www.scopus.com/inward/record.url?scp=73249132776&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=73249132776&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2008.2007491
DO - 10.1109/TVLSI.2008.2007491
M3 - Article
AN - SCOPUS:73249132776
SN - 1063-8210
VL - 18
SP - 53
EP - 65
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IS - 1
M1 - 4895686
ER -