TY - GEN
T1 - Efficient complex operators for irregular codes
AU - Sampson, Jack
AU - Venkatesh, Ganesh
AU - Goulding-Hotta, Nathan
AU - Garcia, Saturnino
AU - Swanson, Steven
AU - Taylor, Michael Bedford
PY - 2011
Y1 - 2011
N2 - Complex "fat operators" are important contributors to the efficiency of specialized hardware. This paper introduces two new techniques for constructing efficient fat operators featuring up to dozens of operations with arbitrary and irregular data and memory dependencies. These techniques focus on minimizing critical path length and load-use delay, which are key concerns for irregular computations. Selective Depipelining(SDP) is a pipelining technique that allows fat operators containing several, possibly dependent, memory operations. SDP allows memory requests to operate at a faster clock rate than the datapath, saving power in the datapath and improving memory performance. Cachelets are small, customized, distributed L0 caches embedded in the datapath to reduce load-use latency. We apply these techniques to Conservation Cores(c-cores) to produce coprocessors that accelerate irregular code regions while still providing superior energy efficiency. On average, these enhanced c-cores reduce EDP by 2x and area by 35% relative to c-cores. They are up to 2.5x faster than a general-purpose processor and reduce energy consumption by up to 8x for a variety of irregular applications including several SPECINT benchmarks.
AB - Complex "fat operators" are important contributors to the efficiency of specialized hardware. This paper introduces two new techniques for constructing efficient fat operators featuring up to dozens of operations with arbitrary and irregular data and memory dependencies. These techniques focus on minimizing critical path length and load-use delay, which are key concerns for irregular computations. Selective Depipelining(SDP) is a pipelining technique that allows fat operators containing several, possibly dependent, memory operations. SDP allows memory requests to operate at a faster clock rate than the datapath, saving power in the datapath and improving memory performance. Cachelets are small, customized, distributed L0 caches embedded in the datapath to reduce load-use latency. We apply these techniques to Conservation Cores(c-cores) to produce coprocessors that accelerate irregular code regions while still providing superior energy efficiency. On average, these enhanced c-cores reduce EDP by 2x and area by 35% relative to c-cores. They are up to 2.5x faster than a general-purpose processor and reduce energy consumption by up to 8x for a variety of irregular applications including several SPECINT benchmarks.
UR - http://www.scopus.com/inward/record.url?scp=79955407416&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79955407416&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2011.5749754
DO - 10.1109/HPCA.2011.5749754
M3 - Conference contribution
AN - SCOPUS:79955407416
SN - 9781424494323
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 491
EP - 502
BT - Proceedings - 17th International Symposium on High-Performance Computer Architecture, HPCA 2011
T2 - 17th International Symposium on High-Performance Computer Architecture, HPCA 2011
Y2 - 12 February 2011 through 16 February 2011
ER -