Exploiting Staleness for Approximating Loads on CMPs

Prasanna Venkatesh Rengasamy, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das

Research output: Contribution to journalConference articlepeer-review

7 Scopus citations


Coherence misses are an important factor in limitingthe scalability of multi-threaded shared memory applicationson chip multiprocessors (CMPs) that are envisaged to containdozens of cores in the imminent future. This paper proposesa novel approach to tackling this problem by leveraging thegrowingly important paradigm of approximate computing. Manyapplications are either tolerant to slight errors in the output or ifstringent, have in-built resiliency to tolerate some errors in the ex-ecution. The approximate computing paradigm suggests breakingconventional barriers of mandating stringent correctness on thehardware, allowing more flexibility in the performance-power-reliability design space. Taking the multi-threaded applicationsin the SPLASH-2 benchmark suite, we note that nearly all theseapplications have such inherent resiliency and/or tolerance toslight errors in the output. Based on this observation, we proposeto approximate coherence-related load misses by returning stalevalues, i.e., the version at the time of the invalidation. We showthat returning such values from the invalidated lines alreadypresent in d-L1 offers only limited scope for improvement sincethose lines get evicted fairly soon due to the high pressure ond-L1. Instead, we propose a very small (8 lines) Stale VictimCache (SVC), to hold such lines upon d-L1 eviction. While thisdoes offer significant improvement, there is the possibility ofdata getting very stale in such a structure, making it highlysensitive to the choice of what data to keep, and for how long. Toaddress these concerns, we propose to time-out these lines fromthe SVC to limit their staleness in a mechanism called SVC+TB. We show that SVC+TB provides as much as 28.6% speedup insome SPLASH-2 applications, with an average speedup between10-15% across the entire suite, becoming comparable to an idealexecution that does not incur coherence misses. Further, theconsequent approximations have little impact on the correctness, allowing all of them to complete. There were no errors, becauseof inherent application resilience, in eleven applications, and themaximum error was at most 0.08% across the entire suite.

Original languageEnglish (US)
Article number7429318
Pages (from-to)343-354
Number of pages12
JournalParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
StatePublished - 2015
Event24th International Conference on Parallel Architecture and Compilation, PACT 2015 - San Francisco, United States
Duration: Oct 18 2015Oct 21 2015

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture


Dive into the research topics of 'Exploiting Staleness for Approximating Loads on CMPs'. Together they form a unique fingerprint.

Cite this