Compiler directed network-on-chip reliability enhancement for chip multiprocessors

Ozcan Ozturk, Mahmut Kandemir, Mary J. Irwin, Sri H.K. Narayanan

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Chip multiprocessors (CMPs) are expected to be the building blocks for future computer systems. While architecting these emerging CMPs is a challenging problem on its own, programming them is even more challenging. As the number of cores accommodated in chip multiprocessors increases, network-on-chip (NoC) type communication fabrics are expected to replace traditional point-to-point buses. Most of the prior software related work so far targeting CMPs focus on performance and power aspects. However, as technology scales, components of a CMP are being increasingly exposed to both transient and permanent hardware failures. This paper presents and evaluates a compiler-directed power-performance aware reliability enhancement scheme for network-on-chip (NoC) based chip multiprocessors (CMPs). The proposed scheme improves on-chip communication reliability by duplicating messages traveling across CMP nodes such that, for each original message, its duplicate uses a different set of communication links as much as possible (to satisfy performance constraint). In addition, our approach tries to reuse communication links across the different phases of the program to maximize link shutdown opportunities for the NoC (to satisfy power constraint). Our results show that the proposed approach is very effective in improving on-chip network reliability, without causing excessive power or performance degradation. In our experiments, we also evaluate the performance oriented and energy oriented versions of our compiler-directed reliability enhancement scheme, and compare it to two pure hardware based fault tolerant routing schemes.

Original languageEnglish (US)
Pages (from-to)85-94
Number of pages10
JournalACM SIGPLAN Notices
Volume45
Issue number4
DOIs
StatePublished - Apr 2010

All Science Journal Classification (ASJC) codes

  • General Computer Science

Fingerprint

Dive into the research topics of 'Compiler directed network-on-chip reliability enhancement for chip multiprocessors'. Together they form a unique fingerprint.

Cite this