TY - GEN
T1 - Mechanisms for bounding vulnerabilities of processor structures
AU - Soundararajan, Niranjan Kumar
AU - Parashar, Angshuman
AU - Sivasubramaniam, Anand
PY - 2007
Y1 - 2007
N2 - Concern for the increasing susceptibility of processor structures to transient errors has led to several recent research efforts that propose architectural techniques to enhance reliability. However, real systems are typically required to satisfy hard reliability budgets, and barring expensive full-redundancy approaches, none of the proposed solutions treat any reliability budgets or bounds as hard constraints. Meeting vulnerability bounds requires monitoring vulnerabilities of processor structures and taking appropriate actions whenever these bounds are violated. This mandates treating reliability as a first-order microarchitecture design constraint, while optimizing performance as long as reliability requirements are satisfied. This paper makes three key contributions towards this goal: (i) we present a simple infrastructure to monitor and provide upper bounds on the vulnerabilities of key processor structures at cycle-level fidelity; (ii) we propose two distinct control mechanisms - throttling and selective redundancy - to proactively and/or reactively bound the vulnerabilities to any limit specified by the system designer; (iii) within this framework, we propose a novel adaptation of Out-of-Order Commit for vulnerability reduction, which automatically provides additional leverage for the control mechanisms to boost performance while remaining within the reliability budget.
AB - Concern for the increasing susceptibility of processor structures to transient errors has led to several recent research efforts that propose architectural techniques to enhance reliability. However, real systems are typically required to satisfy hard reliability budgets, and barring expensive full-redundancy approaches, none of the proposed solutions treat any reliability budgets or bounds as hard constraints. Meeting vulnerability bounds requires monitoring vulnerabilities of processor structures and taking appropriate actions whenever these bounds are violated. This mandates treating reliability as a first-order microarchitecture design constraint, while optimizing performance as long as reliability requirements are satisfied. This paper makes three key contributions towards this goal: (i) we present a simple infrastructure to monitor and provide upper bounds on the vulnerabilities of key processor structures at cycle-level fidelity; (ii) we propose two distinct control mechanisms - throttling and selective redundancy - to proactively and/or reactively bound the vulnerabilities to any limit specified by the system designer; (iii) within this framework, we propose a novel adaptation of Out-of-Order Commit for vulnerability reduction, which automatically provides additional leverage for the control mechanisms to boost performance while remaining within the reliability budget.
UR - http://www.scopus.com/inward/record.url?scp=35348914300&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35348914300&partnerID=8YFLogxK
U2 - 10.1145/1250662.1250725
DO - 10.1145/1250662.1250725
M3 - Conference contribution
AN - SCOPUS:35348914300
SN - 1595937064
SN - 9781595937063
T3 - Proceedings - International Symposium on Computer Architecture
SP - 506
EP - 515
BT - ISCA'07
T2 - ISCA'07: 34th Annual International Symposium on Computer Architecture
Y2 - 9 June 2007 through 13 June 2007
ER -