TY - GEN
T1 - Quantifying thread vulnerability for multicore architectures
AU - Oz, Isil
AU - Topcuoglu, Haluk Rahmi
AU - Kandemir, Mahmut
AU - Tosun, Oguz
PY - 2011
Y1 - 2011
N2 - Continuously reducing transistor sizes and aggressive low power operating modes employed by modern architectures tend to increase transient error rates. Concurrently, multicore machines are dominating the architectural spectrum in various application domains. These two trends require a fresh look at resiliency of multithreaded applications against transient errors from a software perspective. In this paper, we propose and evaluate a new metric called the Thread Vulnerability Factor (TVF). A distinguishing characteristic of TVF is that its calculation for a given thread (which is typically one of the threads of a multithreaded application) does not depend on its code alone, but also on the codes of the threads that share data with that thread. As a result, we decompose TVF of a thread into two complementary parts: local and remote. While the former captures the TVF induced by the code of the target thread, the latter represents the vulnerability impact of the threads that interact with the target thread. We quantify the local and remote TVF values for three architectural components (register file, ALUs, and caches) using a set of four multithreaded applications. Our experimental evaluation shows that TVF values tend to increase as the number of cores increases which means the system becomes more vulnerable as the core count rises. We also discuss how TVF values and execution cycles together can be used to explore performance-reliability tradeoffs in multicores at a source code level.
AB - Continuously reducing transistor sizes and aggressive low power operating modes employed by modern architectures tend to increase transient error rates. Concurrently, multicore machines are dominating the architectural spectrum in various application domains. These two trends require a fresh look at resiliency of multithreaded applications against transient errors from a software perspective. In this paper, we propose and evaluate a new metric called the Thread Vulnerability Factor (TVF). A distinguishing characteristic of TVF is that its calculation for a given thread (which is typically one of the threads of a multithreaded application) does not depend on its code alone, but also on the codes of the threads that share data with that thread. As a result, we decompose TVF of a thread into two complementary parts: local and remote. While the former captures the TVF induced by the code of the target thread, the latter represents the vulnerability impact of the threads that interact with the target thread. We quantify the local and remote TVF values for three architectural components (register file, ALUs, and caches) using a set of four multithreaded applications. Our experimental evaluation shows that TVF values tend to increase as the number of cores increases which means the system becomes more vulnerable as the core count rises. We also discuss how TVF values and execution cycles together can be used to explore performance-reliability tradeoffs in multicores at a source code level.
UR - http://www.scopus.com/inward/record.url?scp=79954985942&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79954985942&partnerID=8YFLogxK
U2 - 10.1109/PDP.2011.75
DO - 10.1109/PDP.2011.75
M3 - Conference contribution
AN - SCOPUS:79954985942
SN - 9780769543284
T3 - Proceedings - 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing, PDP 2011
SP - 32
EP - 39
BT - Proceedings - 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing, PDP 2011
T2 - 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing, PDP 2011
Y2 - 9 February 2011 through 11 February 2011
ER -