TY - GEN
T1 - Paralellism-Based Techniques for Slowing Down Soft Error Propagation
AU - Ozturk, Zuhal
AU - Topcuoglu, Haluk Rahmi
AU - Kandemir, Mahmut Taylan
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Vulnerability of soft errors initiates various fault tolerance techniques on modern computing systems which can be implemented at hardware and software layers. While the fault tolerance techniques can improve the reliability, they introduce additional costs which may not be tolerable for some systems. There are several studies in the literature that target to reduce such additional costs. In this study, we monitor the soft error propagation throughout the execution and propose simple and relatively inexpensive methods to slow down the error propagation curves. Matrix multiplication is considered as the target multi-threaded application where we utilize parallelization-based versions including changing the number of threads and loop parallelization options. The fault injection experiments reveal that the utilized methods reshape the error propagation curves effectively. They can reshape the error propagation at runtime, where switching between different versions during operation helps balance reliability and performance and use the limited resources more efficiently at the same time.
AB - Vulnerability of soft errors initiates various fault tolerance techniques on modern computing systems which can be implemented at hardware and software layers. While the fault tolerance techniques can improve the reliability, they introduce additional costs which may not be tolerable for some systems. There are several studies in the literature that target to reduce such additional costs. In this study, we monitor the soft error propagation throughout the execution and propose simple and relatively inexpensive methods to slow down the error propagation curves. Matrix multiplication is considered as the target multi-threaded application where we utilize parallelization-based versions including changing the number of threads and loop parallelization options. The fault injection experiments reveal that the utilized methods reshape the error propagation curves effectively. They can reshape the error propagation at runtime, where switching between different versions during operation helps balance reliability and performance and use the limited resources more efficiently at the same time.
UR - http://www.scopus.com/inward/record.url?scp=85145354277&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85145354277&partnerID=8YFLogxK
U2 - 10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927870
DO - 10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927870
M3 - Conference contribution
AN - SCOPUS:85145354277
T3 - Proceedings of the 2022 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2022
BT - Proceedings of the 2022 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2022
A2 - Fortino, Giancarlo
A2 - Gravina, Raffaele
A2 - Guerrieri, Antonio
A2 - Savaglio, Claudio
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th IEEE International Conference on Dependable, Autonomic and Secure Computing, 20th IEEE International Conference on Pervasive Intelligence and Computing, 7th IEEE International Conference on Cloud and Big Data Computing, 2022 IEEE International Conference on Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2022
Y2 - 12 September 2022 through 15 September 2022
ER -