TY - GEN
T1 - REMAP
T2 - 2017 International Symposium on Memory Systems, MEMSYS 2017
AU - Tavana, Mohammad Khavari
AU - Ziabari, Amir Kavyan
AU - Arjomand, Mohammad
AU - Kandemir, Mahmut
AU - Das, Chita
AU - Kaeli, David
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/10/2
Y1 - 2017/10/2
N2 - Even given PCM's attractive features that include high scalability and lower power, write endurance remains a critical issue that impedes the move for this technology to replace DRAM in main memory systems. The wear-out problem is further exacerbated by advances in future technologies, where cell sizes are reduced and process variation increases. When using PCMs, worn-out cells are permanently stuck at either '0' or '1'. Successful adoption of PCM requires recovery from multiple stuck-at faults in a data block. Current error correction schemes for PCMs have limited capabilities to tolerate faults. In this paper we propose REMAP to improve the reliability of PCMs so that they can tolerate a large number of hard faults. In contrast to previous schemes, REMAP uses all the metadata space for replacing faulty bits - error detection and location information is not needed. The detection and location of failed memory cells are identified by read verification and an extra write operation. Despite tolerating many hard errors, employing REMAP can negatively impact cell lifetime due to the extra writes.We propose solutions to alleviate this problem and increase memory lifetime significantly. REMAP performs write endurance localization using both static and dynamic partitioning. Additionally, fault location caching is used to avoid the extra write overhead. Given the error correction capabilities of REMAP, we consider using it as a second layer of defense that is combined with other schemes. Our evaluation, which includes both Monte Carlo and trace-driven simulation, shows that REMAP is capable of boosting the PCM lifetime by 56% on average (up to 78%) as compared to our baseline.
AB - Even given PCM's attractive features that include high scalability and lower power, write endurance remains a critical issue that impedes the move for this technology to replace DRAM in main memory systems. The wear-out problem is further exacerbated by advances in future technologies, where cell sizes are reduced and process variation increases. When using PCMs, worn-out cells are permanently stuck at either '0' or '1'. Successful adoption of PCM requires recovery from multiple stuck-at faults in a data block. Current error correction schemes for PCMs have limited capabilities to tolerate faults. In this paper we propose REMAP to improve the reliability of PCMs so that they can tolerate a large number of hard faults. In contrast to previous schemes, REMAP uses all the metadata space for replacing faulty bits - error detection and location information is not needed. The detection and location of failed memory cells are identified by read verification and an extra write operation. Despite tolerating many hard errors, employing REMAP can negatively impact cell lifetime due to the extra writes.We propose solutions to alleviate this problem and increase memory lifetime significantly. REMAP performs write endurance localization using both static and dynamic partitioning. Additionally, fault location caching is used to avoid the extra write overhead. Given the error correction capabilities of REMAP, we consider using it as a second layer of defense that is combined with other schemes. Our evaluation, which includes both Monte Carlo and trace-driven simulation, shows that REMAP is capable of boosting the PCM lifetime by 56% on average (up to 78%) as compared to our baseline.
UR - http://www.scopus.com/inward/record.url?scp=85033575666&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85033575666&partnerID=8YFLogxK
U2 - 10.1145/3132402.3132421
DO - 10.1145/3132402.3132421
M3 - Conference contribution
AN - SCOPUS:85033575666
T3 - ACM International Conference Proceeding Series
SP - 385
EP - 398
BT - MEMSYS 2017 - Proceedings of the International Symposium on Memory Systems
PB - Association for Computing Machinery
Y2 - 2 October 2017 through 5 October 2017
ER -