TY - GEN
T1 - Boosting Access Parallelism to PCM-Based Main Memory
AU - Arjomand, Mohammad
AU - Kandemir, Mahmut T.
AU - Sivasubramaniam, Anand
AU - Das, Chita R.
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/8/24
Y1 - 2016/8/24
N2 - Despite its promise as a DRAM main memory replacement, Phase Change Memory (PCM) has high write latencies which can be a serious detriment to its widespread adoption. Apart from slowing down a write request, the consequent high latency can also keep other chips of the same rank, that are not involved in this write, idle for long times. There are several practical considerations that make it difficult to allow subsequent reads and/or writes to be served concurrently from the same chips during the long latency write. This paper proposes and evaluates several novel mechanisms - re-constructing data from error correction bits instead of waiting for chips currently busy to serve a read, rotating word mappings across chips of a PCM rank, and rotating the mapping of error detection/correction bits across these chips - to overlap several reads with an ongoing write (RoW) and even a write with an ongoing write (WoW). The paper also presents the necessary micro-architectural enhancements needed to implement these mechanisms, without significantly changing the current interfaces. The resulting PCM access parallelism (PCMap) system incorporating these enhancements, boosts the intra-rank-level parallelism during such writes from a very low baseline value of 2.4 to an average and maximum values of 4.5 and 7.4, respectively (out of a maximum of 8.0), across a wide spectrum of both multiprogrammed and multithreaded workloads. This boost in parallelism results in an average IPC improvement of 15.6% and 16.7% for the multi-programmed and multi-threaded workloads, respectively.
AB - Despite its promise as a DRAM main memory replacement, Phase Change Memory (PCM) has high write latencies which can be a serious detriment to its widespread adoption. Apart from slowing down a write request, the consequent high latency can also keep other chips of the same rank, that are not involved in this write, idle for long times. There are several practical considerations that make it difficult to allow subsequent reads and/or writes to be served concurrently from the same chips during the long latency write. This paper proposes and evaluates several novel mechanisms - re-constructing data from error correction bits instead of waiting for chips currently busy to serve a read, rotating word mappings across chips of a PCM rank, and rotating the mapping of error detection/correction bits across these chips - to overlap several reads with an ongoing write (RoW) and even a write with an ongoing write (WoW). The paper also presents the necessary micro-architectural enhancements needed to implement these mechanisms, without significantly changing the current interfaces. The resulting PCM access parallelism (PCMap) system incorporating these enhancements, boosts the intra-rank-level parallelism during such writes from a very low baseline value of 2.4 to an average and maximum values of 4.5 and 7.4, respectively (out of a maximum of 8.0), across a wide spectrum of both multiprogrammed and multithreaded workloads. This boost in parallelism results in an average IPC improvement of 15.6% and 16.7% for the multi-programmed and multi-threaded workloads, respectively.
UR - http://www.scopus.com/inward/record.url?scp=84988375039&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84988375039&partnerID=8YFLogxK
U2 - 10.1109/ISCA.2016.66
DO - 10.1109/ISCA.2016.66
M3 - Conference contribution
AN - SCOPUS:84988375039
T3 - Proceedings - 2016 43rd International Symposium on Computer Architecture, ISCA 2016
SP - 695
EP - 706
BT - Proceedings - 2016 43rd International Symposium on Computer Architecture, ISCA 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 43rd International Symposium on Computer Architecture, ISCA 2016
Y2 - 18 June 2016 through 22 June 2016
ER -