TY - GEN
T1 - MDACache
T2 - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018
AU - George, Sumitha
AU - Liao, Minli Julie
AU - Jiang, Huaipan
AU - Kotra, Jagadish B.
AU - Kandemir, Mahmut T.
AU - Sampson, Jack
AU - Narayanan, Vijaykrishnan
N1 - Funding Information:
This work is supported in part by NSF grants 1317560,1822923, 1439021, 1629915, 1626251, 1629129, 1763681, 1526750, 1439057,1500848
Funding Information:
This work is supported in part by NSF grants 1317560,1822923, 1439021, 1629915,1626251,1629129,1763681,1526750,1439057,1500848andSemi-conductor Research Corporation JUMP CRISP. *Equal contribution by first & second author.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/12
Y1 - 2018/12/12
N2 - For several emerging memory technologies, a natural formulation of memory arrays (cross-point) provides nearly symmetric access costs along multiple (e.g., both row and column) dimensions in contrast to the row-oriented nature of most DRAM and SRAM implementations, producing a Multi-Dimensional-Access (MDA) memory. While MDA memories can directly support applications with both row and column preferences, most modern processors do not directly access either the rows or columns of memories: memory accesses proceed through a cache hierarchy that abstracts many of the physical features that supply the aforementioned symmetry. To reap the full benefits of MDA memories, a co-design approach must occur across software memory layout, the mapping between the physical and logical organization of the memory arrays, and the cache hierarchy itself in order to efficiently express, convey, and exploit multidimensional access patterns. In this paper, we describe a taxonomy for different ways of connecting row and column preferences at the application level to an MDA memory through an MDA cache hierarchy and explore specific implementations for the most plausible design points. We extend vectorization support at the compiler level to provide the necessary information to extract preferences and provide compatible memory layouts, and evaluate the tradeoffs among multiple cache designs for the MDA memory systems. Our results indicate that both logically 2-D caching using physically 1-D SRAM structures and on-chip physically 2-D caches can both provide significant improvements in performance over a traditional cache system interfacing with an MDA memory, reducing execution time by 72% and 65%, respectively. We then explore the sensitivity of these benefits as a function of the working-set to cache capacity ratio as well as to MDA technology assumptions
AB - For several emerging memory technologies, a natural formulation of memory arrays (cross-point) provides nearly symmetric access costs along multiple (e.g., both row and column) dimensions in contrast to the row-oriented nature of most DRAM and SRAM implementations, producing a Multi-Dimensional-Access (MDA) memory. While MDA memories can directly support applications with both row and column preferences, most modern processors do not directly access either the rows or columns of memories: memory accesses proceed through a cache hierarchy that abstracts many of the physical features that supply the aforementioned symmetry. To reap the full benefits of MDA memories, a co-design approach must occur across software memory layout, the mapping between the physical and logical organization of the memory arrays, and the cache hierarchy itself in order to efficiently express, convey, and exploit multidimensional access patterns. In this paper, we describe a taxonomy for different ways of connecting row and column preferences at the application level to an MDA memory through an MDA cache hierarchy and explore specific implementations for the most plausible design points. We extend vectorization support at the compiler level to provide the necessary information to extract preferences and provide compatible memory layouts, and evaluate the tradeoffs among multiple cache designs for the MDA memory systems. Our results indicate that both logically 2-D caching using physically 1-D SRAM structures and on-chip physically 2-D caches can both provide significant improvements in performance over a traditional cache system interfacing with an MDA memory, reducing execution time by 72% and 65%, respectively. We then explore the sensitivity of these benefits as a function of the working-set to cache capacity ratio as well as to MDA technology assumptions
UR - http://www.scopus.com/inward/record.url?scp=85060029775&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060029775&partnerID=8YFLogxK
U2 - 10.1109/MICRO.2018.00073
DO - 10.1109/MICRO.2018.00073
M3 - Conference contribution
AN - SCOPUS:85060029775
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 841
EP - 854
BT - Proceedings - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018
PB - IEEE Computer Society
Y2 - 20 October 2018 through 24 October 2018
ER -