TY - GEN
T1 - A novel migration-based NUCA design for chip multiprocessors
AU - Kandemir, Mahmut
AU - Li, Feihui
AU - Irwin, Mary Jane
AU - Son, Seung Woo
PY - 2008
Y1 - 2008
N2 - Chip Multiprocessors (CMPs) and Non-Uniform Cache Architectures (NUCAs) represent two emerging trends in computer architecture. Targeting future CMP based systems with NUCA type L2 caches, this paper proposes a novel data migration algorithm for parallel applications and evaluates it. The goal of this migration scheme is to determine a suitable location for each data block within a large L2 space at any given point during execution. A unique characteristic of the proposed scheme is that it models the problem of optimal data placement in the L2 cache space as a two-dimensional post office placement problem, presents a practical architectural implementation of this model, and gives a detailed evaluation of the proposed implementation. In our experimental evaluation, we also compare our approach to a previously-proposed NUCA management scheme using applications from the specomp suite, oltp, specjbb, and specweb. These experiments show that our migration approach generates about 35% improvement, on average, in average L2 access latency over the previous migration scheme, and these L2 latency savings translate, on average, to 9.5% improvement in IPC (instructions per cycle).We also observed during our experiments that both the careful initial placement of data (which itself triggers migrations within the L2 space) and subsequent migrations (due to interprocessor data sharing) play an important role in achieving our performance improvements.
AB - Chip Multiprocessors (CMPs) and Non-Uniform Cache Architectures (NUCAs) represent two emerging trends in computer architecture. Targeting future CMP based systems with NUCA type L2 caches, this paper proposes a novel data migration algorithm for parallel applications and evaluates it. The goal of this migration scheme is to determine a suitable location for each data block within a large L2 space at any given point during execution. A unique characteristic of the proposed scheme is that it models the problem of optimal data placement in the L2 cache space as a two-dimensional post office placement problem, presents a practical architectural implementation of this model, and gives a detailed evaluation of the proposed implementation. In our experimental evaluation, we also compare our approach to a previously-proposed NUCA management scheme using applications from the specomp suite, oltp, specjbb, and specweb. These experiments show that our migration approach generates about 35% improvement, on average, in average L2 access latency over the previous migration scheme, and these L2 latency savings translate, on average, to 9.5% improvement in IPC (instructions per cycle).We also observed during our experiments that both the careful initial placement of data (which itself triggers migrations within the L2 space) and subsequent migrations (due to interprocessor data sharing) play an important role in achieving our performance improvements.
UR - http://www.scopus.com/inward/record.url?scp=70350746352&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70350746352&partnerID=8YFLogxK
U2 - 10.1109/SC.2008.5216918
DO - 10.1109/SC.2008.5216918
M3 - Conference contribution
AN - SCOPUS:70350746352
SN - 9781424428359
T3 - 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
BT - 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
T2 - 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
Y2 - 15 November 2008 through 21 November 2008
ER -