TY - GEN
T1 - Ring data location prediction scheme for Non-Uniform cache architectures
AU - Akioka, Sayaka
AU - Li, Feihui
AU - Malkowski, Konrad
AU - Raghavan, Padma
AU - Kandemir, Mahmut
AU - Irwin, Mary Jane
PY - 2008/12/1
Y1 - 2008/12/1
N2 - Increases in cache capacity are accompanied by growing wire delays due to technology scaling. Non-Unifor Cache Architecture (NUCA) is one of proposed solutions to reducing the average access latency in such cache designs.While most of the prior NUCA work focuses on data placement, data replacement, and migration related issues, this paper studies the problem of data search (access) in NUCA. In our architecture we arrange sets of banks with equal access latency into rings. Our Last Access Based (LAB) prediction scheme predicts the ring that is expected to contain the required data and checks the banks in that ring first for the data block sought. We compare our scheme to two alternate approaches: searching all rings in parallel, and searching rings sequentially. We show that our LAB ring prediction scheme reduces L2 energy significantly over the sequential and parallel schemes, while maintaining similar performance. Our LAB scheme reduces energy consumption by 15.9% relative to the sequential lookup scheme, and 53.8% relative to the parallel lookup scheme.
AB - Increases in cache capacity are accompanied by growing wire delays due to technology scaling. Non-Unifor Cache Architecture (NUCA) is one of proposed solutions to reducing the average access latency in such cache designs.While most of the prior NUCA work focuses on data placement, data replacement, and migration related issues, this paper studies the problem of data search (access) in NUCA. In our architecture we arrange sets of banks with equal access latency into rings. Our Last Access Based (LAB) prediction scheme predicts the ring that is expected to contain the required data and checks the banks in that ring first for the data block sought. We compare our scheme to two alternate approaches: searching all rings in parallel, and searching rings sequentially. We show that our LAB ring prediction scheme reduces L2 energy significantly over the sequential and parallel schemes, while maintaining similar performance. Our LAB scheme reduces energy consumption by 15.9% relative to the sequential lookup scheme, and 53.8% relative to the parallel lookup scheme.
UR - http://www.scopus.com/inward/record.url?scp=62349087394&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=62349087394&partnerID=8YFLogxK
U2 - 10.1109/ICCD.2008.4751936
DO - 10.1109/ICCD.2008.4751936
M3 - Conference contribution
AN - SCOPUS:62349087394
SN - 9781424426584
T3 - 26th IEEE International Conference on Computer Design 2008, ICCD
SP - 693
EP - 698
BT - 26th IEEE International Conference on Computer Design 2008, ICCD
T2 - 26th IEEE International Conference on Computer Design 2008, ICCD
Y2 - 12 October 2008 through 15 October 2008
ER -