TY - GEN
T1 - ACIC
T2 - 29th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2023
AU - Wang, Yunjin
AU - Chang, Chia Hao
AU - Sivasubramaniam, Anand
AU - Soundararajan, Niranjan
N1 - Funding Information:
We thank the anonymous reviewers for their helpful feedback and suggestions. This work was supported in part by NSF grants 1912495, 1909004, 1714389, 1912495, 1629915, 1629129, 1763681, a grant from Intel Corporation and by CRISP, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.
Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The front end bottleneck in datacenter workloads has come under increased scrutiny, with the growing code footprint, involvement of numerous libraries and OS services, and the unpredictability in the instruction stream. Our examination of these workloads points to burstiness in accesses to instruction blocks, which has also been observed in data accesses [61]. Such burstiness is largely due to spatial and short-duration temporal localities, that LRU fails to recognize and optimize for, when a single cache caters to both forms of locality. Instead, we incorporate a small i-Filter as in previous works [29], [49] to separate spatial from temporal accesses. However, a simple separation does not suffice, and we additionally need to predict whether the block will continue to have temporal locality, after the burst of spatial locality. This combination of i-Filter and temporal locality predictor constitutes our Admission-Controlled Instruction Cache (ACIC). ACIC outperforms a number of state-of-the-art pollution reduction techniques (replacement algorithms, bypassing mechanisms, victim caches), providing 1.0223 speedup on the average over a baseline LRU based conventional i-cache (bridging over half of the gap between LRU and OPT) across several datacenter workloads.
AB - The front end bottleneck in datacenter workloads has come under increased scrutiny, with the growing code footprint, involvement of numerous libraries and OS services, and the unpredictability in the instruction stream. Our examination of these workloads points to burstiness in accesses to instruction blocks, which has also been observed in data accesses [61]. Such burstiness is largely due to spatial and short-duration temporal localities, that LRU fails to recognize and optimize for, when a single cache caters to both forms of locality. Instead, we incorporate a small i-Filter as in previous works [29], [49] to separate spatial from temporal accesses. However, a simple separation does not suffice, and we additionally need to predict whether the block will continue to have temporal locality, after the burst of spatial locality. This combination of i-Filter and temporal locality predictor constitutes our Admission-Controlled Instruction Cache (ACIC). ACIC outperforms a number of state-of-the-art pollution reduction techniques (replacement algorithms, bypassing mechanisms, victim caches), providing 1.0223 speedup on the average over a baseline LRU based conventional i-cache (bridging over half of the gap between LRU and OPT) across several datacenter workloads.
UR - http://www.scopus.com/inward/record.url?scp=85151755240&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85151755240&partnerID=8YFLogxK
U2 - 10.1109/HPCA56546.2023.10071033
DO - 10.1109/HPCA56546.2023.10071033
M3 - Conference contribution
AN - SCOPUS:85151755240
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 165
EP - 178
BT - 2023 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2023 - Proceedings
PB - IEEE Computer Society
Y2 - 25 February 2023 through 1 March 2023
ER -