The last line of defense in the cache hierarchy before going to off-chip memory is very critical in chip multiprocessors (CMPs) from both the performance and power perspectives. This paper investigates different organizations for this last line of defense (assumed to be L2 in this paper) towards reducing off-chip memory accesses. We evaluate the trade-offs between private L2 and address-interleaved shared L2 designs, noting their individual benefits and drawbacks. The possible imbalance between the L2 demands across the CPUs favors a shared L2 organization, while the interference between these demands can favor a private L2 organization. We propose a new architecture, called Shared Processor-Based Split L2, that captures the benefits of these two organizations, while avoiding many of their drawbacks. Using several applications from the SPEC OMP suite and a commercial benchmark, Specjbb, on a complete system simulator, we demonstrate the benefits of this shared processor-based L2 organization. Our results show as much as 42.50% improvement in IPC over the private organization (with 11.52% on the average), and as much as 42.22% improvement over the shared inter-leaved organization (with 9.76% on the average).
|Number of pages
|IEEE High-Performance Computer Architecture Symposium Proceedings
|Published - 2004
|Proceedings - 10th International Symposium on High Performance Computer Architecture - Madrid, Spain
Duration: Feb 14 2004 → Feb 18 2004
All Science Journal Classification (ASJC) codes
- Hardware and Architecture