TY - GEN
T1 - Multilayer cache partitioning for multiprogram workloads
AU - Kandemir, Mahmut
AU - Prabhakar, Ramya
AU - Karakoy, Mustafa
AU - Zhang, Yuanrui
PY - 2011
Y1 - 2011
N2 - We present a fully-automated, model based, multilayer cache partitioning scheme for multiprogram workloads running on multicore machines. As opposed to prior efforts, this scheme partitions shared caches at multiple layers simultaneously in a coordinated fashion. This scheme tries to achieve two objectives. First, it tries to satisfy the specified quality of service (QoS) values for all applications by partitioning the shared cache hierarchy across them, and second, it distributes the remaining excess cache capacity (if any) across applications such that a global performance metric is maximized. Our experimental analysis shows that the proposed multilayer partitioning scheme generates, on average, 33.1% improvement (on the weighted speedup metric) over the next best-performing scheme and is very successful in satisfying the QoS requirements of applications. Also, we show that partitioning each layer in isolation cannot generate the benefits obtained through our coordinated partitioning scheme. In addition, we observed that the difference between our scheme and an optimal scheme (that derives best dynamic partitions) was less than 15% for all the workloads tested and 6.6% on average.
AB - We present a fully-automated, model based, multilayer cache partitioning scheme for multiprogram workloads running on multicore machines. As opposed to prior efforts, this scheme partitions shared caches at multiple layers simultaneously in a coordinated fashion. This scheme tries to achieve two objectives. First, it tries to satisfy the specified quality of service (QoS) values for all applications by partitioning the shared cache hierarchy across them, and second, it distributes the remaining excess cache capacity (if any) across applications such that a global performance metric is maximized. Our experimental analysis shows that the proposed multilayer partitioning scheme generates, on average, 33.1% improvement (on the weighted speedup metric) over the next best-performing scheme and is very successful in satisfying the QoS requirements of applications. Also, we show that partitioning each layer in isolation cannot generate the benefits obtained through our coordinated partitioning scheme. In addition, we observed that the difference between our scheme and an optimal scheme (that derives best dynamic partitions) was less than 15% for all the workloads tested and 6.6% on average.
UR - http://www.scopus.com/inward/record.url?scp=80052386002&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052386002&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-23400-2_13
DO - 10.1007/978-3-642-23400-2_13
M3 - Conference contribution
AN - SCOPUS:80052386002
SN - 9783642233999
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 130
EP - 141
BT - Euro-Par 2011 Parallel Processing - 17th International Conference, Proceedings
T2 - 17th International Conference on Parallel Processing, Euro-Par 2011
Y2 - 29 August 2011 through 2 September 2011
ER -