TY - GEN
T1 - Improving shared cache behavior of multithreaded object-oriented applications in multicores
AU - Kandemir, Mahmut
AU - Srikantaiah, Shekhar
AU - Son, Seung Woo
PY - 2011/12/1
Y1 - 2011/12/1
N2 - Understanding shared cache performance when executing multithreaded object-oriented applications and optimizing these applications for multicores have not received much attention. In this paper, we first quantify the intra-thread and inter-thread cache line (block) reuse characteristics of a set of multithreaded C programs when executed in shared cache based multicores. Our results show that, as far as shared on-chip caches are concerned, inter-thread cache line (block) reuse distances are much higher than intra-thread cache line reuse distances. We study the impact of these characteristics on the hit/miss behavior of the shared last-level cache on a commercial multicore machine. We then show that, by rearranging accesses to the objects shared across different threads and to the objects stored in nearby memory locations, inter-thread (temporal and spatial) object reuse distances can be reduced, which in turn helps to reduce inter-thread cache line reuse distances. The results we collected using eight multithreaded applications show that our proposed shared cache-aware code restructuring strategy can reduce misses in the last-level on-chip cache of a commercial multicore machine by 25.4%, on average. These savings in cache misses translate in turn to average execution time improvement of 11.9%.
AB - Understanding shared cache performance when executing multithreaded object-oriented applications and optimizing these applications for multicores have not received much attention. In this paper, we first quantify the intra-thread and inter-thread cache line (block) reuse characteristics of a set of multithreaded C programs when executed in shared cache based multicores. Our results show that, as far as shared on-chip caches are concerned, inter-thread cache line (block) reuse distances are much higher than intra-thread cache line reuse distances. We study the impact of these characteristics on the hit/miss behavior of the shared last-level cache on a commercial multicore machine. We then show that, by rearranging accesses to the objects shared across different threads and to the objects stored in nearby memory locations, inter-thread (temporal and spatial) object reuse distances can be reduced, which in turn helps to reduce inter-thread cache line reuse distances. The results we collected using eight multithreaded applications show that our proposed shared cache-aware code restructuring strategy can reduce misses in the last-level on-chip cache of a commercial multicore machine by 25.4%, on average. These savings in cache misses translate in turn to average execution time improvement of 11.9%.
UR - http://www.scopus.com/inward/record.url?scp=84855762740&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84855762740&partnerID=8YFLogxK
U2 - 10.1109/ICCAD.2011.6105315
DO - 10.1109/ICCAD.2011.6105315
M3 - Conference contribution
AN - SCOPUS:84855762740
SN - 9781457713989
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
SP - 118
EP - 125
BT - 2011 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2011
T2 - 2011 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2011
Y2 - 7 November 2011 through 10 November 2011
ER -