TY - GEN
T1 - POSTER
T2 - 26th International Conference on Parallel Architectures and Compilation Techniques, PACT 2017
AU - Kislal, Orhan
AU - Kotra, Jagadish
AU - Tang, Xulong
AU - Kandemir, Mahmut Taylan
AU - Jung, Myoungsoo
PY - 2017/10/31
Y1 - 2017/10/31
N2 - Employing an on-chip network in a manycore system (to improve scalability) makes the latencies of data accesses issued by a core non-uniform, which significant impact application performance. This paper presents a compiler strategy which involves exposing architecture information to the compiler to enable optimized computation-to-core mapping. Our scheme takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. Our experiments of 12 multi-threaded applications reveal that, on average, our approach reduces the on-chip network latency in a 6x6 manycore system by 49.5% in the case of private LLCs and 52.7% in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 14.8% and 15.2% for the private LLC and shared LLC based systems.
AB - Employing an on-chip network in a manycore system (to improve scalability) makes the latencies of data accesses issued by a core non-uniform, which significant impact application performance. This paper presents a compiler strategy which involves exposing architecture information to the compiler to enable optimized computation-to-core mapping. Our scheme takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. Our experiments of 12 multi-threaded applications reveal that, on average, our approach reduces the on-chip network latency in a 6x6 manycore system by 49.5% in the case of private LLCs and 52.7% in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 14.8% and 15.2% for the private LLC and shared LLC based systems.
UR - http://www.scopus.com/inward/record.url?scp=85040525879&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040525879&partnerID=8YFLogxK
U2 - 10.1109/PACT.2017.20
DO - 10.1109/PACT.2017.20
M3 - Conference contribution
AN - SCOPUS:85040525879
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 138
EP - 139
BT - Proceedings - 26th International Conference on Parallel Architectures and Compilation Techniques, PACT 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 September 2017 through 13 September 2017
ER -