TY - GEN
T1 - Locality-aware distributed loop scheduling for chip multiprocessors
AU - Xue, L.
AU - Kandemir, M.
AU - Chen, G.
AU - Li, F.
AU - Ozturk, O.
AU - Ramanarayanan, R.
AU - Vaidyanathan, B.
PY - 2007
Y1 - 2007
N2 - Chip multiprocessors are becoming increasingly popular in embedded domain since they have important advantages over their single core counterparts from the parallelism, power efficiency, validation, and verification perspectives. However, extracting maximum performance from these multiprocessors requires compiler support in form of effective code parallelization. The goal of this paper is to present and experimentally evaluate a locality aware dynamic loop scheduling strategy that implements both locality aware loop iteration distribution across parallel processors and dynamic load balancing at runtime. This hybrid scheme has been implemented and tested along with four other previously-proposed loop scheduling schemes, including a locality aware one. Our experimental analysis reveals that the proposed approach generates better results than all other scheduling schemes (static or dynamic) tested. Our results also show that the improvements brought by the proposed scheduling scheme are consistent across experiments with different values of our major simulation parameters such as the number of processors and cache size per processor.
AB - Chip multiprocessors are becoming increasingly popular in embedded domain since they have important advantages over their single core counterparts from the parallelism, power efficiency, validation, and verification perspectives. However, extracting maximum performance from these multiprocessors requires compiler support in form of effective code parallelization. The goal of this paper is to present and experimentally evaluate a locality aware dynamic loop scheduling strategy that implements both locality aware loop iteration distribution across parallel processors and dynamic load balancing at runtime. This hybrid scheme has been implemented and tested along with four other previously-proposed loop scheduling schemes, including a locality aware one. Our experimental analysis reveals that the proposed approach generates better results than all other scheduling schemes (static or dynamic) tested. Our results also show that the improvements brought by the proposed scheduling scheme are consistent across experiments with different values of our major simulation parameters such as the number of processors and cache size per processor.
UR - http://www.scopus.com/inward/record.url?scp=48349122237&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=48349122237&partnerID=8YFLogxK
U2 - 10.1109/VLSID.2007.97
DO - 10.1109/VLSID.2007.97
M3 - Conference contribution
AN - SCOPUS:48349122237
SN - 0769527620
SN - 9780769527628
T3 - Proceedings of the IEEE International Conference on VLSI Design
SP - 251
EP - 256
BT - Proceedings - 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems
T2 - 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems, VLSID'07
Y2 - 6 January 2007 through 10 January 2007
ER -