TY - GEN
T1 - Application mapping for chip multiprocessors
AU - Chen, Guangyu
AU - Li, Feihui
AU - Son, S. W.
AU - Kandemir, M.
PY - 2008
Y1 - 2008
N2 - The problem attacked in this paper is one of automatically mapping an application onto a Network-on-Chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fashion. The proposed compiler approach has four major steps: task scheduling, processor mapping, data mapping, and packet routing. In the first step, the application code is parallelized and the resulting parallel threads are assigned to virtual processors. The second step implements a virtual processor-to-physical processor mapping. The goal of this mapping is to ensure that the threads that are expected to communicate frequently with each other are assigned to neighboring processors as much as possible. In the third step, data elements are mapped to memories attached to CMP nodes. The main objective of this mapping is to place a given data item into a node which is close to the nodes that access it. The last step of our approach determines the paths (between memories and processors) for data to travel in an energy efficient manner. In this paper, we describe the compiler algorithms we implemented in detail and present an experimental evaluation of the framework. In our evaluation, we test our entire framework as well as the impact of omitting some of its steps. This experimental analysis clearly shows that the proposed framework reduces energy consumption of our applications significantly (27.41% on average over a pure performance oriented application mapping strategy) as a result of improved locality of data accesses.
AB - The problem attacked in this paper is one of automatically mapping an application onto a Network-on-Chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fashion. The proposed compiler approach has four major steps: task scheduling, processor mapping, data mapping, and packet routing. In the first step, the application code is parallelized and the resulting parallel threads are assigned to virtual processors. The second step implements a virtual processor-to-physical processor mapping. The goal of this mapping is to ensure that the threads that are expected to communicate frequently with each other are assigned to neighboring processors as much as possible. In the third step, data elements are mapped to memories attached to CMP nodes. The main objective of this mapping is to place a given data item into a node which is close to the nodes that access it. The last step of our approach determines the paths (between memories and processors) for data to travel in an energy efficient manner. In this paper, we describe the compiler algorithms we implemented in detail and present an experimental evaluation of the framework. In our evaluation, we test our entire framework as well as the impact of omitting some of its steps. This experimental analysis clearly shows that the proposed framework reduces energy consumption of our applications significantly (27.41% on average over a pure performance oriented application mapping strategy) as a result of improved locality of data accesses.
UR - http://www.scopus.com/inward/record.url?scp=51549094430&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=51549094430&partnerID=8YFLogxK
U2 - 10.1109/DAC.2008.4555892
DO - 10.1109/DAC.2008.4555892
M3 - Conference contribution
AN - SCOPUS:51549094430
SN - 9781605581156
T3 - Proceedings - Design Automation Conference
SP - 620
EP - 625
BT - Proceedings of the 45th Design Automation Conference, DAC
T2 - 45th Design Automation Conference, DAC
Y2 - 8 June 2008 through 13 June 2008
ER -