TY - JOUR
T1 - Automatic parallel code generation for NuFFT data translation on multicores
AU - Zhang, Yuanrui
AU - Liu, Jun
AU - Kultursay, Emre
AU - Kandemir, Mahmut
AU - Pitsianis, Nikos
AU - Sun, Xiaobai
N1 - Funding Information:
This research is supported in part by NSF grants #1017882, #0963839, CNS #0720645, CCF #0811687, and CCF #0702519, a grant from DARPA (W911NF-05-1-0248) and a grant from Microsoft Corporation.
PY - 2012/4
Y1 - 2012/4
N2 - The nonuniform FFT (NuFFT) is widely used in many applications. Focusing on the most time-consuming part of the NuFFT computation, the data translation step, in this paper, we develop an automatic parallel code generation tool for data translation targeting emerging multicores. The key components of this tool are two scalable parallelization strategies, namely, the source-driven parallelization and the target-driven parallelization. Both these strategies employ equally sized geometric tiling and binning to improve data locality while trying to balance workloads across the cores through dynamic task allocation. They differ in the partitioning and scheduling schemes used to guarantee mutual exclusion in data updates. This tool also consists of a code generator and a code optimizer for the data translation. We evaluated our tool on a commercial multicore machine for both 2D and 3D inputs under different sample distributions with large data set sizes. The results indicate that both parallelization strategies have good scalability as the number of cores and the number of dimensions of data space increase. In particular, the target-driven parallelization outperforms the other when samples are nonuniformly distributed. The experiments also show that our code optimizations can bring about 32%43% performance improvement to the data translation step of NuFFT.
AB - The nonuniform FFT (NuFFT) is widely used in many applications. Focusing on the most time-consuming part of the NuFFT computation, the data translation step, in this paper, we develop an automatic parallel code generation tool for data translation targeting emerging multicores. The key components of this tool are two scalable parallelization strategies, namely, the source-driven parallelization and the target-driven parallelization. Both these strategies employ equally sized geometric tiling and binning to improve data locality while trying to balance workloads across the cores through dynamic task allocation. They differ in the partitioning and scheduling schemes used to guarantee mutual exclusion in data updates. This tool also consists of a code generator and a code optimizer for the data translation. We evaluated our tool on a commercial multicore machine for both 2D and 3D inputs under different sample distributions with large data set sizes. The results indicate that both parallelization strategies have good scalability as the number of cores and the number of dimensions of data space increase. In particular, the target-driven parallelization outperforms the other when samples are nonuniformly distributed. The experiments also show that our code optimizations can bring about 32%43% performance improvement to the data translation step of NuFFT.
UR - http://www.scopus.com/inward/record.url?scp=84862159436&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84862159436&partnerID=8YFLogxK
U2 - 10.1142/S021812661240004X
DO - 10.1142/S021812661240004X
M3 - Article
AN - SCOPUS:84862159436
SN - 0218-1266
VL - 21
JO - Journal of Circuits, Systems and Computers
JF - Journal of Circuits, Systems and Computers
IS - 2
M1 - 1240004
ER -