TY - GEN
T1 - Accelerating the Non-uniform Fast Fourier Transform using FPGAs
AU - Kestur, Srinidhi
AU - Park, Sungho
AU - Irick, Kevin M.
AU - Narayanan, Vijaykrishnan
PY - 2010
Y1 - 2010
N2 - We present an FPGA accelerator for the Non-uniform Fast Fourier Transform, which is a technique to reconstruct images from arbitrarily sampled data. We accelerate the compute-intensive interpolation step of the NuFFT Gridding algorithm by implementing it on an FPGA. In order to ensure efficient memory performance, we present a novel FPGA implementation for Geometric Tiling based sorting of the arbitrary samples. The convolution is then performed by a novel Data Translation architecture which is composed of a multi-port local memory, dynamic coordinate-generator and a plug-and-play kernel pipeline. Our implementation is in single-precision floating point and has been ported onto the BEE3 platform. Experimental results show that our FPGA implementation can generate fairly high performance without sacrificing flexibility for various data-sizes and kernel functions. We demonstrate up to 8X speedup and up to 27 times higher performance-per-watt over a comparable CPU implementation and up to 20% higher performance-per-watt when compared to a relevant GPU implementation.
AB - We present an FPGA accelerator for the Non-uniform Fast Fourier Transform, which is a technique to reconstruct images from arbitrarily sampled data. We accelerate the compute-intensive interpolation step of the NuFFT Gridding algorithm by implementing it on an FPGA. In order to ensure efficient memory performance, we present a novel FPGA implementation for Geometric Tiling based sorting of the arbitrary samples. The convolution is then performed by a novel Data Translation architecture which is composed of a multi-port local memory, dynamic coordinate-generator and a plug-and-play kernel pipeline. Our implementation is in single-precision floating point and has been ported onto the BEE3 platform. Experimental results show that our FPGA implementation can generate fairly high performance without sacrificing flexibility for various data-sizes and kernel functions. We demonstrate up to 8X speedup and up to 27 times higher performance-per-watt over a comparable CPU implementation and up to 20% higher performance-per-watt when compared to a relevant GPU implementation.
UR - http://www.scopus.com/inward/record.url?scp=77954260127&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954260127&partnerID=8YFLogxK
U2 - 10.1109/FCCM.2010.13
DO - 10.1109/FCCM.2010.13
M3 - Conference contribution
AN - SCOPUS:77954260127
SN - 9780769540566
T3 - Proceedings - IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2010
SP - 19
EP - 26
BT - Proceedings - IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2010
T2 - 18th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2010
Y2 - 2 May 2010 through 4 May 2010
ER -