TY - JOUR
T1 - FPGA architecture for 2d discrete Fourier transform based on 2d decomposition for large-sized data
AU - Yu, Chi Li
AU - Kim, Jung Sub
AU - Deng, Lanping
AU - Kestur, Srinidhi
AU - Narayanan, Vijaykrishnan
AU - Chakrabarti, Chaitali
N1 - Funding Information:
Acknowledgements This work is supported in part by a grant from DARPA W911NF-05-1-0248. In addition, the authors gratefully acknowledge the help of Dr. Nikos Pitsianis and Dr. Xiaobai Sun of Duke University.
PY - 2011/7
Y1 - 2011/7
N2 - Applications based on Discrete Fourier Transforms (DFT) are extensively used in several areas of signal and digital image processing. Of particular interest is the two-dimensional (2D) DFT which is more computation-and bandwidth-intensive than the one-dimensional (1D) DFT. Traditionally, a 2D DFT is computed using Row-Column (RC) decomposition, where 1D DFTs are computed along the rows followed by 1D DFTs along the columns. Both application specific and reconfigurable hardware have utilized this scheme for high-performance implementations of 2D DFT. However, architectures based on RC decomposition are not efficient for large input size data due to memory bandwidth constraints. In this paper, we propose an efficient architecture to implement 2D DFT for large-sized input data based on a novel 2D decomposition algorithm. This architecture achieves very high throughput by exploiting the inherent parallelism due to the algorithm decomposition and by utilizing the row-wise burst access pattern of the external memory. A high throughput memory interface has been designed to enable maximum utilization of the memory bandwidth. In addition, an automatic system generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2K ×2K input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations.
AB - Applications based on Discrete Fourier Transforms (DFT) are extensively used in several areas of signal and digital image processing. Of particular interest is the two-dimensional (2D) DFT which is more computation-and bandwidth-intensive than the one-dimensional (1D) DFT. Traditionally, a 2D DFT is computed using Row-Column (RC) decomposition, where 1D DFTs are computed along the rows followed by 1D DFTs along the columns. Both application specific and reconfigurable hardware have utilized this scheme for high-performance implementations of 2D DFT. However, architectures based on RC decomposition are not efficient for large input size data due to memory bandwidth constraints. In this paper, we propose an efficient architecture to implement 2D DFT for large-sized input data based on a novel 2D decomposition algorithm. This architecture achieves very high throughput by exploiting the inherent parallelism due to the algorithm decomposition and by utilizing the row-wise burst access pattern of the external memory. A high throughput memory interface has been designed to enable maximum utilization of the memory bandwidth. In addition, an automatic system generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2K ×2K input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations.
UR - http://www.scopus.com/inward/record.url?scp=79956369368&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79956369368&partnerID=8YFLogxK
U2 - 10.1007/s11265-010-0500-y
DO - 10.1007/s11265-010-0500-y
M3 - Article
AN - SCOPUS:79956369368
SN - 1939-8018
VL - 64
SP - 109
EP - 122
JO - Journal of Signal Processing Systems
JF - Journal of Signal Processing Systems
IS - 1
ER -