TY - GEN
T1 - A reconfigurable accelerator for neuromorphic object recognition
AU - Sabarad, Jagdish
AU - Kestur, Srinidhi
AU - Park, Mi Sun
AU - Dantara, Dharav
AU - Narayanan, Vijaykrishnan
AU - Chen, Yang
AU - Khosla, Deepak
PY - 2012
Y1 - 2012
N2 - Advances in neuroscience have enabled researchers to develop computational models of auditory, visual and learning perceptions in the human brain. HMAX, which is a biologically inspired model of the visual cortex, has been shown to outperform standard computer vision approaches for multi-class object recognition. HMAX, while computationally demanding, can be potentially applied in various applications such as autonomous vehicle navigation, unmanned surveillance and robotics. In this paper, we present a reconfigurable hardware accelerator for the time-consuming S2 stage of the HMAX model. The accelerator leverages spatial parallelism, dedicated wide data buses with on-chip memories to provide an energy efficient solution to enable adoption into embedded systems. We present a systolic array-based architecture which includes a run-time reconfigurable convolution engine which can perform multiple variable-sized convolutions in parallel. An automation flow is described for this accelerator which can generate optimal hardware configurations for a given algorithmic specification and also perform run-time configuration and execution seamlessly. Experimental results on Virtex-6 FPGA platforms show 5X to 11X speedups and 14X to 33X higher performance-per-Watt over a CNS-based implementation on a Tesla GPU.
AB - Advances in neuroscience have enabled researchers to develop computational models of auditory, visual and learning perceptions in the human brain. HMAX, which is a biologically inspired model of the visual cortex, has been shown to outperform standard computer vision approaches for multi-class object recognition. HMAX, while computationally demanding, can be potentially applied in various applications such as autonomous vehicle navigation, unmanned surveillance and robotics. In this paper, we present a reconfigurable hardware accelerator for the time-consuming S2 stage of the HMAX model. The accelerator leverages spatial parallelism, dedicated wide data buses with on-chip memories to provide an energy efficient solution to enable adoption into embedded systems. We present a systolic array-based architecture which includes a run-time reconfigurable convolution engine which can perform multiple variable-sized convolutions in parallel. An automation flow is described for this accelerator which can generate optimal hardware configurations for a given algorithmic specification and also perform run-time configuration and execution seamlessly. Experimental results on Virtex-6 FPGA platforms show 5X to 11X speedups and 14X to 33X higher performance-per-Watt over a CNS-based implementation on a Tesla GPU.
UR - http://www.scopus.com/inward/record.url?scp=84859945430&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859945430&partnerID=8YFLogxK
U2 - 10.1109/ASPDAC.2012.6165067
DO - 10.1109/ASPDAC.2012.6165067
M3 - Conference contribution
AN - SCOPUS:84859945430
SN - 9781467307727
T3 - Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC
SP - 813
EP - 818
BT - ASP-DAC 2012 - 17th Asia and South Pacific Design Automation Conference
T2 - 17th Asia and South Pacific Design Automation Conference, ASP-DAC 2012
Y2 - 30 January 2012 through 2 February 2012
ER -