TY - GEN
T1 - SNN-ANN Hybrid Networks for Embedded Multimodal Monocular Depth Estimation
AU - Tumpa, Sadia Anjum
AU - Devulapally, Anusha
AU - Brehove, Matthew
AU - Kyubwa, Espoir
AU - Narayanan, Vijaykrishnan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Monocular depth estimation is a crucial task in many embedded vision systems with numerous applications in autonomous driving, robotics and augmented reality. Traditional methods often rely only on frame-based approaches, which struggle in dynamic scenes due to their limitations, while event-based cameras offer complementary high temporal resolution, though they lack spatial resolution and context. We propose a novel embedded multimodal monocular depth estimation framework using a hybrid spiking neural network (SNN) and artificial neural network (ANN) architecture. This framework leverages a custom accelerator, TransPIM for efficient transformer deployment, enabling real-time depth estimation on embedded systems. Our approach leverages the advantages of both frame-based and event-based cameras, where SNN extracts low-level features and generates sparse representations from events, which are then fed into an ANN with frame-based input for estimating depth. The SNN-ANN hybrid architecture allows for efficient processing of both RGB and event data showing competitive performance across different accuracy metrics in depth estimation with standard benchmark MVSEC and DENSE dataset. To make it accessible to embedded system we deploy it on TransPIM enabling 9x speedup and 183× lower energy consumption compared to standard GPUs opening up new possibilities for various embedded system applications.
AB - Monocular depth estimation is a crucial task in many embedded vision systems with numerous applications in autonomous driving, robotics and augmented reality. Traditional methods often rely only on frame-based approaches, which struggle in dynamic scenes due to their limitations, while event-based cameras offer complementary high temporal resolution, though they lack spatial resolution and context. We propose a novel embedded multimodal monocular depth estimation framework using a hybrid spiking neural network (SNN) and artificial neural network (ANN) architecture. This framework leverages a custom accelerator, TransPIM for efficient transformer deployment, enabling real-time depth estimation on embedded systems. Our approach leverages the advantages of both frame-based and event-based cameras, where SNN extracts low-level features and generates sparse representations from events, which are then fed into an ANN with frame-based input for estimating depth. The SNN-ANN hybrid architecture allows for efficient processing of both RGB and event data showing competitive performance across different accuracy metrics in depth estimation with standard benchmark MVSEC and DENSE dataset. To make it accessible to embedded system we deploy it on TransPIM enabling 9x speedup and 183× lower energy consumption compared to standard GPUs opening up new possibilities for various embedded system applications.
UR - http://www.scopus.com/inward/record.url?scp=85206207310&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85206207310&partnerID=8YFLogxK
U2 - 10.1109/ISVLSI61997.2024.00045
DO - 10.1109/ISVLSI61997.2024.00045
M3 - Conference contribution
AN - SCOPUS:85206207310
T3 - Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
SP - 198
EP - 203
BT - 2024 IEEE Computer Society Annual Symposium on VLSI
A2 - Thapliyal, Himanshu
A2 - Becker, Jurgen
PB - IEEE Computer Society
T2 - 2024 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2024
Y2 - 1 July 2024 through 3 July 2024
ER -