TY - GEN
T1 - Fast and Resource-Efficient Ultrasound Segmentation Using FPGAs
AU - Kang, Joseph
AU - Al-Qurri, Ahmed
AU - Almekkawy, Mohamed
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Ultrasound image segmentation can benefit from the use of field-programmable gate arrays (FPGA) due to its time-sensitive nature and the frequent use of power-constrained devices. In this paper, we demonstrate the feasibility of deploying neural networks on FPGAs for ultrasound image segmentation. To this end, we aggressively compressed a U-Net model, reducing the parameter count by over 99%, from approximately 31 million to under 60,000. The resulting model was converted into a hardware description language using high-level synthesis. Training was performed on the Cardiac Acquisitions for Multi-structure Ultrasound Segmentation dataset using the Keras Python library. Despite the drastic reduction in model size, the network maintained a reasonable segmentation performance, achieving a Dice coefficient of 0.7352. The model was synthesized using the hls4ml framework, targeting the XCU250-FIGD2104-2L-E FPGA. In terms of inference latency, the FPGA implementation achieved speedups of over 50×, 13×, and 13× compared to Google Colab's central processing unit, T4 graphics processing unit, and tensor processing unit v2-8, respectively. While lookup table, flip-flop, and random access memory usage remained within acceptable limits, the high number of multiplications in the network caused digital signal processing usage to exceed the available capacity of this specific FPGA, indicating the need for further architectural optimization.
AB - Ultrasound image segmentation can benefit from the use of field-programmable gate arrays (FPGA) due to its time-sensitive nature and the frequent use of power-constrained devices. In this paper, we demonstrate the feasibility of deploying neural networks on FPGAs for ultrasound image segmentation. To this end, we aggressively compressed a U-Net model, reducing the parameter count by over 99%, from approximately 31 million to under 60,000. The resulting model was converted into a hardware description language using high-level synthesis. Training was performed on the Cardiac Acquisitions for Multi-structure Ultrasound Segmentation dataset using the Keras Python library. Despite the drastic reduction in model size, the network maintained a reasonable segmentation performance, achieving a Dice coefficient of 0.7352. The model was synthesized using the hls4ml framework, targeting the XCU250-FIGD2104-2L-E FPGA. In terms of inference latency, the FPGA implementation achieved speedups of over 50×, 13×, and 13× compared to Google Colab's central processing unit, T4 graphics processing unit, and tensor processing unit v2-8, respectively. While lookup table, flip-flop, and random access memory usage remained within acceptable limits, the high number of multiplications in the network caused digital signal processing usage to exceed the available capacity of this specific FPGA, indicating the need for further architectural optimization.
UR - https://www.scopus.com/pages/publications/105021830193
UR - https://www.scopus.com/pages/publications/105021830193#tab=citedBy
U2 - 10.1109/IUS62464.2025.11201783
DO - 10.1109/IUS62464.2025.11201783
M3 - Conference contribution
AN - SCOPUS:105021830193
T3 - IEEE International Ultrasonics Symposium, IUS
BT - 2025 IEEE International Ultrasonics Symposium, IUS 2025
PB - IEEE Computer Society
T2 - 2025 IEEE International Ultrasonics Symposium, IUS 2025
Y2 - 15 September 2025 through 18 September 2025
ER -