TY - JOUR
T1 - Parallel SRP-PHAT for GPUs
AU - Lee, Taewoo
AU - Chang, Sukmoon
AU - Yook, Dongsuk
N1 - Publisher Copyright:
© 2015 Elsevier Ltd. All rights reserved.
PY - 2016/6/4
Y1 - 2016/6/4
N2 - The steered response power phase transform (SRP-PHAT) is one of the widely used algorithms for sound source localization. Since it must examine a large number of candidate sound source locations, conventional SRP-PHAT approaches may not be used in real time. To overcome this problem, an effort was made previously to parallelize the SRP-PHAT on graphics processing units (GPUs). However, the full capacities of the GPU were not exploited since on-chip memory usage was not addressed. In this paper, we propose GPU-based parallel algorithms of the SRP-PHAT both in the frequency domain and time domain. The proposed methods optimize the memory access patterns of the SRP-PHAT and efficiently use the on-chip memory. As a result, the proposed methods demonstrate a speedup of 1276 times in the frequency domain and 80 times in the time domain compared to CPU-based algorithms, and 1.5 times in the frequency domain and 6 times in the time domain compared to conventional GPU-based methods.
AB - The steered response power phase transform (SRP-PHAT) is one of the widely used algorithms for sound source localization. Since it must examine a large number of candidate sound source locations, conventional SRP-PHAT approaches may not be used in real time. To overcome this problem, an effort was made previously to parallelize the SRP-PHAT on graphics processing units (GPUs). However, the full capacities of the GPU were not exploited since on-chip memory usage was not addressed. In this paper, we propose GPU-based parallel algorithms of the SRP-PHAT both in the frequency domain and time domain. The proposed methods optimize the memory access patterns of the SRP-PHAT and efficiently use the on-chip memory. As a result, the proposed methods demonstrate a speedup of 1276 times in the frequency domain and 80 times in the time domain compared to CPU-based algorithms, and 1.5 times in the frequency domain and 6 times in the time domain compared to conventional GPU-based methods.
UR - http://www.scopus.com/inward/record.url?scp=84930958953&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84930958953&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2015.05.002
DO - 10.1016/j.csl.2015.05.002
M3 - Article
AN - SCOPUS:84930958953
SN - 0885-2308
VL - 35
SP - 1
EP - 13
JO - Computer Speech and Language
JF - Computer Speech and Language
ER -