TY - GEN
T1 - Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures
AU - Ryoo, Jihyun
AU - Arunachalam, Meena
AU - Khanna, Rahul
AU - Kandemir, Mahmut T.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/5/9
Y1 - 2018/5/9
N2 - Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x-3.48x (training) and 1.43x-9.41x (classification) on Xeon Phi series, compared to 0.05x-0.60x (training), 1.61x-6.32x (classification) achieved by the GPU version, both over the standard host-only system.
AB - Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x-3.48x (training) and 1.43x-9.41x (classification) on Xeon Phi series, compared to 0.05x-0.60x (training), 1.61x-6.32x (classification) achieved by the GPU version, both over the standard host-only system.
UR - http://www.scopus.com/inward/record.url?scp=85047936348&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047936348&partnerID=8YFLogxK
U2 - 10.1109/ISQED.2018.8357279
DO - 10.1109/ISQED.2018.8357279
M3 - Conference contribution
AN - SCOPUS:85047936348
T3 - Proceedings - International Symposium on Quality Electronic Design, ISQED
SP - 144
EP - 150
BT - 2018 19th International Symposium on Quality Electronic Design, ISQED 2018
PB - IEEE Computer Society
T2 - 19th International Symposium on Quality Electronic Design, ISQED 2018
Y2 - 13 March 2018 through 14 March 2018
ER -