TY - GEN
T1 - Architecture-Centric Bottleneck Analysis for Deep Neural Network Applications
AU - Ryoo, Jihyun
AU - Fan, Mengran
AU - Tang, Xulong
AU - Jiang, Huaipan
AU - Arunachalam, Meena
AU - Naveen, Sharada
AU - Kandemir, Mahmut T.
PY - 2019/12
Y1 - 2019/12
N2 - The ever-growing complexity and popularity of machine learning and deep learning applications have motivated an urgent need of effective and efficient support for these applications on contemporary computing systems. In this paper, we thoroughly analyze the various DNN algorithms on three widely used architectures (CPU, GPU, and Xeon Phi). The DNN algorithms we choose for evaluation include i) Unet-for biomedical image segmentation, based on Convolutional Neural Network (CNN), ii) NMT-for neural machine translation based on Recurrent Neural Network (RNN), iii) ResNet-50, and iv) DenseNet-both for image processing based on CNNs. The ultimate goal of this paper is to answer four fundamental questions: i) whether the different DNN networks exhibit similar behavior on a given execution platform? ii) whether, across different platforms, a given DNN network exhibits different behaviors? iii) for the same execution platform and the same DNN network, whether different execution phases have different behaviors? and iv) are the current major general-purpose platforms tuned sufficiently well for different DNN algorithms? Motivated by these questions, we conduct an in-depth investigation of running DNN applications on modern systems. Specifically, we first identify the most time-consuming functions (hotspot functions) across different networks and platforms. Next, we characterize performance bottlenecks and discuss them in detail. Finally, we port selected hotspot functions to a cycle-accurate simulator, and use the results to direct architectural optimizations to better support DNN applications.
AB - The ever-growing complexity and popularity of machine learning and deep learning applications have motivated an urgent need of effective and efficient support for these applications on contemporary computing systems. In this paper, we thoroughly analyze the various DNN algorithms on three widely used architectures (CPU, GPU, and Xeon Phi). The DNN algorithms we choose for evaluation include i) Unet-for biomedical image segmentation, based on Convolutional Neural Network (CNN), ii) NMT-for neural machine translation based on Recurrent Neural Network (RNN), iii) ResNet-50, and iv) DenseNet-both for image processing based on CNNs. The ultimate goal of this paper is to answer four fundamental questions: i) whether the different DNN networks exhibit similar behavior on a given execution platform? ii) whether, across different platforms, a given DNN network exhibits different behaviors? iii) for the same execution platform and the same DNN network, whether different execution phases have different behaviors? and iv) are the current major general-purpose platforms tuned sufficiently well for different DNN algorithms? Motivated by these questions, we conduct an in-depth investigation of running DNN applications on modern systems. Specifically, we first identify the most time-consuming functions (hotspot functions) across different networks and platforms. Next, we characterize performance bottlenecks and discuss them in detail. Finally, we port selected hotspot functions to a cycle-accurate simulator, and use the results to direct architectural optimizations to better support DNN applications.
UR - http://www.scopus.com/inward/record.url?scp=85080144759&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85080144759&partnerID=8YFLogxK
U2 - 10.1109/HiPC.2019.00034
DO - 10.1109/HiPC.2019.00034
M3 - Conference contribution
T3 - Proceedings - 26th IEEE International Conference on High Performance Computing, HiPC 2019
SP - 205
EP - 214
BT - Proceedings - 26th IEEE International Conference on High Performance Computing, HiPC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th Annual IEEE International Conference on High Performance Computing, HiPC 2019
Y2 - 17 December 2019 through 20 December 2019
ER -