TY - GEN
T1 - Optimizing Word2Vec performance on multicore systems
AU - Rengasamy, Vasudevan
AU - Fu, Tao Yang
AU - Lee, Wang Chien
AU - Madduri, Kamesh
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/11/12
Y1 - 2017/11/12
N2 - The Skip-gram with negative sampling (SGNS) method ofWord2Vec is an unsupervised approach to map words in a text corpus to low dimensional real vectors. The learned vectors capture semantic relationships between co-occurring words and can be used as inputs to many natural language processing and machine learning tasks. There are several high-performance implementations of the Word2Vec SGNS method. In this paper, we introduce a new optimization called context combining to further boost SGNS performance on multicore systems. For processing the One Billion Word benchmark dataset on a 16-core platform, we show that our approach is 3.53× faster than the original multithreadedWord2Vec implementation and 1.28× faster than a recent parallel Word2Vec implementation. We also show that our accuracy on benchmark queries is comparable to state-of-the-art implementations.
AB - The Skip-gram with negative sampling (SGNS) method ofWord2Vec is an unsupervised approach to map words in a text corpus to low dimensional real vectors. The learned vectors capture semantic relationships between co-occurring words and can be used as inputs to many natural language processing and machine learning tasks. There are several high-performance implementations of the Word2Vec SGNS method. In this paper, we introduce a new optimization called context combining to further boost SGNS performance on multicore systems. For processing the One Billion Word benchmark dataset on a 16-core platform, we show that our approach is 3.53× faster than the original multithreadedWord2Vec implementation and 1.28× faster than a recent parallel Word2Vec implementation. We also show that our accuracy on benchmark queries is comparable to state-of-the-art implementations.
UR - http://www.scopus.com/inward/record.url?scp=85040109693&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040109693&partnerID=8YFLogxK
U2 - 10.1145/3149704.3149768
DO - 10.1145/3149704.3149768
M3 - Conference contribution
AN - SCOPUS:85040109693
T3 - Proceedings of IA3 2017: 7th Workshop on Irregular Applications: Architectures and Algorithms, Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis
BT - Proceedings of IA3 2017
PB - Association for Computing Machinery, Inc
T2 - 7th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2017
Y2 - 12 November 2017 through 17 November 2017
ER -