Optimizing Word2Vec performance on multicore systems

Vasudevan Rengasamy, Tao Yang Fu, Wang Chien Lee, Kamesh Madduri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations

Abstract

The Skip-gram with negative sampling (SGNS) method ofWord2Vec is an unsupervised approach to map words in a text corpus to low dimensional real vectors. The learned vectors capture semantic relationships between co-occurring words and can be used as inputs to many natural language processing and machine learning tasks. There are several high-performance implementations of the Word2Vec SGNS method. In this paper, we introduce a new optimization called context combining to further boost SGNS performance on multicore systems. For processing the One Billion Word benchmark dataset on a 16-core platform, we show that our approach is 3.53× faster than the original multithreadedWord2Vec implementation and 1.28× faster than a recent parallel Word2Vec implementation. We also show that our accuracy on benchmark queries is comparable to state-of-the-art implementations.

Original languageEnglish (US)
Title of host publicationProceedings of IA3 2017
Subtitle of host publication7th Workshop on Irregular Applications: Architectures and Algorithms, Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450351362
DOIs
StatePublished - Nov 12 2017
Event7th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2017 - Denver, United States
Duration: Nov 12 2017Nov 17 2017

Publication series

NameProceedings of IA3 2017: 7th Workshop on Irregular Applications: Architectures and Algorithms, Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis

Other

Other7th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2017
Country/TerritoryUnited States
CityDenver
Period11/12/1711/17/17

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Modeling and Simulation
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Optimizing Word2Vec performance on multicore systems'. Together they form a unique fingerprint.

Cite this