TY - GEN
T1 - Cluster vector space model
T2 - 67th Annual Conference and Expo of the Institute of Industrial Engineers 2017
AU - Julaiti, Juxihong
AU - Kumara, Soundar
PY - 2017/1/1
Y1 - 2017/1/1
N2 - The Word Vector Space Model (WVSM), a widely used one in text analytics, provides an elegant way to enable computers to understand natural languages by converting words into numbers. With impressive results in syntactic and semantic tasks in natural language processing (NLP), WVSM is widely used in many search engines, information retrieval systems, as well as text classification. However, because the basic elements of the feature space are words the model has a high dimensionality and is at risk of overfitting. An advanced prediction system with multiple models can easily have a longer training time under WVSM. In this paper, a Cluster Vector Space Model (CVSM) based on vector quantization is used for the dimensionality reduction. This method transfers a given word vector space into a much smaller cluster vector space. The results indicate that the CVSM, with less than 1% of the original feature size, works at least as well as the WVSM in binary classification problem; in multi-class classification problems, with less than 1% of the original feature size, CVSM increases the performance of decision tree model.
AB - The Word Vector Space Model (WVSM), a widely used one in text analytics, provides an elegant way to enable computers to understand natural languages by converting words into numbers. With impressive results in syntactic and semantic tasks in natural language processing (NLP), WVSM is widely used in many search engines, information retrieval systems, as well as text classification. However, because the basic elements of the feature space are words the model has a high dimensionality and is at risk of overfitting. An advanced prediction system with multiple models can easily have a longer training time under WVSM. In this paper, a Cluster Vector Space Model (CVSM) based on vector quantization is used for the dimensionality reduction. This method transfers a given word vector space into a much smaller cluster vector space. The results indicate that the CVSM, with less than 1% of the original feature size, works at least as well as the WVSM in binary classification problem; in multi-class classification problems, with less than 1% of the original feature size, CVSM increases the performance of decision tree model.
UR - http://www.scopus.com/inward/record.url?scp=85030989592&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85030989592&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85030989592
T3 - 67th Annual Conference and Expo of the Institute of Industrial Engineers 2017
SP - 428
EP - 433
BT - 67th Annual Conference and Expo of the Institute of Industrial Engineers 2017
A2 - Nembhard, Harriet B.
A2 - Coperich, Katie
A2 - Cudney, Elizabeth
PB - Institute of Industrial Engineers
Y2 - 20 May 2017 through 23 May 2017
ER -