TY - JOUR
T1 - Exploiting the value of class labels on high-dimensional feature spaces
T2 - topic models for semi-supervised document classification
AU - Soleimani, Hossein
AU - Miller, David J.
N1 - Publisher Copyright:
© 2017, Springer-Verlag London.
PY - 2019/5/1
Y1 - 2019/5/1
N2 - We propose a class-based mixture of topic models for classifying documents using both labeled and unlabeled examples (i.e., in a semi-supervised fashion). Most topic models incorporate documents’ class labels by generating them after generating the words. In these models, the training class labels have small effect on the estimated topics, as they are effectively treated as just another word, amongst a huge set of word features. In this paper, we propose to increase the influence of class labels on topic models by generating the words in each document conditioned on the class label. We show that our specific generative process improves classification performance with small loss in test set log-likelihood. Within our framework, we provide a principled mechanism to control the contributions of the class labels and the word space to the likelihood function. Experiments show our approach achieves better classification accuracy compared to some standard semi-supervised and supervised topic models.
AB - We propose a class-based mixture of topic models for classifying documents using both labeled and unlabeled examples (i.e., in a semi-supervised fashion). Most topic models incorporate documents’ class labels by generating them after generating the words. In these models, the training class labels have small effect on the estimated topics, as they are effectively treated as just another word, amongst a huge set of word features. In this paper, we propose to increase the influence of class labels on topic models by generating the words in each document conditioned on the class label. We show that our specific generative process improves classification performance with small loss in test set log-likelihood. Within our framework, we provide a principled mechanism to control the contributions of the class labels and the word space to the likelihood function. Experiments show our approach achieves better classification accuracy compared to some standard semi-supervised and supervised topic models.
UR - http://www.scopus.com/inward/record.url?scp=85020303160&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020303160&partnerID=8YFLogxK
U2 - 10.1007/s10044-017-0629-4
DO - 10.1007/s10044-017-0629-4
M3 - Article
AN - SCOPUS:85020303160
SN - 1433-7541
VL - 22
SP - 299
EP - 309
JO - Pattern Analysis and Applications
JF - Pattern Analysis and Applications
IS - 2
ER -