Sparse topic models by parameter sharing

Hossein Soleimani, David J. Miller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We propose a sparse Bayesian topic model, based on parameter sharing, for modeling text corpora. In Latent Dirichlet Allocation (LDA), each topic models all words, even though many words are not topic-specific, i.e. have similar occurrence frequencies across different topics. We propose a sparser approach by introducing a universal shared model, used by each topic to model the subset of words that are not topic-specific. A Bernoulli random variable is associated with each word under every topic, determining whether that word is modeled topic-specifically, with a free parameter, or by the shared model, with a common parameter. Results of our experiments show that our model achieves sparser topic presence in documents and higher test likelihood than LDA.

Original languageEnglish (US)
Title of host publicationIEEE International Workshop on Machine Learning for Signal Processing, MLSP
EditorsMamadou Mboup, Tulay Adali, Eric Moreau, Jan Larsen
PublisherIEEE Computer Society
ISBN (Electronic)9781479936946
DOIs
StatePublished - Nov 14 2014
Event2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014 - Reims, France
Duration: Sep 21 2014Sep 24 2014

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Other

Other2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014
Country/TerritoryFrance
CityReims
Period9/21/149/24/14

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Signal Processing

Fingerprint

Dive into the research topics of 'Sparse topic models by parameter sharing'. Together they form a unique fingerprint.

Cite this