Topic and trend detection in text collections using latent dirichlet allocation

Levent Bolelli, Şeyda Ertekin, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

85 Scopus citations

Abstract

Algorithms that enable the process of automatically mining distinct topics in document collections have become increasingly important due to their applications in many fields and the extensive growth of the number of documents in various domains. In this paper, we propose a generative model based on latent Dirichlet allocation that integrates the temporal ordering of the documents into the generative process in an iterative fashion. The document collection is divided into time segments where the discovered topics in each segment is propagated to influence the topic discovery in the subsequent time segments. Our experimental results on a collection of academic papers from CiteSeer repository show that segmented topic model can effectively detect distinct topics and their evolution over time.

Original languageEnglish (US)
Title of host publicationAdvances in Information Retrieval - 31th European Conference on IR Research, ECIR 2009, Proceedings
Pages776-780
Number of pages5
DOIs
StatePublished - 2009
Event31th European Conference on Information Retrieval, ECIR 2009 - Toulouse, France
Duration: Apr 6 2009Apr 9 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5478 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other31th European Conference on Information Retrieval, ECIR 2009
Country/TerritoryFrance
CityToulouse
Period4/6/094/9/09

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Topic and trend detection in text collections using latent dirichlet allocation'. Together they form a unique fingerprint.

Cite this