Costco: Robust content and structure constrained clustering of networked documents

Su Yan, Dongwon Lee, Alex Hai Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Connectivity analysis of networked documents provides high quality link structure information, which is usually lost upon a content-based learning system. It is well known that combining links and content has the potential to improve text analysis. However, exploiting link structure is non-trivial because links are often noisy and sparse. Besides, it is difficult to balance the term-based content analysis and the link-based structure analysis to reap the benefit of both. We introduce a novel networked document clustering technique that integrates the content and link information in a unified optimization framework. Under this framework, a novel dimensionality reduction method called COntent & STructure COnstrained (Costco) Feature Projection is developed. In order to extract robust link information from sparse and noisy link graphs, two link analysis methods are introduced. Experiments on benchmark data and diverse real-world text corpora validate the effectiveness of proposed methods.

Original languageEnglish (US)
Title of host publicationComputational Linguistics and Intelligent Text Processing - 12th International Conference, CICLing 2011, Proceedings
Pages289-300
Number of pages12
EditionPART 2
DOIs
StatePublished - 2011
Event12th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2011 - Tokyo, Japan
Duration: Feb 20 2011Feb 26 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume6609 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other12th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2011
Country/TerritoryJapan
CityTokyo
Period2/20/112/26/11

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Costco: Robust content and structure constrained clustering of networked documents'. Together they form a unique fingerprint.

Cite this