Rule-based Word Clustering for Text Classification

Hui Han, Eren Manavoglu, C. Lee Giles, Hongyuan Zha

Research output: Contribution to journalConference articlepeer-review

12 Scopus citations


This paper introduces a rule-based, context-dependent word clustering method, with the rules derived from various domain databases and the word text orthographic properties. Besides significant dimensionality reduction, our experiments show that such rule-based word clustering improves by 8% the overall accuracy of extracting bibliographic fields from references, and by 18.32% on average the class-specific performance on the line classification of document headers.

Original languageEnglish (US)
Pages (from-to)445-446
Number of pages2
JournalSIGIR Forum (ACM Special Interest Group on Information Retrieval)
Issue numberSPEC. ISS.
StatePublished - 2003
EventProceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003 - Toronto, Ont., Canada
Duration: Jul 28 2003Aug 1 2003

All Science Journal Classification (ASJC) codes

  • Management Information Systems
  • Hardware and Architecture


Dive into the research topics of 'Rule-based Word Clustering for Text Classification'. Together they form a unique fingerprint.

Cite this