Semantic clustering for a functional text classification task

Thomasand Lippincott, Rebecca Passonneau

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

We describe a semantic clustering method designed to address shortcomings in the common bag-of-words document representation for functional semantic classification tasks. The method uses WordNet-based distance metrics to construct a similarity matrix, and expectation maximization to find and represent clusters of semanticallyrelated terms. Using these clusters as features for machine learning helps maintain performance across distinct, domain-specific vocabularies while reducing the size of the document representation. We present promising results along these lines, and evaluate several algorithms and parameters that influence machine learning performance. We discuss limitations of the study and future work for optimizing and evaluating the method.

Original languageEnglish (US)
Title of host publicationComputational Linguistics and Intelligent Text Processing - 10th International Conference, CICLing 2009, Proceedings
Pages509-522
Number of pages14
DOIs
StatePublished - 2009
Event10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009 - Mexico City, Mexico
Duration: Mar 1 2009Mar 7 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5449 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009
Country/TerritoryMexico
CityMexico City
Period3/1/093/7/09

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Semantic clustering for a functional text classification task'. Together they form a unique fingerprint.

Cite this