TY - GEN
T1 - Semantic clustering for a functional text classification task
AU - Lippincott, Thomasand
AU - Passonneau, Rebecca
PY - 2009
Y1 - 2009
N2 - We describe a semantic clustering method designed to address shortcomings in the common bag-of-words document representation for functional semantic classification tasks. The method uses WordNet-based distance metrics to construct a similarity matrix, and expectation maximization to find and represent clusters of semanticallyrelated terms. Using these clusters as features for machine learning helps maintain performance across distinct, domain-specific vocabularies while reducing the size of the document representation. We present promising results along these lines, and evaluate several algorithms and parameters that influence machine learning performance. We discuss limitations of the study and future work for optimizing and evaluating the method.
AB - We describe a semantic clustering method designed to address shortcomings in the common bag-of-words document representation for functional semantic classification tasks. The method uses WordNet-based distance metrics to construct a similarity matrix, and expectation maximization to find and represent clusters of semanticallyrelated terms. Using these clusters as features for machine learning helps maintain performance across distinct, domain-specific vocabularies while reducing the size of the document representation. We present promising results along these lines, and evaluate several algorithms and parameters that influence machine learning performance. We discuss limitations of the study and future work for optimizing and evaluating the method.
UR - http://www.scopus.com/inward/record.url?scp=67650541343&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650541343&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-00382-0_41
DO - 10.1007/978-3-642-00382-0_41
M3 - Conference contribution
AN - SCOPUS:67650541343
SN - 3642003818
SN - 9783642003813
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 509
EP - 522
BT - Computational Linguistics and Intelligent Text Processing - 10th International Conference, CICLing 2009, Proceedings
T2 - 10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009
Y2 - 1 March 2009 through 7 March 2009
ER -