Abstract
This paper addresses the problem of classifying Chinese unknown words into fine-grained semantic categories defined in a Chinese thesaurus. We describe three novel knowledge-based models that capture the relationship between the semantic categories of an unknown word and those of its component characters in three different ways. We then combine two of the knowledge-based models with a corpus-based model which classifies unknown words using contextual information. Experiments show that the knowledge-based models outperform previous methods on the same task, but the use of contextual information does not further improve performance.
Original language | English (US) |
---|---|
Pages | 188-195 |
Number of pages | 8 |
State | Published - 2007 |
Event | Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007 - Rochester, NY, United States Duration: Apr 22 2007 → Apr 27 2007 |
Other
Other | Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007 |
---|---|
Country/Territory | United States |
City | Rochester, NY |
Period | 4/22/07 → 4/27/07 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Linguistics and Language