Abstract
This paper addresses the problem of classifying Chinese unknown words into fine-grained semantic categories defined in a Chinese thesaurus, Cilin (Mei et al. 1984). We present three novel knowledge-based models that capture the relationship between the semantic categories of an unknown word and those of its component characters in three different ways, and combine two of them with a corpus-based model that uses contextual information to classify unknown words. Experiments show that the combined knowledge-based model outperforms previous methods on the same task, but the use of contextual information does not further improve performance.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 99-128 |
| Number of pages | 30 |
| Journal | International Journal of Corpus Linguistics |
| Volume | 13 |
| Issue number | 1 |
| DOIs | |
| State | Published - 2008 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Linguistics and Language
Fingerprint
Dive into the research topics of 'Hybrid models for sense guessing 866 of Chinese unknown words'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver