Abstract
This paper introduces a rule-based, context-dependent word clustering method, with the rules derived from various domain databases and the word text orthographic properties. Besides significant dimensionality reduction, our experiments show that such rule-based word clustering improves by 8% the overall accuracy of extracting bibliographic fields from references, and by 18.32% on average the class-specific performance on the line classification of document headers.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 445-446 |
| Number of pages | 2 |
| Journal | SIGIR Forum (ACM Special Interest Group on Information Retrieval) |
| Issue number | SPEC. ISS. |
| DOIs | |
| State | Published - 2003 |
| Event | Proceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003 - Toronto, Ont., Canada Duration: Jul 28 2003 → Aug 1 2003 |
All Science Journal Classification (ASJC) codes
- Management Information Systems
- Hardware and Architecture