Abstract
Scientific literature is increasingly becoming available on the World Wide Web. This paper considers the matching of citations found in different papers in order to autonomously construct a citation index from papers in electronic format. Citation indices of scientific literature have traditionally been constructed manually, partly because it can be difficult to autonomously determine if two citations refer to the same paper (citations can be written in many different formats). We present four algorithms for autonomous citation matching. The algorithms are based on edit-distance computation, word matching, word and phrase matching, and subfield extraction. The word and phrase matching algorithm obtains the lowest error rate, and the subfield algorithm is the most computationally efficient. We quantitatively compare the accuracy and efficiency of the algorithms on a number of datasets.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the International Conference on Autonomous Agents |
Pages | 392-393 |
Number of pages | 2 |
State | Published - 1999 |
Event | Proceedings of the 1999 3rd International Conference on Autonomous Agents - Seattle, WA, USA Duration: May 1 1999 → May 5 1999 |
Other
Other | Proceedings of the 1999 3rd International Conference on Autonomous Agents |
---|---|
City | Seattle, WA, USA |
Period | 5/1/99 → 5/5/99 |
All Science Journal Classification (ASJC) codes
- General Engineering