Learning metadata from the evidence in an on-line citation matching scheme

Isaac G. Councill, Huajing Li, Ziming Zhuang, Sandip Debnath, Levent Bolelli, Wang Chien Lee, Anand Sivasubramaniam, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations

Abstract

Citation matching, or the automatic grouping of bibliographic references that refer to the same document, is a data management problem faced by automatic digital libraries for scientific literature such as CiteSeer and Google Scholar. Although several solutions have been offered for citation matching in large bibliographic databases, these solutions typically require expensive batch clustering operations that must be run offline. Large digital libraries containing citation information can reduce maintenance costs and provide new services through efficient online processing of citation data, resolving document citation relationships as new records become available. Additionally, information found in citations can be used to supplement document metadata, requiring the generation of a canonical citation record from merging variant citation subfields into a unified "best guess" from which to draw information. Citation information must be merged with other information sources in order to provide a complete document record. This paper outlines a system and algorithms for online citation matching and canonical metadata generation. A Bayesian framework is employed to build the ideal citation record for a document that carries the added advantages of fusing information from disparate sources and increasing system resilience to erroneous data.

Original languageEnglish (US)
Title of host publication6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006
Subtitle of host publicationOpening Information Horizons, JCDL '06
Pages276-285
Number of pages10
DOIs
StatePublished - 2006
Event6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06 - Chapel Hill, NC, United States
Duration: Jun 11 2006Jun 15 2006

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
Volume2006
ISSN (Print)1552-5996

Other

Other6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06
Country/TerritoryUnited States
CityChapel Hill, NC
Period6/11/066/15/06

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'Learning metadata from the evidence in an on-line citation matching scheme'. Together they form a unique fingerprint.

Cite this