A comparison of on-line computer science citation databases

Vaclav Petricek, Ingemar J. Cox, Hui Han, Isaac G. Councill, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

This paper examines the difference and similarities between the two on-line computer science citation databases DBLP and CiteSeer. The database entries in DBLP are inserted manually while the CiteSeer entries are obtained autonomously via a crawl of the Web and automatic processing of user submissions. CiteSeer's autonomous citation database can be considered a form of self-selected on-line survey. It is important to understand the limitations of such databases, particularly when citation information is used to assess the performance of authors, institutions and funding bodies. We show that the CiteSeer database contains considerably fewer single author papers. This bias can be modeled by an exponential process with intuitive explanation. The model permits us to predict that the DBLP database covers approximately 24% of the entire literature of Computer Science. CiteSeer is also biased against low-cited papers. Despite their difference, both databases exhibit similar and significantly different citation distributions compared with previous analysis of the Physics community. In both databases, we also observe that the number of authors per paper has been increasing over time.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages438-449
Number of pages12
DOIs
StatePublished - 2005
Event9th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2005 - Vienna, Austria
Duration: Sep 18 2005Sep 23 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3652 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other9th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2005
Country/TerritoryAustria
CityVienna
Period9/18/059/23/05

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'A comparison of on-line computer science citation databases'. Together they form a unique fingerprint.

Cite this