Persistence of information on the web: Analyzing citations contained in research articles

Steve Lawrence, Frans Coetzee, Eric Glover, Gary Flake, David Pennock, Bob Krovetz, Finn Nielsen, Andries Kruger, Lee Giles

Research output: Contribution to conferencePaperpeer-review

9 Scopus citations

Abstract

We analyze the persistence of information on the web, looking at the percentage of invalid URLs contained in academic articles within the CiteSeer (ResearchIndex) database. The number of URLs contained in the papers has increased from an average of 0.06 in 1993 to 1.6 in 1999. We found that a significant percentage of URLs are now invalid, ranging from 23% for 1999 articles, to 53% for 1994. We also found that for almost all of the invalid URLs, it was possible to locate the information (or highly related information) in an alternate location, primarily with the use of search engines. However, the ability to relocate missing information varied according to search experience and effort expended. Citation practices suggest that more information may be lost in the future unless these practices are improved. We discuss persistent URL standards and their usage, and give recommendations for citing URLs in research articles as well as for finding the new location of invalid URLs.

Original languageEnglish (US)
Pages235-242
Number of pages8
DOIs
StatePublished - 2000
Event9th International Conference on Information and Knowledge Management (CIKM 2000) - McLean, VA, United States
Duration: Nov 10 2000 → …

Other

Other9th International Conference on Information and Knowledge Management (CIKM 2000)
Country/TerritoryUnited States
CityMcLean, VA
Period11/10/00 → …

All Science Journal Classification (ASJC) codes

  • General Business, Management and Accounting
  • General Decision Sciences

Fingerprint

Dive into the research topics of 'Persistence of information on the web: Analyzing citations contained in research articles'. Together they form a unique fingerprint.

Cite this