Abstract
The need for fast and vast dissemination of research results has led a new trend such that more number of authors post their documents to personal or group Web spaces so that others can easily access and download them. Similarly, more and more researchers use online search for accessing documents of interest in Web, instead of paying a visit to libraries. Currently, to locate and download an online copy of a particular document D, one typically (1) uses Search Engines with the citation information and browses through returned web pages (e.g., author's homepage) to see if any contains D, or (2) uses searching facilities of an individual Digital Library (e.g., CiteSeer, e-Print) looking for D, and if not found, repeats the search in another Digital Library. However, the scheme (1) involves human browsing to get to the final online copy, while the scheme (2) suffers from incomplete coverage. To remedy these shortcomings, in this paper, we present a system, named as PaSE, which can effectively locate online copies (e.g., PDF or PS) of scientific documents using citation information. We consider a myriad of alternatives in crawling and parsing the Web to arrive at the right document quickly, and present a preliminary experimental study. Using some of the best alternatives that we have identified, we show that PaSE can locate online copy of documents more accurately and conveniently than human users would do at the cost of elongated search time.
Original language | English (US) |
---|---|
Pages (from-to) | 408-418 |
Number of pages | 11 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 3334 |
DOIs | |
State | Published - 2004 |
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- General Computer Science