Abstract
We describe a utility-based feedback control model and its applications within an open access digital library search engine - CiteSeerX, the new version of Cite-Seer. CiteSeerX leverages user-based feedback to correct metadata and reformulate the citation graph. New documents are automatically crawled using a focused crawler for indexing. Those documents that are ingested have their document URLs automatically inspected so as to provide feedback to a whitelist filter, which automatically selects high quality crawl seed URLs. The changing citation count plus the download history of papers is an indicator of ill-conditioned metadata that needs correction. We believe that these feedback mechanisms effectively improve the overall metadata quality and save computational resources. Although these mechanisms are used in the context of CiteSeerX, we believe they can be readily transferred to other similar systems.
Original language | English (US) |
---|---|
State | Published - 2014 |
Event | 9th International Workshop on Feedback Computing - Philadelphia, United States Duration: Jun 17 2014 → Jun 20 2014 |
Conference
Conference | 9th International Workshop on Feedback Computing |
---|---|
Country/Territory | United States |
City | Philadelphia |
Period | 6/17/14 → 6/20/14 |
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Computer Science Applications
- Software
- Artificial Intelligence
- Modeling and Simulation