ParsCit: An open-source CRF reference string parsing package

Isaac G. Councill, C. Lee Giles, Min Yen Kan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

264 Scopus citations

Abstract

We describe ParsCit, a freely available, open-source implementation of a reference string parsing package. At the core of ParsCit is a trained conditional random field (CRF) model used to label the token sequences in the reference string. A heuristic model wraps this core with added functionality to identify reference strings from a plain text file, and to retrieve the citation contexts. The package comes with utilities to run it as a web service or as a standalone utility. We compare ParsCit on three distinct reference string datasets and show that it compares well with other previously published work.

Original languageEnglish (US)
Title of host publicationProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
PublisherEuropean Language Resources Association (ELRA)
Pages661-667
Number of pages7
ISBN (Electronic)2951740840, 9782951740846
StatePublished - 2008
Event6th International Conference on Language Resources and Evaluation, LREC 2008 - Marrakech, Morocco
Duration: May 28 2008May 30 2008

Publication series

NameProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008

Other

Other6th International Conference on Language Resources and Evaluation, LREC 2008
Country/TerritoryMorocco
CityMarrakech
Period5/28/085/30/08

All Science Journal Classification (ASJC) codes

  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics
  • Education

Fingerprint

Dive into the research topics of 'ParsCit: An open-source CRF reference string parsing package'. Together they form a unique fingerprint.

Cite this