Curve separation for line graphs in scholarly documents

Sagnik Ray Choudhury, Shuting Wang, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Scopus citations

Abstract

Line graphs are abundant in scholarly papers. They are usually generated from a data table and that data can not be accessed. One important step in an automated data extraction pipeline is the curve separation problem: segmenting the pixels into separate curves. Previous work in this domain has focused on raster graphics extracted from scholarly PDFs, whereas most scholarly plots are embedded as vector graphics. We report a system to extract these plots as SVG images and show how that can improve both the accuracy (90%) and the scalability (5-8 seconds) of the curve separation problem.

Original languageEnglish (US)
Title of host publicationJCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages277-278
Number of pages2
ISBN (Electronic)9781450342292
DOIs
StatePublished - Sep 1 2016
Event16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016 - Newark, United States
Duration: Jun 19 2016Jun 23 2016

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
Volume2016-September
ISSN (Print)1552-5996

Other

Other16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016
Country/TerritoryUnited States
CityNewark
Period6/19/166/23/16

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'Curve separation for line graphs in scholarly documents'. Together they form a unique fingerprint.

Cite this