TY - GEN
T1 - Curve separation for line graphs in scholarly documents
AU - Choudhury, Sagnik Ray
AU - Wang, Shuting
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/9/1
Y1 - 2016/9/1
N2 - Line graphs are abundant in scholarly papers. They are usually generated from a data table and that data can not be accessed. One important step in an automated data extraction pipeline is the curve separation problem: segmenting the pixels into separate curves. Previous work in this domain has focused on raster graphics extracted from scholarly PDFs, whereas most scholarly plots are embedded as vector graphics. We report a system to extract these plots as SVG images and show how that can improve both the accuracy (90%) and the scalability (5-8 seconds) of the curve separation problem.
AB - Line graphs are abundant in scholarly papers. They are usually generated from a data table and that data can not be accessed. One important step in an automated data extraction pipeline is the curve separation problem: segmenting the pixels into separate curves. Previous work in this domain has focused on raster graphics extracted from scholarly PDFs, whereas most scholarly plots are embedded as vector graphics. We report a system to extract these plots as SVG images and show how that can improve both the accuracy (90%) and the scalability (5-8 seconds) of the curve separation problem.
UR - http://www.scopus.com/inward/record.url?scp=84989889341&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84989889341&partnerID=8YFLogxK
U2 - 10.1145/2910896.2925469
DO - 10.1145/2910896.2925469
M3 - Conference contribution
AN - SCOPUS:84989889341
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
SP - 277
EP - 278
BT - JCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016
Y2 - 19 June 2016 through 23 June 2016
ER -