Automatic Extraction of Data Points and Text Blocks from 2-Dimensional Plots in Digital Documents

Saurabh Kataria, William Browuer, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can be extracted automatically from these 2-D plots, thus eliminating a time consuming manual process. Our information extraction algorithm identifies the axes of the figures, extracts text blocks like axes-labels and legends and identifies data points in the figure. It also extracts the units appearing in the axes labels and segments the legends to identify the different lines in the legend, the different symbols and their associated text explanations. Our algorithm also performs the challenging task of separating out overlapping text and data points effectively. Our experiments indicate that these techniques are computationally efficient and provide acceptable accuracy.

Original languageEnglish (US)
Title of host publicationProceedings of the 23rd AAAI Conference on Artificial Intelligence, AAAI 2008
PublisherAAAI press
Pages1169-1174
Number of pages6
ISBN (Electronic)9781577353683
StatePublished - 2008
Event23rd AAAI Conference on Artificial Intelligence, AAAI 2008 - Chicago, United States
Duration: Jul 13 2008Jul 17 2008

Publication series

NameProceedings of the 23rd AAAI Conference on Artificial Intelligence, AAAI 2008

Conference

Conference23rd AAAI Conference on Artificial Intelligence, AAAI 2008
Country/TerritoryUnited States
CityChicago
Period7/13/087/17/08

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Automatic Extraction of Data Points and Text Blocks from 2-Dimensional Plots in Digital Documents'. Together they form a unique fingerprint.

Cite this