Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents

Saurabh Kataria, William Browuer, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

22 Scopus citations

Abstract

Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can be extracted automatically from these 2-D plots, thus eliminating a time consuming manual process. Our information extraction algorithm identifies the axes of the figures, extracts text blocks like axes-labels and legends and identifies data points in the figure. It also extracts the units appearing in the axes labels and segments the legends to identify the different lines in the legend, the different symbols and their associated text explanations. Our algorithm also performs the challenging task of separating out overlapping text and data points effectively. Our experiments indicate that these techniques are computationally efficient and provide acceptable accuracy.

Original languageEnglish (US)
Title of host publicationAAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference
Pages1169-1174
Number of pages6
StatePublished - Dec 24 2008
Event23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08 - Chicago, IL, United States
Duration: Jul 13 2008Jul 17 2008

Publication series

NameProceedings of the National Conference on Artificial Intelligence
Volume2

Other

Other23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08
Country/TerritoryUnited States
CityChicago, IL
Period7/13/087/17/08

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents'. Together they form a unique fingerprint.

Cite this