TY - GEN
T1 - Automatic Extraction of Data Points and Text Blocks from 2-Dimensional Plots in Digital Documents
AU - Kataria, Saurabh
AU - Browuer, William
AU - Mitra, Prasenjit
AU - Giles, C. Lee
N1 - Funding Information:
This work supported in part by the National Science Foundation.
Publisher Copyright:
Copyright © 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2008
Y1 - 2008
N2 - Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can be extracted automatically from these 2-D plots, thus eliminating a time consuming manual process. Our information extraction algorithm identifies the axes of the figures, extracts text blocks like axes-labels and legends and identifies data points in the figure. It also extracts the units appearing in the axes labels and segments the legends to identify the different lines in the legend, the different symbols and their associated text explanations. Our algorithm also performs the challenging task of separating out overlapping text and data points effectively. Our experiments indicate that these techniques are computationally efficient and provide acceptable accuracy.
AB - Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can be extracted automatically from these 2-D plots, thus eliminating a time consuming manual process. Our information extraction algorithm identifies the axes of the figures, extracts text blocks like axes-labels and legends and identifies data points in the figure. It also extracts the units appearing in the axes labels and segments the legends to identify the different lines in the legend, the different symbols and their associated text explanations. Our algorithm also performs the challenging task of separating out overlapping text and data points effectively. Our experiments indicate that these techniques are computationally efficient and provide acceptable accuracy.
UR - http://www.scopus.com/inward/record.url?scp=84978325054&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84978325054&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84978325054
T3 - Proceedings of the 23rd AAAI Conference on Artificial Intelligence, AAAI 2008
SP - 1169
EP - 1174
BT - Proceedings of the 23rd AAAI Conference on Artificial Intelligence, AAAI 2008
PB - AAAI press
T2 - 23rd AAAI Conference on Artificial Intelligence, AAAI 2008
Y2 - 13 July 2008 through 17 July 2008
ER -