Automatic extraction of data from bar charts

Rabah A. Al-Zaidy, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

46 Scopus citations

Abstract

Scientific charts are an effective tool to visualize numerical data trends. They appear in a wide range of contexts, from experimental results in scientific papers to statistical anal- yses in business reports. The abundance of scientific charts in the web has made it inevitable for search engines to in- clude them as indexed content. However, the queries based on only the textual data used to tag the images can limit query results. Many studies exist to address the extraction of data from scientific diagrams in order to improve search results. In our approach to achieving this goal, we attempt to enhance the semantic labeling of the charts by using the original data values that these charts were designed to rep- resent. In this paper, we describe a method to extract data values from a specific class of charts, bar charts. The extrac- tion process is fully automated using image processing and text recognition techniques combined with various heuristics derived from the graphical properties of bar charts. The ex- tracted information can be used to enrich the indexing con- tent for bar charts and improve search results. We evaluate the effectiveness of our method on bar charts drawn from the web as well as charts embedded in digital documents.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450338493
DOIs
StatePublished - Oct 7 2015
Event8th International Conference on Knowledge Capture, K-CAP 2015 - Palisades, United States
Duration: Oct 7 2015Oct 10 2015

Publication series

NameProceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015

Other

Other8th International Conference on Knowledge Capture, K-CAP 2015
Country/TerritoryUnited States
CityPalisades
Period10/7/1510/10/15

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Automatic extraction of data from bar charts'. Together they form a unique fingerprint.

Cite this