ChartReader: Automatic Parsing of Bar-Plots

Chinmayee Rane, Seshasayee Mahadevan Subramanya, Devi Sandeep Endluri, Jian Wu, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

Scientific figures such as bar graphs are a critical part of scientific research and a predominant method used to represent trends and relationships in data. However, manually interpreting and extracting information from graphs is often tedious. Since data consumption has exponentially evolved over the past few decades, there is a need for automated data inference from these bar graphs. ChartReader presents a fully automated end-to-end framework that extracts data from bar graphs in scientific research papers focusing on process engineering and environmental science journals. ChartReader uses a deep learning-based classifier to determine the chart type of a given chart image. We then develop novel heuristic methods for analyzing scientific figures (text detection, pixel grouping, object detection) and address prime challenges like axis detection, legend parsing, and label detection. Our framework achieves 98% and 68% accuracy in parsing x-axis and y-axis ticks, respectively. It achieves 83% accuracy in parsing legends and 42% accuracy in parsing data values in the testing corpus. We compare the proposed method with state-of-the-art methods and address its limitations.

Original languageEnglish (US)
Title of host publicationProceedings - 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science, IRI 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages318-325
Number of pages8
ISBN (Electronic)9781665438759
DOIs
StatePublished - 2021
Event22nd IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2021 - Virtual, Online, United States
Duration: Aug 10 2021Aug 12 2021

Publication series

NameProceedings - 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science, IRI 2021

Conference

Conference22nd IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2021
Country/TerritoryUnited States
CityVirtual, Online
Period8/10/218/12/21

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'ChartReader: Automatic Parsing of Bar-Plots'. Together they form a unique fingerprint.

Cite this