TY - GEN
T1 - ChartReader
T2 - 22nd IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2021
AU - Rane, Chinmayee
AU - Subramanya, Seshasayee Mahadevan
AU - Endluri, Devi Sandeep
AU - Wu, Jian
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Scientific figures such as bar graphs are a critical part of scientific research and a predominant method used to represent trends and relationships in data. However, manually interpreting and extracting information from graphs is often tedious. Since data consumption has exponentially evolved over the past few decades, there is a need for automated data inference from these bar graphs. ChartReader presents a fully automated end-to-end framework that extracts data from bar graphs in scientific research papers focusing on process engineering and environmental science journals. ChartReader uses a deep learning-based classifier to determine the chart type of a given chart image. We then develop novel heuristic methods for analyzing scientific figures (text detection, pixel grouping, object detection) and address prime challenges like axis detection, legend parsing, and label detection. Our framework achieves 98% and 68% accuracy in parsing x-axis and y-axis ticks, respectively. It achieves 83% accuracy in parsing legends and 42% accuracy in parsing data values in the testing corpus. We compare the proposed method with state-of-the-art methods and address its limitations.
AB - Scientific figures such as bar graphs are a critical part of scientific research and a predominant method used to represent trends and relationships in data. However, manually interpreting and extracting information from graphs is often tedious. Since data consumption has exponentially evolved over the past few decades, there is a need for automated data inference from these bar graphs. ChartReader presents a fully automated end-to-end framework that extracts data from bar graphs in scientific research papers focusing on process engineering and environmental science journals. ChartReader uses a deep learning-based classifier to determine the chart type of a given chart image. We then develop novel heuristic methods for analyzing scientific figures (text detection, pixel grouping, object detection) and address prime challenges like axis detection, legend parsing, and label detection. Our framework achieves 98% and 68% accuracy in parsing x-axis and y-axis ticks, respectively. It achieves 83% accuracy in parsing legends and 42% accuracy in parsing data values in the testing corpus. We compare the proposed method with state-of-the-art methods and address its limitations.
UR - http://www.scopus.com/inward/record.url?scp=85123456092&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123456092&partnerID=8YFLogxK
U2 - 10.1109/IRI51335.2021.00050
DO - 10.1109/IRI51335.2021.00050
M3 - Conference contribution
AN - SCOPUS:85123456092
T3 - Proceedings - 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science, IRI 2021
SP - 318
EP - 325
BT - Proceedings - 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science, IRI 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 August 2021 through 12 August 2021
ER -