TY - GEN
T1 - Automatic extraction of data from bar charts
AU - Al-Zaidy, Rabah A.
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/10/7
Y1 - 2015/10/7
N2 - Scientific charts are an effective tool to visualize numerical data trends. They appear in a wide range of contexts, from experimental results in scientific papers to statistical anal- yses in business reports. The abundance of scientific charts in the web has made it inevitable for search engines to in- clude them as indexed content. However, the queries based on only the textual data used to tag the images can limit query results. Many studies exist to address the extraction of data from scientific diagrams in order to improve search results. In our approach to achieving this goal, we attempt to enhance the semantic labeling of the charts by using the original data values that these charts were designed to rep- resent. In this paper, we describe a method to extract data values from a specific class of charts, bar charts. The extrac- tion process is fully automated using image processing and text recognition techniques combined with various heuristics derived from the graphical properties of bar charts. The ex- tracted information can be used to enrich the indexing con- tent for bar charts and improve search results. We evaluate the effectiveness of our method on bar charts drawn from the web as well as charts embedded in digital documents.
AB - Scientific charts are an effective tool to visualize numerical data trends. They appear in a wide range of contexts, from experimental results in scientific papers to statistical anal- yses in business reports. The abundance of scientific charts in the web has made it inevitable for search engines to in- clude them as indexed content. However, the queries based on only the textual data used to tag the images can limit query results. Many studies exist to address the extraction of data from scientific diagrams in order to improve search results. In our approach to achieving this goal, we attempt to enhance the semantic labeling of the charts by using the original data values that these charts were designed to rep- resent. In this paper, we describe a method to extract data values from a specific class of charts, bar charts. The extrac- tion process is fully automated using image processing and text recognition techniques combined with various heuristics derived from the graphical properties of bar charts. The ex- tracted information can be used to enrich the indexing con- tent for bar charts and improve search results. We evaluate the effectiveness of our method on bar charts drawn from the web as well as charts embedded in digital documents.
UR - http://www.scopus.com/inward/record.url?scp=84997523778&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84997523778&partnerID=8YFLogxK
U2 - 10.1145/2815833.2816956
DO - 10.1145/2815833.2816956
M3 - Conference contribution
AN - SCOPUS:84997523778
T3 - Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015
BT - Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015
PB - Association for Computing Machinery, Inc
T2 - 8th International Conference on Knowledge Capture, K-CAP 2015
Y2 - 7 October 2015 through 10 October 2015
ER -