Automatic summary generation for scientific data charts

Rabah A. Al-Zaidy, Sagnik Ray Choudhury, Clyde Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations


Scientific charts in the web, whether as images or embedded in digital documents, contain valuable information that is not fully available to information retrieval tools. The information used to describe these charts is typically extracted from the image metadata rather than the information the graphic was initially designed to express. The problem of understanding digital charts found in scholarly documents, and inferring useful textual information from their graphical components is the focus of this study. We present an approach to automatically read the chart data, specifically bar charts, and provide the user with a textual summary of the chart. The proposed method follows a knowledge discovery approach that relies on a versatile graph representation of the chart. This representation is derived from analyzing a chart's original data values, from which useful features are extracted. The data features are in turn used to construct a semantic-graph. To generate a summary, the semantic-graph of the chart is mapped to appropriately crafted protoforms, which are constructs based on fuzzy logic. We verify the effectiveness of our framework by conducting experiments on bar charts extracted from over 1,000 PDF documents. Our preliminary results show that, under certain assumptions, 83% of the produced summaries provide plausible descriptions of the bar charts.

Original languageEnglish (US)
Title of host publicationWS-16-01
Subtitle of host publicationArtificial Intelligence Applied to Assistive Technologies and Smart Environments; WS-16-02: AI, Ethics, and Society; WS-16-03: Artificial Intelligence for Cyber Security; WS-16-04: Artificial Intelligence for Smart Grids and Smart Buildings; WS-16-05: Beyond NP; WS-16-06: Computer Poker and Imperfect Information Games; WS-16-07: Declarative Learning Based Programming; WS-16-08: Expanding the Boundaries of Health Informatics Using AI; WS-16-09: Incentives and Trust in Electronic Communities; WS-16-10: Knowledge Extraction from Text; WS-16-11: Multiagent Interaction without Prior Coordination; WS-16-12: Planning for Hybrid Systems; WS-16-13: Scholarly Big Data: AI Perspectives, Challenges, and Ideas; WS-16-14: Symbiotic Cognitive Systems; WS-16-15: World Wide Web and Population Health Intelligence
PublisherAI Access Foundation
Number of pages6
ISBN (Electronic)9781577357599
StatePublished - 2016
Event30th AAAI Conference on Artificial Intelligence, AAAI 2016 - Phoenix, United States
Duration: Feb 12 2016Feb 17 2016

Publication series

NameAAAI Workshop - Technical Report
VolumeWS-16-01 - WS-16-15


Other30th AAAI Conference on Artificial Intelligence, AAAI 2016
Country/TerritoryUnited States

All Science Journal Classification (ASJC) codes

  • Engineering(all)


Dive into the research topics of 'Automatic summary generation for scientific data charts'. Together they form a unique fingerprint.

Cite this