TY - GEN
T1 - Scalable algorithms for scholarly figure mining and semantics
AU - Choudhury, Sagnik Ray
AU - Wang, Shuting
AU - Giles, C. Lee
N1 - Funding Information:
We gratefully acknowledge partial support from the National Science Foundation and Qatar Foundation.
Publisher Copyright:
© 2016 ACM.
PY - 2016/6/26
Y1 - 2016/6/26
N2 - Most scholarly papers contain one or multiple figures. Often these figures show experimental results, e.g, line graphs are used to compare various methods. Compared to the text of the paper, figures and their semantics have received relatively less attention. This has significantly limited semantic search capabilities in scholarly search engines. Here, we report scalable algorithms for generating semantic metadata for figures. Our system has four sequential modules: 1. Extraction of figure, caption and mention; 2. Binary classification of figures as compound (contains sub-figures) or not; 3. Three class classification of non compound figures as line graph, bar graph or others; and 4. Automatic processing of line graphs to generate a textual summary. In each step a metadata file is generated, each having richer information than the previous one. The algorithms are scalable yet each individual step has an accuracy greater than 80%.
AB - Most scholarly papers contain one or multiple figures. Often these figures show experimental results, e.g, line graphs are used to compare various methods. Compared to the text of the paper, figures and their semantics have received relatively less attention. This has significantly limited semantic search capabilities in scholarly search engines. Here, we report scalable algorithms for generating semantic metadata for figures. Our system has four sequential modules: 1. Extraction of figure, caption and mention; 2. Binary classification of figures as compound (contains sub-figures) or not; 3. Three class classification of non compound figures as line graph, bar graph or others; and 4. Automatic processing of line graphs to generate a textual summary. In each step a metadata file is generated, each having richer information than the previous one. The algorithms are scalable yet each individual step has an accuracy greater than 80%.
UR - http://www.scopus.com/inward/record.url?scp=85045211577&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045211577&partnerID=8YFLogxK
U2 - 10.1145/2928294.2928305
DO - 10.1145/2928294.2928305
M3 - Conference contribution
AN - SCOPUS:85045211577
SN - 9781450342995
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
BT - Proceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference
A2 - Gruenwald, Le
A2 - Groppe, Sven
PB - Association for Computing Machinery
T2 - 2016 International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference
Y2 - 1 July 2016
ER -