TY - GEN
T1 - A hybrid approach to discover semantic hierarchical sections in scholarly documents
AU - Tuarob, Suppawong
AU - Mitra, Prasenjit
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/11/20
Y1 - 2015/11/20
N2 - Scholarly documents are usually composed of sections, each of which serves a different purpose by conveying specific context. The ability to automatically identify sections would allow us to understand the semantics of what is different in different sections of documents, such as what was in the introduction, methodologies used, experimental types, trends, etc. We propose a set of hybrid algorithms to 1) automatically identify section boundaries, 2) recognize standard sections, and 3) build a hierarchy of sections. Our algorithms achieve an F-measure of 92.38% in section boundary detection, 96% accuracy (average) on standard section recognition, and 95.51% in accuracy in the section positioning task.
AB - Scholarly documents are usually composed of sections, each of which serves a different purpose by conveying specific context. The ability to automatically identify sections would allow us to understand the semantics of what is different in different sections of documents, such as what was in the introduction, methodologies used, experimental types, trends, etc. We propose a set of hybrid algorithms to 1) automatically identify section boundaries, 2) recognize standard sections, and 3) build a hierarchy of sections. Our algorithms achieve an F-measure of 92.38% in section boundary detection, 96% accuracy (average) on standard section recognition, and 95.51% in accuracy in the section positioning task.
UR - http://www.scopus.com/inward/record.url?scp=84962602612&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84962602612&partnerID=8YFLogxK
U2 - 10.1109/ICDAR.2015.7333927
DO - 10.1109/ICDAR.2015.7333927
M3 - Conference contribution
AN - SCOPUS:84962602612
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 1081
EP - 1085
BT - 13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015 - Conference Proceedings
PB - IEEE Computer Society
T2 - 13th International Conference on Document Analysis and Recognition, ICDAR 2015
Y2 - 23 August 2015 through 26 August 2015
ER -