A hybrid approach to discover semantic hierarchical sections in scholarly documents

Suppawong Tuarob, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

36 Scopus citations

Abstract

Scholarly documents are usually composed of sections, each of which serves a different purpose by conveying specific context. The ability to automatically identify sections would allow us to understand the semantics of what is different in different sections of documents, such as what was in the introduction, methodologies used, experimental types, trends, etc. We propose a set of hybrid algorithms to 1) automatically identify section boundaries, 2) recognize standard sections, and 3) build a hierarchy of sections. Our algorithms achieve an F-measure of 92.38% in section boundary detection, 96% accuracy (average) on standard section recognition, and 95.51% in accuracy in the section positioning task.

Original languageEnglish (US)
Title of host publication13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015 - Conference Proceedings
PublisherIEEE Computer Society
Pages1081-1085
Number of pages5
ISBN (Electronic)9781479918058
DOIs
StatePublished - Nov 20 2015
Event13th International Conference on Document Analysis and Recognition, ICDAR 2015 - Nancy, France
Duration: Aug 23 2015Aug 26 2015

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2015-November
ISSN (Print)1520-5363

Other

Other13th International Conference on Document Analysis and Recognition, ICDAR 2015
Country/TerritoryFrance
CityNancy
Period8/23/158/26/15

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'A hybrid approach to discover semantic hierarchical sections in scholarly documents'. Together they form a unique fingerprint.

Cite this