TY - GEN
T1 - Learning to extract semantic structure from documents using multimodal fully convolutional neural networks
AU - Yang, Xiao
AU - Yumer, Ersin
AU - Asente, Paul
AU - Kraley, Mike
AU - Kifer, Daniel
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/6
Y1 - 2017/11/6
N2 - We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.
AB - We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.
UR - http://www.scopus.com/inward/record.url?scp=85044271360&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85044271360&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2017.462
DO - 10.1109/CVPR.2017.462
M3 - Conference contribution
AN - SCOPUS:85044271360
T3 - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
SP - 4342
EP - 4351
BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Y2 - 21 July 2017 through 26 July 2017
ER -