TY - GEN
T1 - Learning to read irregular text with attention mechanisms
AU - Yang, Xiao
AU - He, Dafang
AU - Zhou, Zihan
AU - Kifer, Daniel
AU - Lee Giles, C.
N1 - Funding Information:
We gratefully acknowledge partial support from NSF grant CCF 1317560 and a hardware grant from NVIDIA.
PY - 2017
Y1 - 2017
N2 - We present a robust end-to-end neural-based model to attentively recognize text in natural images. Particularly, we focus on accurately identifying irregular (perspectively distorted or curved) text, which has not been well addressed in the previous literature. Previous research on text reading often works with regular (horizontal and frontal) text and does not adequately generalize to processing text with perspective distortion or curving effects. Our work proposes to overcome this difficulty by introducing two learning components: (1) an auxiliary dense character detection task that helps to learn text specific visual patterns, (2) an alignment loss that provides guidance to the training of an attention model. We show with experiments that these two components are crucial for achieving fast convergence and high classification accuracy for irregular text recognition. Our model outperforms previous work on two irregular-text datasets: SVT-Perspective and CUTE80, and is also highly-competitive on several regular-text datasets containing primarily horizontal and frontal text.
AB - We present a robust end-to-end neural-based model to attentively recognize text in natural images. Particularly, we focus on accurately identifying irregular (perspectively distorted or curved) text, which has not been well addressed in the previous literature. Previous research on text reading often works with regular (horizontal and frontal) text and does not adequately generalize to processing text with perspective distortion or curving effects. Our work proposes to overcome this difficulty by introducing two learning components: (1) an auxiliary dense character detection task that helps to learn text specific visual patterns, (2) an alignment loss that provides guidance to the training of an attention model. We show with experiments that these two components are crucial for achieving fast convergence and high classification accuracy for irregular text recognition. Our model outperforms previous work on two irregular-text datasets: SVT-Perspective and CUTE80, and is also highly-competitive on several regular-text datasets containing primarily horizontal and frontal text.
UR - http://www.scopus.com/inward/record.url?scp=85031934691&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85031934691&partnerID=8YFLogxK
U2 - 10.24963/ijcai.2017/458
DO - 10.24963/ijcai.2017/458
M3 - Conference contribution
AN - SCOPUS:85031934691
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 3280
EP - 3286
BT - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017
A2 - Sierra, Carles
PB - International Joint Conferences on Artificial Intelligence
T2 - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017
Y2 - 19 August 2017 through 25 August 2017
ER -