Abstract
Scene text detection (STD) and recognition (STR) methods have recently greatly improved with the use of synthetic training data playing an important role. That being said, for text detection task the performance of a model that is trained sorely on large-scale synthetic data is significantly worse than one trained on a few real-world data samples. However, state-of-the-art performance on text recognition can be achieved by only training on synthetic data [10]. This shows the limitations in only using large-scale synthetic data for scene text detection. In this work, we propose the first learning-based, data-driven text synthesis engine for scene text detection task. Our text synthesis engine is decomposed into two modules: 1) a location module that learns the distribution of text locations on the image plane, and 2) an appearance module that translates the text-inserted images to realistic-looking ones that are essentially indistinguishable from real-world scene text images. Evaluation of our created synthetic data on ICDAR 2015 Incidental Scene Text dataset [15] outperforms previous text synthesis methods.
Original language | English (US) |
---|---|
State | Published - 2020 |
Event | 30th British Machine Vision Conference, BMVC 2019 - Cardiff, United Kingdom Duration: Sep 9 2019 → Sep 12 2019 |
Conference
Conference | 30th British Machine Vision Conference, BMVC 2019 |
---|---|
Country/Territory | United Kingdom |
City | Cardiff |
Period | 9/9/19 → 9/12/19 |
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition