[简体中文]

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification (PAMI 2018)

Baoguang Shi Mingkun Yang Xinggang Wang Pengyuan Lyu Cong Yao Xiang Bai
IEEE transactions on pattern analysis and machine intelligence [pdf]
End-to-end recognition results on IC15. Red boxes are detected by TextBoxes. Green polygons are the rectified detections.

 


Abstract

A challenging aspect of scene text recognition is to handle text with distortions or irregular layout. In particular, perspective text and curved text are common in natural scenes and are difficult to recognize. In this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network. The rectification network adaptively transforms an input image into a new one, rectifying the text in it. It is powered by a flexible Thin-Plate Spline transformation which handles a variety of text irregularities and is trained without human annotations. The recognition network is an attentional sequence-to-sequence model that predicts a character sequence directly from the rectified image. The whole model is trained end to end, requiring only images and their groundtruth text. Through extensive experiments, we verify the effectiveness of the rectification and demonstrate the state-of-the-art recognition performance of ASTER. Furthermore, we demonstrate that ASTER is a powerful component in end-to-end recognition systems, for its ability to enhance the detector.

Method

Structure of the rectification network.
Structure of the basic text recognition network.
Some typical rectifications. Loosely bounded a), oriented
or perspectively distorted (b)(c), and curved text (d).
 

 


 

Results

Rectified images and recognition results by RARE and by ASTER.
Recognition errors are marked by red characters.

 

 

End-to-end results comparison. “End-to-End” and “Word-Spotting” are two different measures.
“Strong”, “Weak”, and “Generic” denote different lexicons.

 

BibTeX

@article{shi2018aster,
  title={ASTER: An Attentional Scene Text Recognizer with Flexible Rectification},
  author={Shi, Baoguang and Yang, Mingkun and Wang, Xinggang and Lyu, Pengyuan and Yao, Cong and Bai, Xiang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2018},
  publisher={IEEE}
}

Acknowledgements

This work was supported by National Key R&D Program of China No. 2018YFB1004600, NSFC 61733007, and NSFC 61573160

 

 

Join the Discussion