[简体中文]

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

Pengyuan Lyu Cong Yao Wenhao Wu Shuicheng Yan Xiang Bai
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition [pdf]

Abstract

      Previous deep learning based state-of-the-art scene text detection methods can be roughly classified into two categories. The first category treats scene text as a type of general objects and follows general object detection paradigm to localize scene text by regressing the text box locations, but troubled by the arbitrary-orientation and large aspect ratios of scene text. The second one segments text regions directly, but mostly needs complex post processing. In this paper, we present a method that combines the ideas of the two types of methods while avoiding their shortcomings. We propose to detect scene text by localizing corner points of text bounding boxes and segmenting text regions in rel- ative positions. In inference stage, candidate boxes are generated by sampling and grouping corner points, which are further scored by segmentation maps and suppressed by NMS. Compared with previous methods, our method can handle long oriented text naturally and doesn’t need complex post processing. The experiments on ICDAR2013, ICDAR2015, MSRA-TD500, MLT and COCO-Text demonstrate that the proposed algorithm achieves better or comparable results in both accuracy and efficiency. Based on VGG16, it achieves an F measure of 84.3% on ICDAR2015 and 81.5% on MSRA-TD500.

 

Method

Overview of our method: Given an image, the network outputs corner points and segmentation maps by corner detection and position-sensitive segmentation. Then candidate boxes are generated by sampling and grouping corner points. Finally, those candidate boxes are scored by segmentation maps and suppressed by NMS.
Network Architecture: The network contains three parts: backbone, conner point detector and position-sensitive segmentation predictor. The backbone is adapted from DSSD. Conner point detectors are built on multiple feature layers (blocks in pink). position sensitive segmentation predictor shares some features (pink blocks) with corner point detectors.
Label generation for corner points detection and position-sensitive segmentation.
(a) Corner points are redefined and represented by squares (boxes in white, red, green, blue) with the side length set as the short side of text bounding box R (yellow box). (b) Corresponding ground truth of R in (a) for position sensitive segmentation.

 

Overview of the scoring process.
​​​​The yellow boxes in (a) are candidate boxes. (b) are predicted segmentation maps. We generate instance segment (c) of candidate boxes by assembling the segmentation maps. Scores are calculated by averaging the instance segment regions.

 

Results

Examples of detection results. From left to right in columns: ICDAR2015, ICDAR2013, MSRA-TD500, MLT, COCO-Text.
Results on ICDAR2015. ∗ means multi-scale, † stands for the base net of the model is not VGG16.
Results on ICDAR2013.
∗ means multi-scale, † stands for the base net of the model is not VGG16. Note that, the methods of the top three lines are evaluated under the ”ICDAR2013” evaluation protocol.
Results on MSRA-TD500. † stands for the base net of the model is not VGG16.
Results on COCO-Text. ∗ means multi-scale.
Results on MLT. ∗ means multi-scale.

BibTeX

@inproceedings{lyu2018multi,
  title={Multi-oriented scene text detection via corner localization and region segmentation},
  author={Lyu, Pengyuan and Yao, Cong and Wu, Wenhao and Yan, Shuicheng and Bai, Xiang},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={7553--7563},
  year={2018}
}

Join the Discussion