ZTE Communications ›› 2022, Vol. 20 ›› Issue (4): 89-95.DOI: 10.12142/ZTECOM.202204011
• Research Paper • Previous Articles Next Articles
MEI Junjun1,2, GUAN Tao1,2(), TONG Junwen1,2
Received:
2021-12-28
Online:
2022-12-31
Published:
2022-12-30
About author:
MEI Junjun is a chief R&D engineer of ZTE Corporation in the field of audio and video, engaged in the research of the overall architecture of the integrated video cloud network and key technologies such as computer vision, audio and video coding, and audio and video transmission. He has presided over the R&D and design of a number of system solutions.|GUAN Tao (Supported by:
MEI Junjun, GUAN Tao, TONG Junwen. Label Enhancement for Scene Text Detection[J]. ZTE Communications, 2022, 20(4): 89-95.
Method | Precision/% | Recall/% | F-measure/% |
---|---|---|---|
ResNet-18 | 84.7 | 77.0 | 80.6 |
ResNet-18 + Dis | 86.5 | 80.6 | 83.5 |
ResNet-18 + Dis + Bor | 88.1 | 79.9 | 83.8 |
ResNet-50 | 90.5 | 77.9 | 83.7 |
ResNet-50 + Dis | 90.9 | 80.6 | 85.4 |
ResNet-50 + Dis + Bor | 93.8 | 81.7 | 87.3 |
Table 1 Ablation study results with different settings on MSRA-TD500 dataset
Method | Precision/% | Recall/% | F-measure/% |
---|---|---|---|
ResNet-18 | 84.7 | 77.0 | 80.6 |
ResNet-18 + Dis | 86.5 | 80.6 | 83.5 |
ResNet-18 + Dis + Bor | 88.1 | 79.9 | 83.8 |
ResNet-50 | 90.5 | 77.9 | 83.7 |
ResNet-50 + Dis | 90.9 | 80.6 | 85.4 |
ResNet-50 + Dis + Bor | 93.8 | 81.7 | 87.3 |
Method | Precision/% | Recall/% | F-measure/% |
---|---|---|---|
TextSnake[ | 82.7 | 74.5 | 78.4 |
ATRR[ | 80.9 | 76.2 | 78.5 |
Mask TextSpotter[ | 82.5 | 75.6 | 78.6 |
TextField[ | 81.2 | 79.9 | 80.6 |
LOMO*[ | 87.6 | 79.3 | 83.3 |
CRAFT[ | 87.6 | 79.9 | 83.6 |
CSE[ | 81.4 | 79.1 | 80.2 |
PSENet-1s[ | 84.0 | 78.0 | 80.9 |
TextFuseNet-ResNet-50[ | 83.2 | 87.5 | 85.3 |
DB-ResNet-50 (800)[ | 87.1 | 82.5 | 84.7 |
Ours-ResNet-50 (800) | 89.1 | 82.4 | 85.6 |
Table 2 Detection results on Total-Text dataset
Method | Precision/% | Recall/% | F-measure/% |
---|---|---|---|
TextSnake[ | 82.7 | 74.5 | 78.4 |
ATRR[ | 80.9 | 76.2 | 78.5 |
Mask TextSpotter[ | 82.5 | 75.6 | 78.6 |
TextField[ | 81.2 | 79.9 | 80.6 |
LOMO*[ | 87.6 | 79.3 | 83.3 |
CRAFT[ | 87.6 | 79.9 | 83.6 |
CSE[ | 81.4 | 79.1 | 80.2 |
PSENet-1s[ | 84.0 | 78.0 | 80.9 |
TextFuseNet-ResNet-50[ | 83.2 | 87.5 | 85.3 |
DB-ResNet-50 (800)[ | 87.1 | 82.5 | 84.7 |
Ours-ResNet-50 (800) | 89.1 | 82.4 | 85.6 |
Method | Precision/% | Recall/% | F-measure/% |
---|---|---|---|
Text-CNN[ | 71 | 61 | 69 |
DeepReg[ | 77 | 70 | 74 |
RRPN[ | 82 | 68 | 74 |
RRD[ | 87 | 73 | 79 |
MCN[ | 88 | 79 | 83 |
PixelLink[ | 83 | 73.2 | 77.8 |
Corner[ | 87.6 | 76.2 | 81.5 |
TextSnake[ | 83.2 | 73.9 | 78.3 |
Scene text detection with bootstrapping and semantics-aware text border techniques[ | 83.0 | 77.4 | 80.1 |
MSR[ | 87.4 | 76.7 | 81.7 |
CRAFT[ | 88.2 | 78.2 | 82.9 |
SAE[ | 84.2 | 81.7 | 82.9 |
DB-ResNet-50 (736)[ | 91.5 | 79.2 | 84.9 |
An accurate segmentation-based detector[ | 88.8 | 83.5 | 86.1 |
Ours-ResNet-50 (736) | 93.8 | 81.7 | 87.3 |
Table 3 Detection results on MSRA-TD500 dataset
Method | Precision/% | Recall/% | F-measure/% |
---|---|---|---|
Text-CNN[ | 71 | 61 | 69 |
DeepReg[ | 77 | 70 | 74 |
RRPN[ | 82 | 68 | 74 |
RRD[ | 87 | 73 | 79 |
MCN[ | 88 | 79 | 83 |
PixelLink[ | 83 | 73.2 | 77.8 |
Corner[ | 87.6 | 76.2 | 81.5 |
TextSnake[ | 83.2 | 73.9 | 78.3 |
Scene text detection with bootstrapping and semantics-aware text border techniques[ | 83.0 | 77.4 | 80.1 |
MSR[ | 87.4 | 76.7 | 81.7 |
CRAFT[ | 88.2 | 78.2 | 82.9 |
SAE[ | 84.2 | 81.7 | 82.9 |
DB-ResNet-50 (736)[ | 91.5 | 79.2 | 84.9 |
An accurate segmentation-based detector[ | 88.8 | 83.5 | 86.1 |
Ours-ResNet-50 (736) | 93.8 | 81.7 | 87.3 |
Method | Precision/% | Recall/% | F-measure/% |
---|---|---|---|
CTPN[ | 74.0 | 52.0 | 61.0 |
Corner[ | 94.1 | 70.7 | 80.7 |
PSENet-1s[ | 86.9 | 84.5 | 85.7 |
TextBoxes++[ | 87.8 | 78.5 | 82.9 |
PixelLink[ | 85.5 | 82.0 | 83.7 |
LOMO*[ | 91.3 | 83.5 | 87.2 |
An accurate segmentation-based detector[ | 90.0 | 85.1 | 87.5 |
CRAFTS[ | 89.0 | 85.3 | 87.1 |
DB-Resnet50 (1 152)[ | 91.8 | 83.2 | 87.3 |
An end-to-end trainable network (ResNet50)[ | 89.3 | 85.7 | 87.5 |
Ours-ResNet50 (1 152) | 92.4 | 83.8 | 87.8 |
Table 4 Detection results on the ICDAR-2015 dataset.
Method | Precision/% | Recall/% | F-measure/% |
---|---|---|---|
CTPN[ | 74.0 | 52.0 | 61.0 |
Corner[ | 94.1 | 70.7 | 80.7 |
PSENet-1s[ | 86.9 | 84.5 | 85.7 |
TextBoxes++[ | 87.8 | 78.5 | 82.9 |
PixelLink[ | 85.5 | 82.0 | 83.7 |
LOMO*[ | 91.3 | 83.5 | 87.2 |
An accurate segmentation-based detector[ | 90.0 | 85.1 | 87.5 |
CRAFTS[ | 89.0 | 85.3 | 87.1 |
DB-Resnet50 (1 152)[ | 91.8 | 83.2 | 87.3 |
An end-to-end trainable network (ResNet50)[ | 89.3 | 85.7 | 87.5 |
Ours-ResNet50 (1 152) | 92.4 | 83.8 | 87.8 |
1 |
LIAO M H, ZOU Z S, WAN Z Y, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion [J]. IEEE transactions on pattern analysis and machine intelligence, 2022, early access. DOI: 10.1109/TPAMI.2022.3155612
DOI |
2 | WANG W H, XIE E Z, LI X, et al. Shape robust text detection with progressive scale expansion network [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 9336–9345 |
3 |
XU N, LIU Y P, GENG X. Label enhancement for label distribution learning [J]. IEEE transactions on knowledge and data engineering, 2021, 33(4): 1632–1643. DOI: 10.1109/TKDE.2019.294704020
DOI |
4 |
GENG X, YIN C, ZHOU Z H. Facial age estimation by learning from label distributions [J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(10): 2401–2412. DOI: 10.1109/TPAMI.2013.51
DOI |
5 |
TIAN Z, HUANG W L, HE T, et al. Detecting text in natural image with connectionist text proposal network [C]//European Conference on Computer Vision. Springer, 2016: 56–72. DOI: 10.1007/978-3-319-46484-8_4
DOI |
6 |
LIAO M, SHI B, BAI X. Textboxes: a fast text detector with a single deep neural network [C]//Thirty-First AAAI Conference on Artificial Intelligence. AAAI, 2017. DOI: 10.1609/aaai.v31i1.11196
DOI |
7 |
REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137–1149. DOI: 10.1109/TPAMI.2016.2577031
DOI |
8 |
MA J Q, SHAO W Y, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals [J]. IEEE transactions on multimedia, 2018, 20(11): 3111–3122. DOI: 10.1109/TMM.2018.2818020
DOI |
9 |
LIU Y L, JIN L W. Deep matching prior network: toward tighter multi-oriented text detection [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 3454–3461. DOI: 10.1109/CVPR.2017.368
DOI |
10 |
LIAO M H, SHI B G, BAI X. TextBoxes++: a single-shot oriented scene text detector [J]. IEEE transactions on image processing, 2018, 27(8): 3676–3690. DOI: 10.1109/TIP.2018.2825107
DOI |
11 |
ZHOU X, YAO C, WEN H, et al. East: an efficient and accurate scene text detector [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2642–2651. DOI: 10.1109/CVPR.2017.283
DOI |
12 |
HE W H, ZHANG X Y, YIN F, et al. Deep direct regression for multi-oriented scene text detection [C]//IEEE International Conference on Computer Vision. IEEE, 2017: 745–753. DOI: 10.1109/ICCV.2017.87
DOI |
13 |
LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015: 3431–3440. DOI: 10.1109/CVPR.2015.7298965
DOI |
14 |
ZHANG Z, ZHANG C, SHEN W, et al. Multi-oriented text detection with fully convolutional networks [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016, pp. 4159–4167. DOI: 10.1109/CVPR.2016.451
DOI |
15 | YAO C, BAI X, SANG N, et al. Scene text detection via holistic, multi-channel prediction [EB/OL]. (2016-07-05)[2021-06-01]. |
16 |
DENG D, LIU H F, LI X L, et al. Pixellink: detecting scene text via instance segmentation [C]//Thirty-Second AAAI Conference on Artificial Intelligence. AAAI, 2018. DOI: 10.1609/aaai.v32i1.12269
DOI |
17 |
GAO B-B, XING C, XIE C-W, et al. Deep label distribution learning with label ambiguity [J]. IEEE transactions on image processing, 2017, 26(6): 2825–2838. DOI: 10.1109/TIP.2017.2689998
DOI |
18 |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 770–778. DOI: 10.1109/CVPR.2016.90
DOI |
19 |
NAYEF N, YIN F, BIZID I, et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT [C]//14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IAPR, 2017: 1454–1459. DOI: 10.1109/ICDAR.2017.237
DOI |
20 |
CHNG C K, CHAN C S. Total-text: a comprehensive dataset for scene text detection and recognition [C]//14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IAPR, 2017: 935–942. DOI: 10.1109/ICDAR.2017.157
DOI |
21 |
YAO C, BAI X, LIU W Y, et al. Detecting texts of arbitrary orientations in natural images [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2012: 1083–1090. DOI: 10.1109/CVPR.2012.6247787
DOI |
22 |
KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading [C]//13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2015: 56–1160. DOI: 10.1109/ICDAR.2015.7333942
DOI |
23 |
LYU P Y, YAO C, WU W H, et al. Multi-oriented scene text detection via corner localization and region segmentation [C]//IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2018: 7553–7563. DOI: 10.1109/CVPR.2018.00788
DOI |
24 |
LONG S B, RUAN J Q, ZHANG W J, et al. TextSnake: a flexible representation for detecting text of arbitrary shapes [C]//European Conference on Computer Vision. Springer, 2018: 20–36. DOI: 10.1007/978-3-030-01216-8_2
DOI |
25 |
WANG X B, JIANG Y Y, LUO Z B, et al. Arbitrary shape scene text detection with adaptive text region representation [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 6442–6451. DOI: 10.1109/CVPR.2019.00661
DOI |
26 |
LYU P Y, LIAO M H, YAO C, et al. Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes [C]//European Conference on Computer Vision. Springer, 2018: 67–83. DOI: 10.1007/978-3-030-01264-9_5
DOI |
27 |
XU Y C, WANG Y K, ZHOU W, et al. TextField: learning a deep direction field for irregular scene text detection [J]. IEEE transactions on image processing, 2019, 28(11): 5566–5579. DOI: 10.1109/TIP.2019.2900589
DOI |
28 |
ZHANG C Q, LIANG B R, HUANG Z M, et al. Look more than once: an accurate detector for text of arbitrary shapes [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 10544–10553. DOI: 10.1109/CVPR.2019.01080
DOI |
29 |
BAEK Y, LEE B, HAN D, et al. Character region awareness for text detection [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 9357–9366. DOI: 10.1109/CVPR.2019.00959
DOI |
30 |
LIU Z C, LIN G S, YANG S, et al. Towards robust curve text detection with conditional spatial expansion [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 7261–7270. DOI: 10.1109/CVPR.2019.00744
DOI |
31 |
YE J, CHEN Z, LIU J H, et al. Textfusenet: scene text detection with richer fused features [C]//Twenty-Ninth International Joint Conference on Artificial Intelligence. IJCAI, 2020: 516–522. DOI: 10.24963/ijcai.2020/72
DOI |
32 |
HE T, HUANG W L, QIAO Y, et al. Text-attentional convolutional neural network for scene text detection [J]. IEEE transactions on image processing, 2016, 25(6): 2529–2541. DOI: 10.1109/TIP.2016.2547588
DOI |
33 |
LIAO M H, ZHU Z, SHI B G, et al. Rotation-sensitive regression for oriented scene text detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 5909–5918. DOI: 10.1109/CVPR.2018.00619
DOI |
34 |
LIU Z, LIN G, YANG S, et al. Learning markov clustering networks for scene text detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018. DOI: 10.1109/CVPR.2018.00725
DOI |
35 |
XUE C H, LU S J, ZHAN F N. Accurate scene text detection through border semantics awareness and bootstrapping [C]//European Conference on Computer Vision. Springer, 2018: 355–372. DOI: 10.1007/978-3-030-01270-0_22
DOI |
36 | XUE C H, LU S J, ZHANG W. MSR: multi-scale shape regression for scene text detection [EB/OL]. (2019-01-09)[2021-06-01]. |
37 |
TIAN Z T, SHU M, LYU P Y, et al. Learning shape-aware embedding for scene text detection [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 4229–4238. DOI: 10.1109/CVPR.2019.00436
DOI |
38 |
LIU X, ZHOU G J, ZHANG R, et al. An accurate segmentation-based scene text detector with context attention and repulsive text border [C]//IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2020: 2344–2352. DOI: 10.1109/CVPRW50498.2020.00283
DOI |
39 |
BAEK Y, SHIN S, BAEK J, et al. Character region attention for text spotting [C]//European Conference on Computer Vision. Springer, 2020: 504–521. DOI: 10.1007/978-3-030-58526-6_30
DOI |
40 |
QIN S, BISSACCO A, RAPTIS M, et al. Towards unconstrained end-to-end text spotting [C]//IEEE International Conference on Computer Vision. IEEE, 2019: 4704–4714. DOI: 10.1109/ICCV.2019.00480
DOI |
No related articles found! |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||