ZTE Communications ›› 2022, Vol. 20 ›› Issue (4): 89-95.DOI: 10.12142/ZTECOM.202204011

• Research Paper • Previous Articles     Next Articles

Label Enhancement for Scene Text Detection

MEI Junjun1,2, GUAN Tao1,2(), TONG Junwen1,2   

  1. 1.State Key Laboratory of Mobile Network and Mobile Multimedia Technology, Shenzhen 518055, China
    2.ZTE Corporation, Shenzhen 518057, China
  • Received:2021-12-28 Online:2022-12-31 Published:2022-12-30
  • About author:MEI Junjun is a chief R&D engineer of ZTE Corporation in the field of audio and video, engaged in the research of the overall architecture of the integrated video cloud network and key technologies such as computer vision, audio and video coding, and audio and video transmission. He has presided over the R&D and design of a number of system solutions.|GUAN Tao (guan.tao@zte.com.cn) is the senior system architect of ZTE Corporation, mainly engaged in the architecture design and algorithm research of video systems and industrial digital systems. He has participated in standard organizations, initiated and compiled the formulation of a number of communication standards, and applied for more than 20 national invention patents.|TONG Junwen received his BE and ME degrees in control science and engineering from Nanjing University, China in 2017 and 2020, respectively. He now works with ZTE Corporation. His current research interests include object detection, semantic segmentation and optical character recognition in industrial scenarios.
  • Supported by:
    ZTE Industry?University?Institute Cooperation Funds(HC?CN?20200717012)


Segmentation-based scene text detection has drawn a great deal of attention, as it can describe the text instance with arbitrary shapes based on its pixel-level prediction. However, most segmentation-based methods suffer from complex post-processing to separate the text instances which are close to each other, resulting in considerable time consumption during the inference procedure. A label enhancement method is proposed to construct two kinds of training labels for segmentation-based scene text detection in this paper. The label distribution learning (LDL) method is used to overcome the problem brought by pure shrunk text labels that might result in sub-optimal detection performance. The experimental results on three benchmarks demonstrate that the proposed method can consistently improve the performance without sacrificing inference speed.

Key words: label enhancement, scene text detection, semantic segmentation