ZTE Communications ›› 2020, Vol. 18 ›› Issue (2): 74-82.DOI: 10.12142/ZTECOM.202002009

• Research Paper • Previous Articles    

Crowd Counting for Real Monitoring Scene

LI Yiming1, LI Weihua2, SHEN Zan3, NI Bingbing1   

  1. 1.Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
    2.Video Production Line, ZTE Corporation, Chongqing 401121, China
    3.Institute of Technology, Ping An Technology (Shenzhen) Co. , Ltd. , Shanghai 200120, China
  • Received:2019-01-17 Online:2020-06-25 Published:2020-08-07
  • About author:LI Yiming received the B.S. degree in information engineering from Shanghai Jiao Tong University, China in 2018. From 2018 to the present, he is pursuing his M.S. degree at the Institute of Image Communications and Network Engineering of Shanghai Jiao Tong University. His research interests include crowd counting in dense scenes and the image enhancement, segmentation and texture recognition technologies in materials science.|LI Weihua received the B.S. degree in information engineering from Southwest University, China in 1996. He is currently responsible for the VSS product planning at ZTE Corporation.|SHEN Zan received the B.S. and M.S. degrees in electronics and information engineering from Shanghai Jiao Tong University, China in 2016 and 2019 respectively. He once participated in the internship of Tencent Youtu Lab in 2018. After graduation, he works at Ping An Technology (Shenzhen) Co, Ltd. His research interests include but not limited to deep learning, computer vision, and machine learning. He has published one technical paper in IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  • Supported by:
    ZTE Industry?University?Institute Cooperation Funds

Abstract:

Crowd counting is a challenging task in computer vision as realistic scenes are always filled with unfavourable factors such as severe occlusions, perspective distortions and diverse distributions. Recent state-of-the-art methods based on convolutional neural network (CNN) weaken these factors via multi-scale feature fusion or optimal feature selection through a front switch-net. L2 regression is used to regress the density map of the crowd, which is known to lead to an average and blurry result, and affects the accuracy of crowd count and position distribution. To tackle these problems, we take full advantage of the application of generative adversarial networks (GANs) in image generation and propose a novel crowd counting model based on conditional GANs to predict high-quality density maps from crowd images. Furthermore, we innovatively put forward a new regularizer so as to help boost the accuracy of processing extremely crowded scenes. Extensive experiments on four major crowd counting datasets are conducted to demonstrate the better performance of the proposed approach compared with recent state-of-the-art methods.

Key words: crowd counting, density, generative adversarial network