ZTE Communications ›› 2020, Vol. 18 ›› Issue (2): 74-82.DOI: 10.12142/ZTECOM.202002009
• Research Paper • Previous Articles
LI Yiming1, LI Weihua2, SHEN Zan3, NI Bingbing1
Received:
2019-01-17
Online:
2020-06-25
Published:
2020-08-07
About author:
LI Yiming received the B.S. degree in information engineering from Shanghai Jiao Tong University, China in 2018. From 2018 to the present, he is pursuing his M.S. degree at the Institute of Image Communications and Network Engineering of Shanghai Jiao Tong University. His research interests include crowd counting in dense scenes and the image enhancement, segmentation and texture recognition technologies in materials science.|LI Weihua received the B.S. degree in information engineering from Southwest University, China in 1996. He is currently responsible for the VSS product planning at ZTE Corporation.|SHEN Zan received the B.S. and M.S. degrees in electronics and information engineering from Shanghai Jiao Tong University, China in 2016 and 2019 respectively. He once participated in the internship of Tencent Youtu Lab in 2018. After graduation, he works at Ping An Technology (Shenzhen) Co, Ltd. His research interests include but not limited to deep learning, computer vision, and machine learning. He has published one technical paper in IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Supported by:
LI Yiming, LI Weihua, SHEN Zan, NI Bingbing. Crowd Counting for Real Monitoring Scene[J]. ZTE Communications, 2020, 18(2): 74-82.
Add to citation manager EndNote|Ris|BibTeX
URL: https://zte.magtechjournal.com/EN/10.12142/ZTECOM.202002009
Figure 2 Architecture of the proposed Crowd Counting Network for Real Monitoring Scene (RMSN): The top level is the structure of generator Glarge, the middle part is the structure of generator Gsmall, and the bottom part is the discriminators Dlarge and Dsmall that have the same structure.
Part A | Part B | WorldExpo’10 | |||
---|---|---|---|---|---|
Objective | MAE | MSE | MAE | MSE | AMAE |
95.8 | 149.4 | 24.1 | 36.4 | 9.95 | |
83.2 | 131.3 | 18.4 | 28.8 | 8.48 | |
75.7 | 102.7 | 17.2 | 27.4 | 7.5 |
Table 1 Comparisons of errors for training with different losses
Part A | Part B | WorldExpo’10 | |||
---|---|---|---|---|---|
Objective | MAE | MSE | MAE | MSE | AMAE |
95.8 | 149.4 | 24.1 | 36.4 | 9.95 | |
83.2 | 131.3 | 18.4 | 28.8 | 8.48 | |
75.7 | 102.7 | 17.2 | 27.4 | 7.5 |
Part A | Part B | |||
---|---|---|---|---|
Methods | MAE | MSE | MAE | MSE |
The approach in Ref. [ | 181.8 | 277.7 | 32.0 | 49.8 |
MCNN[ | 110.2 | 173.2 | 26.4 | 41.3 |
Switch-CNN[ | 90.4 | 135.0 | 21.6 | 33.4 |
The proposed RMSN | 86.2 | 145.4 | 17.2 | 27.4 |
Table 2 Comparison of RMSN with other three state-of-the-art CNN-based methods on ShanghaiTech dataset
Part A | Part B | |||
---|---|---|---|---|
Methods | MAE | MSE | MAE | MSE |
The approach in Ref. [ | 181.8 | 277.7 | 32.0 | 49.8 |
MCNN[ | 110.2 | 173.2 | 26.4 | 41.3 |
Switch-CNN[ | 90.4 | 135.0 | 21.6 | 33.4 |
The proposed RMSN | 86.2 | 145.4 | 17.2 | 27.4 |
Figure 4 Two test images sampled from the ShanghaiTech Part A dataset (From left to right, the four columns successively denote test images, ground-truth density maps, our estimated density maps and the multi-column convolutional neural network (MCNN)’s[1] respectively).
Methods | Scene 1 | Scene 2 | Scene 3 | Scene 4 | Scene 5 | Average |
---|---|---|---|---|---|---|
The approach in Ref. [ | 9.8 | 14.1 | 14.3 | 22.2 | 3.7 | 12.9 |
MCNN[ | 3.4 | 20.6 | 12.9 | 13.0 | 8.1 | 11.6 |
Switch-CNN[ | 4.4 | 15.7 | 10.0 | 11.0 | 5.9 | 9.4 |
CP-CNN[ | 2.9 | 14.7 | 10.5 | 10.4 | 5.8 | 8.9 |
The proposed RMSN | 4.1 | 14.05 | 9.6 | 11.8 | 2.9 | 8.49 |
Table 3 Comparison of RMSN with other four state-of-the-art CNN-based methods on the WorldExpo’10 dataset
Methods | Scene 1 | Scene 2 | Scene 3 | Scene 4 | Scene 5 | Average |
---|---|---|---|---|---|---|
The approach in Ref. [ | 9.8 | 14.1 | 14.3 | 22.2 | 3.7 | 12.9 |
MCNN[ | 3.4 | 20.6 | 12.9 | 13.0 | 8.1 | 11.6 |
Switch-CNN[ | 4.4 | 15.7 | 10.0 | 11.0 | 5.9 | 9.4 |
CP-CNN[ | 2.9 | 14.7 | 10.5 | 10.4 | 5.8 | 8.9 |
The proposed RMSN | 4.1 | 14.05 | 9.6 | 11.8 | 2.9 | 8.49 |
Methods | MAE | MSE |
---|---|---|
The approach in Ref. [ | 419.5 | 541.6 |
The approach in Ref. [ | 467.0 | 498.5 |
MCNN[ | 377.6 | 509.1 |
Switch-CNN[ | 318.1 | 439.2 |
CP-CNN[ | 295.8 | 320.9 |
The proposed RMSN | 291.0 | 404.6 |
Table 4 Comparative results on the UCF_CC_50 dataset
Methods | MAE | MSE |
---|---|---|
The approach in Ref. [ | 419.5 | 541.6 |
The approach in Ref. [ | 467.0 | 498.5 |
MCNN[ | 377.6 | 509.1 |
Switch-CNN[ | 318.1 | 439.2 |
CP-CNN[ | 295.8 | 320.9 |
The proposed RMSN | 291.0 | 404.6 |
Methods | MAE | MSE |
---|---|---|
Kernel Ridge Regression[ | 2.16 | 7.45 |
Cumulative Attribute Regression[ | 2.07 | 6.86 |
The approach in Ref. [ | 1.60 | 3.31 |
Switch-CNN[ | 1.62 | 2.10 |
The proposed RMSN | 1.47 | 1.98 |
Table 5 Comparative results on the UCSD dataset
Methods | MAE | MSE |
---|---|---|
Kernel Ridge Regression[ | 2.16 | 7.45 |
Cumulative Attribute Regression[ | 2.07 | 6.86 |
The approach in Ref. [ | 1.60 | 3.31 |
Switch-CNN[ | 1.62 | 2.10 |
The proposed RMSN | 1.47 | 1.98 |
Figure 5 One test video information sampled from the UCSD dataset (from left to right and top to bottom, the four images successively denote real time source, density map, velocity map and retention map respectively).
1 |
ZHANG Y Y, ZHOU D S, CHEN S Q, et al. Single⁃image crowd counting via multi⁃column convolutional neural network [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA, 2016: 589–597. DOI: 10.1109/cvpr.2016.70
DOI |
2 |
SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA, 2017. DOI:10.1109/cvpr.2017.429
DOI |
3 |
ZHANG C, LI H, WANG X, et al. Cross⁃scene crowd counting via deep convolutional neural networks [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA, 2015: 833–841. DOI:10.1109/cvpr.2015.7298684
DOI |
4 | GOODFELLOW I J, POUGET⁃ABADIE J, MIRZA M, et al. Generative adversarial networks [J]. Advances in neural information processing systems, 2014(3): 2672–2680 |
5 | MIRZA M, OSINDERO S. Conditional generative adversarial nets [EB/OL]. (2014⁃11⁃06) [2018⁃10⁃12]. |
6 | CHEN X, DUAN Y, HOUTHOOFT R, et al. InfoGAN: interpretable representation learning by information maximizing generative adversarial nets [C]//Conference and Workshop on Neural Information Processing Systems. Barcelona, Spain, 2016 |
7 | ISOLA P, ZHU J⁃Y, ZHOU T, et al. Image⁃to⁃image translation with conditional adversarial networks [EB/OL]. (2016⁃09⁃21) [2018⁃10⁃12]. |
8 |
RONNEBERGER O, FISCHER P, BROX T. U⁃Net: convolutional networks for biomedical image segmentation [C]//International Conference on Medical Image Computing and Computer⁃Assisted Intervention. Munich, Germany, 2015: 234–241. DOI: 10.1007/978⁃3⁃319⁃24574⁃4_28
DOI |
9 |
LIN Z L, DAVIS L S. Shape⁃based human detection and segmentation via hierarchical part⁃template matching [J]. IEEE transactions on pattern analysis and machine intelligence, 2010, 32(4): 604–618. DOI: 10.1109/tpami.2009.204
DOI |
10 |
WANG M, WANG X. Automatic adaptation of a generic pedestrian detector to a specific traffic scene [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Colorado Springs, USA, 2011. DOI: 10.1109/cvpr.2011.5995698
DOI |
11 |
WU B, NEVATIA R. Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors [C]//Tenth IEEE International Conference on Computer Vision (ICCV’05). Beijing, China, 2005: 90–97. DOI: 10.1109/iccv.2005.74
DOI |
12 |
AN S J, LIU W Q, VENKATESH S. Face recognition using kernel ridge regression [C]//IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA, 2007: 1110⁃1116. DOI:10.1109/cvpr.2007.383105
DOI |
13 |
CHANA B, LIANG Z⁃S J, VASCONCELOS N. Privacy preserving crowd monitoring: counting people without people models or tracking [C]//IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA, 2008: 1–7. DOI: 10.1109/cvpr.2008.4587569
DOI |
14 |
CHEN K, LOY C C, GONG S, et al. Feature mining for localised crowd counting [C]//British Machine Vision Conference. Surrey, UK, 2012. DOI: 10.5244/c.26.21
DOI |
15 |
KONG D, GRAY D, TAO H. A viewpoint invariant approach for crowd counting [C]//International Conference on Pattern Recognition. Hong Kong, China, 2006. DOI: 10.1109/icpr.2006.197
DOI |
16 | BANSAL A, VENKATESH K S. People counting in high density crowds from still images [EB/OL]. (2015⁃07⁃30) [2018⁃10⁃12]. |
17 |
RABAUD V, BELONGIE S J. Counting crowded moving objects [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). New York, USA, 2006: 705–711. DOI: 10.1109/cvpr.2006.92
DOI |
18 |
BROSTOW G J, CIPOLLA R. Unsupervised bayesian detection of independent motion in crowds [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, USA, 2006: 594–601. DOI: 10.1109/cvpr.2006.320
DOI |
19 |
WANG C, ZHANG H, YANG L, et al. Deep people counting in extremely dense crowds [C]//ACM International Conference on Multimedia. Brisbane, Australia, 2015. DOI: 10.1145/2733373.2806337
DOI |
20 |
BOOMINATHAN L, KRUTHIVENTI S S, BABU R V. CrowdNet: a deep convolutional network for dense crowd counting [C]//ACM Conference on Multimedia. Vienna, Austria, 2016. DOI: 10.1145/2964284.2967300
DOI |
21 | SINDAGI V A, PATEL V M. Generating high⁃quality crowd density maps using contextual pyramid CNNs [C]//IEEE International Conference on Computer Vision. Venice, Italy, 2017 |
22 |
LI C, WAND M. Precomputed real⁃time texture synthesis with markovian generative adversarial networks [C]//European Conference on Computer Vision. Amsterdam, Netherland, 2016: 702–716. DOI: 10.1007/978⁃3⁃319⁃46487⁃9_43
DOI |
23 |
PATHAK D, KRAHENBUHL P, DONAHUE J, et al. Context encoders: feature learning by inpainting [C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016: 2536–2544. DOI: 10.1109/cvpr.2016.278
DOI |
24 |
JOHNSON J, ALAHI A, LI F F. Perceptual losses for real⁃time style transfer and super⁃resolution [C]//European Conference on Computer Vision. Amsterdam, Netherland, 2016: 694–711. DOI: 10.1007/978⁃3⁃319⁃46475⁃6_43
DOI |
25 |
HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks [J]. Science, 2006, 313(5786): 504–507. DOI: 10.1126/science.1127647
DOI |
26 | ZHANG H, SINDAGI V, PATEL V M. Image de⁃raining using a conditional generative adversarial network [EB/OL]. (2017⁃01⁃21) [2018⁃10⁃12]. |
27 |
SHEN Z, XU Y, NI B B, et al. Crowd counting via adversarial cross scale consistency pursuit [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018: 5245–5254. DOI: 10.1109/cvpr.2018.00550
DOI |
28 |
IDREES H, SALEEMI I, SEIBERT C, et al. Multi⁃source multi⁃scale counting in extremely dense crowd images [C]//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 2547–2554. DOI: 10.1109/cvpr.2013.329
DOI |
[1] | LU Jianguo, ZHENG Qingfang. Ultra-Lightweight Face Animation Method for Ultra-Low Bitrate Video Conferencing [J]. ZTE Communications, 2023, 21(1): 64-71. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||