ZTE Communications ›› 2019, Vol. 17 ›› Issue (3): 15-22.DOI: 10.12142/ZTECOM.201903004
收稿日期:
2019-06-11
出版日期:
2019-09-29
发布日期:
2019-12-06
GUO Da1,2, ZHENG Qingfang3,4, PENG Xiaojiang1,2, LIU Ming3,4
Received:
2019-06-11
Online:
2019-09-29
Published:
2019-12-06
About author:
GUO Da received the B.Eng. from the Computer Engineering College, JiMei University, China in 2018. He is currently a master student at the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China. His research direction is face detection and recognition based on deep learning.|ZHENG Qingfang received the B.S. degree in civil engineering from Shanghai Jiaotong University, China in 2002 and Ph.D. degree in computer science from Institute of Computing Technology, Chinese Academy of Science, China in 2008. He is currently the chief scientist of video technology with ZTE Corporation. His research interests include computer vision, multimedia retrieval, image/video processing, with a special focus on low power embedded application and large-scale cloud application.|PENG Xiaojiang (xj.peng@siat.ac.cn) received his Ph.D. from School of Information Science and Technology from Southwest Jiaotong University, China in 2014. He currently is an associate professor at the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China. He was a postdoctoral researcher at Idiap Institute, Switzerland from 2016 to 2017, and was a postdoctoral researcher in LEAR Team, INRIA, France, working with Prof. Cordelia Schmid from 2015 to 2016. He serves as a reviewer for IJCV, TMM, TIP, CVPR, ICCV, AAAI, IJCAI, FG, Image and Vision Computing, IEEE Signal Processing Letter, Neurocomputing, etc. His research focus is in the areas of action recognition and detection, face recognition, facial emotion analysis, and deep learning.|LIU Ming received the M.Sc. degree from Harbin Engineering University, China in 2011. He is currently a senior engineer with ZTE Corporation. His research interests include object detection, tracking and recognition.
. [J]. ZTE Communications, 2019, 17(3): 15-22.
GUO Da, ZHENG Qingfang, PENG Xiaojiang, LIU Ming. Face Detection, Alignment, Quality Assessment and Attribute Analysis with Multi-Task Hybrid Convolutional Neural Networks[J]. ZTE Communications, 2019, 17(3): 15-22.
Figure 1. The pipeline of the proposed Multi-Task Hybrid Convolutional Neural Network (MHCNN). It consists of an SSD-based face detector for high-accuracy detection performance and a T-CNN for detection refinement and multi-task face analysis.
Methods | Recall |
---|---|
Cascade CNN [ | 85.67% |
ACF-multiscale [ | 86.08% |
YAN et al. [ | 86.15% |
Faster R-CNN [ | 96.10% |
S3FD [ | 98.37% |
MHCNN | 98.66% |
Table 1 Comparison of our MHCNN on FDDB
Methods | Recall |
---|---|
Cascade CNN [ | 85.67% |
ACF-multiscale [ | 86.08% |
YAN et al. [ | 86.15% |
Faster R-CNN [ | 96.10% |
S3FD [ | 98.37% |
MHCNN | 98.66% |
Methods (task setting) | Accuracy of Sunglasses (Threshold = 0.5) | Accuracy of Mask (Threshold = 0.5) |
---|---|---|
T-CNN (sunglasses) | 76.14% | ---- |
T-CNN (sunglasses + landmarks) | 76.57% | ---- |
T-CNN (masks) | ---- | 83.30% |
T-CNN (masks + landmarks) | ---- | 85.90% |
T-CNN (sunglasses + masks + landmarks) | 98.70% | 99.35% |
Table 2 Ablation study of T-CNN on the FaceA dataset
Methods (task setting) | Accuracy of Sunglasses (Threshold = 0.5) | Accuracy of Mask (Threshold = 0.5) |
---|---|---|
T-CNN (sunglasses) | 76.14% | ---- |
T-CNN (sunglasses + landmarks) | 76.57% | ---- |
T-CNN (masks) | ---- | 83.30% |
T-CNN (masks + landmarks) | ---- | 85.90% |
T-CNN (sunglasses + masks + landmarks) | 98.70% | 99.35% |
Methods | Accuracy of Face Quality (Best threshold) |
---|---|
LBP+SVM | 78.52% |
T-CNN | 81.86% |
Table 3 Evaluation on the FaceQ dataset
Methods | Accuracy of Face Quality (Best threshold) |
---|---|
LBP+SVM | 78.52% |
T-CNN | 81.86% |
Figure 8. False positives on FaceQ test set. The faces with higher scores are predicted to the high-quality class and those with lower scores are predicted to the low-quality class.
[1] | VIOLA P, JONES M J . Robust Real-Time Face Detection[J]. International Journal of Computer Vision, 2004,57(2):137-154. DOI: 10.1023/B:VISI.0000013087.49260.fb |
[2] | MATHIAS M, BENENSON R, PEDERSOLI M , et al. Face Detection Without Bells and Whistles [C]//European Conference on Computer Vision. Zurich, Switzerland, 2014: 720-735. DOI: 10.1007/978-3-319-10593-2_47 |
[3] | ZHU C, ZHENG Y, LUU K , et al. CMS-RCNN: Contextual Multi-Scale Region-Based CNN for Unconstrained Face Detection[M]. Deep Learning for Biometrics. Cham, Switzerland: Springer, 2017: 57-79 |
[4] | JIANG H, LEARNED-MILLER E . Face Detection with the Faster R-CNN [C]//12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). Washington DC, USA, 2017: 650-657. DOI: 10.1109/FG.2017.82 |
[5] | LI H, LIN Z, SHEN X , et al. A Convolutional Neural Network Cascade for Face Detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 5325-5334. DOI: 10.1109/CVPR.2015.7299170 |
[6] | YANG S, LUO P, LOY C C , et al. From Facial Parts Responses to Face Detection: A Deep Learning Approach [C]//IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 3676-3684. DOI: 10.1109/ICCV.2015.419 |
[7] | ZHANG K, ZHANG Z, LI Z , et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks[J]. IEEE Signal Processing Letters, 2016,23(10):1499-1503. DOI: 10.1109/LSP.2016.2603342 |
[8] | ZHANG S, ZHU X, LEI Z , et al. Faceboxes: A CPU Real-Time Face Detector with High Accuracy [C]//2017 IEEE International Joint Conference on Biometrics (IJCB). Denver, Colorado, USA, 2017: 1-9. DOI: 10.1109/BTAS.2017.8272675 |
[9] | NAJIBI M, SAMANGOUEI P, Chellappa R , et al. SSH: Single Stage Headless Face Detector [C]//IEEE International Conference on Computer Vision. Venice, Italy, 2017: 4875-4884. DOI: 10.1109/ICCV.2017.522 |
[10] | LIU W, ANGUELOV D, ERHAN D , et al. SSD: Single Shot Multibox Detector [C]//European Conference on Computer Vision. Amsterdam, The Netherlands, 2016: 21-37. DOI: 10.1007/978-3-319-46448-0_2 |
[11] | REN S, HE K, GIRSHICK R , et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [C]//Advances in Neural Information Processing Systems. Montreal, Canada, 2015: 91-99. DOI: 10.1109/TPAMI.2016.2577031 |
[12] | ZHANG S, ZHU X, LEI Z , et al. S 3FD: Single Shot Scale-Invariant Face Detector [C]//IEEE International Conference on Computer Vision. Venice, Italy, 2017: 192-201. DOI: 10.1109/ICCV.2017.30 |
[13] | GLOROT X, BENGIO Y . Understanding the Difficulty of Training Deep Feedforward Neural Networks [C]//13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy, 2010: 249-256. |
[14] | SUN Y, WANG X, TANG X . Deep Convolutional Network Cascade for Facial Point Detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 3476-3483. DOI: 10.1109/CVPR.2013.446 |
[15] | ZHU X, LEI Z, LIU X , et al. Face Alignment Across Large Poses: A 3D Solution [C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016: 146-155. DOI: 10.1109/CVPR.2016.23 |
[16] | FENG Z H, KITTLER J, AWAIS M , et al. Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks [C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 2235-2245. DOI: 10.1109/CVPR.2018.00238 |
[17] | ZHUANG C, ZHANG S, LEI Z , et al. FLDet: A CPU Real-Time Joint Face and Landmark Detector [C]// IAPR International Conference on Biometrics (ICB). Crete, Greece, 2019 |
[18] | BHARADWAJ S, VATSA M, SINGH R . Can Holistic Representations be Used for Face Biometric Quality Assessment? [C]//IEEE International Conference on Image Processing. Melbourne, Australia, 2013: 2792-2796. DOI: 10.1109/ICIP.2013.6738575 |
[19] | OJALA T, PIETIK?INEN M, M?ENP?? T . Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002 ( 7):971-987. DOI: 10.1109/TPAMI.2002.1017623 |
[20] | DALAL N, TRIGGS B . Histograms of Oriented Gradients for Human Detection [C]//International Conference on Computer Vision & Pattern Recognition (CVPR'05). San Diego, USA, 2005,1:886-893. DOI: 10.1109/CVPR.2005.177 |
[21] | HERNANDEZ-ORTEGA J, GALBALLY J, FIERREZ J , et al. FaceQnet: Quality Assessment for Face Recognition Based on Deep Learning [DB/OL]. (2019-04-03). |
[22] | NASROLLAHI K, MOESLUND T B . Face Quality Assessment System in Video Sequences [C]//European Workshop on Biometrics and Identity Management. Roskilde, Denmark, 2008: 10-18. DOI: 10.1007/978-3-540-89991-4_2 |
[23] | LIU Z, LUO P, WANG X , et al. Deep Learning Face Attributes in the Wild [C]//IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 3730-3738. DOI: 10.1109/ICCV.2015.425 |
[24] | HAN H, JAIN A K, WANG F , et al. Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018,40(11):2597-2609. DOI: 10.1109/TPAMI.2017.2738004 |
[25] | RANJAN R, SANKARANARAYANAN S, CASTILLO C D , et al. An All-in-One Convolutional Neural Network for Face Analysis [C]//12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). Washington DC, USA, 2017: 17-24. DOI: 10.1109/FG.2017.137 |
[26] | ZHANG Z, LUO P, LOY C C , et al. Learning Deep Representation for Face Alignment with Auxiliary Attributes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015,38(5):918-930. DOI: 10.1109/TPAMI.2015.2469286 |
[27] | BEST-ROWDEN L, JAIN A K . Learning Face Image Quality from Human Assessments[J]. IEEE Transactions on Information Forensics and Security, 2018,13(12):3064-3077. DOI: 10.1109/TIFS.2018.2799585 |
[28] | ZHANG L, CHU R, XIANG S , et al. Face Detection Based on Multi-Block LBP Representation [C]//International Conference on Biometrics. Seoul, South Korea, 2007: 11-18. DOI: 10.1007/978-3-540-74549-5_2 |
[29] | ZHU Q, YEH M C, CHENG K T , et al. Fast Human Detection Using a Cascade of Histograms of Oriented Gradients [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). New York, USA, 2006,2:1491-1498. DOI: 10.1109/CVPR.2006.119 |
[30] | PHAM M T, GAO Y, HOANG V D D , et al. Fast Polygonal Integration and its Application in Extending Haar-Like Features to Improve Object Detection [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA, 2010: 942-949. DOI: 10.1109/CVPR.2010.5540117 |
[31] | YAN J, LEI Z, WEN L , et al. The Fastest Deformable Part Model for Object Detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 2497-2504. DOI: 10.1109/CVPR.2014.320 |
[32] | RAMANAN D, ZHU X. Face Detection, Pose Estimation , Landmark Localization in the Wild [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Rhode Island, USA, 2012: 2879-2886. DOI: 10.1109/cvpr.2012.6248014 |
[33] | KRIZHEVSKY A, SUTSKEVER I, HINTON G E . Imagenet Classification with Deep Convolutional Neural Networks [C]//Advances in Neural Information Processing Systems. Lake Tahoe, USA, 2012: 1097-1105. DOI: 10.1145/3065386 |
[34] | GIRSHICK R, DONAHUE J, DARRELL T , et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation [C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 580-587. DOI: 10.1109/CVPR.2014.81 |
[35] | GIRSHICK R . Fast R-CNN [C]//IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 1440-1448. DOI: 10.1109/ICCV.2015.169 |
[36] | TANG X, DU D K, HE Z , et al. Pyramidbox: A Context-Assisted Single Shot Face Detector [C]//European Conference on Computer Vision (ECCV). Munich, Germany, 2018: 797-813. DOI: 10.1007/978-3-030-01240-3_49 |
[37] | ZHANG Z, LUO P, LOY C C , et al. Facial Landmark Detection by Deep Multi-Task Learning [C]//European Conference on Computer Vision. Zurich, Switzerland, 2014: 94-108. DOI: 10.1007/978-3-319-10599-4_7 |
[38] | CHEN D, REN S, WEI Y , et al. Joint Cascade Face Detection and Alignment [C]//European Conference on Computer Vision. Zurich, Switzerland, 2014: 109-122. DOI: 10.1007/978-3-319-10599-4_8 |
[39] | SIMONYAN K, ZISSERMAN A . Very Deep Convolutional Networks for Large-Scale Image Recognition [DB/OL]. (2014-09-04). |
[40] | YANG S, LUO P, LOY C C , et al. Wider Face: A Face Detection Benchmark [C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vega, USA, 2016: 5525-5533. DOI: 10.1109/CVPR.2016.596 |
[41] | YANG B, YAN J, LEI Z , et al. Aggregate Channel Features for Multi-View Face Detection [C]//IEEE International Joint Conference on Biometrics. Clearwater, USA, 2014: 1-8. DOI: 10.1109/BTAS.2014.6996284 |
[42] | YAN J, ZHANG X, LEI Z , et al. Face Detection by Structural Models[J]. Image and Vision Computing, 2014,32(10):790-799. DOI: 10.1016/j.imavis.2013.12.004 |
[43] | MARKUS N, FRLJAK M, PANDZIC I S , et al. Object Detection with Pixel Intensity Comparisons Organized in Decision Trees [DB/OL]. (2013-05-20). |
[44] | LI H, LIN Z, BRANDT J , et al. Efficient Boosted Exemplar-Based Face Detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 1843-1850. DOI: DOI 10.1109/CVPR.2014.238 |
[45] | LI J, ZHANG Y . Learning Surf Cascade for Fast and Accurate Object Detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 3468-3475. DOI: 10.1109/CVPR.2013.445 |
No related articles found! |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||