Face Detection, Alignment, Quality Assessment and Attribute Analysis with Multi-Task Hybrid Convolutional Neural Networks

doi:10.12142/ZTECOM.201903004

ZTE Communications ›› 2019, Vol. 17 ›› Issue (3): 15-22.DOI: 10.12142/ZTECOM.201903004

• Special Topic • Previous Articles Next Articles

Face Detection, Alignment, Quality Assessment and Attribute Analysis with Multi-Task Hybrid Convolutional Neural Networks

GUO Da^1,², ZHENG Qingfang^3,⁴, PENG Xiaojiang^1,², LIU Ming^3,⁴

1.Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
2.University of Chinese Academy of Sciences, Beijing 100049, China
3.ZTE Corporation, Shenzhen, Guangdong 518057, China
4.State Key Laboratory of Mobile Network and Mobile Multimedia Technology, Shenzhen, Guangdong 518057, China

Received:2019-06-11 Online:2019-09-29 Published:2019-12-06
About author:GUO Da received the B.Eng. from the Computer Engineering College, JiMei University, China in 2018. He is currently a master student at the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China. His research direction is face detection and recognition based on deep learning.|ZHENG Qingfang received the B.S. degree in civil engineering from Shanghai Jiaotong University, China in 2002 and Ph.D. degree in computer science from Institute of Computing Technology, Chinese Academy of Science, China in 2008. He is currently the chief scientist of video technology with ZTE Corporation. His research interests include computer vision, multimedia retrieval, image/video processing, with a special focus on low power embedded application and large-scale cloud application.|PENG Xiaojiang (xj.peng@siat.ac.cn) received his Ph.D. from School of Information Science and Technology from Southwest Jiaotong University, China in 2014. He currently is an associate professor at the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China. He was a postdoctoral researcher at Idiap Institute, Switzerland from 2016 to 2017, and was a postdoctoral researcher in LEAR Team, INRIA, France, working with Prof. Cordelia Schmid from 2015 to 2016. He serves as a reviewer for IJCV, TMM, TIP, CVPR, ICCV, AAAI, IJCAI, FG, Image and Vision Computing, IEEE Signal Processing Letter, Neurocomputing, etc. His research focus is in the areas of action recognition and detection, face recognition, facial emotion analysis, and deep learning.|LIU Ming received the M.Sc. degree from Harbin Engineering University, China in 2011. He is currently a senior engineer with ZTE Corporation. His research interests include object detection, tracking and recognition.

Abstract

Abstract:

This paper proposes a universal framework, termed as Multi-Task Hybrid Convolutional Neural Network (MHCNN), for joint face detection, facial landmark detection, facial quality, and facial attribute analysis. MHCNN consists of a high-accuracy single stage detector (SSD) and an efficient tiny convolutional neural network (T-CNN) for joint face detection refinement, alignment and attribute analysis. Though the SSD face detectors achieve promising results, we find that applying a tiny CNN on detections further boosts the detected face scores and bounding boxes. By multi-task training, our T-CNN aims to provide five facial landmarks, facial quality scores, and facial attributes like wearing sunglasses and wearing masks. Since there is no public facial quality data and facial attribute data as we need, we contribute two datasets, namely FaceQ and FaceA, which are collected from the Internet. Experiments show that our MHCNN achieves face detection performance comparable to the state of the art in face detection data set and benchmark (FDDB), and gets reasonable results on AFLW, FaceQ and FaceA.

Key words: face detection, face alignment, facial attribute, CNN, multi-task training

GUO Da, ZHENG Qingfang, PENG Xiaojiang, LIU Ming. Face Detection, Alignment, Quality Assessment and Attribute Analysis with Multi-Task Hybrid Convolutional Neural Networks[J]. ZTE Communications, 2019, 17(3): 15-22.

Figures/Tables 11

Figure 1. The pipeline of the proposed Multi-Task Hybrid Convolutional Neural Network (MHCNN). It consists of an SSD-based face detector for high-accuracy detection performance and a T-CNN for detection refinement and multi-task face analysis.

Figure 2. The architecture of single stage detector (SSD)-based face detector.

Figure 3. The architecture of tiny CNN (T-CNN).

Figure 4. Examples of our FaceA dataset.

Figure 5. Examples of our FaceQ dataset.

Figure 6. Discontinuous ROC curves on the FDDB dataset.

Table 1 Comparison of our MHCNN on FDDB

Methods	Recall
Cascade CNN [3]	85.67%
ACF-multiscale [41]	86.08%
YAN et al. [42]	86.15%
Faster R-CNN [11]	96.10%
S³FD [12]	98.37%
MHCNN	98.66%

Table 2 Ablation study of T-CNN on the FaceA dataset

Methods (task setting)	Accuracy of Sunglasses (Threshold = 0.5)	Accuracy of Mask (Threshold = 0.5)
T-CNN (sunglasses)	76.14%	----
T-CNN (sunglasses + landmarks)	76.57%	----
T-CNN (masks)	----	83.30%
T-CNN (masks + landmarks)	----	85.90%
T-CNN (sunglasses + masks + landmarks)	98.70%	99.35%

Figure 7. False positives on FaceA test set. The score of images are the probability to predict wearing-mask and wearing-sunglasses.

Table 3 Evaluation on the FaceQ dataset

Methods	Accuracy of Face Quality (Best threshold)
LBP+SVM	78.52%
T-CNN	81.86%

Figure 8. False positives on FaceQ test set. The faces with higher scores are predicted to the high-quality class and those with lower scores are predicted to the low-quality class.

References 45

[1]	VIOLA P, JONES M J . Robust Real-Time Face Detection[J]. International Journal of Computer Vision, 2004,57(2):137-154. DOI: 10.1023/B:VISI.0000013087.49260.fb
[2]	MATHIAS M, BENENSON R, PEDERSOLI M , et al. Face Detection Without Bells and Whistles [C]//European Conference on Computer Vision. Zurich, Switzerland, 2014: 720-735. DOI: 10.1007/978-3-319-10593-2_47
[3]	ZHU C, ZHENG Y, LUU K , et al. CMS-RCNN: Contextual Multi-Scale Region-Based CNN for Unconstrained Face Detection[M]. Deep Learning for Biometrics. Cham, Switzerland: Springer, 2017: 57-79
[4]	JIANG H, LEARNED-MILLER E . Face Detection with the Faster R-CNN [C]//12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). Washington DC, USA, 2017: 650-657. DOI: 10.1109/FG.2017.82
[5]	LI H, LIN Z, SHEN X , et al. A Convolutional Neural Network Cascade for Face Detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 5325-5334. DOI: 10.1109/CVPR.2015.7299170
[6]	YANG S, LUO P, LOY C C , et al. From Facial Parts Responses to Face Detection: A Deep Learning Approach [C]//IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 3676-3684. DOI: 10.1109/ICCV.2015.419
[7]	ZHANG K, ZHANG Z, LI Z , et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks[J]. IEEE Signal Processing Letters, 2016,23(10):1499-1503. DOI: 10.1109/LSP.2016.2603342
[8]	ZHANG S, ZHU X, LEI Z , et al. Faceboxes: A CPU Real-Time Face Detector with High Accuracy [C]//2017 IEEE International Joint Conference on Biometrics (IJCB). Denver, Colorado, USA, 2017: 1-9. DOI: 10.1109/BTAS.2017.8272675
[9]	NAJIBI M, SAMANGOUEI P, Chellappa R , et al. SSH: Single Stage Headless Face Detector [C]//IEEE International Conference on Computer Vision. Venice, Italy, 2017: 4875-4884. DOI: 10.1109/ICCV.2017.522
[10]	LIU W, ANGUELOV D, ERHAN D , et al. SSD: Single Shot Multibox Detector [C]//European Conference on Computer Vision. Amsterdam, The Netherlands, 2016: 21-37. DOI: 10.1007/978-3-319-46448-0_2
[11]	REN S, HE K, GIRSHICK R , et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [C]//Advances in Neural Information Processing Systems. Montreal, Canada, 2015: 91-99. DOI: 10.1109/TPAMI.2016.2577031
[12]	ZHANG S, ZHU X, LEI Z , et al. S 3FD: Single Shot Scale-Invariant Face Detector [C]//IEEE International Conference on Computer Vision. Venice, Italy, 2017: 192-201. DOI: 10.1109/ICCV.2017.30
[13]	GLOROT X, BENGIO Y . Understanding the Difficulty of Training Deep Feedforward Neural Networks [C]//13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy, 2010: 249-256.
[14]	SUN Y, WANG X, TANG X . Deep Convolutional Network Cascade for Facial Point Detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 3476-3483. DOI: 10.1109/CVPR.2013.446
[15]	ZHU X, LEI Z, LIU X , et al. Face Alignment Across Large Poses: A 3D Solution [C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016: 146-155. DOI: 10.1109/CVPR.2016.23
[16]	FENG Z H, KITTLER J, AWAIS M , et al. Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks [C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 2235-2245. DOI: 10.1109/CVPR.2018.00238
[17]	ZHUANG C, ZHANG S, LEI Z , et al. FLDet: A CPU Real-Time Joint Face and Landmark Detector [C]// IAPR International Conference on Biometrics (ICB). Crete, Greece, 2019
[18]	BHARADWAJ S, VATSA M, SINGH R . Can Holistic Representations be Used for Face Biometric Quality Assessment? [C]//IEEE International Conference on Image Processing. Melbourne, Australia, 2013: 2792-2796. DOI: 10.1109/ICIP.2013.6738575
[19]	OJALA T, PIETIK?INEN M, M?ENP?? T . Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002 ( 7):971-987. DOI: 10.1109/TPAMI.2002.1017623
[20]	DALAL N, TRIGGS B . Histograms of Oriented Gradients for Human Detection [C]//International Conference on Computer Vision & Pattern Recognition (CVPR'05). San Diego, USA, 2005,1:886-893. DOI: 10.1109/CVPR.2005.177
[21]	HERNANDEZ-ORTEGA J, GALBALLY J, FIERREZ J , et al. FaceQnet: Quality Assessment for Face Recognition Based on Deep Learning [DB/OL]. (2019-04-03).
[22]	NASROLLAHI K, MOESLUND T B . Face Quality Assessment System in Video Sequences [C]//European Workshop on Biometrics and Identity Management. Roskilde, Denmark, 2008: 10-18. DOI: 10.1007/978-3-540-89991-4_2
[23]	LIU Z, LUO P, WANG X , et al. Deep Learning Face Attributes in the Wild [C]//IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 3730-3738. DOI: 10.1109/ICCV.2015.425
[24]	HAN H, JAIN A K, WANG F , et al. Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018,40(11):2597-2609. DOI: 10.1109/TPAMI.2017.2738004
[25]	RANJAN R, SANKARANARAYANAN S, CASTILLO C D , et al. An All-in-One Convolutional Neural Network for Face Analysis [C]//12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). Washington DC, USA, 2017: 17-24. DOI: 10.1109/FG.2017.137
[26]	ZHANG Z, LUO P, LOY C C , et al. Learning Deep Representation for Face Alignment with Auxiliary Attributes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015,38(5):918-930. DOI: 10.1109/TPAMI.2015.2469286
[27]	BEST-ROWDEN L, JAIN A K . Learning Face Image Quality from Human Assessments[J]. IEEE Transactions on Information Forensics and Security, 2018,13(12):3064-3077. DOI: 10.1109/TIFS.2018.2799585
[28]	ZHANG L, CHU R, XIANG S , et al. Face Detection Based on Multi-Block LBP Representation [C]//International Conference on Biometrics. Seoul, South Korea, 2007: 11-18. DOI: 10.1007/978-3-540-74549-5_2
[29]	ZHU Q, YEH M C, CHENG K T , et al. Fast Human Detection Using a Cascade of Histograms of Oriented Gradients [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). New York, USA, 2006,2:1491-1498. DOI: 10.1109/CVPR.2006.119
[30]	PHAM M T, GAO Y, HOANG V D D , et al. Fast Polygonal Integration and its Application in Extending Haar-Like Features to Improve Object Detection [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA, 2010: 942-949. DOI: 10.1109/CVPR.2010.5540117
[31]	YAN J, LEI Z, WEN L , et al. The Fastest Deformable Part Model for Object Detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 2497-2504. DOI: 10.1109/CVPR.2014.320
[32]	RAMANAN D, ZHU X. Face Detection, Pose Estimation , Landmark Localization in the Wild [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Rhode Island, USA, 2012: 2879-2886. DOI: 10.1109/cvpr.2012.6248014
[33]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E . Imagenet Classification with Deep Convolutional Neural Networks [C]//Advances in Neural Information Processing Systems. Lake Tahoe, USA, 2012: 1097-1105. DOI: 10.1145/3065386
[34]	GIRSHICK R, DONAHUE J, DARRELL T , et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation [C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 580-587. DOI: 10.1109/CVPR.2014.81
[35]	GIRSHICK R . Fast R-CNN [C]//IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 1440-1448. DOI: 10.1109/ICCV.2015.169
[36]	TANG X, DU D K, HE Z , et al. Pyramidbox: A Context-Assisted Single Shot Face Detector [C]//European Conference on Computer Vision (ECCV). Munich, Germany, 2018: 797-813. DOI: 10.1007/978-3-030-01240-3_49
[37]	ZHANG Z, LUO P, LOY C C , et al. Facial Landmark Detection by Deep Multi-Task Learning [C]//European Conference on Computer Vision. Zurich, Switzerland, 2014: 94-108. DOI: 10.1007/978-3-319-10599-4_7
[38]	CHEN D, REN S, WEI Y , et al. Joint Cascade Face Detection and Alignment [C]//European Conference on Computer Vision. Zurich, Switzerland, 2014: 109-122. DOI: 10.1007/978-3-319-10599-4_8
[39]	SIMONYAN K, ZISSERMAN A . Very Deep Convolutional Networks for Large-Scale Image Recognition [DB/OL]. (2014-09-04).
[40]	YANG S, LUO P, LOY C C , et al. Wider Face: A Face Detection Benchmark [C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vega, USA, 2016: 5525-5533. DOI: 10.1109/CVPR.2016.596
[41]	YANG B, YAN J, LEI Z , et al. Aggregate Channel Features for Multi-View Face Detection [C]//IEEE International Joint Conference on Biometrics. Clearwater, USA, 2014: 1-8. DOI: 10.1109/BTAS.2014.6996284
[42]	YAN J, ZHANG X, LEI Z , et al. Face Detection by Structural Models[J]. Image and Vision Computing, 2014,32(10):790-799. DOI: 10.1016/j.imavis.2013.12.004
[43]	MARKUS N, FRLJAK M, PANDZIC I S , et al. Object Detection with Pixel Intensity Comparisons Organized in Decision Trees [DB/OL]. (2013-05-20).
[44]	LI H, LIN Z, BRANDT J , et al. Efficient Boosted Exemplar-Based Face Detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 1843-1850. DOI: DOI 10.1109/CVPR.2014.238
[45]	LI J, ZHANG Y . Learning Surf Cascade for Fast and Accurate Object Detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 3468-3475. DOI: 10.1109/CVPR.2013.445

Face Detection, Alignment, Quality Assessment and Attribute Analysis with Multi-Task Hybrid Convolutional Neural Networks

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 45

Related Articles 0

Recommended Articles 0

Metrics