ZTE Communications ›› 2019, Vol. 17 ›› Issue (3): 15-22.doi: 10.12142/ZTECOM.201903004

• Special Topic • Previous Articles     Next Articles

Face Detection, Alignment, Quality Assessment and Attribute Analysis with Multi-Task Hybrid Convolutional Neural Networks

GUO Da1,2, ZHENG Qingfang3,4, PENG Xiaojiang1,2, LIU Ming3,4   

  1. 1.Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
    3.ZTE Corporation, Shenzhen, Guangdong 518057, China
    4.State Key Laboratory of Mobile Network and Mobile Multimedia Technology, Shenzhen, Guangdong 518057, China
  • Received:2019-06-11 Online:2019-09-29 Published:2019-12-06
  • About author:GUO Da received the B.Eng. from the Computer Engineering College, JiMei University, China in 2018. He is currently a master student at the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China. His research direction is face detection and recognition based on deep learning.|ZHENG Qingfang received the B.S. degree in civil engineering from Shanghai Jiaotong University, China in 2002 and Ph.D. degree in computer science from Institute of Computing Technology, Chinese Academy of Science, China in 2008. He is currently the chief scientist of video technology with ZTE Corporation. His research interests include computer vision, multimedia retrieval, image/video processing, with a special focus on low power embedded application and large-scale cloud application.|PENG Xiaojiang (xj.peng@siat.ac.cn) received his Ph.D. from School of Information Science and Technology from Southwest Jiaotong University, China in 2014. He currently is an associate professor at the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China. He was a postdoctoral researcher at Idiap Institute, Switzerland from 2016 to 2017, and was a postdoctoral researcher in LEAR Team, INRIA, France, working with Prof. Cordelia Schmid from 2015 to 2016. He serves as a reviewer for IJCV, TMM, TIP, CVPR, ICCV, AAAI, IJCAI, FG, Image and Vision Computing, IEEE Signal Processing Letter, Neurocomputing, etc. His research focus is in the areas of action recognition and detection, face recognition, facial emotion analysis, and deep learning.|LIU Ming received the M.Sc. degree from Harbin Engineering University, China in 2011. He is currently a senior engineer with ZTE Corporation. His research interests include object detection, tracking and recognition.


This paper proposes a universal framework, termed as Multi-Task Hybrid Convolutional Neural Network (MHCNN), for joint face detection, facial landmark detection, facial quality, and facial attribute analysis. MHCNN consists of a high-accuracy single stage detector (SSD) and an efficient tiny convolutional neural network (T-CNN) for joint face detection refinement, alignment and attribute analysis. Though the SSD face detectors achieve promising results, we find that applying a tiny CNN on detections further boosts the detected face scores and bounding boxes. By multi-task training, our T-CNN aims to provide five facial landmarks, facial quality scores, and facial attributes like wearing sunglasses and wearing masks. Since there is no public facial quality data and facial attribute data as we need, we contribute two datasets, namely FaceQ and FaceA, which are collected from the Internet. Experiments show that our MHCNN achieves face detection performance comparable to the state of the art in face detection data set and benchmark (FDDB), and gets reasonable results on AFLW, FaceQ and FaceA.

Key words: face detection, face alignment, facial attribute, CNN, multi-task training