ZTE Communications ›› 2023, Vol. 21 ›› Issue (1): 64-71.DOI: 10.12142/ZTECOM.202301008

• Research Paper • Previous Articles    

Ultra-Lightweight Face Animation Method for Ultra-Low Bitrate Video Conferencing

LU Jianguo1,2, ZHENG Qingfang1,2()   

  1. 1.State Key Laboratory of Mobile Network and Mobile Multimedia Technology, Shenzhen 518055, China
    2.ZTE Corporation, Shenzhen 518057, China
  • Received:2022-08-25 Online:2023-03-25 Published:2024-03-15
  • About author:LU Jianguo received his BS and MS degrees from Huazhong University of Science and Technology, China in 2017 and 2020 respectively. After graduation, he has been working at ZTE Corporation. His research interests include computer vision, artificial intelligence and augmented reality.
    ZHENG Qingfang (zheng.qingfang@zte.com.cn) received his BS degree from Shanghai Jiao Tong University, China in 2002, and PhD degree in computer science from Institute of Computing Technology, Chinese Academy of Sciences in 2008. He is now the chief scientist of cloud video product and deputy director of the Video Technology Committee at ZTE Corporation. His current research interests include video communication, computer vision and artificial intelligence.

Abstract:

Video conferencing systems face the dilemma between smooth streaming and decent visual quality because traditional video compression algorithms fail to produce bitstreams low enough for bandwidth-constrained networks. An ultra-lightweight face-animation-based method that enables better video conferencing experience is proposed in this paper. The proposed method compresses high-quality upper-body videos with ultra-low bitrates and runs efficiently on mobile devices without high-end graphics processing units (GPU). Moreover, a visual quality evaluation algorithm is used to avoid image degradation caused by extreme face poses and/or expressions, and a full resolution image composition algorithm to reduce unnaturalness, which guarantees the user experience. Experiments show that the proposed method is efficient and can generate high-quality videos at ultra-low bitrates.

Key words: talking heads, face animation, video conferencing, generative adversarial network