ZTE Communications ›› 2024, Vol. 22 ›› Issue (4): 78-88.DOI: 10.12142/ZTECOM.202404011

• Research Papers • Previous Articles     Next Articles

Video Enhancement Network Based on CNN and Transformer

YUAN Lang1(), HUI Chen1, WU Yanfeng1, LIAO Ronghua1, JIANG Feng2, GAO Ying3   

  1. 1.Harbin Institute of Technology, Harbin 150001, China
    2.Sichuan University of Science & Engineering, Zigong 643002, China
    3.ZTE Corporation, Shenzhen 518057, China
  • Received:2023-07-29 Online:2024-12-20 Published:2024-12-03
  • About author:YUAN Lang (1190201114@stu.hit.edu.cn) is now a senior student at Harbin Institute of Technology (HIT), China. He is working at the Research Center of Intelligent Interface and Human-Computer Interaction, HIT. His research interests include deep learning, image coding, and compressive sensing.
    HUI Chen received his BS degree in software engineering from Yanshan University, China in 2017, MS degree in software engineering, and PhD in computer science and technology from Harbin Institute of Technology, China in 2020 and 2024, respectively. He has been a visiting scholar with the College of Computing and Data Science, Nanyang Technological University, Singapore since 2023. He is currently with the School of Future Technology, Nanjing University of Information Science and Technology, China. His research interests include image compression, quality assessment, and multimedia security.
    WU Yanfeng is currently an undergraduate student at Harbin Institute of Technology (HIT), China. He has joined the Research Center of Intelligent Interface and Human-Computer Interaction, HIT. His research interests include machine learning and deep learning, with a current focus on video coding and compressive sensing.
    LIAO Ronghua is pursuing a master’s degree at the Department of Computing, Harbin Institute of Technology, China. His research interests include compressed sensing and video quality assessment.
    JIANG Feng received his BS, MS and PhD degrees from the Harbin Institute of Technology (HIT), China in 2001, 2003, and 2008, respectively, all in computer science. He is currently a professor with the Department of Computer Science, Sichuan University of Science & Engineering, China and a visiting scholar with the School of Electrical Engineering, Princeton University, USA. His research interests include computer vision, image and video processing, and pattern recognition.
    GAO Ying received her master’s degree in mathematics from Hohai University, China in 2011. Since then, she has been engaged in the research on Internet of Things (IoT), video surveillance, video transmission, and video coding. She has applied for over 100 patents in these fields. She is currently a chief engineer of standard pre-research at ZTE Corporation and a member of the State Key Laboratory of Mobile Network and Multimedia Technology in China, where she mainly focuses on the research and standardization of video coding technology.
  • Supported by:
    the Key R&D Program of China(2022YFC3301800);Sichuan Local Technological Development Program(24YRGZN0010);ZTE Industry?University?Institute Cooperation Funds(HC?CN?03?2019?12)

Abstract:

To enhance the video quality after encoding and decoding in video compression, a video quality enhancement framework is proposed based on local and non-local priors in this paper. Low-level features are first extracted through a single convolution layer and then processed by several conv-tran blocks (CTB) to extract high-level features, which are ultimately transformed into a residual image. The final reconstructed video frame is obtained by performing an element-wise addition of the residual image and the original lossy video frame. Experiments show that the proposed Conv-Tran Network (CTN) model effectively recovers the quality loss caused by Versatile Video Coding (VVC) and further improves VVC's performance.

Key words: attention fusion mechanism, H.266/VVC, transformer, video coding, video quality enhancement