ZTE Communications ›› 2025, Vol. 23 ›› Issue (3): 96-110.DOI: 10.12142/ZTECOM.202503011

• Research Papers • Previous Articles    

M+MNet: A Mixed-Precision Multibranch Network for Image Aesthetics Assessment

HE Shuai1, LIU Limin1, WANG Zhanli2, LI Jinliang2, MAO Xiaojun2, MING Anlong1()   

  1. 1.Beijing University of Posts and Telecommunications, Beijing 100876, China
    2.ZTE Corporation, Shenzhen 518057, China
  • Received:2024-11-13 Online:2025-09-25 Published:2025-09-11
  • About author:HE Shuai is a postdoctoral researcher in computer science at Beijing University of Posts and Telecommunications (BUPT), China. His research interests include image processing and image aesthetics assessment.
    LIU Limin is pursuing a master’s degree in computer science at Beijing University of Posts and Telecommunications (BUPT), China, with a research focus on image processing and image aesthetics assessment.
    WANG Zhanli is an engineer at ZTE Corporation, with a research focus on image processing.
    LI Jinliang is an engineer at ZTE Corporation, with a research focus on image processing.
    MAO Xiaojun is an engineer at ZTE Corporation, with a research focus on image processing and image quality assessment.
    MING Anlong (mal@bupt.edu.cn) received his PhD from Beijing University of Posts and Telecommunications (BUPT), China in 2008. He is currently a professor with the School of Computer Science, BUPT. His research interests include computer vision and robot vision.
  • Supported by:
    the National Natural Science Foundation of China(62502040);the ZTE Industry?University?Institute Cooperation Funds(IA20230700001)

Abstract:

We propose Mixed-Precision Multibranch Network (M+MNet) to compensate for the neglect of background information in image aesthetics assessment (IAA) while providing strategies for overcoming the dilemma between training costs and performance. First, two exponentially weighted pooling methods are used to selectively boost the extraction of background and salient information during downsampling. Second, we propose Corner Grid, an unsupervised data augmentation method that leverages the diffusive characteristics of convolution to force the network to seek more relevant background information. Third, we perform mixed-precision training by switching the precision format, thus significantly reducing the time and memory consumption of data representation and transmission. Most of our methods specifically designed for IAA tasks have demonstrated generalizability to other IAA works. For performance verification, we develop a large-scale benchmark (the most comprehensive thus far) by comparing 17 methods with M+MNet on two representative datasets: the Aesthetic Visual Analysis (AVA) dataset and FLICKR-Aesthetic Evaluation Subset (FLICKR-AES). M+MNet achieves state-of-the-art performance on all tasks.

Key words: deep learning, image aesthetics assessment, multibranch network