Loading...

Table of Content

    11 September 2025, Volume 23 Issue 3
    Download the whole issue (PDF)
    The whole issue of ZTE Communications September 2025, Vol. 23 No.3
    2025, 23(3):  0. 
    Asbtract ( )   PDF (20152KB) ( )  
    Related Articles | Metrics
    Special Topic
    Special Topic on Security of Large Models
    2025, 23(3):  1-2.  doi:10.12142/ZTECOM.202503001
    Asbtract ( )   HTML ( )   PDF (343KB) ( )  
    References | Related Articles | Metrics
    Poison-Only and Targeted Backdoor Attack Against Visual Object Tracking
    GU Wei, SHAO Shuo, ZHOU Lingtao, QIN Zhan, REN Kui
    2025, 23(3):  3-14.  doi:10.12142/ZTECOM.202503002
    Asbtract ( )   HTML ( )   PDF (1597KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Visual object tracking (VOT), aiming to track a target object in a continuous video, is a fundamental and critical task in computer vision. However, the reliance on third-party resources (e.g., dataset) for training poses concealed threats to the security of VOT models. In this paper, we reveal that VOT models are vulnerable to a poison-only and targeted backdoor attack, where the adversary can achieve arbitrary tracking predictions by manipulating only part of the training data. Specifically, we first define and formulate three different variants of the targeted attacks: size-manipulation, trajectory-manipulation, and hybrid attacks. To implement these, we introduce Random Video Poisoning (RVP), a novel poison-only strategy that exploits temporal correlations within video data by poisoning entire video sequences. Extensive experiments demonstrate that RVP effectively injects controllable backdoors, enabling precise manipulation of tracking behavior upon trigger activation, while maintaining high performance on benign data, thus ensuring stealth. Our findings not only expose significant vulnerabilities but also highlight that the underlying principles could be adapted for beneficial uses, such as dataset watermarking for copyright protection.

    VOTI: Jailbreaking Vision-Language Models via Visual Obfuscation and Task Induction
    ZHU Yifan, CHU Zhixuan, REN Kui
    2025, 23(3):  15-26.  doi:10.12142/ZTECOM.202503003
    Asbtract ( )   HTML ( )   PDF (6551KB) ( )  
    Figures and Tables | References | Supplementary Material | Related Articles | Metrics

    In recent years, large vision-language models (VLMs) have achieved significant breakthroughs in cross-modal understanding and generation. However, the safety issues arising from their multimodal interactions become prominent. VLMs are vulnerable to jailbreak attacks, where attackers craft carefully designed prompts to bypass safety mechanisms, leading them to generate harmful content. To address this, we investigate the alignment between visual inputs and task execution, uncovering locality defects and attention biases in VLMs. Based on these findings, we propose VOTI, a novel jailbreak framework leveraging visual obfuscation and task induction. VOTI subtly embeds malicious keywords within neutral image layouts to evade detection, and breaks down harmful queries into a sequence of subtasks. This approach disperses malicious intent across modalities, exploiting VLMs’ over-reliance on local visual cues and their fragility in multi-step reasoning to bypass global safety mechanisms. Implemented as an automated framework, VOTI integrates large language models as red-team assistants to generate and iteratively optimize jailbreak strategies. Extensive experiments across seven mainstream VLMs demonstrate VOTI’s effectiveness, achieving a 73.46% attack success rate on GPT-4o-mini. These results reveal critical vulnerabilities in VLMs, highlighting the urgent need for improving robust defenses and multimodal alignment.

    From Function Calls to MCPs for Securing AI Agent Systems: Architecture, Challenges and Countermeasures
    WANG Wei, LI Shaofeng, DONG Tian, MENG Yan, ZHU Haojin
    2025, 23(3):  27-37.  doi:10.12142/ZTECOM.202503004
    Asbtract ( )   HTML ( )   PDF (1290KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    With the widespread deployment of large language models (LLMs) in complex and multimodal scenarios, there is a growing demand for secure and standardized integration of external tools and data sources. The Model Context Protocol (MCP), proposed by Anthropic in late 2024, has emerged as a promising framework. Designed to standardize the interaction between LLMs and their external environments, it serves as a “USB-C interface for AI”. While MCP has been rapidly adopted in the industry, systematic academic studies on its security implications remain scarce. This paper presents a comprehensive review of MCP from a security perspective. We begin by analyzing the architecture and workflow of MCP and identify potential security vulnerabilities across key stages including input processing, decision-making, client invocation, server response, and response generation. We then categorize and assess existing defense mechanisms. In addition, we design a real-world attack experiment to demonstrate the feasibility of tool description injection within an actual MCP environment. Based on the experimental results, we further highlight underexplored threat surfaces and propose future directions for securing AI agent systems powered by MCP. This paper aims to provide a structured reference framework for researchers and developers seeking to balance functionality and security in MCP-based systems.

    Dataset Copyright Auditing for Large Models: Fundamentals, Open Problems, and Future Directions
    DU Linkang, SU Zhou, YU Xinyi
    2025, 23(3):  38-47.  doi:10.12142/ZTECOM.202503005
    Asbtract ( )   HTML ( )   PDF (511KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The unprecedented scale of large models, such as large language models (LLMs) and text-to-image diffusion models, has raised critical concerns about the unauthorized use of copyrighted data during model training. These concerns have spurred a growing demand for dataset copyright auditing techniques, which aim to detect and verify potential infringements in the training data of commercial AI systems. This paper presents a survey of existing auditing solutions, categorizing them across key dimensions: data modality, model training stage, data overlap scenarios, and model access levels. We highlight major trends, including the prevalence of black-box auditing methods and the emphasis on fine-tuning rather than pre-training. Through an in-depth analysis of 12 representative works, we extract four key observations that reveal the limitations of current methods. Furthermore, we identify three open challenges and propose future directions for robust, multimodal, and scalable auditing solutions. Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.

    StegoAgent: A Generative Steganography Framework Based on GUI Agents
    SHEN Qiuhong, YANG Zijin, JIANG Jun, ZHANG Weiming, CHEN Kejiang
    2025, 23(3):  48-58.  doi:10.12142/ZTECOM.202503006
    Asbtract ( )   HTML ( )   PDF (1144KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Steganography is a technology that discreetly embeds secret information into the redundant space of a carrier, enabling covert communication. As generative models continue to advance, steganography has evolved from traditional modification-based methods to generative steganography, which includes generative linguistic and image based forms. However, while large model agents are rapidly emerging, no method has exploited the stable redundant space in their action processes. Inspired by this insightful observation, we propose a steganographic method leveraging large model agents, employing their actions to conceal secret messages. In this paper, we introduce StegoAgent, a generative steganography framework based on graphical user interface (GUI) agents, which effectively demonstrates the remarkable potential and effectiveness of large model agent-based steganographic methods.

    Review
    Analysis of Feasible Solutions for Railway 5G Network Security Assessment
    XU Hang, SUN Bin, DING Jianwen, WANG Wei
    2025, 23(3):  59-70.  doi:10.12142/ZTECOM.202503007
    Asbtract ( )   HTML ( )   PDF (502KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The Fifth Generation of Mobile Communications for Railways (5G-R) brings significant opportunities for the rail industry. However, alongside the potential and benefits of the railway 5G network are complex security challenges. Ensuring the security and reliability of railway 5G networks is therefore essential. This paper presents a detailed examination of security assessment techniques for railway 5G networks, focusing on addressing the unique security challenges in this field. In this paper, various security requirements in railway 5G networks are analyzed, and specific processes and methods for conducting comprehensive security risk assessments are presented. This study provides a framework for securing railway 5G network development and ensuring its long-term sustainability.

    Key Techniques and Challenges in NeRF-Based Dynamic 3D Reconstruction
    LU Ping, FENG Daquan, SHI Wenzhe, LI Wan, LIN Jiaxin
    2025, 23(3):  71-80.  doi:10.12142/ZTECOM.202503008
    Asbtract ( )   HTML ( )   PDF (675KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    This paper explores the key techniques and challenges in dynamic scene reconstruction with neural radiance fields (NeRF). As an emerging computer vision method, the NeRF has wide application potential, especially in excelling at 3D reconstruction. We first introduce the basic principles and working mechanisms of NeRFs, followed by an in-depth discussion of the technical challenges faced by 3D reconstruction in dynamic scenes, including problems in perspective and illumination changes of moving objects, recognition and modeling of dynamic objects, real-time requirements, data acquisition and calibration, motion estimation, and evaluation mechanisms. We also summarize current state-of-the-art approaches to address these challenges, as well as future research trends. The goal is to provide researchers with an in-depth understanding of the application of NeRFs in dynamic scene reconstruction, as well as insights into the key issues faced and future directions.

    Research Papers
    Real-Time 7-Core SDM Transmission System Using Commercial 400 Gbit/s OTN Transceivers and Network Management System
    CUI Jian, GU Ninglun, CHANG Cheng, SHI Hu, YAN Baoluo
    2025, 23(3):  81-88.  doi:10.12142/ZTECOM.202503009
    Asbtract ( )   HTML ( )   PDF (3531KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Space-division multiplexing (SDM) utilizing uncoupled multi-core fibers (MCF) is considered a promising candidate for next-generation high-speed optical transmission systems due to its huge capacity and low inter-core crosstalk. In this paper, we demonstrate a real-time high-speed SDM transmission system over a field-deployed 7-core MCF cable using commercial 400 Gbit/s backbone optical transport network (OTN) transceivers and a network management system. The transceivers employ a high noise-tolerant quadrature phase shift keying (QPSK) modulation format with a 130 Gbaud rate, enabled by optoelectronic multi-chip module (OE-MCM) packaging. The network management system can effectively manage and monitor the performance of the 7-core SDM OTN system and promptly report failure events through alarms. Our field trial demonstrates the compatibility of uncoupled MCF with high-speed OTN transmission equipment and network management systems, supporting its future deployment in next-generation high-speed terrestrial cable transmission networks.

    Antenna Parameter Calibration for Mobile Communication Base Station via Laser Tracker
    LI Junqiang, CHEN Shijun, FENG Yujie, FAN Jiancun, CHEN Qiang
    2025, 23(3):  89-95.  doi:10.12142/ZTECOM.202503010
    Asbtract ( )   HTML ( )   PDF (1386KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In the field of antenna engineering parameter calibration for indoor communication base stations, traditional methods suffer from issues such as low efficiency, poor accuracy, and limited applicability to indoor scenarios. To address these problems, a high-precision and high-efficiency indoor base station parameter calibration method based on laser measurement is proposed. We use a high-precision laser tracker to measure and determine the coordinate system transformation relationship, and further obtain the coordinates and attitude of the base station. In addition, we propose a simple calibration method based on point cloud fitting for specific scenes. Simulation results show that using common commercial laser trackers, we can achieve a coordinate correction accuracy of 1 cm and an angle correction accuracy of 0.25°, which is sufficient to meet the needs of wireless positioning.

    M+MNet: A Mixed-Precision Multibranch Network for Image Aesthetics Assessment
    HE Shuai, LIU Limin, WANG Zhanli, LI Jinliang, MAO Xiaojun, MING Anlong
    2025, 23(3):  96-110.  doi:10.12142/ZTECOM.202503011
    Asbtract ( )   HTML ( )   PDF (4795KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    We propose Mixed-Precision Multibranch Network (M+MNet) to compensate for the neglect of background information in image aesthetics assessment (IAA) while providing strategies for overcoming the dilemma between training costs and performance. First, two exponentially weighted pooling methods are used to selectively boost the extraction of background and salient information during downsampling. Second, we propose Corner Grid, an unsupervised data augmentation method that leverages the diffusive characteristics of convolution to force the network to seek more relevant background information. Third, we perform mixed-precision training by switching the precision format, thus significantly reducing the time and memory consumption of data representation and transmission. Most of our methods specifically designed for IAA tasks have demonstrated generalizability to other IAA works. For performance verification, we develop a large-scale benchmark (the most comprehensive thus far) by comparing 17 methods with M+MNet on two representative datasets: the Aesthetic Visual Analysis (AVA) dataset and FLICKR-Aesthetic Evaluation Subset (FLICKR-AES). M+MNet achieves state-of-the-art performance on all tasks.