ZTE Communications ›› 2025, Vol. 23 ›› Issue (3): 3-14.DOI: 10.12142/ZTECOM.202503002

• Special Topic • Previous Articles     Next Articles

Poison-Only and Targeted Backdoor Attack Against Visual Object Tracking

GU Wei1,2, SHAO Shuo1,2, ZHOU Lingtao3, QIN Zhan1,2(), REN Kui1,2   

  1. 1.State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou 310027, China
    2.Hangzhou High?Tech Zone (Binjiang) Institute of Blockchain and Data Security, Hangzhou 310051, China
    3.Shandong University, Jinan 250100, China
  • Received:2025-07-23 Online:2025-09-11 Published:2025-09-11
  • About author:GU Wei is currently pursuing a master's degree at the School of Cyber Science and Technology and the State Key Laboratory of Blockchain and Data Security, Zhejiang University, China. Before that, he received a BE degree in computer science and technology from Zhuoyue Honors College, Hangzhou Dianzi University, China in 2023. His research interests include LLM security and AI safety.
    SHAO Shuo is currently pursuing a PhD degree at the School of Cyber Science and Technology and the State Key Laboratory of Blockchain and Data Security, Zhejiang University, China. Before that, he received a BE degree from the School of Computer Science and Technology, Central South University, China in 2022. His research interests include AI copyright protection, data protection, and LLM safety. He has published a series of papers in top-tier conferences and journals such as NDSS, ICLR, TIFS, and TDSC, among others, and actively serves as a reviewer for NeurIPS, ICML, TCSVT, TII and other leading venues.
    ZHOU Lingtao is currently pursuing a BE degree at Shandong University, China. His research interests include backdoor attacks and AI security.
    QIN Zhan (qinzhan@zju.edu.cn) is currently a tenured associate professor, with both the College of Computer Science and Technology and the Institute of Cyberspace Research (ICSR) at Zhejiang University, China. He was an assistant professor at the Department of Electrical and Computer Engineering, the University of Texas at San Antonio, USA after receiving the PhD degree from the Computer Science and Engineering department, State University of New York at Buffalo, USA in 2017. His current research interests include data security and privacy, secure computation outsourcing, artificial intelligence security, and cyber-physical security in the context of the Internet of Things. His works explore and develop novel security-sensitive algorithms and protocols for computation and communication in the general context of Cloud and Internet devices.
    REN Kui is a professor and the dean of the School of Cyber Science and Technology at Zhejiang University. Before that, he was a SUNY Empire Innovation Professor at State University of New York at Buffalo, USA. He received his PhD degree in electrical and computer engineering from Worcester Polytechnic Institute, USA. His current research interests include data security, IoT security, AI security, and privacy. He received the Guohua Distinguished Scholar Award from Zhejiang University, IEEE CISTC Technical Recognition Award, SUNY Chancellor's Research Excellence Award, Sigma Xi Research Excellence Award, and NSF CAREER Award. He has published extensively in peer-reviewed journals and conferences and received the Test-of-Time Paper Award from IEEE INFOCOM and many Best Paper Awards from IEEE and ACM. He currently serves as Chair of SIGSAC of ACM China. He is a Fellow of IEEE, a Fellow of ACM, and a Clarivate Highly-Cited Researcher.
  • Supported by:
    the "Pioneer" and "Leading Goose" R&D Program of Zhejiang(2024C01169);the National Natural Science Foundation of China(62441238);the National Natural Science Foundation of China(U2441240)

Abstract:

Visual object tracking (VOT), aiming to track a target object in a continuous video, is a fundamental and critical task in computer vision. However, the reliance on third-party resources (e.g., dataset) for training poses concealed threats to the security of VOT models. In this paper, we reveal that VOT models are vulnerable to a poison-only and targeted backdoor attack, where the adversary can achieve arbitrary tracking predictions by manipulating only part of the training data. Specifically, we first define and formulate three different variants of the targeted attacks: size-manipulation, trajectory-manipulation, and hybrid attacks. To implement these, we introduce Random Video Poisoning (RVP), a novel poison-only strategy that exploits temporal correlations within video data by poisoning entire video sequences. Extensive experiments demonstrate that RVP effectively injects controllable backdoors, enabling precise manipulation of tracking behavior upon trigger activation, while maintaining high performance on benign data, thus ensuring stealth. Our findings not only expose significant vulnerabilities but also highlight that the underlying principles could be adapted for beneficial uses, such as dataset watermarking for copyright protection.

Key words: visual object tracking, backdoor attack, computer vision, data security, AI safety