ZTE Communications ›› 2025, Vol. 23 ›› Issue (3): 48-58.DOI: 10.12142/ZTECOM.202503006
• Special Topic • Previous Articles Next Articles
SHEN Qiuhong, YANG Zijin, JIANG Jun, ZHANG Weiming, CHEN Kejiang(
)
Received:2025-06-23
Online:2025-09-11
Published:2025-09-11
Contact:
CHEN Kejiang
About author:SHEN Qiuhong received her BS degree from the University of Science and Technology of China (USTC) in 2025. She is currently pursuing her MS degree at USTC. Her research interests include information hiding and multimedia security.Supported by:SHEN Qiuhong, YANG Zijin, JIANG Jun, ZHANG Weiming, CHEN Kejiang. StegoAgent: A Generative Steganography Framework Based on GUI Agents[J]. ZTE Communications, 2025, 23(3): 48-58.
Add to citation manager EndNote|Ris|BibTeX
URL: https://zte.magtechjournal.com/EN/10.12142/ZTECOM.202503006
| Dataset | TM | Pos | TM+Pos |
|---|---|---|---|
| Screenspot[ | 0.932 | 0.999 | 1 |
| Mind2web[ | 0.919 | 0.976 | 0.995 |
Table 1 Position prediction accuracy
| Dataset | TM | Pos | TM+Pos |
|---|---|---|---|
| Screenspot[ | 0.932 | 0.999 | 1 |
| Mind2web[ | 0.919 | 0.976 | 0.995 |
| Dataset | Entropy Bit per Token | Capacity Bit per Token | Capacity Bit per Sample |
|---|---|---|---|
| Screenspot[ | 0.383 | 0.122 | 1.553 |
| Mind2web[ | 0.438 | 0.056 | 1.716 |
Table 2 Results of capacity evaluation
| Dataset | Entropy Bit per Token | Capacity Bit per Token | Capacity Bit per Sample |
|---|---|---|---|
| Screenspot[ | 0.383 | 0.122 | 1.553 |
| Mind2web[ | 0.438 | 0.056 | 1.716 |
| Method | Mobile Text | Mobile Icon | Desktop Text | Desktop Icon | Web Text | Web Icon | Avg. |
|---|---|---|---|---|---|---|---|
| ShowUI[ | 0.791 | 0.672 | 0.763 | 0.614 | 0.804 | 0.592 | 0.706 |
| StegoAgent | 0.787 | 0.681 | 0.758 | 0.600 | 0.804 | 0.578 | 0.701 |
Table 3 Results of grounding capability evaluation accuracy (%)
| Method | Mobile Text | Mobile Icon | Desktop Text | Desktop Icon | Web Text | Web Icon | Avg. |
|---|---|---|---|---|---|---|---|
| ShowUI[ | 0.791 | 0.672 | 0.763 | 0.614 | 0.804 | 0.592 | 0.706 |
| StegoAgent | 0.787 | 0.681 | 0.758 | 0.600 | 0.804 | 0.578 | 0.701 |
| Method | Cross-Task | Cross-Domain | Cross-Website | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Ele.Acc | Op.F1 | Step.SR | Ele.Acc | Op.F1 | Step.SR | Ele.Acc | Op.F1 | Step.SR | |
| ShowUI[ | 0.214 | 0.832 | 0.178 | 0.248 | 0.802 | 0.200 | 0.224 | 0.799 | 0.169 |
| StegoAgent | 0.212 | 0.832 | 0.179 | 0.244 | 0.802 | 0.196 | 0.226 | 0.799 | 0.170 |
Table 4 Results of navigation capability evaluation accuracy (%)
| Method | Cross-Task | Cross-Domain | Cross-Website | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Ele.Acc | Op.F1 | Step.SR | Ele.Acc | Op.F1 | Step.SR | Ele.Acc | Op.F1 | Step.SR | |
| ShowUI[ | 0.214 | 0.832 | 0.178 | 0.248 | 0.802 | 0.200 | 0.224 | 0.799 | 0.169 |
| StegoAgent | 0.212 | 0.832 | 0.179 | 0.244 | 0.802 | 0.196 | 0.226 | 0.799 | 0.170 |
Figure 5 Visualization of StegoAgent before and after steganography, where blue bounding boxes delineate the regions of UI elements annotated in the dataset, blue dots represent the coordinates generated by StegoAgent, and red dots indicate the original coordinates
| [1] | BARNI M. Steganography in digital media: principles, algorithms, and applications [J]. IEEE signal processing magazine, 2011, 28(5): 142–144. DOI: 10.1109/MSP.2011.941841 |
| [2] | PEVNÝ T, FRIDRICH J. Benchmarking for steganography [C]//International Workshop on Information Hiding (10th International Workshop). ISIH, 2018. DOI: 10.1007/978-3-540-88961-8_18 |
| [3] | REINEL T S, RAÚL R P, GUSTAVO I. Deep learning applied to steganalysis of digital images: a systematic review [J]. IEEE access, 2019, 7: 68970–68990 |
| [4] | KHEDDAR H, HEMIS M, HIMEUR Y, et al. Deep learning for steganalysis of diverse data types: a review of methods, taxonomy, challenges and future directions [J]. Neurocomputing, 2024, 581: 127528. DOI: 10.1016/j.neucom.2024.127528 |
| [5] | LIU J, KE Y, ZHANG Z, et al. Recent advances of image steganography with generative adversarial networks [J]. IEEE access, 2020, 8: 60575–60597 |
| [6] | ZHANG C Y, HE S L, QIAN J X, et al. Large language model-brained GUI agents: a survey [EB/OL]. (2024-11-27) [2025-06-01]. |
| [7] | LIU M L, SONG T T, LUO W Q, et al. Adversarial steganography embedding via stego generation and selection [J]. IEEE transactions on dependable and secure computing, 2023, 20(3): 2375–2389. DOI: 10.1109/TDSC.2022.3182041 |
| [8] | LI Q, MA B, FU X P, et al. Robust image steganography via color conversion [J]. IEEE transactions on circuits and systems for video technology, 2025, 35(2): 1399–1408. DOI: 10.1109/TCSVT.2024.3466961 |
| [9] | FAN Z X, CHEN K J, ZENG K, et al. Natias: neuron attribution-based transferable image adversarial steganography [J]. IEEE transactions on information forensics and security, 2024, 19: 6636–6649. DOI: 10.1109/TIFS.2024.3421893 |
| [10] | LI Z H, JIANG X H, DONG Y, et al. An anti-steganalysis HEVC video steganography with high performance based on CNN and PU partition modes [J]. IEEE transactions on dependable and secure computing, 2023, 20(1): 606–619. DOI: 10.1109/TDSC.2022.3140899 |
| [11] | HE S H, XU D W, YANG L, et al. Adaptive HEVC video steganography with high performance based on attention-net and PU partition modes [J]. IEEE transactions on multimedia, 2023, 26: 687–700. DOI: 10.1109/TMM.2023.3269663 |
| [12] | MAO X Y, HU X X, PENG W L, et al. From covert hiding to visual editing: robust generative video steganography [C]//The 32nd ACM International Conference on Multimedia. ACM, 2024: 2757–2765. DOI: 10.1145/3664647.3681149 |
| [13] | FILLER T, JUDAS J, FRIDRICH J. Minimizing additive distortion in steganography using syndrome-trellis codes [J]. IEEE transactions on information forensics and security, 2011, 6(3): 920–935. DOI: 10.1109/TIFS.2011.2134094 |
| [14] | LI W X, ZHANG W M, LI L, et al. Designing near-optimal steganographic codes in practice based on polar codes [J]. IEEE transactions on communications, 2020, 68(7): 3948–3962. DOI: 10.1109/TCOMM.2020.2982624 |
| [15] | GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks [C]//The 28th International Conference on Neural Information Processing Systems. ACM, 2014 |
| [16] | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models [C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022: 10674–10685. DOI: 10.1109/CVPR52688.2022.01042 |
| [17] | PENG F, CHEN G F, LONG M. A robust coverless steganography based on generative adversarial networks and gradient descent approximation [J]. IEEE transactions on circuits and systems for video technology, 2022, 32(9): 5817–5829. DOI: 10.1109/TCSVT.2022.3161419 |
| [18] | DING J Y, CHEN K J, WANG Y F, et al. Discop: provably secure steganography in practice based on “distribution copies” [C]//IEEE Symposium on Security and Privacy (SP). IEEE, 2023: 2238–2255. DOI: 10.1109/SP46215.2023.10179287 |
| [19] | WITT D C S, SOKOTA S, KOLTER J Z, et al. Perfectly secure steganography using minimum entropy coupling [C]//The Eleventh International Conference on Learning Representations. ICLR, 2023:1–14 |
| [20] | YANG Z J, CHEN K J, ZENG K, et al. Provably secure robust image steganography [J]. IEEE transactions on multimedia, 2023, 26: 5040–5053. DOI: 10.1109/TMM.2023.3330098 |
| [21] | HU X X, LI S, YING Q C, et al. Establishing robust generative image steganography via popular stable diffusion [J]. IEEE transactions on information forensics and security, 2024, 19: 8094–8108. DOI: 10.1109/TIFS.2024.3444311 |
| [22] | WANG Y F, PEI G, CHEN K J, et al. Sparsamp: efficient provably secure steganography based on sparse sampling [EB/OL]. [2025-06-01]. |
| [23] | LI K L, WU M Q. Effective GUI testing automation: developing an automated GUI testing tool [M]. Hoboken, USA: John Wiley & Sons, 2006 |
| [24] | RODRÍGUEZ-VALDÉS O, EJ VOS T, AHOV P, et al. 30 years of automated GUI testing: a bibliometric analysis [C]//The 14th International Conference Quality of Information and Communications Technology. CCIS, 2021: 473–488 |
| [25] | IVANČIĆ L, SUŠA VUGEC D, BOSILJ VUKŠIĆ V. Robotic process automation: systematic literature review [EB/OL]. [2025-06-01]. |
| [26] | GUR I, FURUTA H, HUANG A V, et al. A real-world webagent with planning, long context understanding, and program synthesis [EB/OL]. (2023-07-24) [2025-06-01]. |
| [27] | KIM G, BALDI P, MCALEER S. Language models can solve computer tasks [C]//The 37th International Conference on Neural Information Processing Systems. NIPS, 2023: 39648–39677 |
| [28] | LO R, SRIDHAR A, XU F, et al. Hierarchical prompting assists large language model on web navigation [C]//Proceedings of Findings of the Association for Computational Linguistics. EMNLP. Association for Computational Linguistics, 2023: 10217–10244. DOI: 10.18653/v1/2023.findings-emnlp.685 |
| [29] | LAI H Y, LIU X, IONG I L, et al. AutoWebGLM: a large language model-based web navigating agent [C]//The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2024: 5295–5306. DOI: 10.1145/3637528.3671620 |
| [30] | AGASHE S, HAN J Z, GAN S Y, et al. Agent S: an open agentic framework that uses computers like a human [C]//The Thirteenth International Conference on Learning Representations. ICLR, 2025 |
| [31] | NIU R L, LI J D, WANG S Q, et al. ScreenAgent: a vision language model-driven computer control agent [C]//Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2024: 6433–6441. DOI: 10.24963/ijcai.2024/711 |
| [32] | HE H L, YAO W L, MA K X, et al. WebVoyager: building an end-to-end web agent with large multimodal models [C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ACL, 2024: 6864–6890. DOI: 10.18653/v1/2024.acl-long.371 |
| [33] | IONG I L, LIU X, CHEN Y X, et al. OpenWebAgent: an open toolkit to enable web agents on large language models [C]//The 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). ACL, 2024: 72–81. DOI: 10.18653/v1/2024.acl-demos.8 |
| [34] | WANG B, LI G, LI Y. Enabling conversational interaction with mobile UI using large language models [C]//The 2023 CHI Conference on Human Factors in Computing Systems. ACM, 2023: 1–17. DOI: 10.1145/3544548.3580895 |
| [35] | ZHANG C, YANG Z, LIU J X, et al. AppAgent: multimodal agents as smartphone users [C]//Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM, 2025: 1–20. DOI: 10.1145/3706598.3713600 |
| [36] | ZHANG C Y, LI L Q, HE S L, et al. UFO: a UI-focused agent for windows OS interaction [C]//The 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2025: 597–622. DOI: 10.18653/v1/2025.naacl-long.26 |
| [37] | WU Z Y, HAN C C, DING Z C, et al. Os-copilot: towards generalist computer agents with self-improvement. [EB/OL]. (2024-02-12) [2025-06-01]. |
| [38] | AGASHE S, WONG K, TU V, et al. Agent s2: a compositional generalist-specialist framework for computer use agents. [EB/OL]. (2025-04-01) [2025-06-01]. |
| [39] | WANG Y Q, ZHANG H J, TIAN J Q, et al. Ponder & press: advancing visual GUI agent towards general computer control [C]//Findings of the Association for Computational Linguistics. ACL, 2025: 1461–1473 |
| [40] | LIN K Q H, LI L J, GAO D F, et al. ShowUI: one vision-language-action model for GUI visual agent [EB/OL]. (2024-11-26) [2025-06-01]. |
| [41] | LIU X, QIN B, LIANG D Z, et al. Autoglm: autonomous foundation agents for GUIs [EB/OL]. (2024-10-28) [2025-06-01]. |
| [42] | NING L B, LIANG Z R, JIANG Z H, et al. A survey of webagents: towards next-generation AI agents for web automation with large foundation models [C]//The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2025: 6140–6150 |
| [43] | BAI S, CHEN K Q, LIU X J, et al. Qwen 2.5-VL technical report [EB/OL]. (2025-02-19) [2025-06-01]. |
| [44] | CHENG K Z, SUN Q S, CHU Y G, et al. Seeclick: harnessing GUI grounding for advanced visual GUI agents [C]//The 62nd Annual Meeting of the Association for Computational Linguistics. ACL, 2024: 9313–9332. DOI: 10.18653/v1/2024.acl-long.505 |
| [45] | DENG X, GU Y, ZHENG B Y, et al. Mind2web: towards a generalist agent for the web [C]//The 37th International Conference on Neural Information Processing Systems. ACM, 2023: 28091–28114 |
| [1] | CHEN Guangyi, ZHANG Ruoyu, REN Hong, LIN Xu, WU Wen. Joint Beamforming Design for Dual-Functional Radar-Communication Systems Under Beampattern Gain Constraints [J]. ZTE Communications, 2024, 22(3): 13-20. |
| [2] | MA Qianli, ZHANG Shengli, WANG Taotao, YANG Qing, WANG Jigang. Optimization of High-Concurrency Conflict Issues in Execute-Order-Validate Blockchain [J]. ZTE Communications, 2024, 22(2): 19-29. |
| [3] | XIONG Zhiang, ZHAO Ping, FAN Jiyuan, WU Zengqiang, GONG Hongwei. Mixed Electric and Magnetic Coupling Design Based on Coupling Matrix Extraction [J]. ZTE Communications, 2023, 21(4): 85-90. |
| [4] | DENG Letian, ZHAO Yanru. Deep Learning-Based Semantic Feature Extraction: A Literature Review and Future Directions [J]. ZTE Communications, 2023, 21(2): 11-17. |
| [5] | DUAN Lanyan, LU Hongliang, QI Junjun, ZHANG Yuming, ZHANG Yimen. An Improved Parasitic Parameter Extraction Method for InP HEMT [J]. ZTE Communications, 2022, 20(S1): 1-6. |
| [6] | LI Daiyi, TU Yaofeng, ZHOU Xiangsheng, ZHANG Yangming, MA Zongmin. End-to-End Chinese Entity Recognition Based on BERT-BiLSTM-ATT-CRF [J]. ZTE Communications, 2022, 20(S1): 27-35. |
| [7] | CHEN Liangqin, TIAN Liping, XU Zhimeng, CHEN Zhizhang. A Survey of Wi-Fi Sensing Techniques with Channel State Information [J]. ZTE Communications, 2020, 18(3): 57-63. |
| [8] | ZHANG Diankai, ZHAO Rui-Wei, SHEN Lin, CHEN Shaoxiang, SUN Zhenfeng, and JIANG Yu-Gang. Action Recognition in Surveillance Videos with Combined Deep Network Models [J]. ZTE Communications, 2016, 14(S1): 54-60. |
| [9] | Junneng Nie and Haopeng Chen. An MAS Framework for Speculative Trading Research in Stock Index Futures Market [J]. ZTE Communications, 2014, 12(4): 54-60. |
| [10] | Jonathan S. Lu, Daniel Steinbach, Patrick Cabrol, and Philip Pietraski. Modeling Human Blockers in Millimeter Wave Radio Links [J]. ZTE Communications, 2012, 10(4): 23-28. |
| [11] | Philip Pietraski, Gregg Charlton, Rui Yang and Carl Wang. Enhanced Cell-Edge Performance with Transmit Power-Shaping and Multipoint, Multiflow Techniques [J]. ZTE Communications, 2011, 9(4): 43-48. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||