SST-V: A Scalable Semantic Transmission Framework for Video

doi:10.12142/ZTECOM.202302010

ZTE Communications ›› 2023, Vol. 21 ›› Issue (2): 70-79.DOI: 10.12142/ZTECOM.202302010

• Special Topic • Previous Articles Next Articles

SST-V: A Scalable Semantic Transmission Framework for Video

LIU Chenyao¹, GUO Jiejie², ZHANG Yimeng¹, XU Wenjun^1,³(), LIU Yiming¹

^1.State Key Laboratory of Network and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
^2.School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
^3.Department of Mathematics and Theories, Peng Cheng Laboratory, Shenzhen 518066, China

Received:2023-02-11 Online:2023-06-13 Published:2023-06-13
About author:LIU Chenyao received her BE degree from the School of Information and Communication Engineering, Beijing University of Posts and Telecommunication (BUPT), China in 2022. She is currently pursuing her PhD degree at the School of Artificial Intelligence, BUPT. Her research interests include semantic communication, video coding, and machine learning.|GUO Jiejie is currently pursuing her BE degree from the School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, China. Her research interests include semantic communication, video coding, and artificial intelligence.|ZHANG Yimeng received her BE degree from the School of Information and Communication Engineering, Beijing University of Posts and Telecommunication (BUPT), China in 2018. She is currently pursuing her PhD degree at the School of Artificial Intelligence, BUPT. Her research interests include semantic communication and intelligent resource allocation in emerging wireless applications. She is a graduate student Member of IEEE.|XU Wenjun (wjxu@bupt.edu.cn) is a professor with the State Key Laboratory of Network and Switching Technology, Beijing University of Posts and Telecommunications, China, and with Peng Cheng Laboratory, China. He received his PhD degree from Beijing University of Posts and Telecommunications in 2008. His research interests include artificial intelligence-driven networks, semantic communications, unmanned aerial vehicle communications and networks, and green communications and networking. He is an editor of China Communications and a senior member of IEEE.|LIU Yiming received her BE degree in communication engineering from Shanghai University, China in 2014, and PhD degree in information and communication engineering from Beijing University of Posts and Telecommunications (BUPT), China in 2019. She was a visiting PhD student with The University of British Columbia, Canada in 2017 and 2018. She is currently an associate researcher with the School of Information and Communication Engineering, BUPT. Her research interests include next-generation wireless networks, semantic communication, edge intelligence, blockchain, and the distributed ledger technology.
Supported by:
the National Natural Science Foundation of China(62293485);the Fundamental Research Funds for the Central Universities(2022RC18)

Abstract

Abstract:

The emerging new services in the sixth generation (6G) communication system impose increasingly stringent requirements and challenges on video transmission. Semantic communications are envisioned as a promising solution to these challenges. This paper provides a highly-efficient solution to video transmission by proposing a scalable semantic transmission algorithm, named scalable semantic transmission framework for video (SST-V), which jointly considers the semantic importance and channel conditions. Specifically, a semantic importance evaluation module is designed to extract more informative semantic features according to the estimated importance level, facilitating high-efficiency semantic coding. By further considering the channel condition, a cascaded learning based scalable joint semantic-channel coding algorithm is proposed, which autonomously adapts the semantic coding and channel coding strategies to the specific signal-to-noise ratio (SNR). Simulation results show that SST-V achieves better video reconstruction performance, while significantly reducing the transmission overhead.

Key words: scalable coding, semantic communication, video transmission

LIU Chenyao, GUO Jiejie, ZHANG Yimeng, XU Wenjun, LIU Yiming. SST-V: A Scalable Semantic Transmission Framework for Video[J]. ZTE Communications, 2023, 21(2): 70-79.

Figures/Tables 9

Figure 1 H.264 coding framework[12]

Figure 2 Semantic communication systems [17]

Figure 3 Framework of scalable semantic transmission for video

Figure 4 Framework of semantic feature extraction module

Figure 5 Semantic importance estimation (SIE) module structure

Figure 6 Scalable multi-level joint semantic-channel (S-JSC) coding architecture based on cascade learning

Figure 7 Reconstruction performance of the schemes with or without SIE at different coding levels

Figure 8 Examples of the reconstructed frames of the schemes with or without semantic importance estimation (SIE)

Table 1 PSNR and MS-SSIM of different schemes

Number	Scheme	PSNR	MS-SSIM/dB
1	Without SIE, fixed Level 1 (L1)	23.937 6	6.704 7
2	Without SIE, fixed Level 2 (L2)	27.307 8	8.743 6
3	With SIE, fixed Level 1 (SIE-L1)	26.099 3	7.454 1
4	With SIE, fixed Level 2 (SIE-L2)	28.900 34	9.754 9
5	Scalable multilevel coding without SIE	29.935 9	11.682 3
6 (SST-V)	Scalable multilevel coding with SIE	31.190 78	14.349 9

References 43

1	TU Y, CHEN W. A deep learning-based semantic communication system [J]. Mobile communications, 2021, 45(4): 91-94. DOI: 10.3969/j.issn.1006-1010.2021.04.015 DOI
2	CISCO. 2020 global networking trends report [EB/OL]. (2019-11-17) [2023-04-01].
3	WARREN W, SHANNON C E. Recent contributions to the mathematical theory of communication [EB/OL]. [2023-02-01].
4	MORRIS C W. Foundations of the theory of signs [M]. Chicago, USA: The University of Chicago Press, 1938
5	XIE H Q, QIN Z J, LI G Y, et al. Deep learning enabled semantic communication systems [J]. IEEE transactions on signal processing, 2021, 69: 2663–2675. DOI: 10.1109/TSP.2021.3071210 DOI
6	WEI H, XU W J, WANG F Y, et al. SemAudio: semantic-aware streaming communications for real-time audio transmission [C]//IEEE Global Communications Conference. IEEE, 2022: 3965–3970. DOI: 10.1109/GLOBECOM48099.2022.10001043 DOI
7	XU W J, ZHANG Y M, WANG F Y, et al. Semantic communication for the Internet of vehicles: a multiuser cooperative approach [J]. IEEE vehicular technology magazine, 2023, 18(1): 100–109. DOI: 10.1109/MVT.2022.3227723 DOI
8	LU G, OUYANG W L, XU D, et al. DVC: an end-to-end deep video compression framework [C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 10998–11007. DOI: 10.1109/CVPR.2019.01126 DOI
9	WANG S X, DAI J C, LIANG Z J, et al. Wireless deep video semantic transmission [J]. IEEE journal on selected areas in communications, 2023, 41(1): 214–229. DOI: 10.1109/JSAC.2022.3221977 DOI
10	TUNG T Y, GÜNDÜZ D. DeepWiVe: deep-learning-aided wireless video transmission [J]. IEEE journal on selected areas in communications, 2022, 40(9): 2570–2583. DOI: 10.1109/JSAC.2022.3191354 DOI
11	HUANG B W, YAN X, ZHOU J J, et al. CSMCNet: scalable video compressive sensing reconstruction with interpretable motion estimation [EB/OL]. (2021-08-03) [2023-02-01]. arXiv: 2108.01522.
12	WIEGAND T, SULLIVAN G J, BJONTEGAARD G, et al. Overview of the H.264/AVC video coding standard [J]. IEEE transactions on circuits and systems for video technology, 2003, 13(7): 560–576. DOI: 10.1109/TCSVT.2003.815165 DOI
13	CARNAP R, BAR-HILLEL Y. An outline of a theory of semantic information [EB/OL]. [2023-02-01].
14	BAR-HILLEL Y, CARNAP R. Semantic information [J]. The British journal for the philosophy of science, 1953, 4(14): 147–157. DOI: 10.1093/bjps/iv.14.147 DOI
15	FLORIDI L. Outline of a theory of strongly semantic information [J].Minds and machines, 2004, 14(2): 197–221. DOI: 10.1023/B: MIND.0000021684.50925.c9 DOI
16	KOLCHINSKY A, WOLPERT D H. Semantic information, autonomous agency and non-equilibrium statistical physics [J]. Interface focus, 2018, 8(6): 20180041. DOI: 10.1098/rsfs.2018.0041 DOI
17	ZHANG P, XU W J, GAO H, et al. Toward wisdom-evolutionary and primitive-concise 6G: a new paradigm of semantic communication networks [J]. Engineering, 2022, 8: 60–73. DOI: 10.1016/j.eng.2021.11.003 DOI
18	ZHONG Y X. A theory of semantic information [J]. China communications, 2017, 14(1): 1–17. DOI: 10.1109/CC.2017.7839754 DOI
19	RAO M, FARSAD N, GOLDSMITH A. Variable length joint source-channel coding of text using deep neural networks [C]//IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, 2018: 1–5. DOI: 10.1109/SPAWC.2018.8445924 DOI
20	BOURTSOULATZE E, BURTH KURKA D, GÜNDÜZ D. Deep joint source-channel coding for wireless image transmission [J]. IEEE transactions on cognitive communications and networking, 2019, 5(3): 567–579. DOI: 10.1109/TCCN.2019.2919300 DOI
21	KURKA D B, GÜNDÜZ D. DeepJSCC-f: deep joint source-channel coding of images with feedback [J]. IEEE journal on selected areas in information theory, 2020, 1(1): 178–193. DOI: 10.1109/JSAIT.2020.2987203 DOI
22	JALALPOUR Y, WANG L Y, FENG W C, et al. FID: frame interpolation and DCT-based video compression [C]//IEEE International Symposium on Multimedia (ISM). IEEE, 2021: 218–221. DOI: 10.1109/ISM.2020.00045 DOI
23	CHEN J W, HO C M. MM-ViT: multi-modal video transformer for compressed video action recognition [C]//IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2022: 786–797. DOI: 10.1109/WACV51458.2022.00086 DOI
24	LIN J P, LIU D, LI H Q, et al. M-LVC: multiple frames prediction for learned video compression [C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 3543–3551. DOI: 10.1109/CVPR42600.2020.00360 DOI
25	LI J, LI B, LU Y. Deep contextual video compression [EB/OL]. [2023-04-01].
26	LIU C, SUN H M, ZENG X Y, et al. Learned video compression with residual prediction and feature-aided loop filter [C]//IEEE International Conference on Image Processing (ICIP). IEEE, 2022: 1321–1325. DOI: 10.1109/ICIP46576.2022.9897989 DOI
27	ZHANG S P, MRAK M, HERRANZ L, et al. DVC-P: deep video compression with perceptual optimizations [C]//Proceedings of 2021 International Conference on Visual Communications and Image Processing (VCIP). IEEE, 2022: 1–5. DOI: 10.1109/VCIP53242.2021.9675350 DOI
28	YANG R, TIMOFTE R, VAN GOOL L. Advancing learned video compression with In-loop frame prediction [J]. IEEE transactions on circuits and systems for video technology, 2023, 33(5): 2410–2423. DOI: 10.1109/TCSVT.2022.3222418 DOI
29	HUANG D L, GAO F F, TAO X M, et al. Toward semantic communications: deep learning-based image semantic coding [J]. IEEE journal on selected areas in communications, 2023, 41(1): 55–71. DOI: 10.1109/JSAC.2022.3221999 DOI
30	DUAN Y P, LI M Z, WEN L J, et al. From object-attribute-relation semantic representation to video generation: a multiple variational autoencoder approach [C]//Proceedings of 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2022: 1–6. DOI: 10.1109/MLSP55214.2022.9943394 DOI
31	JIANG P W, WEN C K, JIN S, et al. Wireless semantic communications for video conferencing [J]. IEEE journal on selected areas in communications, 2023, 41(1): 230–244. DOI: 10.1109/JSAC.2022.3221968 DOI
32	CHEN B, WANG Z, LI B, et al. Interactive face video coding: a generative compression framework [EB/OL]. [2023-02-20].
33	CUI L Z, SU D Y, YANG S, et al. TCLiVi: transmission control in live video streaming based on deep reinforcement learning [J]. IEEE transactions on Multimedia, 2020, 23: 651-663. DOI: 10.1109/TMM.2020.2985631 DOI
34	ELGAMAL T, SHI S, GUPTA V, et al. SiEVE: semantically encoded video analytics on edge and cloud [C]//Proceedings of 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2021: 1383–1388. DOI: 10.1109/ICDCS47774.2020.00182 DOI
35	WANG Y Q, XU J C, JI W. A feature-based video transmission framework for visual IoT in fog computing systems [C]//Proceedings of 2019 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS). IEEE, 2019: 1–8. DOI: 10.1109/ANCS.2019.8901872 DOI
36	YANG R, MENTZER F, VAN GOOL L, et al. Learning for video compression with hierarchical quality and recurrent enhancement [C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 6627–6636. DOI: 10.1109/CVPR42600.2020.00666 DOI
37	ZHANG B, QIN Z, LI Y. Semantic communications with variable-length coding for extended reality [EB/OL]. [2023-03-11]. .
38	RAPPAPORT T S. Wireless communications: principles and practice [M]. Upper Saddle River, USA: Prentice Hall PTR, 1996
39	RANJAN A, BLACK M J. Optical flow estimation using a spatial pyramid network [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2720–2729. DOI: 10.1109/CVPR.2017.291 DOI
40	MARQUEZ E S, HARE J S, NIRANJAN M. Deep cascade learning [J]. IEEE transactions on neural networks and learning systems, 2018, 29(11): 5475–5485. DOI: 10.1109/TNNLS.2018.2805098 DOI
41	XUE T F, CHEN B A, WU J J, et al. Video enhancement with task-oriented flow [J]. International journal of computer vision, 2019, 127(8): 1106–1125. DOI: 10.1007/s11263-018-01144-2 DOI
42	WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity [J]. IEEE transactions on image processing, 2004, 13(4): 600–612. DOI: 10.1109/TIP.2003.819861 DOI
43	WANG Z, SIMONCELLI E P, BOVIK A C. Multiscale structural similarity for image quality assessment [C]//The 37th Asilomar Conference on Signals, Systems & Computers. IEEE, 2004: 1398–1402. DOI: 10.1109/ACSSC.2003.1292216 DOI

[1]	XIONG Yuhui, LIU Zhilong, XU Lingmin, HUA Xinhai, WANG Zhaoyang, BI Ting, JIANG Tao. Adaptive Hybrid Forward Error Correction Coding Scheme for Video Transmission [J]. ZTE Communications, 2024, 22(2): 85-93.
[2]	DENG Letian, ZHAO Yanru. Deep Learning-Based Semantic Feature Extraction: A Literature Review and Future Directions [J]. ZTE Communications, 2023, 21(2): 11-17.

SST-V: A Scalable Semantic Transmission Framework for Video

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 43

Related Articles 2

Recommended Articles

Metrics