Key Technologies for AI-Driven Network Traffic Classification Workflow and Data Distribution Shift

doi:10.12142/ZTECOM.202601006

ZTE Communications ›› 2026, Vol. 24 ›› Issue (1): 34-44.DOI: 10.12142/ZTECOM.202601006

• Special Topic • Previous Articles Next Articles

Key Technologies for AI-Driven Network Traffic Classification Workflow and Data Distribution Shift

Zhao Jianchao¹^,, Geng Zhaosen¹^,, Li Zeyi², Wang Pan³()

^1.Cable Products Business Department, ZTE Corporation, Shenzhen 518057, China
^2.School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
^3.School of Modern Posts, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
the Cable Products Business Department, ZTE Corporation

Received:2025-01-11 Online:2026-03-25 Published:2026-03-17
About author:Zhao Jianchao is with the Cable Products Business Department, ZTE Corporation. He is engaged in the development and delivery of wireline and cable network solutions, with a focus on product implementation, service integration, and system deployment. His work involves coordinating product requirements with network operations and supporting the deployment of large-scale wireline network services. His professional interests include wireline network solutions, service delivery optimization, and operational support for carrier networks.
Geng Zhaosen is with the Cable Products Business Department, ZTE Corporation. He is a member of the FM Product Team, focusing on the planning, operation, and management of wireline and cable network products. His work involves network service deployment, traffic management, and operational optimization in large-scale wireline networks. His research interests include wireline product planning, network operations, and data-driven network management.
Li Zeyi is currently pursuing his PhD degree in cyberspace security at Nanjing University of Posts and Telecommunications, China. His research interests include network security, anomaly detection, and deep packet inspection.
Wang Pan (wangpan@njupt.edu.cn) received his BS, MS, and PhD degrees in electrical and computer engineering from Nanjing University of Posts and Telecommunications, China in 2001, 2004, and 2013, respectively, where he is currently a full professor. His research interests include AI-powered networking and security in B5G, 6G, IoT, Smart Grid, CFN, and AI-enabled big data analysis. From 2017 to 2018, he was a visiting scholar at the Department of Electrical and Computer Engineering, University of Dayton, USA. He served as a TPC member of IEEE CyberSciTech Congress. He is also a reviewer for several journals, such as IEEE Transaction on Network and Service Management, IEEE Internet of Things Journal, Computer and Security, and Big Data Research.
Supported by:
ZTE Industry?University?Institute Cooperation Funds(HC?CN?20220607009)

Abstract

Abstract:

With the evolution of next-generation network technologies, the complexity of network management has significantly increased, and the means of network attacks are diversified, bringing new challenges to network traffic classification. This paper presents a general AI-driven network traffic classification workflow and elaborates on a traffic data and feature engineering framework. Most importantly, it analyzes the concept and causes of data distribution shifts in network traffic, proposing detection methods and countermeasures. Experimental results on real traffic collected at different time intervals show that application evolution can induce data distribution shifts, which in turn lead to a noticeable degradation in traffic classification performance. Comparative drift detection experiments further confirm that such shifts are more evident over long-term intervals, while short-term traffic remains relatively stable. These findings demonstrate the necessity of incorporating drift-aware mechanisms into AI-driven network traffic classification systems.

Key words: traffic classification, traffic identification, deep learning, data distribution shift, concept shifting

Zhao Jianchao, Geng Zhaosen, Li Zeyi, Wang Pan. Key Technologies for AI-Driven Network Traffic Classification Workflow and Data Distribution Shift[J]. ZTE Communications, 2026, 24(1): 34-44.

Figures/Tables 10

Table 1 Comparison between academia and industry in AI-TC

	Academia	Industry
Classification requirements	Modeling on training datasets for optimal performance	Comprehensive study of training/classification costs, efficiency, continuous operation, credibility, etc.
End-to-end TC	Focus on the modeling process before the TC model is deployed	Increased focus on monitoring optimization of TC models after deployment
Training data	Static/obsolete, well labeled, noise known	Continuous/changing, no/wrong labeling, noise unknown
Costs	Mostly unconcerned	Great concern
Data distribution shift	Mostly unconcerned	Great concern
Continuous learning	Mostly unconcerned	Great concern
Interpretability	Mostly unconcerned	Consideration
Computational complexity	Focus more on training fast	More concerned with reasoning fast

Figure 1 Generic end-to-end AI-TC workflow

Figure 2 AI-driven network traffic data and feature engineering

Table 2 A typical collection of network traffic features (partial)

Flow Feature	Category	Description	Feature Calculation Method
Flow 5-tuple	Flow index	src/sp/dst/dp/protocol	Serialized regular preprocessing
TCP slide window	TCP window	TCP flow control parameters	Serialized regular preprocessing
TLS handshake packet information	TLS fingerprint	Handshake types, cipher suites, content types, key length, etc	Serialized regular preprocessing
Packet length sequence	Packet-related	A sequence of packet lengths in the stream. It may contain upstream, downstream, and bidirectional sequences as needed.	Packet length variance (max/min/ave/std)
Packet arrival times	Packet-related	A time sequence of packet arrivals in a diversion. Upstream, downstream and bidirectional sequences may be included as needed.	Packet time variance (max/min/ave/std)
Flow length related	Flow-related	Total number of flow bytes per unit of time, which may include upstream, downstream, and bidirectional as needed.	Multi-flow length variance (max/min/ave/std)
Flow duration	Flow-related	TCP flow duration UDP flow duration can be increased if NP resources are sufficient.	Multi-flow duration variance (max/min/ave/std)

Figure 3 Network traffic distribution shift detection based on a two-sample test

Table 3 Dataset and the corresponding number of streams

App	Flow
QQmusic (Music)	39 465
LOL:Wild Rift (Game)	19 841
Naruto (Game)	27 240
Zhihu (Sociality)	16 643
Bilibili (Video)	20 014
Teamfight Tactics (Game)	23 718
IQiyi (Video)	36 740
Tiktok (Video)	16 640
Honor of Kings (Game)	42 734
Background (Log)	30 000

Figure 4 Confusion matrices of network traffic classification results at two different time points

Table 4 Classification results of network traffic at different times

	Traffic at Time Point $T 1$				Traffic at Time Point $T 2$
	precision	recall	f1-score	support	precision	recall	f1-score	support
QQ Music	0.802	0.826	0.814	7 978	0.829	0.787	0.807	7 978
Background	0.716	0.698	0.707	6 106	0.709	0.698	0.704	6 106
Bilibili	0.984	0.951	0.968	3 979	0.985	0.950	0.967	3 979
Tiktok	0.736	0.638	0.683	3 291	0.749	0.656	0.700	3 291
Naruto	0.732	0.880	0.799	5 472	0.688	0.897	0.779	5 472
IQiyi	0.859	0.871	0.865	7 300	0.861	0.843	0.852	7 300
Honor of Kings	0.865	0.812	0.838	8 653	0.841	0.819	0.830	8 653
Zhihu	0.800	0.791	0.795	3 250	0.841	0.767	0.802	3 250
LOL: Wild Rift	0.842	0.897	0.869	3 977	0.817	0.904	0.859	3 977
Teamfight Tactics	0.988	0.899	0.942	4 601	0.984	0.901	0.941	4 601
Accuracy	0.828	0.828	0.828	0.828	0.822	0.822	0.822	0.822
Macro avg	0.832	0.826	0.828	54 607	0.830	0.822	0.824	54 607
Weighted avg	0.831	0.828	0.828	54 607	0.827	0.822	0.822	54 607

Table 4 Classification results of network traffic at different times

	Traffic at Time Point $T 1$				Traffic at Time Point $T 2$
	precision	recall	f1-score	support	precision	recall	f1-score	support
QQ Music	0.802	0.826	0.814	7 978	0.829	0.787	0.807	7 978
Background	0.716	0.698	0.707	6 106	0.709	0.698	0.704	6 106
Bilibili	0.984	0.951	0.968	3 979	0.985	0.950	0.967	3 979
Tiktok	0.736	0.638	0.683	3 291	0.749	0.656	0.700	3 291
Naruto	0.732	0.880	0.799	5 472	0.688	0.897	0.779	5 472
IQiyi	0.859	0.871	0.865	7 300	0.861	0.843	0.852	7 300
Honor of Kings	0.865	0.812	0.838	8 653	0.841	0.819	0.830	8 653
Zhihu	0.800	0.791	0.795	3 250	0.841	0.767	0.802	3 250
LOL: Wild Rift	0.842	0.897	0.869	3 977	0.817	0.904	0.859	3 977
Teamfight Tactics	0.988	0.899	0.942	4 601	0.984	0.901	0.941	4 601
Accuracy	0.828	0.828	0.828	0.828	0.822	0.822	0.822	0.822
Macro avg	0.832	0.826	0.828	54 607	0.830	0.822	0.824	54 607
Weighted avg	0.831	0.828	0.828	54 607	0.827	0.822	0.822	54 607

Table 5 Experimental results of data drift in response to different time periods

Data	Methods	Drift Occurs	Distance (unitless)	Execution Time/s
One-month interval	Kolmogorov-Smirnov	√	/	0.062
	Maximum mean discrepancy	√	0.192 245 841	1.385
	Chi-Squared	√	/	0.232
	Cramérvon Mises	√	/	0.030
	Least-squares density difference	√	0.239 282 097	0.398
	Spot-the-diff	×	0.551 293 545	1.293
	Mixed-type tabular data	√	/	0.069
One-day interval	Kolmogorov-Smirnov	×	/	0.064
	Maximum mean discrepancy	×	0.000 526 399	1.449
	Chi-Squared	×	/	0.145
	Cramérvon Mises	×	/	0.025
	Least-squares density difference	√	0.003 137 094	0.391
	Spot-the-diff	×	0.054 495 389	1.286
	Mixed-type tabular data	×	/	0.069

Figure 6 Visualisation of network traffic characteristics for different time intervals

References 16

[1]	Wang H Z, Liu J W. Research status and key technologies of network endogenous security [J]. ZTE technology journal, 2022, 167(6): 2–11. DOI: 10.12142/ZTETJ.202206002
[2]	Lu H, Chen Y, Lou D. 5G/5G-Advanced/6G access network security technology evolution and endogenous security [J]. ZTE technology journal, 2022, 167(6): 85–94. DOI: 10.12142/ZTETJ.202206014
[3]	Rezaei S, Liu X. Deep learning for encrypted traffic classification: an overview [J]. IEEE communications magazine, 2019, 57(5): 76–81. DOI: 10.1109/MCOM.2019.1800819
[4]	Wang P, Chen X J, Ye F, et al. A survey of techniques for mobile service encrypted traffic classification using deep learning [J]. IEEE access, 2019, 7: 54024–54033. DOI: 10.1109/ACCESS.2019.2912787
[5]	Aceto G, Ciuonzo D, Montieri A, et al. Mobile encrypted traffic classification using deep learning: experimental evaluation, lessons learned, and challenges [J]. IEEE transactions on network and service management, 2019, 16(2): 445–458. DOI: 10.1109/TNSM.2019.2899085
[6]	Aceto G, Ciuonzo D, Montieri A, et al. Mobile encrypted traffic classification using deep learning [C]//Proceedings of Network Traffic Measurement and Analysis Conference (TMA). IEEE, 2018: 1–8. DOI: 10.23919/TMA.2018.8506563
[7]	Wang P, Ye F, Chen X J, et al. Datanet: deep learning based encrypted network traffic classification in SDN home gateway [J]. IEEE access, 2018, 6: 55380–55391. DOI: 10.1109/ACCESS.2018.2872430
[8]	Wang P, Li S H, Ye F, et al. PacketCGAN: exploratory study of class imbalance for encrypted traffic classification using CGAN [C]//International Conference on Communications (ICC). IEEE, 2020: 1–7. DOI: 10.1109/icc40277.2020.9148946
[9]	Wang P, Wang Z X, Ye F, et al. ByteSGAN: a semi-supervised generative adversarial network for encrypted traffic classification in SDN Edge Gateway [J]. Computer networks, 2021, 200: 108535. DOI: 10.1016/j.comnet.2021.108535
[10]	Wang Z X, Wang P, Zhou X K, et al. FLOWGAN: unbalanced network encrypted traffic identification method based on GAN [C]//IEEE International Conference on Big Data and Cloud Computing (BdCloud). IEEE, 2019: 975–983. DOI: 10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00141
[11]	Wang Y, Wang P, Wang Z X, et al. Evaluation of feature selection on network traffic classification [C]//IEEE International Conference on Big Data and Cloud Computing (BdCloud). IEEE, 2021: 813–818. DOI: 10.1109/dasc-picom-cbdcom-cyberscitech52372.2021.00135
[12]	Lipton Z, Wang Y X, Smola A. Detecting and correcting for label shift with black box predictors [C]//International conference on machine learning. PMLR, 2018: 3122–3130. DOI: 10.48550/arXiv.1802.03646
[13]	Gretton A, Borgwardt K M, Rasch M J, et al. A kernel two-sample test [J]. The Journal of machine learning research, 2012, 13(1): 723–773. DOI: 10.5555/2188385.2188410
[14]	Liu F, Xu W, Lu J, et al. Learning deep kernels for non-parametric two-sample tests [C]//International conference on machine learning. PMLR, 2020: 6316–6326. DOI: 10.48550/arXiv.2002.09116
[15]	Zhang K, Schölkopf B, Muandet K, et al. Domain adaptation under target and conditional shift [C]//International conference on machine learning. PMLR, 2013: 819–827. DOI: 10.5555/3042817.3043028
[16]	Zhao H, Des Combes R T, Zhang K, et al. On learning invariant representations for domain adaptation [C]//International conference on machine learning. PMLR, 2019: 7523–7532. DOI: 10.48550/arXiv.1905.12013

Key Technologies for AI-Driven Network Traffic Classification Workflow and Data Distribution Shift

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 16

Related Articles 15

Recommended Articles

Metrics

[1]	LIU Yichen, GAO Ruixin, ZENG Chen, LIU Yingzhuang. A Transformer-Based End-to-End Receiver Design for Wi-Fi 7 Physical Layer [J]. ZTE Communications, 2025, 23(4): 27-36.
[2]	ZHANG Yang, CEN Zihan, ZHAN Wen, CHEN Xiang. C-WAN for FTTR: Enabling Low-Overhead Joint Transmission with Deep Learning [J]. ZTE Communications, 2025, 23(4): 65-76.
[3]	HE Shuai, LIU Limin, WANG Zhanli, LI Jinliang, MAO Xiaojun, MING Anlong. M+MNet: A Mixed-Precision Multibranch Network for Image Aesthetics Assessment [J]. ZTE Communications, 2025, 23(3): 96-110.
[4]	AI Bo, ZHANG Yuxin, YANG Mi, HE Ruisi, GUO Rongge. A Machine Learning-Based Channel Data Enhancement Platform for Digital Twin Channels [J]. ZTE Communications, 2025, 23(2): 20-30.
[5]	CHENG Jiaming, CHEN Wei, LI Lun, AI Bo. Efficient Spatio-Temporal Predictive Learning for Massive MIMO CSI Prediction [J]. ZTE Communications, 2025, 23(1): 3-10.
[6]	WANG Chongchong, LI Yao, WANG Beibei, CAO Hong, ZHANG Yanyong. Point Cloud Processing Methods for 3D Point Cloud Detection Tasks [J]. ZTE Communications, 2023, 21(4): 38-46.
[7]	GONG Panyin, ZHANG Guidong, ZHANG Zhigang, CHEN Xiao, DING Xuan. Research on Fall Detection System Based on Commercial Wi-Fi Devices [J]. ZTE Communications, 2023, 21(4): 60-68.
[8]	DENG Letian, ZHAO Yanru. Deep Learning-Based Semantic Feature Extraction: A Literature Review and Future Directions [J]. ZTE Communications, 2023, 21(2): 11-17.
[9]	LU Ping, SHENG Bin, SHI Wenzhe. Scene Visual Perception and AR Navigation Applications [J]. ZTE Communications, 2023, 21(1): 81-88.
[10]	FAN Guotian, WANG Zhibin. Intelligent Antenna Attitude Parameters Measurement Based on Deep Learning SSD Model [J]. ZTE Communications, 2022, 20(S1): 36-43.
[11]	GAO Zhengguang, LI Lun, WU Hao, TU Xuezhen, HAN Bingtao. A Unified Deep Learning Method for CSI Feedback in Massive MIMO Systems [J]. ZTE Communications, 2022, 20(4): 110-115.
[12]	ZHANG Jintao, HE Zhenqing, RUI Hua, XU Xiaojing. Spectrum Sensing for OFDMA Using Multicarrier Covariance Matrix Aware CNN [J]. ZTE Communications, 2022, 20(3): 61-69.
[13]	HE Hongye, YANG Zhiguo, CHEN Xiangning. Payload Encoding Representation from Transformer for Encrypted Traffic Classification [J]. ZTE Communications, 2021, 19(4): 90-97.
[14]	XUE Songyan, LI Ang, WANG Jinfei, YI Na, MA Yi, Rahim TAFAZOLLI, Terence DODGSON. To Learn or Not to Learn:Deep Learning Assisted Wireless Modem Design [J]. ZTE Communications, 2019, 17(4): 3-11.
[15]	ZHENG Xiaoqing, LU Yaping, PENG Haoyuan, FENG Jiangtao, ZHOU Yi, JIANG Min, MA Li, ZHANG Ji, JI Jie. Detecting Abnormal Start-Ups, Unusual Resource Consumptions of the Smart Phone: A Deep Learning Approach [J]. ZTE Communications, 2019, 17(2): 38-43.