ZTE Communications ›› 2021, Vol. 19 ›› Issue (4): 90-97.DOI: 10.12142/ZTECOM.202104010

• Research Paper • Previous Articles     Next Articles

Payload Encoding Representation from Transformer for Encrypted Traffic Classification

HE Hongye(), YANG Zhiguo, CHEN Xiangning   

  1. ZTE Corporation, Shenzhen 518057, China
  • Online:2021-12-25 Published:2022-01-04
  • About author:HE Hongye (he.hongye@zte.com.cn) received his M.S. degree from Central South University, China in 2018. He is currently an algorithm engineer working with ZTE Corporation. His research interests include artificial intelligence and network traffic identification.|YANG Zhiguo received his M.S. degree from Hunan University, China in 2015. He is a senior software engineer at ZTE Corporation. His current research interests include Internet traffic identification and network security.|CHEN Xiangning received his bachelor’s degree in communication engineering from Hunan University, China in 2004. He is a software engineer at ZTE Corporation. His research interests include big data technology and AI applications.

Abstract:

Traffic identification becomes more important, yet more challenging as related encryption techniques are rapidly developing nowadays. Unlike recent deep learning methods that apply image processing to solve such encrypted traffic problems, in this paper, we propose a method named Payload Encoding Representation from Transformer (PERT) to perform automatic traffic feature extraction using a state-of-the-art dynamic word embedding technique. By implementing traffic classification experiments on a public encrypted traffic data set and our captured Android HTTPS traffic, we prove the proposed method can achieve an obvious better effectiveness than other compared baselines. To the best of our knowledge, this is the first time the encrypted traffic classification with the dynamic word embedding has been addressed.

Key words: traffic identification, encrypted traffic classification, natural language processing, deep learning, dynamic word embedding