Payload Encoding Representation from Transformer for Encrypted Traffic Classification

doi:10.12142/ZTECOM.202104010

Abstract

Abstract:

Traffic identification becomes more important, yet more challenging as related encryption techniques are rapidly developing nowadays. Unlike recent deep learning methods that apply image processing to solve such encrypted traffic problems, in this paper, we propose a method named Payload Encoding Representation from Transformer (PERT) to perform automatic traffic feature extraction using a state-of-the-art dynamic word embedding technique. By implementing traffic classification experiments on a public encrypted traffic data set and our captured Android HTTPS traffic, we prove the proposed method can achieve an obvious better effectiveness than other compared baselines. To the best of our knowledge, this is the first time the encrypted traffic classification with the dynamic word embedding has been addressed.

Key words: traffic identification, encrypted traffic classification, natural language processing, deep learning, dynamic word embedding

HE Hongye, YANG Zhiguo, CHEN Xiangning. Payload Encoding Representation from Transformer for Encrypted Traffic Classification[J]. ZTE Communications, 2021, 19(4): 90-97.

Figures/Tables 10

References 16

1	VELAN P, CERMAK M, CELEDA P, et al. A survey of methods for encrypted traffic classification and analysis [J]. International journal of network management, 2015, 25(5): 355–374. DOI: 10.1002/nem.1901 DOI
2	REZAEI S, LIU X. Deep learning for encrypted traffic classification: an overview [J]. IEEE communications magazine, 2019, 57(5): 76–81. DOI: 10.1109/MCOM.2019.1800819 DOI
3	DEVLIN J, CHANG M-W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA: Association for Computational Linguistics, 2019: 4171–4186. DOI: 10.18653/v1/N19-1423 DOI
4	JAVAID A, NIYAZ Q, SUN W Q, et al. A deep learning approach for network intrusion detection system [C]//Proceedings of the 9th EAI International Conference on Bio-Inspired Information and Communications Technologies. Brussels, Belgium: ICST, 2016: 21–26. DOI: 10.4108/eai.3-12-2015.2262516 DOI
5	HOCHST J, BAUMGARTNER L, HOLLICK M, et al. Unsupervised traffic flow classification using a neural autoencoder [C]//42nd Conference on Local Computer Networks (LCN). Singapore, Singapore: IEEE, 2017: 523–526. DOI: 10.1109/LCN.2017.57 DOI
6	REZAEI S, LIU X. How to achieve high classification accuracy with just a few labels: a semi-supervised approach using sampled packets [EB/OL]. (2020-05-16)[2020-06-01].
7	WANG W, ZHU M, WANG J J, et al. End-to-end encrypted traffic classification with one-dimensional convolution neural networks [C]//IEEE International Conference on Intelligence and Security Informatics (ISI). Beijing, China: IEEE, 2017: 43–48. DOI: 10.1109/ISI.2017.8004872 DOI
8	LOTFOLLAHI M, SIAVOSHANI M J, ZADE R S H, et al. Deep packet: a novel approach for encrypted traffic classification using deep learning [J]. Soft computing, 2020, 24: 1999–2012. DOI: 10.1007/s00500-019-04030-2 DOI
9	LOPEZ-MARTIN M, CARRO B, SANCHEZ-ESGUEVILLAS A, et al. Network traffic classifier with convolutional and recurrent neural networks for internet of things [J]. IEEE access, 2017, 5: 18042–18050. DOI: 10.1109/ACCESS.2017.2747560 DOI
10	WANG W, SHENG Y Q, WANG J L, et al. HAST-IDS: learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection [J]. IEEE access, 2017, 6: 1792–1806. DOI: 10.1109/ACCESS.2017.2780250 DOI
11	MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [C]//International Conference on Learning Representation. Scottsdale, USA: ICLR, 2013
12	PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations [EB/OL]. (2018-03-22)[2020-06-01].
13	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. (2018-03-22)[2020-06-01].
14	BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model [J]. The journal of machine learning research, 2000, 3: 1137–1155
15	LAN Z Z, CHEN M D, GOODMAN S, et al. ALBERT: a lite BERT for self-supervised learning of language representations [EB/OL]. (2020-02-09)[2020-06-01].
16	DRAPER-GIL G, LASHKARI A H, MAMUN M S I, et al. Characterization of encrypted and VPN traffic using time-related features [C]//2nd International Conference on Information Systems Security and Privacy (ICISSP). Rome, Italy: INSTICC, 2016

Parameter	Value	Description
hidden_size	768	Vector size of the encoding outputs (embedding vectors)
num_hidden_layers	12	Number of encoders used in the encoding network
num_attention_heads	12	Number of attention heads used in the multi-head attention mechanism
intermediate_size	3 072	Size of the hidden vectors in FNN networks
input_length	128	Amount of tokenized bigrams used in a single packet

Parameter	Value	Description
hidden_size	768	Vector size of the encoding outputs (embedding vectors)
num_hidden_layers	12	Number of encoders used in the encoding network
num_attention_heads	12	Number of attention heads used in the multi-head attention mechanism
intermediate_size	3 072	Size of the hidden vectors in FNN networks
input_length	128	Amount of tokenized bigrams used in a single packet

Model	Precision	Recall	F1
ML-1^[16]	0.819 4	0.813 6	0.816 4
ML-2	0.890 1	0.889 6	0.889 8
CNN-1D^[7]	0.861 6	0.860 5	0.861 0
CNN-2D^[7]	0.842 5	0.842 0	0.842 2
HAST-I^[10]	0.875 7	0.872 9	0.874 2
HAST-II^[10]	0.850 2	0.842 7	0.840 9
PERT	0.932 7	0.932 2	0.932 3

Model	Precision	Recall	F1
ML-1^[16]	0.819 4	0.813 6	0.816 4
ML-2	0.890 1	0.889 6	0.889 8
CNN-1D^[7]	0.861 6	0.860 5	0.861 0
CNN-2D^[7]	0.842 5	0.842 0	0.842 2
HAST-I^[10]	0.875 7	0.872 9	0.874 2
HAST-II^[10]	0.850 2	0.842 7	0.840 9
PERT	0.932 7	0.932 2	0.932 3

Model	Precision	Recall	F1
ML-1^[16]	/	/	/
ML-2	0.735 1	0.733 5	0.732 1
CNN-1D^[7]	0.770 9	0.768 3	0.766 8
CNN-2D^[7]	0.768 4	0.765 9	0.764 3
HAST-I^[10]	0.820 1	0.818 5	0.816 7
HAST-II^[10]	0.792 4	0.781 3	0.782 6
PERT	0.904 2	0.900 3	0.900 7