Log Anomaly Detection Through GPT-2 for Large Scale Systems

doi:10.12142/ZTECOM.202303010

ZTE Communications ›› 2023, Vol. 21 ›› Issue (3): 70-76.DOI: 10.12142/ZTECOM.202303010

• Research Papers • Previous Articles Next Articles

Log Anomaly Detection Through GPT-2 for Large Scale Systems

JI Yuhe¹, HAN Jing²(), ZHAO Yongxin¹, ZHANG Shenglin¹, GONG Zican²

^1.Nankai University, Tianjin 300071, China
^2.ZTE Corporation, Shenzhen 518057, China

Received:2022-12-08 Online:2023-09-21 Published:2023-03-22
About author:JI Yuhe received his bachelor’s degree in software engineering from the College of Software, Nankai University, China in 2022. He is now pursuing his master’s degree at the School of Software, Nankai University. His research interests include anomaly detection and natural language processing.|HAN Jing (han.jing28@zte.com.cn) received her master’s degree from Nanjing University of Aeronautics and Astronautics, China. She has been with ZTE Corporation since 2000. She had been engaged in 3G/4G key technologies, from 2000 to 2016, and has become a technical director responsible for intelligent operation of cloud platforms and wireless networks since 2016. Her research interests include machine learning, data mining, and signal processing.|ZHAO Yongxin received her bachelor’s degree in software engineering from Nankai University, China in 2021. She is currently pursuing her master’s degree at the School of Software, Nankai University. Her research interests include anomaly detection and failure diagnosis.|ZHANG Shenglin received his BS degree in network engineering from the School of Computer Science and Technology, Xidian University, China in 2012 and PhD degree in computer science from Tsinghua University, China in 2017. He is currently an associate professor with the College of Software, Nankai University, China. His current research interests include failure detection, diagnosis and prediction for service management. He is an IEEE Member.|GONG Zican received his master’s degree in professional computing and artificial intelligence from the Australian National University in 2019. He has been a machine learning engineer in ZTE Corporation since 2020. His research interests include machine learning, professional computing and system architecture.

Abstract

Abstract:

As the scale of software systems expands, maintaining their stable operation has become an extraordinary challenge. System logs are semi-structured text generated by the recording function in the source code and have important research significance in software service anomaly detection. Existing log anomaly detection methods mainly focus on the statistical characteristics of logs, making it difficult to distinguish the semantic differences between normal and abnormal logs, and performing poorly on real-world industrial log data. In this paper, we propose an unsupervised framework for log anomaly detection based on generative pre-training-2 (GPT-2). We apply our approach to two industrial systems. The experimental results on two datasets show that our approach outperforms state-of-the-art approaches for log anomaly detection.

Key words: hybrid beamforming, hybrid architecture, weighted mean square error, manifold optimization, dynamic subarrays

JI Yuhe, HAN Jing, ZHAO Yongxin, ZHANG Shenglin, GONG Zican. Log Anomaly Detection Through GPT-2 for Large Scale Systems[J]. ZTE Communications, 2023, 21(3): 70-76.

Figures/Tables 8

References 18

1	ZHANG S L, LIU Y, PEI D, et al. Rapid and robust impact assessment of software changes in large Internet-based services [C]//The 11th ACM Conference on Emerging Networking Experiments and Technologies. ACM, 2015: 1–13. DOI: 10.1145/2716281.2836087 DOI URL
2	ZHU J M, HE S L, LIU J Y, et al. Tools and benchmarks for automated log parsing [C]//IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2019: 121–130. DOI: 10.1109/ICSE-SEIP.2019.00021 DOI URL
3	DU M, LI F F, ZHENG G N, et al. DeepLog: anomaly detection and diagnosis from system logs through deep learning [C]//ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017: 1285–1298. DOI: 10.1145/3133956.3134015 DOI URL
4	MENG W B, LIU Y, ZHU Y C, et al. LogAnomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs[C]//The 28th International Joint Conference on Artificial Intelligence. ACM, 2019, 19(7): 4739–4745
5	ZHANG X, XU Y, LIN Q W, et al. Robust log-based anomaly detection on unstable log data [C]//The 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 2019: 807–817. DOI: 10.1145/3338906.3338931 DOI URL
6	EKELHART A, EKAPUTRA F J, KIESLING E. The SLOGERT framework for automated log knowledge graph construction [C]//European Semantic Web Conference. ESWC, 2021: 631–646. DOI: 10.1007/978-3-030-77385-4_38 DOI URL
7	GUO H X, YUAN S H, WU X T. LogBERT: log anomaly detection via BERT [C]//Proceedings of 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021: 1–8. DOI: 10.1109/IJCNN52387.2021.9534113 DOI URL
8	LE V H, ZHANG H Y. Log-based anomaly detection without log parsing [C]//The 36th IEEE/ACM International Conference on Automated Software Engineering. ACM, 2021: 492–504. DOI: 10.1109/ASE51524.2021.9678773 DOI URL
9	HE S L, HE P J, CHEN Z B, et al. A survey on automated log analysis for reliability engineering [J]. ACM computing surveys, 54(6): 1–37. DOI: 10.1145/3460345 DOI URL
10	RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners [EB/OL]. [2023-03-10].
11	HE P J, ZHU J M, ZHENG Z B, et al. Drain: an online log parsing approach with fixed depth tree [C]//IEEE International Conference on Web Services (ICWS). IEEE, 2017: 33–40. DOI: 10.1109/ICWS.2017.13 DOI URL
12	REIMERS N, GUREVYCH I. Sentence-BERT: sentence embeddings using Siamese BERT-networks [C]//Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2019: 3982–3992. DOI: 10.18653/v1/d19-1410 DOI URL
13	HASHEMI S, MÄNTYLÄ M. OneLog: towards end-to-end training in software log anomaly detection [EB/OL]. [2022-12-12].
14	CHEN R, ZHANG S L, LI D W, et al. LogTransfer: cross-system log anomaly detection for software systems with transfer learning [C]//IEEE 31st International Symposium on Software Reliability Engineering. IEEE, 2020: 37–47. DOI: 10.1109/ISSRE5003.2020.00013 DOI URL
15	HUANG S H, LIU Y, FUNG C, et al. HitAnomaly: Hierarchical transformers for anomaly detection in system log [J]. IEEE transactions on network and service management, 2020, 17(4): 2064–2076. DOI: 10.1109/TNSM.2020.3034647 DOI URL
16	YANG H T, ZHAO X, SUN D G, et al. Sprelog: log-based anomaly detection with self-matching networks and pre-trained models [C]//International Conference on Service-Oriented Computing. 2021: 736–743
17	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017: 6000–6010. DOI: 10.5555/3295222.3295349 DOI URL
18	LE V H, ZHANG H Y. Log-based anomaly detection with deep learning: how far are we? [C]//IEEE/ACM 44th International Conference on Software Engineering (ICSE). IEEE, 2022: 1356–1367

Templates	Euclidean Distance
httprequest except <> permission denied httprequest except <> <*> permission denied	- 0.147 629 340 284 133 4
httprequest except <*> no such file or directory	0.595 852 332 701 891 4
httprequest except <*>	0.621 201 472 867 456 3
httprequest except EoF occurred in violation of protocol	0.838 852 193 154 771 3
httprequest except <*> connection reset by peer	0.880 359 580 380 884 6

Templates	Euclidean Distance
httprequest except <> permission denied httprequest except <> <*> permission denied	- 0.147 629 340 284 133 4
httprequest except <*> no such file or directory	0.595 852 332 701 891 4
httprequest except <*>	0.621 201 472 867 456 3
httprequest except EoF occurred in violation of protocol	0.838 852 193 154 771 3
httprequest except <*> connection reset by peer	0.880 359 580 380 884 6

Dataset	Training Data	Number of Templates	Test Dataset
Dataset	Training Data	Number of Templates	Normal	Anomalous
Ada	6 626 865	599	7 911 944	2 648
Bob	7 021 577	84	1 067 850	904

Dataset	Training Data	Number of Templates	Test Dataset
Dataset	Training Data	Number of Templates	Normal	Anomalous
Ada	6 626 865	599	7 911 944	2 648
Bob	7 021 577	84	1 067 850	904

Approach	Ada			Bob
Approach	Precision	Recall	F1S	Precision	Recall	F1S
LogAnomaly	0.394	0.190	0.256	0.353	0.332	0.342
NeuralLog	0.297	0.354	0.323	0.638	0.872	0.736
Our method	0.738	1.00	0.850	0.857	1.00	0.923

Log Anomaly Detection Through GPT-2 for Large Scale Systems

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 18

Related Articles 2

Recommended Articles

Metrics

[1]	TANG Yuanqi, ZHANG Huimin, ZHENG Zheng, LI Ping, ZHU Yu. Hybrid Architecture and Beamforming Optimization for Millimeter Wave Systems [J]. ZTE Communications, 2023, 21(3): 93-104.
[2]	Shuangfeng Han, Chih-Lin I, Zhikun Xu, Qi Sun, Haibin Li. Energy-Efficient Large-Scale Antenna Systems with Hybrid Digital-Analog Beamforming Structure [J]. ZTE Communications, 2015, 13(1): 28-34.

Approach	Ada			Bob
Approach	Precision	Recall	F1S	Precision	Recall	F1S
OM w/o SV & AS	0.128	0.835	0.222	0.510	0.940	0.661
OM w/o AS	0.427	1.00	0.598	0.718	1	0.836
OM w/o SV	0.627	0.807	0.705	0.833	0.940	0.883
OM	0.738	1.00	0.850	0.857	1.00	0.923

Approach	Ada			Bob
Approach	Precision	Recall	F1S	Precision	Recall	F1S
OM w/o SV & AS	0.128	0.835	0.222	0.510	0.940	0.661
OM w/o AS	0.427	1.00	0.598	0.718	1	0.836
OM w/o SV	0.627	0.807	0.705	0.833	0.940	0.883
OM	0.738	1.00	0.850	0.857	1.00	0.923