ZTE Communications ›› 2022, Vol. 20 ›› Issue (3): 77-84.DOI: 10.12142/ZTECOM.202203010

• Research Paper • Previous Articles     Next Articles

Alarm-Based Root Cause Analysis Based on Weighted Fault Propagation Topology for Distributed Information Network

LYU Xiaomeng1(), CHEN Hao1, WU Zhenyu1, HAN Junhua2, GUO Huifeng2   

  1. 1.Engineering Research Center for Information Networks, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2.ZTE Corporation, Shenzhen 518057, China
  • Received:2021-10-31 Online:2022-09-13 Published:2022-09-14
  • About author:LYU Xiaomeng (lvxiaomeng@bupt.edu.cn) is studying for her master’s degree at Beijing University of Posts and Telecommunications (BUPT), China and received her bachelor’s degree in information and communication engineering from BUPT in 2019. Her main research interests include fault prediction and fault diagnosis. She has published one paper in disks fault prediction and two patents in the AIOps.|CHEN Hao is studying for his master’s degree at Beijing University of Posts and Telecommunications (BUPT), China and received his bachelor’s degree in 2020 at the Faculty of the Information and Communication Engineering, BUPT. His main research interests are fault recognition and prediction.|WU Zhenyu received his BS and PhD degrees from Beijing University of Posts and Telecommunications (BUPT), China in 2008 and 2013. He is currently an associate professor of School of Information and Communication Engineering at BUPT. His research interests include AIOps, intelligent fault diagnostics, machine learning and prognostics and health management (PHM) technology.|HAN Junhua received his master’s degree from Graduate School of the Chinese Academy of Sciences (now University of the Chinese Academy of Sciences) in 2005. He is currently an engineer of ZTE Corporation. His research interests include intelligent operation and maintenance, intelligent fault diagnosis, knowledge graph and graph neural network.|GUO Huifeng received her master’s degree from Huazhong University of Science and Technology (HUST), China. She is currently an engineer of ZTE Corporation. Her research interests include intelligent network and fault management.
  • Supported by:
    ZTE Industry-University-Institute Cooperation Funds(HC-CN-20201120009)

Abstract:

A distributed information network with complex network structure always has a challenge of locating fault root causes. In this paper, we propose a novel root cause analysis (RCA) method by random walk on the weighted fault propagation graph. Different from other RCA methods, it mines effective features information related to root causes from offline alarms. Combined with the information, online alarms and graph relationship of network structure are used to construct a weighted graph. Thus, this approach does not require operational experience and can be widely applied in different distributed networks. The proposed method can be used in multiple fault location cases. The experiment results show the proposed approach achieves much better performance with 6% higher precision at least for root fault location, compared with three baseline methods. Besides, we explain how the optimal parameter’s value in the random walk algorithm influences RCA results.

Key words: distributed information network, alarm, graph, root cause analysis, random walk