ZTE Communications ›› 2019, Vol. 17 ›› Issue (2): 26-37.DOI: 10.12142/ZTECOM.201902005

• Special Topic • Previous Articles     Next Articles

Potential Off-Grid User Prediction System Based on Spark

LI Xuebing1,3, SUN Ying1,2, ZHUANG Fuzhen1,2, HE Jia1,2, ZHANG Zhao1,2, ZHU Shijun4, HE Qing1,2   

  1. 1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. College of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China
    4. ZTE Corporation, Shenzhen, Guangdong 518057, China
  • Received:2018-03-13 Online:2019-06-11 Published:2019-11-14
  • About author:LI Xuebing received the M.S. degree from the College of Information Science and Engineering, Yanshan University, China. His research interests include machine learning and data mining. He is currently a recommended system engineer at Baidu|SUN Ying is a master candidate at the Institute of Computing Technology, Chinese Academy of Sciences, China. She received the B.S. degree from Beijing Institute of Technology, China in 2017. Her research interests include machine learning and data mining. She has published two research papers in SIGKDD|ZHUANG Fuzhen (zhuangfuzhen@ict.ac.cn) received the B.S. degree in computer science from Chongqing University, China in 2006, and the Ph.D. degree in computer software and theory from the University of Chinese Academy of Sciences, China in 2011. He is currently an associate professor at the Institute of Computing Technology, Chinese Academy of Sciences, China. His research interests include machine learning, data mining, transfer learning, multi-task learning and recommendation systems. He has published around 60 papers in various journals and conferences, such as IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Cybernetics, IEEE Transactions on Neural Network and Learning System, KDD, IJCAI, AAAI, ICDE, and WWW|HE Jia is a Ph.D. candidate at the Institute of Computing Technology, Chinese Academy of Sciences, China. Her research interests include machine learning, Bayesian nonparametric learning, and multi-view learning. She has published several papers in some relevant research conferences, such as IJCAI, ICDM, ECML, and CIKM|ZHANG Zhao is a Ph.D. candidate in the Institute of Computing Technology, Chinese Academy of Sciences, China. He received the B.S. degree from Beijing Institute of Technology, China in 2015. His research interests include machine learning, data mining, and relational learning. He has published several papers in some relevant research conferences and journals, such as EMNLP, CIKM, and information systems|ZHU Shijun received the B.E. degree in management science and engineering from University of Science and Technology of China (USTC) in 2003. Working with the Wireless Big Data R&D Center of ZTE Corporation, he is responsible for the development of smart optimization and planning sytem of wireless networks|HE Qing is a professor at the Institute of Computing Technology, Chinese Academy of Science (CAS), and he is also a professor at University of Chinese Academy of Sciences, China. He received the B.S. degree from Hebei Normal University, China in 1985, and the M.S. degree from Zhengzhou University, China in 1987, both in mathematics. He received the Ph.D. degree in fuzzy mathematics and artificial intelligence in 2000 from Beijing Normal University, China. He was with Hebei University of Science and Technology from 1987 to 1997. He is currently a doctoral tutor at the Institute of Computing and Technology, CAS. His interests include data mining, machine learning, classification, and fuzzy clustering
  • Supported by:
    This work is supported by ZTE Industry-Academia-Research Cooperation, the National Key Research and Development Program of China under Grant No(2017YFB1002104);The National Natural Science Foundation of China under Grant Nos(U1836206);The National Natural Science Foundation of China under Grant Nos(U1811461);The National Natural Science Foundation of China under Grant Nos(61773361);The Project of Youth Innovation Promotion Association CAS under Grant No(2017146)

Abstract:

With the increasingly fierce competition among communication operators, it is more and more important to make an accurate prediction of potential off-grid users. To solve the above problem, it is inevitable to consider the effectiveness of learning algorithms, the efficiency of data processing, and other factors. Therefore, in this paper, we, from the practical application point of view, propose a potential customer off-grid prediction system based on Spark, including data pre-processing, feature selection, model building, and effective display. Furthermore, in the research of off-grid system, we use the Spark parallel framework to improve the gcForest algorithm which is a novel decision tree ensemble approach. The new parallel gcForest algorithm can be used to solve practical problems, such as the off-grid prediction problem. Experiments on two real-world datasets demonstrate that the proposed prediction system can handle large-scale data for the off-grid user prediction problem and the proposed parallel gcForest can achieve satisfying performance.

Key words: data mining, off-grid prediction, Spark, parallel computing, deep forest