ZTE Communications ›› 2013, Vol. 11 ›› Issue (2): 51-54.doi: DOI:10.3969/j.issn.1673-5188.2013.02.008

• Research Paper • Previous Articles     Next Articles

Spam Filtering: Online Naive Bayes Based on TONE

Guanglu Sun1, Hongyue Sun2, Yingcai Ma3, and Yuewu Shen3   

  1. 1. Research Institute of Information Technology, Tsinghua University, Beijing 100084, China;
    2. ZTE Corporation, Shenzhen 518057, China;
    3. School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
  • Received:2013-01-04 Online:2013-06-25 Published:2013-06-25
  • About author:Guanglu Sun (guanglu.sun@gmail.com) received his B.S, MSc, and PhD degrees in computer science from Harbin Institute of Technology. He is a professor and deputy dean of the School of Computer Science and Technology, HUST. He is also the director of the university’s Information Security and Intelligent Technology Lab. Before joining HUST, he was a postdoctoral researcher at the computer department of Tsinghua University. Dr. Sun’s research interests include computer networks, information security, and machine learning. He has co-edited two books and published more than 30 papers. Dr. Sun is a senior member of the China Computer Federation, a member of the IEEE Computer Society, and a member of the ACM. He is also a member of the IEEE Technical Committee on Globecom and the IEEE Technical Committee on ICC.

    Hongyue Sun (sun.hongyue@zte.com.cn) received his MSc degree in electrical engineer from University of Science and Technology of China (USTC). He is the director of the ZTE’s deep packet inspect and analysis team. His research interests include 3G/4G, information security, big data and machine learning. He has applied for 5 patentes.

    Yingcai Ma received his BS degree in computer science from HUST, China. His research interests include spam filtering and machine learning.

    Yuewu Shen received his BS degree in computer science from Southeast China University. He received his MSc degree in computer science from Harbin University of Science and Technology and later joined Baidu as an R&D engineer. Mr. Shen’s research interests include spam filtering, machine learning, and feature engineering.
  • Supported by:
    This work is supported by National Natural Science Foundation of China under Grant NO. 60903083, Research fund for the doctoral program of higher education of China under Grant NO.20092303120005,and the Research Fund of ZTE Corporation.

Abstract: The naive Bayes (NB) model has been successfully used to tackle spam, and is very accurate. However, there is still room for improvement. We use a train on or near error (TONE) method in online NB to enhance the performance of NB and reduce the number of training emails. We conducted an experiment to determine the performance of the improved algorithm by plotting (1-ROCA)% curves. The results show that the proposed method improves the performance of original NB.

Key words: spam filtering, online naive Bayes, train-on or near error