ZTE Communications ›› 2013, Vol. 11 ›› Issue (3): 56-61.DOI: DOI:10.3969/j.issn.1673-5188.2013.03.010

• Research Paper • Previous Articles    

A Parallel Platform for Web Text Mining

Ping Lu1, Zhenjiang Dong1, Shengmei Luo1, Lixia Liu1, Shanshan Guan2, Shengyu Liu2, and Qingcai Chen2   

  1. 1. ZTE Corporation, Nanjing 210000, China;
    2. Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518055, China
  • Received:2012-12-19 Online:2013-09-25 Published:2013-09-25
  • About author:Ping Lu is the president of Communication Services R&D Institute for Cloud Computing and IT Operation, ZTE Corporation. He received his MS degree from Southeast University in 1996. His research interests include cloud computing, internet of things, home networking, multimedia networking, and mobile networking.

    Zhenjiang Dong is the vice president of the Communication Services R&D Institute for Cloud Computing and IT Operation, ZTE Corporation. He received his MS degree from Harbin Institute of Technology in 1996. His research interests include cloud computing, multimedia networking, and mobile networking.

    Shengmei Luo graduated from Harbin Institute of Technology in 1996 and has been involved in telecommunication network and service development for many hears. He is currently the chief architect at ZTE Corporation and a professor at Nanjing University of Post and Telecommunications. He has been awarded a prize for scientific and technological progress and is the holder of many patents. He has published a number of academic papers in core communication journals. He is the member of the China Cloud Computing Committee and has rich experience in ICT domains.

    Lixia Liu is a senior engineer in the pre-research department at ZTE. She received her MS degree from Ocean University of China in 2008. Her research interests include natural language processing, text mining, data mining, machine learning, mathematical statistics, and cloud computing.

    Shanshan Guan is a graduate student in the Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen Graduate School). He received his BS degree in 2011 from Harbin Institute of Technology at Weihai. His research interests include text mining and machine learning.

    Shengyu Liu is a PhD student in the Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen Graduate School). He received his MS degree in 2011 from Harbin Institute of Technology. His research interests include text mining, natural language processing, information extraction, and machine learning.

    Qingcai Chen is a professor and PhD supervisor in the Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen Graduate School). He received his PhD degree in 2003 from Harbin Institute of Technology. His research interests include machine learning, pattern recognition, natural language processing, information retrieval and speech processing. He has published about 50 papers in renowned academic journals and conferences proceedings. He is the member of the IEEE Systems, Man and Cybernetics Society and a reviewer for IEEE Transactions on Systems, Man and Cybernetics.

A Parallel Platform for Web Text Mining

Ping Lu1, Zhenjiang Dong1, Shengmei Luo1, Lixia Liu1, Shanshan Guan2, Shengyu Liu2, and Qingcai Chen2   

  1. 1. ZTE Corporation, Nanjing 210000, China;
    2. Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518055, China
  • 作者简介:Ping Lu is the president of Communication Services R&D Institute for Cloud Computing and IT Operation, ZTE Corporation. He received his MS degree from Southeast University in 1996. His research interests include cloud computing, internet of things, home networking, multimedia networking, and mobile networking.

    Zhenjiang Dong is the vice president of the Communication Services R&D Institute for Cloud Computing and IT Operation, ZTE Corporation. He received his MS degree from Harbin Institute of Technology in 1996. His research interests include cloud computing, multimedia networking, and mobile networking.

    Shengmei Luo graduated from Harbin Institute of Technology in 1996 and has been involved in telecommunication network and service development for many hears. He is currently the chief architect at ZTE Corporation and a professor at Nanjing University of Post and Telecommunications. He has been awarded a prize for scientific and technological progress and is the holder of many patents. He has published a number of academic papers in core communication journals. He is the member of the China Cloud Computing Committee and has rich experience in ICT domains.

    Lixia Liu is a senior engineer in the pre-research department at ZTE. She received her MS degree from Ocean University of China in 2008. Her research interests include natural language processing, text mining, data mining, machine learning, mathematical statistics, and cloud computing.

    Shanshan Guan is a graduate student in the Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen Graduate School). He received his BS degree in 2011 from Harbin Institute of Technology at Weihai. His research interests include text mining and machine learning.

    Shengyu Liu is a PhD student in the Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen Graduate School). He received his MS degree in 2011 from Harbin Institute of Technology. His research interests include text mining, natural language processing, information extraction, and machine learning.

    Qingcai Chen is a professor and PhD supervisor in the Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen Graduate School). He received his PhD degree in 2003 from Harbin Institute of Technology. His research interests include machine learning, pattern recognition, natural language processing, information retrieval and speech processing. He has published about 50 papers in renowned academic journals and conferences proceedings. He is the member of the IEEE Systems, Man and Cybernetics Society and a reviewer for IEEE Transactions on Systems, Man and Cybernetics.

Abstract: With user-generated content, anyone can be a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is becoming harder to efficiently obtain required information. In this paper, we describe how natural language processing and text mining can be parallelized using Hadoop and Message Passing Interface. We propose a parallel web text mining platform that processes massive amounts of data quickly and efficiently. Our web knowledge service platform is designed to collect information about the IT and telecommunications industries from the web and process this information using natural language processing and data-mining techniques.

Key words: natural language processing, text mining, massive data, parallel, web knowledge service

摘要: With user-generated content, anyone can be a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is becoming harder to efficiently obtain required information. In this paper, we describe how natural language processing and text mining can be parallelized using Hadoop and Message Passing Interface. We propose a parallel web text mining platform that processes massive amounts of data quickly and efficiently. Our web knowledge service platform is designed to collect information about the IT and telecommunications industries from the web and process this information using natural language processing and data-mining techniques.

关键词: natural language processing, text mining, massive data, parallel, web knowledge service