Search Result

Select

MBGM: A Graph-Mining Tool Based on MapReduce and BSP

Zhenjiang Dong, Lixia Liu, Bin Wu, and Yang Liu

ZTE Communications 2014, 12 (4): 16-22. DOI: DOI:10.3969/j.issn.1673-5188.2014.04.003

Abstract （57）

PDF （393KB）（76）

Save

This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) computing model. The tool is named Mapreduce and BSP based Graph-mining tool (MBGM). The core of this mining system are four sets of parallel graph-mining algorithms programmed in the BSP parallel model and one set of data extraction-transformation-loading (ETL) algorithms implemented in MapReduce. To invoke these algorithm sets, we designed a workflow engine which optimized for cloud computing. Finally, a well-designed data management function enables users to view, delete and input data in the Hadoop distributed file system (HDFS). Experiments on artificial data show that the components of graph-mining algorithm in MBGM are efficient.

Related Articles | Metrics

Select

Guest Editorial: Improving Performance of Cloud Computing and Big Data Technologies and Applications

Zhenjiang Dong

ZTE Communications 2014, 12 (4): 1-2.

Abstract （74）

PDF （236KB）（90）

Save

Cloud computing technology is changing the development and usage patterns of IT infrastructure and applications. Virtualized and distributed systems as well as unified management and scheduling has greatly improved computing and storage. Management has become easier, and OAM costs have been significantly reduced. Cloud desktop technology is developing rapidly. With this technology, users can flexibly and dynamically use virtual machine resources, companies’efficiency of using and allocating resources is greatly improved, and information security is ensured. In most existing virtual cloud desktop solutions, computing and storage are bound together, and data is stored as image files. This limits the flexibility and expandability of systems and is insufficient for meeting customers’requirements in different scenarios.
In this era of big data, the annual growth rate of the data in social networks, mobile communication, e-commerce, and the Internet of Things is more than 50%. More than 80% of this data is unstructured. Therefore, it is imperative to develop an effective method for storing and managing big data and querying and analyzing big data in real time or quasi real. HBase is a distributed data storage system operating in the Hadoop environment. HBase provides a highly expandable method and platform for big data storage and management. However, it supports only primary key indexing but does not support non-primary key indexing. As a result, the data query efficiency of HBase is low and data cannot be queried in real time or quasi real time. For HBase operating in Hadoop, the capability of querying data according to non-primary keys is the most important and urgent.
The graph data structure is suitable for most big data created in social networks. Graph data is more complex and difficult to understand than traditional linked-list data or tree data, so quick and easy processing and understanding of graph data is of great significance and has become a hot topic in the industry.
Big data has a high proportion of video and image data but most of the video and image data is not utilized. Creating value with this data has been a research focus in the industry. For example, the traditional face localization and identification technology is a local optimal solution that has a large room for improvement in accuracy.
This special issue of ZTE Communications embodies the industry’s efforts on performance improvement of cloud computing and big data technologies and applications. We invited four peer-reviewed papers based on projects supported by ZTE Industry-Academic-Research Cooperation Funds.
Hancong Duang et al . propose a disk mapping solution integrated with the virtual desktop technology in“A New Virtual Disk Mapping Method for the Cloud Desktop Storage Client.”The virtual disk driver has a user-friendly mode for accessing desktop data and has a flexible cache space management mechanism. The file system filter driver intelligently checks I/O requests of upper applications and synchronizes file access requests to users’cloud storage services. Experimental results show that the read-write performance of our virtual disk mapping method with customizable local cache storage is almost same as that of the local hard disk.
“HMIBase: An Hierarchical Indexing System for Storing and Querying Big Data,”by Shengmei Luoet al ., presents the design and implementation of a complete hierarchical indexing and query system called HMIBase. This system efficiently queries a value or values within a range according to non-primary key attributes. This system has good expandability. Test results based on 10 million to 1 billion data records show that regardless of whether the number of query results is large or small, HMIBase can respond to cold and hot queries one to four levels faster than standard HBase and five to twenty times faster than the open-source Hindex system.
In“MBGM: A Graph - Mining Tool Based on MapReduce and BSP,”Zhenjiang Dong et al. propose a MapReduce and BSP-based Graph Mining (MBGM) tool. This tool uses the BSP model-based parallel graph mining algorithm and the MapReduce-based extraction-transformation-loading (ETL) algorithm, and an optimized workflow engine for cloud computing is designed for the tool. Experiments show that graph mining algorithm components, including PageRank, K - means, InDegree Count, and Closeness Centrality, in the MBGM tool has higher performance than the corresponding algorithm components of the BC-PDM and BC-BSP.
Bofei Wang et al . in“Facial Landmark Localization by Gibbs Sampling,”present an optimized solution of the face localization technology based on key points. Instead of the traditional gradient descent algorithm, this solution uses the Gibbs sampling algorithm, which is easy to converge and can implement the global optimal solution for face localization based on key points. In this way, the local optimal solution is avoided. The posterior probability function used by the Gibbs sampling algorithm comprises the prior probability function and the likelihood function. The prior probability function is assumed to follow the Gaussian distribution and learn according to features after dimension reduction. The likelihood function is obtained through the local linear SVM algorithm. The LFW data has been used in the system for tests. The test results show that the accuracy of face localization is high.
I would like to thank all the authors for their contributions and all the reviewers who helped improve the quality of the papers.

Related Articles | Metrics

Select

Mobile Internet WebRTC and Related Technologies

Zhenjiang Dong, Congbing Li, Wei Wang, Da Lyu

ZTE Communications 2014, 12 (1): 46-51. DOI: 10.3939/j.issn.1673-5188.2014.01.007

Abstract （77）

PDF （353KB）（107）

Save

This paper describes an improved design for WebRTC technology. With this design, WebRTC communication at client side, server side, and between these two sides is improved. HTML5 WebSocket, media negotiation and synthesis, network address translator (NAT)/firewall traversal, Session Initiation Protocol (SIP) signaling interaction, and P2P communication security are all used in this improved design. This solution solves cross- browser running problem of WebRTC applications, reduces reliance on client- side processing capability, and reduces bandwidth consumption. With this design, WebRTC also become more scalable.

Related Articles | Metrics

Select

A Parallel Platform for Web Text Mining

Ping Lu, Zhenjiang Dong, Shengmei Luo, Lixia Liu, Shanshan Guan, Shengyu Liu, and Qingcai Chen

ZTE Communications 2013, 11 (3): 56-61. DOI: DOI:10.3969/j.issn.1673-5188.2013.03.010

Abstract （56）

PDF （411KB）（61）

Save

With user-generated content, anyone can be a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is becoming harder to efficiently obtain required information. In this paper, we describe how natural language processing and text mining can be parallelized using Hadoop and Message Passing Interface. We propose a parallel web text mining platform that processes massive amounts of data quickly and efficiently. Our web knowledge service platform is designed to collect information about the IT and telecommunications industries from the web and process this information using natural language processing and data-mining techniques.

Related Articles | Metrics

Select

A Hadoop Performance Prediction Model Based on Random Forest

Zhendong Bei, Zhibin Yu, Huiling Zhang, Chengzhong Xu, Shenzhong Feng, Zhenjiang Dong, and Hengsheng Zhang

ZTE Communications 2013, 11 (2): 38-44. DOI: DOI:10.3969/j.issn.1673-5188.2013.02.006

Abstract （78）

PDF （455KB）（177）

Save

MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently developed machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system’s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.

Related Articles | Metrics

Select

Android Apps: Static Analysis Based on Permission Classification

Zhenjiang Dong, Hui Ye, Yan Wu, Shaoyin Cheng, and Fan Jiang

ZTE Communications 2013, 11 (1): 62-66.

Abstract （70）

PDF （343KB）（76）

Save

Android has a strict permission management mechanism. Any applications that try to run on the Android system need to obtain permission. In this paper, we propose an efficient method of detecting malicious applications in the Android system. First, hundreds of permissions are classified into different groups. The application programming interfaces (APIs) associated with permissions that can interact with the outside environment are called sink functions. The APIs associated with other permissions are called taint functions. e construct association tables for block variables and function variables of each application. Malicious applications can then be detected by using the static taint-propagation method to analyze these tables.

Related Articles | Metrics