SPBD: Streamlining Big-Data Processing in Cloud Environments

doi:DOI:10.3969/j.issn.1673-5188.2013.02.005

ZTE Communications ›› 2013, Vol. 11 ›› Issue (2): 30-37.DOI: DOI:10.3969/j.issn.1673-5188.2013.02.005

• Special Topic • Previous Articles Next Articles

SPBD: Streamlining Big-Data Processing in Cloud Environments

Tung Nguyen, Jingwen Zhang, and Weisong Shi

Department of Computer Science, Wayne State University, Detroit, MI 48202, USA

Received:2013-04-30 Online:2013-06-25 Published:2013-06-25
About author:Tung Nguyen (tnguyen@i-a-i.com) is a research scientist at Intelligent Automation Inc. He plays a key role in many projects on data-intensive distributed processing, cloud computing, and massive data analysis using topological features. Dr. Nguyen received his PhD degree from Wayne State University in 2012. He received his BS and MS degrees in computer science and engineering from Ho Chi Minh City University of Technology, Vietnam, in 2001 and 2006. His research interests include green computing, cloud computing, data-intensive computing, and application of cloud computing to life sciences. He has published several papers on computer science and bioinformatics and has been published in the proceedings of OSDI and in NPC, SUSCOM, and BMC Frontiers Genetics journals. He has also been a peer reviewer at many conferences, including Euro-Par and CollaborateCom. His homepage is http://www.cs.wayne.edu/tung/.

Jingwen Zhang (jingwen.zhang@wayne.edu) received her BS degree in computer science from Xidian University, China. She is currently a PhD student in the Department of Computer Science, Wayne State Uni?versity. Her research interests include cloud computing and big-data analysis.

Weisong Shi (weisong@wayne.edu) is an associate professor of computer science at Wayne State University. He received his BS degree in computer engineering from Xidian University in 1995. He received his PhD degree in computer engineering from the Chinese Academy of Sciences in 2000. His research interests include computer systems, mobile computing, and cloud computing. Dr. Shi has published 120 peer-reviewed journal and conference papers and has an H-index of 24. He has been the program chair and technical program committee member of numerous international conferences, including WWW and ICDCS. In 2002, he received the NSF CAREER award for outstanding PhD dissertation (China). In 2009, he received the Career Development Chair Award of Wayne State University. He has also won "Best Paper Award" at ICWE’04, IPDPS’05, HPCChina’12, and IISWC’12.

SPBD: Streamlining Big-Data Processing in Cloud Environments

Tung Nguyen, Jingwen Zhang, and Weisong Shi

Department of Computer Science, Wayne State University, Detroit, MI 48202, USA

作者简介:Tung Nguyen (tnguyen@i-a-i.com) is a research scientist at Intelligent Automation Inc. He plays a key role in many projects on data-intensive distributed processing, cloud computing, and massive data analysis using topological features. Dr. Nguyen received his PhD degree from Wayne State University in 2012. He received his BS and MS degrees in computer science and engineering from Ho Chi Minh City University of Technology, Vietnam, in 2001 and 2006. His research interests include green computing, cloud computing, data-intensive computing, and application of cloud computing to life sciences. He has published several papers on computer science and bioinformatics and has been published in the proceedings of OSDI and in NPC, SUSCOM, and BMC Frontiers Genetics journals. He has also been a peer reviewer at many conferences, including Euro-Par and CollaborateCom. His homepage is http://www.cs.wayne.edu/tung/.

Jingwen Zhang (jingwen.zhang@wayne.edu) received her BS degree in computer science from Xidian University, China. She is currently a PhD student in the Department of Computer Science, Wayne State Uni?versity. Her research interests include cloud computing and big-data analysis.

Weisong Shi (weisong@wayne.edu) is an associate professor of computer science at Wayne State University. He received his BS degree in computer engineering from Xidian University in 1995. He received his PhD degree in computer engineering from the Chinese Academy of Sciences in 2000. His research interests include computer systems, mobile computing, and cloud computing. Dr. Shi has published 120 peer-reviewed journal and conference papers and has an H-index of 24. He has been the program chair and technical program committee member of numerous international conferences, including WWW and ICDCS. In 2002, he received the NSF CAREER award for outstanding PhD dissertation (China). In 2009, he received the Career Development Chair Award of Wayne State University. He has also won "Best Paper Award" at ICWE’04, IPDPS’05, HPCChina’12, and IISWC’12.

Abstract

Abstract: Many applications, such as those in genomics, are designed for one machine. This is not problematic if the input data set is small and can fit into the memory of a single powerful machine. However, the application and its algorithms are limited by the capacity and performance of the machine (the application cannot run in parallel). A single machine cannot handle very large data sets. In recent research, cloud computing and MapReduce have been used together to store and process big data. There are three main steps in handling data in the cloud: 1) the user uploads the data, 2) the data is processed, and 3) results are returned. When the size of the data reaches a certain scale, transmission time becomes the dominant factor; however, most research to date has only been focused on reducing the processing time. Also, it is generally assumed that the data is already stored in the cloud. This assumption does not hold because many organizations now store their data locally. In this paper, we propose SPBD (pronounced“speed”) to minimize overall user wait time. We abstract overall processing time as an optimization problem and derive the optimal solution. When evaluated on our private cloud platform, SPBD is shown to reduce user wait time by up to 34% for a traditional WordCount application and up to 31% for a metagenomic application.

Key words: bigdata, genomics, NGS, MapReduce, cloud

摘要： Many applications, such as those in genomics, are designed for one machine. This is not problematic if the input data set is small and can fit into the memory of a single powerful machine. However, the application and its algorithms are limited by the capacity and performance of the machine (the application cannot run in parallel). A single machine cannot handle very large data sets. In recent research, cloud computing and MapReduce have been used together to store and process big data. There are three main steps in handling data in the cloud: 1) the user uploads the data, 2) the data is processed, and 3) results are returned. When the size of the data reaches a certain scale, transmission time becomes the dominant factor; however, most research to date has only been focused on reducing the processing time. Also, it is generally assumed that the data is already stored in the cloud. This assumption does not hold because many organizations now store their data locally. In this paper, we propose SPBD (pronounced“speed”) to minimize overall user wait time. We abstract overall processing time as an optimization problem and derive the optimal solution. When evaluated on our private cloud platform, SPBD is shown to reduce user wait time by up to 34% for a traditional WordCount application and up to 31% for a metagenomic application.

关键词: bigdata, genomics, NGS, MapReduce, cloud

Tung Nguyen, Jingwen Zhang, and Weisong Shi. SPBD: Streamlining Big-Data Processing in Cloud Environments[J]. ZTE Communications, 2013, 11(2): 30-37.

[1]	HU Jin, LIU Xu, ZHU Songlin, ZHUANG Yudi, WU Yuejun, XIA Xiang, HE Zuyuan. Waveguide Bragg Grating for Fault Localization in PON [J]. ZTE Communications, 2024, 22(2): 94-98.
[2]	ZHOU Yingjie, ZHANG Zicheng, SUN Wei, MIN Xiongkuo, ZHAI Guangtao. Perceptual Quality Assessment for Point Clouds : A Survey [J]. ZTE Communications, 2023, 21(4): 3-16.
[3]	ZHANG Huiran, DONG Zhen, WANG Mingsheng. Spatio-Temporal Context-Guided Algorithm for Lossless Point Cloud Geometry Compression [J]. ZTE Communications, 2023, 21(4): 17-28.
[4]	YIN Qian, ZHANG Xinfeng, HUANG Hongyue, WANG Shanshe, MA Siwei. Lossy Point Cloud Attribute Compression with Subnode-Based Prediction [J]. ZTE Communications, 2023, 21(4): 29-37.
[5]	WANG Chongchong, LI Yao, WANG Beibei, CAO Hong, ZHANG Yanyong. Point Cloud Processing Methods for 3D Point Cloud Detection Tasks [J]. ZTE Communications, 2023, 21(4): 38-46.
[6]	YIN Yujie, CHEN Zhang. Perceptual Optimization for Point-Based Point Cloud Rendering [J]. ZTE Communications, 2023, 21(4): 47-53.
[7]	CUI Ziqi, WANG Gongpu, WANG Zhigang, AI Bo, XIAO Huahua. Symbiotic Radio Systems: Detection and Performance Analysis [J]. ZTE Communications, 2022, 20(3): 93-98.
[8]	YANG Bo, MITANI Tomohiko, SHINOHARA Naoki, ZHANG Huaiqing. High-Power Simultaneous Wireless Information and Power Transfer: Injection-Locked Magnetron Technology [J]. ZTE Communications, 2022, 20(2): 3-12.
[9]	LI Xiuxian, LI Zhetao, OUYANG Yan, DUAN Haohua, XIANG Liyao. Using UAV to Detect Truth for Clean Data Collection in Sensor‑Cloud Systems [J]. ZTE Communications, 2021, 19(3): 30-45.
[10]	YANG Han, CHEN Xu, ZHOU Zhi. Super Resolution Sensing Technique for Distributed Resource Monitoring on Edge Clouds [J]. ZTE Communications, 2021, 19(3): 73-80.
[11]	ZHAO Kongyange, GAO Bin, ZHOU Zhi. Cost-Effective Task Scheduling for Collaborative Cross-Edge Analytics [J]. ZTE Communications, 2021, 19(2): 11-19.
[12]	TAN Jie, SHA Xiubin, DAI Bo, LU Ting. Analysis of Industrial Internet of Things and Digital Twins [J]. ZTE Communications, 2021, 19(2): 53-60.
[13]	LIANG Junrui, LI Xin, YANG Hailiang. Kinetic Energy Harvesting Toward Battery-Free IoT: Fundamentals, Co-Design Necessity and Prospects [J]. ZTE Communications, 2021, 19(1): 48-60.
[14]	ZHANG Gengxin, DING Xiaojin, QU Zhicheng. Space‑Terrestrial Integrated Architecture for Internet of Things [J]. ZTE Communications, 2020, 18(4): 3-9.
[15]	YU Qingshuang, ZHOU Jie, GONG Wenjuan. A Lightweight Sentiment Analysis Method [J]. ZTE Communications, 2019, 17(3): 2-8.

SPBD: Streamlining Big-Data Processing in Cloud Environments

SPBD: Streamlining Big-Data Processing in Cloud Environments

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles 0

Metrics