MapReduce in the Cloud: Data-Location-Aware VM Scheduling

doi:DOI:10.3939/j.issn.1673-5188.2013.04.003

ZTE Communications ›› 2013, Vol. 11 ›› Issue (4): 18-26.DOI: DOI:10.3939/j.issn.1673-5188.2013.04.003

MapReduce in the Cloud: Data-Location-Aware VM Scheduling

Tung Nguyen and Weisong Shi

Department of Computer Science, Wayne State University, Detroit, MI 48202, USA

收稿日期:2013-04-22 出版日期:2013-12-25 发布日期:2013-12-25
作者简介:Tung Nguyen (tnguyen@i-a-i.com) is a research scientist at Intelligent Automation Inc. He plays a key role in many projects on data-intensive distributed processing, cloud computing, and mass-data analysis using topological features. Dr. Nguyen received his PhD degree from Wayne State University in 2012. He received his BS and MS degrees in computer science and engineering from Ho Chi Minh City University of Technology, Vietnam, in 2001 and 2006. His research interests include green computing, cloud computing, data-intensive computing, and application of cloud computing to life sciences. He has published several papers on computer science and bioinformatics and has been published in the proceedings ofOSDI and in NPC, SUSCOM , and BMCFrontiers Genetics journals. He has also been a peer reviewer at many conferences, includingEuro-Par andCollaborateCo m. His homepage is http://www.cs.wayne.edu/tung/

Weisong Shi (weisong@wayne.edu) is an associate professor of computer science at Wayne State University. He received his BS degree in computer engineering from Xidian University in 1995. He received his PhD degree in computer engineering from the Chinese Academy of Sciences in 2000. His research interests include computer systems, mobile computing, and cloud computing. Dr. Shi has published 120 peer-reviewed journal and conference papers and has an H-index of 24. He has been the program chair and technical program committee member of numerous international conferences, including WWW and ICDCS. In 2002, he received the NSF CAREER award for outstanding PhD dissertation (China). In 2009, he received the Career Development Chair Award of Wayne State University. He has also won the Best Paper Award at ICWE’04, IPDPS’05, HPCChina’12, and IISWC’12.

MapReduce in the Cloud: Data-Location-Aware VM Scheduling

Tung Nguyen and Weisong Shi

Department of Computer Science, Wayne State University, Detroit, MI 48202, USA

Received:2013-04-22 Online:2013-12-25 Published:2013-12-25
About author:Tung Nguyen (tnguyen@i-a-i.com) is a research scientist at Intelligent Automation Inc. He plays a key role in many projects on data-intensive distributed processing, cloud computing, and mass-data analysis using topological features. Dr. Nguyen received his PhD degree from Wayne State University in 2012. He received his BS and MS degrees in computer science and engineering from Ho Chi Minh City University of Technology, Vietnam, in 2001 and 2006. His research interests include green computing, cloud computing, data-intensive computing, and application of cloud computing to life sciences. He has published several papers on computer science and bioinformatics and has been published in the proceedings ofOSDI and in NPC, SUSCOM , and BMCFrontiers Genetics journals. He has also been a peer reviewer at many conferences, includingEuro-Par andCollaborateCo m. His homepage is http://www.cs.wayne.edu/tung/

Weisong Shi (weisong@wayne.edu) is an associate professor of computer science at Wayne State University. He received his BS degree in computer engineering from Xidian University in 1995. He received his PhD degree in computer engineering from the Chinese Academy of Sciences in 2000. His research interests include computer systems, mobile computing, and cloud computing. Dr. Shi has published 120 peer-reviewed journal and conference papers and has an H-index of 24. He has been the program chair and technical program committee member of numerous international conferences, including WWW and ICDCS. In 2002, he received the NSF CAREER award for outstanding PhD dissertation (China). In 2009, he received the Career Development Chair Award of Wayne State University. He has also won the Best Paper Award at ICWE’04, IPDPS’05, HPCChina’12, and IISWC’12.

摘要/Abstract

摘要： We have witnessed the fast-growing deployment of Hadoop, an open-source implementation of the MapReduce programming model, for purpose of data-intensive computing in the cloud. However, Hadoop was not originally designed to run transient jobs in which users need to move data back and forth between storage and computing facilities. As a result, Hadoop is inefficient and wastes resources when operating in the cloud. This paper discusses the inefficiency of MapReduce in the cloud. We study the causes of this inefficiency and propose a solution. Inefficiency mainly occurs during data movement. Transferring large data to computing nodes is very time-consuming and also violates the rationale of Hadoop, which is to move computation to the data. To address this issue, we developed a distributed cache system and virtual machine scheduler. We show that our prototype can improve performance significantly when running different applications.

关键词: cloud, MapReduce, VM scheduling, data location, Hadoop

Abstract: We have witnessed the fast-growing deployment of Hadoop, an open-source implementation of the MapReduce programming model, for purpose of data-intensive computing in the cloud. However, Hadoop was not originally designed to run transient jobs in which users need to move data back and forth between storage and computing facilities. As a result, Hadoop is inefficient and wastes resources when operating in the cloud. This paper discusses the inefficiency of MapReduce in the cloud. We study the causes of this inefficiency and propose a solution. Inefficiency mainly occurs during data movement. Transferring large data to computing nodes is very time-consuming and also violates the rationale of Hadoop, which is to move computation to the data. To address this issue, we developed a distributed cache system and virtual machine scheduler. We show that our prototype can improve performance significantly when running different applications.

Key words: cloud, MapReduce, VM scheduling, data location, Hadoop

Tung Nguyen and Weisong Shi. MapReduce in the Cloud: Data-Location-Aware VM Scheduling[J]. ZTE Communications, 2013, 11(4): 18-26.

[1]	Faizal Riaz-ud-Din, Robin Doss. Verification of Substring Searches on the Untrusted Cloud[J]. ZTE Communications, 2016, 14(S0): 10-20.
[2]	Smitha Shivshankar and Abbas Jamalipour. A Cloud Computing Perspective for Distributed Routing in Vehicular Environments[J]. ZTE Communications, 2016, 14(3): 36-44.
[3]	Zhi Liu, Xiang Wang, and Jun Li. From CIA to PDR: A Top-Down Survey of SDN Security for Cloud DCN[J]. ZTE Communications, 2016, 14(1): 54-60.
[4]	Yongbo Chen, Jijun Chen, Jiafeng Gan. Experimental Study on Cloud-Computing-Based Electric Power SCADA System[J]. ZTE Communications, 2015, 13(3): 33-41.
[5]	Aftab Ahmed Chandio, Nikos Tziritas, Cheng-Zhong Xu. Big-Data Processing Techniques and Their Challenges in Transport Domain[J]. ZTE Communications, 2015, 13(1): 50-59.
[6]	Zhenjiang Dong. Guest Editorial: Improving Performance of Cloud Computing and Big Data Technologies and Applications[J]. ZTE Communications, 2014, 12(4): 1-2.
[7]	Hancong Duan, Xiaoqin Wang, Ping Lu, Shengmei Luo, and Zhiyong Wang. A New Virtual Disk Mapping Method for the Cloud Desktop Storage Client[J]. ZTE Communications, 2014, 12(4): 3-7.
[8]	Zhenjiang Dong, Lixia Liu, Bin Wu, and Yang Liu. MBGM: A Graph-Mining Tool Based on MapReduce and BSP[J]. ZTE Communications, 2014, 12(4): 16-22.
[9]	Xiongyan Tang, Pei Zhang, and Chang Cao. SDN-Based Broadband Network for Cloud Services[J]. ZTE Communications, 2014, 12(2): 18-22.
[10]	Yasha Chen, Jianpeng Zhao, Junmao Zhu, and Fei Yan. Formal Protection Architecture for Cloud Computing System[J]. ZTE Communications, 2014, 12(2): 63-66.
[11]	. Guest Editorial: Cloud Computing[J]. ZTE Communications, 2013, 11(4): 1-1.
[12]	Ghazanfar Ali, Jie Hu, and Bhumip Khasnabish. Software-Defined Data Center[J]. ZTE Communications, 2013, 11(4): 2-7.
[13]	Lei Yang and Jiannong Cao. Computation Partitioning in Mobile Cloud Computing: A Survey[J]. ZTE Communications, 2013, 11(4): 8-17.
[14]	Fuzhi Cang, Mingxing Zhang, Yongwei Wu, and Weimin Zheng. Preventing Data Leakage in a Cloud Environment[J]. ZTE Communications, 2013, 11(4): 27-31.
[15]	Xinhua Dong, Ruixuan Li, Wanwan Zhou, Dongjie Liao, and Shuoyi Zhao. Data Security and Privacy in Cloud Storage[J]. ZTE Communications, 2013, 11(2): 18-23.

MapReduce in the Cloud: Data-Location-Aware VM Scheduling

MapReduce in the Cloud: Data-Location-Aware VM Scheduling

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics