MapReduce in the Cloud: Data-Location-Aware VM Scheduling

doi:DOI:10.3939/j.issn.1673-5188.2013.04.003

ZTE Communications ›› 2013, Vol. 11 ›› Issue (4): 18-26.DOI: DOI:10.3939/j.issn.1673-5188.2013.04.003

• Special Topic • Previous Articles Next Articles

MapReduce in the Cloud: Data-Location-Aware VM Scheduling

Tung Nguyen and Weisong Shi

Department of Computer Science, Wayne State University, Detroit, MI 48202, USA

Received:2013-04-22 Online:2013-12-25 Published:2013-12-25
About author:Tung Nguyen (tnguyen@i-a-i.com) is a research scientist at Intelligent Automation Inc. He plays a key role in many projects on data-intensive distributed processing, cloud computing, and mass-data analysis using topological features. Dr. Nguyen received his PhD degree from Wayne State University in 2012. He received his BS and MS degrees in computer science and engineering from Ho Chi Minh City University of Technology, Vietnam, in 2001 and 2006. His research interests include green computing, cloud computing, data-intensive computing, and application of cloud computing to life sciences. He has published several papers on computer science and bioinformatics and has been published in the proceedings ofOSDI and in NPC, SUSCOM , and BMCFrontiers Genetics journals. He has also been a peer reviewer at many conferences, includingEuro-Par andCollaborateCo m. His homepage is http://www.cs.wayne.edu/tung/

Weisong Shi (weisong@wayne.edu) is an associate professor of computer science at Wayne State University. He received his BS degree in computer engineering from Xidian University in 1995. He received his PhD degree in computer engineering from the Chinese Academy of Sciences in 2000. His research interests include computer systems, mobile computing, and cloud computing. Dr. Shi has published 120 peer-reviewed journal and conference papers and has an H-index of 24. He has been the program chair and technical program committee member of numerous international conferences, including WWW and ICDCS. In 2002, he received the NSF CAREER award for outstanding PhD dissertation (China). In 2009, he received the Career Development Chair Award of Wayne State University. He has also won the Best Paper Award at ICWE’04, IPDPS’05, HPCChina’12, and IISWC’12.

MapReduce in the Cloud: Data-Location-Aware VM Scheduling

Tung Nguyen and Weisong Shi

Department of Computer Science, Wayne State University, Detroit, MI 48202, USA

作者简介:Tung Nguyen (tnguyen@i-a-i.com) is a research scientist at Intelligent Automation Inc. He plays a key role in many projects on data-intensive distributed processing, cloud computing, and mass-data analysis using topological features. Dr. Nguyen received his PhD degree from Wayne State University in 2012. He received his BS and MS degrees in computer science and engineering from Ho Chi Minh City University of Technology, Vietnam, in 2001 and 2006. His research interests include green computing, cloud computing, data-intensive computing, and application of cloud computing to life sciences. He has published several papers on computer science and bioinformatics and has been published in the proceedings ofOSDI and in NPC, SUSCOM , and BMCFrontiers Genetics journals. He has also been a peer reviewer at many conferences, includingEuro-Par andCollaborateCo m. His homepage is http://www.cs.wayne.edu/tung/

Weisong Shi (weisong@wayne.edu) is an associate professor of computer science at Wayne State University. He received his BS degree in computer engineering from Xidian University in 1995. He received his PhD degree in computer engineering from the Chinese Academy of Sciences in 2000. His research interests include computer systems, mobile computing, and cloud computing. Dr. Shi has published 120 peer-reviewed journal and conference papers and has an H-index of 24. He has been the program chair and technical program committee member of numerous international conferences, including WWW and ICDCS. In 2002, he received the NSF CAREER award for outstanding PhD dissertation (China). In 2009, he received the Career Development Chair Award of Wayne State University. He has also won the Best Paper Award at ICWE’04, IPDPS’05, HPCChina’12, and IISWC’12.

Abstract

Abstract: We have witnessed the fast-growing deployment of Hadoop, an open-source implementation of the MapReduce programming model, for purpose of data-intensive computing in the cloud. However, Hadoop was not originally designed to run transient jobs in which users need to move data back and forth between storage and computing facilities. As a result, Hadoop is inefficient and wastes resources when operating in the cloud. This paper discusses the inefficiency of MapReduce in the cloud. We study the causes of this inefficiency and propose a solution. Inefficiency mainly occurs during data movement. Transferring large data to computing nodes is very time-consuming and also violates the rationale of Hadoop, which is to move computation to the data. To address this issue, we developed a distributed cache system and virtual machine scheduler. We show that our prototype can improve performance significantly when running different applications.

Key words: cloud, MapReduce, VM scheduling, data location, Hadoop

摘要： We have witnessed the fast-growing deployment of Hadoop, an open-source implementation of the MapReduce programming model, for purpose of data-intensive computing in the cloud. However, Hadoop was not originally designed to run transient jobs in which users need to move data back and forth between storage and computing facilities. As a result, Hadoop is inefficient and wastes resources when operating in the cloud. This paper discusses the inefficiency of MapReduce in the cloud. We study the causes of this inefficiency and propose a solution. Inefficiency mainly occurs during data movement. Transferring large data to computing nodes is very time-consuming and also violates the rationale of Hadoop, which is to move computation to the data. To address this issue, we developed a distributed cache system and virtual machine scheduler. We show that our prototype can improve performance significantly when running different applications.

关键词: cloud, MapReduce, VM scheduling, data location, Hadoop

Tung Nguyen and Weisong Shi. MapReduce in the Cloud: Data-Location-Aware VM Scheduling[J]. ZTE Communications, 2013, 11(4): 18-26.

[1]	ZHOU Yingjie, ZHANG Zicheng, SUN Wei, MIN Xiongkuo, ZHAI Guangtao. Perceptual Quality Assessment for Point Clouds : A Survey [J]. ZTE Communications, 2023, 21(4): 3-16.
[2]	ZHANG Huiran, DONG Zhen, WANG Mingsheng. Spatio-Temporal Context-Guided Algorithm for Lossless Point Cloud Geometry Compression [J]. ZTE Communications, 2023, 21(4): 17-28.
[3]	YIN Qian, ZHANG Xinfeng, HUANG Hongyue, WANG Shanshe, MA Siwei. Lossy Point Cloud Attribute Compression with Subnode-Based Prediction [J]. ZTE Communications, 2023, 21(4): 29-37.
[4]	WANG Chongchong, LI Yao, WANG Beibei, CAO Hong, ZHANG Yanyong. Point Cloud Processing Methods for 3D Point Cloud Detection Tasks [J]. ZTE Communications, 2023, 21(4): 38-46.
[5]	YIN Yujie, CHEN Zhang. Perceptual Optimization for Point-Based Point Cloud Rendering [J]. ZTE Communications, 2023, 21(4): 47-53.
[6]	LI Xiuxian, LI Zhetao, OUYANG Yan, DUAN Haohua, XIANG Liyao. Using UAV to Detect Truth for Clean Data Collection in Sensor‑Cloud Systems [J]. ZTE Communications, 2021, 19(3): 30-45.
[7]	YANG Han, CHEN Xu, ZHOU Zhi. Super Resolution Sensing Technique for Distributed Resource Monitoring on Edge Clouds [J]. ZTE Communications, 2021, 19(3): 73-80.
[8]	YU Qingshuang, ZHOU Jie, GONG Wenjuan. A Lightweight Sentiment Analysis Method [J]. ZTE Communications, 2019, 17(3): 2-8.
[9]	YE Dezhong, LV Haibing, GAO Yun, BAO Qiuxia, CHEN Mingzi. Novel Real-Time System for Traffic Flow Classification and Prediction [J]. ZTE Communications, 2019, 17(2): 10-18.
[10]	CHEN Yan, WEN Xitao, LENG Xue, YANG Bo, Li Erran Li, ZHENG Peng, HU Chengchen. Optimization Framework for Minimizing Rule Update Latency in SDN Switches [J]. ZTE Communications, 2018, 16(4): 15-29.
[11]	ZHANG Yunyong, XU Lei, TAO Ye. SDN Based Security Services [J]. ZTE Communications, 2018, 16(4): 9-14.
[12]	JIN Yichao, WEN Yonggang. When Machine Learning Meets Media Cloud: Architecture, Application and Outlook [J]. ZTE Communications, 2018, 16(3): 30-39.
[13]	XU Yiling, ZHANG Ke, HE Lanyi, JIANG Zhiqian, ZHU Wenjie. Introduction to Point Cloud Compression [J]. ZTE Communications, 2018, 16(3): 3-8.
[14]	MENG Ziqian, GUAN Zhi, WU Zhengang, LI Anran, CHEN Zhong. Security Enhanced Internet of Vehicles with Cloud-Fog-Dew Computing [J]. ZTE Communications, 2017, 15(S2): 47-51.
[15]	CHEN Aiguo, WU Huaigu, TIAN Ling, LUO Guangchun. HCOS: A Unified Model and Architecture for Cloud Operating System [J]. ZTE Communications, 2017, 15(4): 23-29.

MapReduce in the Cloud: Data-Location-Aware VM Scheduling

MapReduce in the Cloud: Data-Location-Aware VM Scheduling

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles 0

Metrics