期刊
  出版年
  关键词
结果中检索 Open Search
Please wait a minute...
选择: 显示/隐藏图片
1. A Hadoop Performance Prediction Model Based on Random Forest
Zhendong Bei, Zhibin Yu, Huiling Zhang, Chengzhong Xu, Shenzhong Feng, Zhenjiang Dong, and Hengsheng Zhang
ZTE Communications    2013, 11 (2): 38-44.   DOI: DOI:10.3969/j.issn.1673-5188.2013.02.006
摘要78)      PDF (455KB)(177)    收藏
MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently developed machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system’s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.
相关文章 | 多维度评价