TY - JOUR
T1 - Noninvasive MapReduce Performance Tuning Using Multiple Tuning Methods on Hadoop
AU - Chen, Donghua
AU - Zhang, Runtong
AU - Qiu, Robin Guanghua
N1 - Funding Information:
Manuscript received January 28, 2020; revised June 18, 2020 and August 29, 2020; accepted August 30, 2020. Date of publication September 25, 2020; date of current version June 7, 2021. This work was supported in part by the National Natural Science Foundation of China through a key project under Grant 71532002 and in part by the National Social Science Foundation of China through a major project under Grant 18ZDA086. (Corresponding author: Runtong Zhang.) Donghua Chen is with the School of Information Technology and Management, University of International Business and Economics, Beijing 100029, China (e-mail: [email protected]).
Publisher Copyright:
© 2007-2012 IEEE.
PY - 2021/6
Y1 - 2021/6
N2 - There are more than 190 configuration parameters affecting the performance of MapReduce jobs on Hadoop. It is time-consuming and tedious for general users who have no deep knowledge of Hadoop configuring to tune the parameters of a MapReduce job for optimal performance. Therefore, a self-Tuning system to improve MapReduce performance in an automated and efficient manner in a complicated Hadoop environment is needed. This article explores multiple tuning methods to improve tuning efficiency for MapReduce performance on Hadoop. The proposed Catla system employs succinct templates and proper schemes of MapReduce algorithms, which can be incorporated in facilitating the tuning and optimization of MapReduce performance. A comprehensive evaluation of the Catla system, with the support of multiple tuning approaches, is discussed in this article. Direct search-based and derivative-free optimization-based tuning techniques for improved efficiency and usability are evaluated using a series of tuning experiments. The experimental results reveal that our work can identify optimal Hadoop parameters for deployed MapReduce jobs in a noninvasive, flexible, automated, and comprehensive manner.
AB - There are more than 190 configuration parameters affecting the performance of MapReduce jobs on Hadoop. It is time-consuming and tedious for general users who have no deep knowledge of Hadoop configuring to tune the parameters of a MapReduce job for optimal performance. Therefore, a self-Tuning system to improve MapReduce performance in an automated and efficient manner in a complicated Hadoop environment is needed. This article explores multiple tuning methods to improve tuning efficiency for MapReduce performance on Hadoop. The proposed Catla system employs succinct templates and proper schemes of MapReduce algorithms, which can be incorporated in facilitating the tuning and optimization of MapReduce performance. A comprehensive evaluation of the Catla system, with the support of multiple tuning approaches, is discussed in this article. Direct search-based and derivative-free optimization-based tuning techniques for improved efficiency and usability are evaluated using a series of tuning experiments. The experimental results reveal that our work can identify optimal Hadoop parameters for deployed MapReduce jobs in a noninvasive, flexible, automated, and comprehensive manner.
UR - http://www.scopus.com/inward/record.url?scp=85110826835&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85110826835&partnerID=8YFLogxK
U2 - 10.1109/JSYST.2020.3022286
DO - 10.1109/JSYST.2020.3022286
M3 - Article
AN - SCOPUS:85110826835
SN - 1932-8184
VL - 15
SP - 2906
EP - 2917
JO - IEEE Systems Journal
JF - IEEE Systems Journal
IS - 2
M1 - 9205847
ER -