HybridMR: A hierarchical MapReduce scheduler for hybrid data centers

Bikash Sharma, Timothy Wood, Chita R. Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

57 Scopus citations

Abstract

Virtualized environments are attractive because they simplify cluster management, while facilitating cost-effective workload consolidation. As a result, virtual machines in public clouds or private data centers, have become the norm for running transactional applications like web services and virtual desktops. On the other hand, batch workloads like MapReduce, are typically deployed in a native cluster to avoid the performance overheads of virtualization. While both these virtual and native environments have their own strengths and weaknesses, we demonstrate in this work that it is feasible to provide the best of these two computing paradigms in a hybrid platform. In this paper, we make a case for a hybrid data center consisting of native and virtual environments, and propose a 2-phase hierarchical scheduler, called HybridMR, for the effective resource management of interactive and batch workloads. In the first phase, HybridMR classifies incoming MapReduce jobs based on the expected virtualization overheads, and uses this information to automatically guide placement between physical and virtual machines. In the second phase, HybridMR manages the run-time performance of MapReduce jobs collocated with interactive applications in order to provide best effort delivery to batch jobs, while complying with the Service Level Agreements (SLAs) of interactive applications. By consolidating batch jobs with over-provisioned foreground applications, the available unused resources are better utilized, resulting in improved application performance and energy efficiency. Evaluations on a hybrid cluster consisting of 24 physical servers and 48 virtual machines, with diverse workload mix of interactive and batch MapReduce applications, demonstrate that HybridMR can achieve up to 40% improvement in the completion times of MapReduce jobs, over the virtual-only case, while complying with the SLAs of interactive applications. Compared to the native-only cluster, at the cost of minimal performance penalty, HybridMR boosts resource utilization by 45%, and achieves up to 43% energy savings. These results indicate that a hybrid data center with an efficient scheduling mechanism can provide a cost-effective solution for hosting both batch and interactive workloads.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013
Pages102-111
Number of pages10
DOIs
StatePublished - 2013
Event2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013 - Philadelphia, PA, United States
Duration: Jul 8 2013Jul 11 2013

Publication series

NameProceedings - International Conference on Distributed Computing Systems

Other

Other2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013
Country/TerritoryUnited States
CityPhiladelphia, PA
Period7/8/137/11/13

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'HybridMR: A hierarchical MapReduce scheduler for hybrid data centers'. Together they form a unique fingerprint.

Cite this