TY - GEN
T1 - Phoenix
T2 - 37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017
AU - Thinakaran, Prashanth
AU - Gunasekaran, Jashwant Raj
AU - Sharma, Bikash
AU - Kandemir, Mahmut Taylan
AU - Das, Chita R.
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/13
Y1 - 2017/7/13
N2 - Today's datacenters are increasingly becoming diverse with respect to both hardware and software architectures in order to support a myriad of applications. These applications are also heterogeneous in terms of job response times and resource requirements (eg., Number of Cores, GPUs, Network Speed) and they are expressed as task constraints. Constraints are used for ensuring task performance guarantees/Quality of Service(QoS) by enabling the application to express its specific resource requirements. While several schedulers have recently been proposed that aim to improve overall application and system performance, few of these schedulers consider resource constraints across tasks while making the scheduling decisions. Furthermore, latencycritical workloads and short-lived jobs that typically constitute about 90% of the total jobs in a datacenter have strict QoS requirements, which can be ensured by minimizing the tail latency through effective scheduling. In this paper, we propose Phoenix, a constraint-aware hybrid scheduler to address both these problems (constraint awareness and ensuring low tail latency) by minimizing the job response times at constrained workers. We use a novel Constraint Resource Vector (CRV) based scheduling, which in turn facilitates reordering of the jobs in a queue to minimize tail latency. We have used the publicly available Google traces to analyze their constraint characteristics and have embedded these constraints in Cloudera and Yahoo cluster traces for studying the impact of traces on system performance. Experiments with Google, Cloudera and Yahoo cluster traces across 15,000 worker node cluster shows that Phoenix improves the 99th percentile job response times on an average by 1.9× across all three traces when compared against a state-of-the-art hybrid scheduler. Further, in comparison to other distributed scheduler like Hawk, it improves the 90th and 99th percentile job response times by 4.5× and 5× respectively.
AB - Today's datacenters are increasingly becoming diverse with respect to both hardware and software architectures in order to support a myriad of applications. These applications are also heterogeneous in terms of job response times and resource requirements (eg., Number of Cores, GPUs, Network Speed) and they are expressed as task constraints. Constraints are used for ensuring task performance guarantees/Quality of Service(QoS) by enabling the application to express its specific resource requirements. While several schedulers have recently been proposed that aim to improve overall application and system performance, few of these schedulers consider resource constraints across tasks while making the scheduling decisions. Furthermore, latencycritical workloads and short-lived jobs that typically constitute about 90% of the total jobs in a datacenter have strict QoS requirements, which can be ensured by minimizing the tail latency through effective scheduling. In this paper, we propose Phoenix, a constraint-aware hybrid scheduler to address both these problems (constraint awareness and ensuring low tail latency) by minimizing the job response times at constrained workers. We use a novel Constraint Resource Vector (CRV) based scheduling, which in turn facilitates reordering of the jobs in a queue to minimize tail latency. We have used the publicly available Google traces to analyze their constraint characteristics and have embedded these constraints in Cloudera and Yahoo cluster traces for studying the impact of traces on system performance. Experiments with Google, Cloudera and Yahoo cluster traces across 15,000 worker node cluster shows that Phoenix improves the 99th percentile job response times on an average by 1.9× across all three traces when compared against a state-of-the-art hybrid scheduler. Further, in comparison to other distributed scheduler like Hawk, it improves the 90th and 99th percentile job response times by 4.5× and 5× respectively.
UR - http://www.scopus.com/inward/record.url?scp=85027243807&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027243807&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2017.262
DO - 10.1109/ICDCS.2017.262
M3 - Conference contribution
AN - SCOPUS:85027243807
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 977
EP - 987
BT - Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017
A2 - Lee, Kisung
A2 - Liu, Ling
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 June 2017 through 8 June 2017
ER -