TY - GEN
T1 - CASH
T2 - 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2021
AU - Sharma, Aakash
AU - Dhakshinamurthyy, Saravanan
AU - Kesidis, George
AU - Das, Chita R.
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/5
Y1 - 2021/5
N2 - Distributed data processing frameworks such as Hadoop, Tez, Spark, and Flink are exclusively used by public cloud tenants for executing large scale data analytics applications in various domains including but not limited to content management, financial sector, healthcare etc. These frameworks slice a job into a number of smaller tasks, which are then executed by a job scheduler on a multi-node compute cluster. While making scheduling decisions, the State-of-art schedulers employed in these frameworks assume hardware resources such as CPU, disk I/O and network I/O to offer a fixed service rate. However, in a public cloud environment, many of these resources are associated with burstable service rates. More specifically, the resources offer a guaranteed baseline service rate with an option to burst above their baseline rate by expending accumulated burst credits. Being unaware about this underlying hardware burstability, schedulers tend to make sub-optimal task placement decisions, thereby adversely affecting the job completion times, leading to higher deployment costs.In this paper, we propose CASH, a burst credit aware scheduler, which is cognizant about the burst credits associated with the individual hardware resources in the public cloud cluster. Through coarse grained task annotations depicting the burst credit demand of individual tasks and dynamically monitoring the credits for the underlying resources, CASH performs optimal task placement decisions. We prototype CASH on YARN, Hadoop, and Tez, and extensively evaluate it using both batch and streaming workloads. Our experimental results with CASH show CPU-credit based instances, like AWS T3, are a viable cost effective alternative when compared to self-managed offerings like Amazon EMR, for running large scale batch workloads. Furthermore, we demonstrate that CASH can accelerate streaming SQL queries on a large Hive database by up to 39.4% , leading to public cloud cost savings by up to 22%.
AB - Distributed data processing frameworks such as Hadoop, Tez, Spark, and Flink are exclusively used by public cloud tenants for executing large scale data analytics applications in various domains including but not limited to content management, financial sector, healthcare etc. These frameworks slice a job into a number of smaller tasks, which are then executed by a job scheduler on a multi-node compute cluster. While making scheduling decisions, the State-of-art schedulers employed in these frameworks assume hardware resources such as CPU, disk I/O and network I/O to offer a fixed service rate. However, in a public cloud environment, many of these resources are associated with burstable service rates. More specifically, the resources offer a guaranteed baseline service rate with an option to burst above their baseline rate by expending accumulated burst credits. Being unaware about this underlying hardware burstability, schedulers tend to make sub-optimal task placement decisions, thereby adversely affecting the job completion times, leading to higher deployment costs.In this paper, we propose CASH, a burst credit aware scheduler, which is cognizant about the burst credits associated with the individual hardware resources in the public cloud cluster. Through coarse grained task annotations depicting the burst credit demand of individual tasks and dynamically monitoring the credits for the underlying resources, CASH performs optimal task placement decisions. We prototype CASH on YARN, Hadoop, and Tez, and extensively evaluate it using both batch and streaming workloads. Our experimental results with CASH show CPU-credit based instances, like AWS T3, are a viable cost effective alternative when compared to self-managed offerings like Amazon EMR, for running large scale batch workloads. Furthermore, we demonstrate that CASH can accelerate streaming SQL queries on a large Hive database by up to 39.4% , leading to public cloud cost savings by up to 22%.
UR - http://www.scopus.com/inward/record.url?scp=85114898876&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114898876&partnerID=8YFLogxK
U2 - 10.1109/CCGrid51090.2021.00032
DO - 10.1109/CCGrid51090.2021.00032
M3 - Conference contribution
AN - SCOPUS:85114898876
T3 - Proceedings - 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2021
SP - 227
EP - 236
BT - Proceedings - 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2021
A2 - Lefevre, Laurent
A2 - Patterson, Stacy
A2 - Lee, Young Choon
A2 - Shen, Haiying
A2 - Ilager, Shashikant
A2 - Goudarzi, Mohammad
A2 - Toosi, Adel N.
A2 - Buyya, Rajkumar
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 May 2021 through 13 May 2021
ER -