TY - GEN
T1 - SplitServe
T2 - 21st International Middleware Conference, Middleware 2020
AU - Jain, Aman
AU - Urgaonkar, Bhuvan
AU - Baarzi, Ata F.
AU - Alfares, Nader
AU - Kesidis, George
AU - Kandemir, Mahmut
N1 - Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/12/7
Y1 - 2020/12/7
N2 - Due to their lower startup latencies and finer-grain pricing than virtual machines (VMs), Amazon Lambdas and other cloud functions (CFs) have been identified as ideal candidates for handling unexpected spikes in simple, stateless workloads. However, it is not immediately clear if CFs would be similarly effective in autoscaling complex workloads involving significant state transfer across distributed application components. We have found that, through careful design, currently available CFs can indeed be useful even for complex workloads. To demonstrate this, we design and implement SplitServe, an enhancement of Apache Spark. If not enough executors on existing VMs are available for a newly arriving latency-sensitive job, SplitServe is able to use CFs to quickly bridge this shortfall in VMs, so avoiding the startup latencies of newly requested VMs. If desirable in terms of performance or cost, when newly requested VMs, or executors on existing VMs, do become available, SplitServe is able to move ongoing work from CFs to them. Our experimental evaluation of SplitServe using four different workloads (either on a mixture of VM-based executors and CFs or just CFs) shows that it improves execution time by up to (a) 55% for workloads with small to modest amount of shuffling, and (b) 31% in workloads with large amounts of shuffling, when compared to only VM-based autoscaling.
AB - Due to their lower startup latencies and finer-grain pricing than virtual machines (VMs), Amazon Lambdas and other cloud functions (CFs) have been identified as ideal candidates for handling unexpected spikes in simple, stateless workloads. However, it is not immediately clear if CFs would be similarly effective in autoscaling complex workloads involving significant state transfer across distributed application components. We have found that, through careful design, currently available CFs can indeed be useful even for complex workloads. To demonstrate this, we design and implement SplitServe, an enhancement of Apache Spark. If not enough executors on existing VMs are available for a newly arriving latency-sensitive job, SplitServe is able to use CFs to quickly bridge this shortfall in VMs, so avoiding the startup latencies of newly requested VMs. If desirable in terms of performance or cost, when newly requested VMs, or executors on existing VMs, do become available, SplitServe is able to move ongoing work from CFs to them. Our experimental evaluation of SplitServe using four different workloads (either on a mixture of VM-based executors and CFs or just CFs) shows that it improves execution time by up to (a) 55% for workloads with small to modest amount of shuffling, and (b) 31% in workloads with large amounts of shuffling, when compared to only VM-based autoscaling.
UR - http://www.scopus.com/inward/record.url?scp=85098494085&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098494085&partnerID=8YFLogxK
U2 - 10.1145/3423211.3425695
DO - 10.1145/3423211.3425695
M3 - Conference contribution
AN - SCOPUS:85098494085
T3 - Middleware 2020 - Proceedings of the 2020 21st International Middleware Conference
SP - 236
EP - 250
BT - Middleware 2020 - Proceedings of the 2020 21st International Middleware Conference
PB - Association for Computing Machinery, Inc
Y2 - 7 December 2020 through 11 December 2020
ER -