TY - GEN
T1 - Paldia
T2 - 38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
AU - Bhasi, Vivek M.
AU - Sharma, Aakash
AU - Mohanty, Shruti
AU - Kandemir, Mahmut Taylan
AU - Das, Chitaranjan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Among the variety of applications (apps) being deployed on serverless platforms, apps such as Machine Learning (ML) inference serving can achieve better performance from leveraging accelerators like GPUs. Yet, major serverless providers, despite having GPU-equipped servers, do not offer GPU support for their serverless functions. Given that serverless functions are deployed on various generations of CPUs already, extending this to various (typically more expensive) GPU generations can offer providers a greater range of hardware to serve incoming requests according to the functions and request traffic. Here, providers are faced with the challenge of selecting hardware to reach a well-proportioned trade-off point between cost and performance. While recent works have attempted to address this, they often fail to do so as they overlook optimization opportunities arising from intelligently leveraging existing GPU sharing mechanisms. To address this point, we devise a heterogeneous serverless framework, PALDIA, which uses a prudent Hardware selection policy to acquire capable, cost-effective hardware and perform intelligent request scheduling on it to yield high performance and cost savings. Specifically, our scheduling algorithm employs hybrid spatio-temporal GPU sharing that intelligently trades off job queueing delays and interference to allow the chosen cost-effective hardware to also be highly performant. We extensively evaluate PALDIA using 16 ML inference workloads with real-world traces on a 6 node heterogeneous cluster. Our results show that PALDIA significantly outperforms state-of-the-art works in terms of Service Level Objective (SLO) compliance (up to 13.3% more) and tail latency (up to ∼50% less), with cost savings up to 86%.
AB - Among the variety of applications (apps) being deployed on serverless platforms, apps such as Machine Learning (ML) inference serving can achieve better performance from leveraging accelerators like GPUs. Yet, major serverless providers, despite having GPU-equipped servers, do not offer GPU support for their serverless functions. Given that serverless functions are deployed on various generations of CPUs already, extending this to various (typically more expensive) GPU generations can offer providers a greater range of hardware to serve incoming requests according to the functions and request traffic. Here, providers are faced with the challenge of selecting hardware to reach a well-proportioned trade-off point between cost and performance. While recent works have attempted to address this, they often fail to do so as they overlook optimization opportunities arising from intelligently leveraging existing GPU sharing mechanisms. To address this point, we devise a heterogeneous serverless framework, PALDIA, which uses a prudent Hardware selection policy to acquire capable, cost-effective hardware and perform intelligent request scheduling on it to yield high performance and cost savings. Specifically, our scheduling algorithm employs hybrid spatio-temporal GPU sharing that intelligently trades off job queueing delays and interference to allow the chosen cost-effective hardware to also be highly performant. We extensively evaluate PALDIA using 16 ML inference workloads with real-world traces on a 6 node heterogeneous cluster. Our results show that PALDIA significantly outperforms state-of-the-art works in terms of Service Level Objective (SLO) compliance (up to 13.3% more) and tail latency (up to ∼50% less), with cost savings up to 86%.
UR - http://www.scopus.com/inward/record.url?scp=85198905180&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85198905180&partnerID=8YFLogxK
U2 - 10.1109/IPDPS57955.2024.00018
DO - 10.1109/IPDPS57955.2024.00018
M3 - Conference contribution
AN - SCOPUS:85198905180
T3 - Proceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
SP - 100
EP - 113
BT - Proceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 May 2024 through 31 May 2024
ER -