TY - GEN
T1 - Towards SLO-Compliant and Cost-Effective Serverless Computing on Emerging GPU Architectures
AU - Bhasi, Vivek M.
AU - Sharma, Aakash
AU - Jain, Rishabh
AU - Gunasekaran, Jashwant Raj
AU - Pattnaik, Ashutosh
AU - Kandemir, Mahmut Taylan
AU - Das, Chita
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/12/2
Y1 - 2024/12/2
N2 - Serverless platforms are supporting an increasing variety of applications (apps). Among these, apps such as Machine Learning (ML) inference serving can benefit significantly from leveraging accelerators like GPUs. Yet, major serverless providers, despite having GPU-equipped servers, do not offer GPU support for their serverless functions. While recent works have attempted to bridge this gap, they are agnostic to the capabilities of new-generation GPUs, thereby, overlooking several performance optimization opportunities. To address this, we leverage unique features of newer NVIDIA GPU architectures (specifically, their Multi-Instance GPU (MIG) and Multi-Process Service (MPS) capabilities) to devise a serverless framework, Protean, that can guarantee a higher degree of Service Level Objective (SLO) compliance than that offered by state-of-the-art works. Moreover, Protean also proposes to host its components on a combination of both on-demand (reliable) VMs and heavily discounted VMs to reduce costs to the end consumer, while offering high service availability. We extensively evaluate Protean using 22 ML inference workloads with real-world traces on an 8×A100 GPU cluster. Our results show that Protean significantly outperforms state-of-the-art works in terms of SLO compliance (up to ∼93% more) and tail latency (up to 82% less), while reducing cost by up to 70%. We also maintain reasonable tail latencies (< 200 ms) for best effort requests.
AB - Serverless platforms are supporting an increasing variety of applications (apps). Among these, apps such as Machine Learning (ML) inference serving can benefit significantly from leveraging accelerators like GPUs. Yet, major serverless providers, despite having GPU-equipped servers, do not offer GPU support for their serverless functions. While recent works have attempted to bridge this gap, they are agnostic to the capabilities of new-generation GPUs, thereby, overlooking several performance optimization opportunities. To address this, we leverage unique features of newer NVIDIA GPU architectures (specifically, their Multi-Instance GPU (MIG) and Multi-Process Service (MPS) capabilities) to devise a serverless framework, Protean, that can guarantee a higher degree of Service Level Objective (SLO) compliance than that offered by state-of-the-art works. Moreover, Protean also proposes to host its components on a combination of both on-demand (reliable) VMs and heavily discounted VMs to reduce costs to the end consumer, while offering high service availability. We extensively evaluate Protean using 22 ML inference workloads with real-world traces on an 8×A100 GPU cluster. Our results show that Protean significantly outperforms state-of-the-art works in terms of SLO compliance (up to ∼93% more) and tail latency (up to 82% less), while reducing cost by up to 70%. We also maintain reasonable tail latencies (< 200 ms) for best effort requests.
UR - https://www.scopus.com/pages/publications/85215529068
UR - https://www.scopus.com/pages/publications/85215529068#tab=citedBy
U2 - 10.1145/3652892.3700760
DO - 10.1145/3652892.3700760
M3 - Conference contribution
AN - SCOPUS:85215529068
T3 - Middleware 2024 - Proceedings of the 25th ACM International Middleware Conference
SP - 211
EP - 224
BT - Middleware 2024 - Proceedings of the 25th ACM International Middleware Conference
PB - Association for Computing Machinery, Inc
T2 - 25th ACM International Middleware Conference, Middleware 2024
Y2 - 2 December 2024 through 6 December 2024
ER -