Skip to main navigation Skip to search Skip to main content

Towards SLO-Compliant and Cost-Effective Serverless Computing on Emerging GPU Architectures

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Serverless platforms are supporting an increasing variety of applications (apps). Among these, apps such as Machine Learning (ML) inference serving can benefit significantly from leveraging accelerators like GPUs. Yet, major serverless providers, despite having GPU-equipped servers, do not offer GPU support for their serverless functions. While recent works have attempted to bridge this gap, they are agnostic to the capabilities of new-generation GPUs, thereby, overlooking several performance optimization opportunities. To address this, we leverage unique features of newer NVIDIA GPU architectures (specifically, their Multi-Instance GPU (MIG) and Multi-Process Service (MPS) capabilities) to devise a serverless framework, Protean, that can guarantee a higher degree of Service Level Objective (SLO) compliance than that offered by state-of-the-art works. Moreover, Protean also proposes to host its components on a combination of both on-demand (reliable) VMs and heavily discounted VMs to reduce costs to the end consumer, while offering high service availability. We extensively evaluate Protean using 22 ML inference workloads with real-world traces on an 8×A100 GPU cluster. Our results show that Protean significantly outperforms state-of-the-art works in terms of SLO compliance (up to ∼93% more) and tail latency (up to 82% less), while reducing cost by up to 70%. We also maintain reasonable tail latencies (< 200 ms) for best effort requests.

Original languageEnglish (US)
Title of host publicationMiddleware 2024 - Proceedings of the 25th ACM International Middleware Conference
PublisherAssociation for Computing Machinery, Inc
Pages211-224
Number of pages14
ISBN (Electronic)9798400706233
DOIs
StatePublished - Dec 2 2024
Event25th ACM International Middleware Conference, Middleware 2024 - Hong Kong, Hong Kong
Duration: Dec 2 2024Dec 6 2024

Publication series

NameMiddleware 2024 - Proceedings of the 25th ACM International Middleware Conference

Conference

Conference25th ACM International Middleware Conference, Middleware 2024
Country/TerritoryHong Kong
CityHong Kong
Period12/2/2412/6/24

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Towards SLO-Compliant and Cost-Effective Serverless Computing on Emerging GPU Architectures'. Together they form a unique fingerprint.

Cite this