Cocktail: A Multidimensional Optimization for Model Serving in Cloud

Jashwant Raj Gunasekaran, Cyan Subhra Mishra, Prashanth Thinakaran, Bikash Sharma, Mahmut Taylan Kandemir, Chita R. Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

49 Scopus citations

Abstract

With a growing demand for adopting ML models for a variety of application services, it is vital that the frameworks serving these models are capable of delivering highly accurate predictions with minimal latency along with reduced deployment costs in a public cloud environment. Despite high latency, prior works in this domain are crucially limited by the accuracy offered by individual models. Intuitively, model ensembling can address the accuracy gap by intelligently combining different models in parallel. However, selecting the appropriate models dynamically at runtime to meet the desired accuracy with low latency at minimal deployment cost is a nontrivial problem. Towards this, we propose Cocktail, a cost effective ensembling-based model serving framework. Cocktail comprises of two key components: (i) a dynamic model selection framework, which reduces the number of models in the ensemble, while satisfying the accuracy and latency requirements; (ii) an adaptive resource management (RM) framework that employs a distributed proactive autoscaling policy, to efficiently allocate resources for the models. The RM framework leverages transient virtual machine (VM) instances to reduce the deployment cost in a public cloud. A prototype implementation of Cocktail on the AWS EC2 platform and exhaustive evaluations using a variety of workloads demonstrate that Cocktail can reduce deployment cost by 1.45×, while providing 2× reduction in latency and satisfying the target accuracy for up to 96% of the requests, when compared to state-of-the-art model-serving frameworks.

Original languageEnglish (US)
Title of host publicationProceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022
PublisherUSENIX Association
Pages1041-1057
Number of pages17
ISBN (Electronic)9781939133274
StatePublished - 2022
Event19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022 - Renton, United States
Duration: Apr 4 2022Apr 6 2022

Publication series

NameProceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022

Conference

Conference19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022
Country/TerritoryUnited States
CityRenton
Period4/4/224/6/22

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Cocktail: A Multidimensional Optimization for Model Serving in Cloud'. Together they form a unique fingerprint.

Cite this