TY - GEN
T1 - Implications of Public Cloud Resource Heterogeneity for Inference Serving
AU - Gunasekaran, Jashwant Raj
AU - Mishra, Cyan Subhra
AU - Thinakaran, Prashanth
AU - Kandemir, Mahmut Taylan
AU - Das, Chita R.
N1 - Funding Information:
Acknowledgement This research was partially supported by NSF grants #1931531, #1955815, #1629129, #1763681, #1629915, #1908793, #1526750 and we thank NSF Chameleon Cloud project CH-819640 for their generous compute grant.
Publisher Copyright:
© 2020 ACM.
PY - 2020/12/7
Y1 - 2020/12/7
N2 - We are witnessing an increasing trend towards using Machine Learning (ML) based prediction systems, spanning across different application domains, including product recommendation systems, personal assistant devices, facial recognition, etc. These applications typically have diverse requirements in terms of accuracy and response latency, that can be satisfied by a myriad of ML models. However, the deployment cost of prediction serving primarily depends on the type of resources being procured, which by themselves are heterogeneous in terms of provisioning latencies and billing complexity. Thus, it is strenuous for an inference serving system to choose from this confounding array of resource types and model types to provide low-latency and cost-effective inferences. In this work we quantitatively characterize the cost, accuracy and latency implications of hosting ML inferences on different public cloud resource offerings. Our evaluation shows that, prior work does not solve the problem from both dimensions of model and resource heterogeneity. Hence, to holistically address this problem, we need to solve the issues that arise from combining both model and resource heterogeneity towards optimizing for application constraints. Towards this, we discuss the design implications of a self-managed inference serving system, which can optimize for application requirements based on public cloud resource characteristics.
AB - We are witnessing an increasing trend towards using Machine Learning (ML) based prediction systems, spanning across different application domains, including product recommendation systems, personal assistant devices, facial recognition, etc. These applications typically have diverse requirements in terms of accuracy and response latency, that can be satisfied by a myriad of ML models. However, the deployment cost of prediction serving primarily depends on the type of resources being procured, which by themselves are heterogeneous in terms of provisioning latencies and billing complexity. Thus, it is strenuous for an inference serving system to choose from this confounding array of resource types and model types to provide low-latency and cost-effective inferences. In this work we quantitatively characterize the cost, accuracy and latency implications of hosting ML inferences on different public cloud resource offerings. Our evaluation shows that, prior work does not solve the problem from both dimensions of model and resource heterogeneity. Hence, to holistically address this problem, we need to solve the issues that arise from combining both model and resource heterogeneity towards optimizing for application constraints. Towards this, we discuss the design implications of a self-managed inference serving system, which can optimize for application requirements based on public cloud resource characteristics.
UR - http://www.scopus.com/inward/record.url?scp=85099607718&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099607718&partnerID=8YFLogxK
U2 - 10.1145/3429880.3430093
DO - 10.1145/3429880.3430093
M3 - Conference contribution
AN - SCOPUS:85099607718
T3 - WOSC 2020 - Proceedings of the 2020 6th International Workshop on Serverless Computing, Part of Middleware 2020
SP - 7
EP - 12
BT - WOSC 2020 - Proceedings of the 2020 6th International Workshop on Serverless Computing, Part of Middleware 2020
PB - Association for Computing Machinery, Inc
T2 - 6th International Workshop on Serverless Computing, WOSC 2020 - Part of Middleware 2020
Y2 - 7 December 2020 through 11 December 2020
ER -