TY - JOUR
T1 - SplitRPC
T2 - A {Control + Data} Path Splitting RPC Stack for ML Inference Serving
AU - Kumar, Adithya
AU - Sivasubramaniam, Anand
AU - Zhu, Timothy
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/6/19
Y1 - 2023/6/19
N2 - The growing adoption of hardware accelerators driven by their intelligent compiler and runtime system counterparts has democratized ML services and precipitously reduced their execution times. This motivates us to shift our attention to characterize the overheads imposed by the RPC mechanism ('RPC tax') when serving them on accelerators. Conventional RPC implementations implicitly assume the host CPU services the requests, and we focus on expanding such works towards accelerator-based services. While SmartNIC based solutions work well for simple applications, serving complex ML models requires a more nuanced view to optimize both the data-path and the control/orchestration of these accelerators. We program commodity network interface cards (NICs) to split the control and data paths for effective transfer of control while efficiently transferring the payload to the accelerator. As opposed to unified approaches that bundle these paths together, limiting the flexibility in each of these paths, we design and implement SplitRPC-a {control + data} path optimizing RPC mechanism for ML inference serving. SplitRPC allows us to optimize the datapath to the accelerator while simultaneously allowing the CPU to maintain full orchestration capabilities. We implement SplitRPC on both commodity NICs and SmartNICs and demonstrate that SplitRPC is effective in minimizing the RPC tax while providing significant gains in throughput and latency.
AB - The growing adoption of hardware accelerators driven by their intelligent compiler and runtime system counterparts has democratized ML services and precipitously reduced their execution times. This motivates us to shift our attention to characterize the overheads imposed by the RPC mechanism ('RPC tax') when serving them on accelerators. Conventional RPC implementations implicitly assume the host CPU services the requests, and we focus on expanding such works towards accelerator-based services. While SmartNIC based solutions work well for simple applications, serving complex ML models requires a more nuanced view to optimize both the data-path and the control/orchestration of these accelerators. We program commodity network interface cards (NICs) to split the control and data paths for effective transfer of control while efficiently transferring the payload to the accelerator. As opposed to unified approaches that bundle these paths together, limiting the flexibility in each of these paths, we design and implement SplitRPC-a {control + data} path optimizing RPC mechanism for ML inference serving. SplitRPC allows us to optimize the datapath to the accelerator while simultaneously allowing the CPU to maintain full orchestration capabilities. We implement SplitRPC on both commodity NICs and SmartNICs and demonstrate that SplitRPC is effective in minimizing the RPC tax while providing significant gains in throughput and latency.
UR - http://www.scopus.com/inward/record.url?scp=85164247407&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85164247407&partnerID=8YFLogxK
U2 - 10.1145/3606376.3593571
DO - 10.1145/3606376.3593571
M3 - Article
AN - SCOPUS:85164247407
SN - 0163-5999
VL - 51
SP - 13
EP - 14
JO - Performance Evaluation Review
JF - Performance Evaluation Review
IS - 1
ER -