TY - GEN
T1 - LazyDP
T2 - 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024
AU - Lim, Juntaek
AU - Kwon, Youngeun
AU - Hwang, Ranggi
AU - Maeng, Kiwan
AU - Suh, Edward
AU - Rhu, Minsoo
N1 - Publisher Copyright:
© 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/4/27
Y1 - 2024/4/27
N2 - Differential privacy (DP) is widely being employed in the industry as a practical standard for privacy protection. While private training of computer vision or natural language processing applications has been studied extensively, the computational challenges of training of recommender systems (RecSys) with DP have not been explored. In this work, we first present our detailed characterization of private RecSys training using DP-SGD, root-causing its several performance bottlenecks. Specifically, we identify DP-SGD's noise sampling and noisy gradient update stage to suffer from a severe compute and memory bandwidth limitation, respectively, causing significant performance overhead in training private RecSys. Based on these findings, we propose LazyDP, an algorithm-software co-design that addresses the compute and memory challenges of training RecSys with DP-SGD. Compared to a state-of-the-art DP-SGD training system, we demonstrate that LazyDP provides an average 119× training throughput improvement while also ensuring mathematically equivalent, differentially private RecSys models to be trained.
AB - Differential privacy (DP) is widely being employed in the industry as a practical standard for privacy protection. While private training of computer vision or natural language processing applications has been studied extensively, the computational challenges of training of recommender systems (RecSys) with DP have not been explored. In this work, we first present our detailed characterization of private RecSys training using DP-SGD, root-causing its several performance bottlenecks. Specifically, we identify DP-SGD's noise sampling and noisy gradient update stage to suffer from a severe compute and memory bandwidth limitation, respectively, causing significant performance overhead in training private RecSys. Based on these findings, we propose LazyDP, an algorithm-software co-design that addresses the compute and memory challenges of training RecSys with DP-SGD. Compared to a state-of-the-art DP-SGD training system, we demonstrate that LazyDP provides an average 119× training throughput improvement while also ensuring mathematically equivalent, differentially private RecSys models to be trained.
UR - http://www.scopus.com/inward/record.url?scp=85192135846&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192135846&partnerID=8YFLogxK
U2 - 10.1145/3620665.3640384
DO - 10.1145/3620665.3640384
M3 - Conference contribution
AN - SCOPUS:85192135846
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 616
EP - 630
BT - Summer Cycle
PB - Association for Computing Machinery
Y2 - 27 April 2024 through 1 May 2024
ER -