TY - GEN
T1 - QoS-Diff
T2 - 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
AU - Huo, Pingyi
AU - Sridhar, Ajay Narayanan
AU - Khan, Md Fahim Faysal
AU - Maeng, Kiwan
AU - Narayanan, Vijaykrishnan
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/12/28
Y1 - 2024/12/28
N2 - Diffusion models are pivotal for generating high-quality images, yet they encounter latency and throughput challenges in data center environments, particularly in meeting stringent service level objectives (SLOs).This paper introduces the Quality of Service-Diffusion (QoS-Diff) framework tailored to optimize the diffusion model's performance in data center environments.QoS-Diff dynamically adjusts processing steps per query, enhancing throughput and reducing latency while ensuring SLO compliance.It utilizes latent vector entropy for step adjustment and incorporates a 'one last step' approach to enhance image quality.Additionally, the system integrates load balancing and CPU-GPU pipelining to boost throughput further.Extensive experiments show that QoS-Diff reduces SLO violations by over 5X, cuts latency by up to 2X, and increases throughput by 1.68X compared to the Stable Diffusion baseline.
AB - Diffusion models are pivotal for generating high-quality images, yet they encounter latency and throughput challenges in data center environments, particularly in meeting stringent service level objectives (SLOs).This paper introduces the Quality of Service-Diffusion (QoS-Diff) framework tailored to optimize the diffusion model's performance in data center environments.QoS-Diff dynamically adjusts processing steps per query, enhancing throughput and reducing latency while ensuring SLO compliance.It utilizes latent vector entropy for step adjustment and incorporates a 'one last step' approach to enhance image quality.Additionally, the system integrates load balancing and CPU-GPU pipelining to boost throughput further.Extensive experiments show that QoS-Diff reduces SLO violations by over 5X, cuts latency by up to 2X, and increases throughput by 1.68X compared to the Stable Diffusion baseline.
UR - https://www.scopus.com/pages/publications/85216227548
UR - https://www.scopus.com/pages/publications/85216227548#tab=citedBy
U2 - 10.1145/3696409.3700277
DO - 10.1145/3696409.3700277
M3 - Conference contribution
AN - SCOPUS:85216227548
T3 - Proceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
BT - Proceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
PB - Association for Computing Machinery, Inc
Y2 - 3 December 2024 through 6 December 2024
ER -