Skip to main navigation Skip to search Skip to main content

QoS-Diff: Adaptive Auto-tuning Framework for Low-latency Diffusion Model Inference

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Diffusion models are pivotal for generating high-quality images, yet they encounter latency and throughput challenges in data center environments, particularly in meeting stringent service level objectives (SLOs).This paper introduces the Quality of Service-Diffusion (QoS-Diff) framework tailored to optimize the diffusion model's performance in data center environments.QoS-Diff dynamically adjusts processing steps per query, enhancing throughput and reducing latency while ensuring SLO compliance.It utilizes latent vector entropy for step adjustment and incorporates a 'one last step' approach to enhance image quality.Additionally, the system integrates load balancing and CPU-GPU pipelining to boost throughput further.Extensive experiments show that QoS-Diff reduces SLO violations by over 5X, cuts latency by up to 2X, and increases throughput by 1.68X compared to the Stable Diffusion baseline.

Original languageEnglish (US)
Title of host publicationProceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400712739
DOIs
StatePublished - Dec 28 2024
Event6th ACM International Conference on Multimedia in Asia, MMAsia 2024 - Auckland, New Zealand
Duration: Dec 3 2024Dec 6 2024

Publication series

NameProceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024

Conference

Conference6th ACM International Conference on Multimedia in Asia, MMAsia 2024
Country/TerritoryNew Zealand
CityAuckland
Period12/3/2412/6/24

All Science Journal Classification (ASJC) codes

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'QoS-Diff: Adaptive Auto-tuning Framework for Low-latency Diffusion Model Inference'. Together they form a unique fingerprint.

Cite this