Skip to main navigation Skip to search Skip to main content

Quantization for Bayesian Deep Learning: Low-Precision Characterization and Robustness

  • Jun Liang Lin
  • , Ranganath Krishnan
  • , Keyur Ruganathbhai Ranipa
  • , Mahesh Subedar
  • , Vrushabh Sanghavi
  • , Meena Arunachalam
  • , Omesh Tickoo
  • , Ravishankar Iyer
  • , Mahmut Taylan Kandemir

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Bayesian Deep Learning is an emerging field for building robust and trustworthy AI systems due to its ability to estimate reliable uncertainty in neural networks. The need for modeling distribution over parameters and multiple Monte Carlo forward runs in Bayesian neural networks leads to larger model size and significant increase in inference latency compared to deterministic models, which poses challenges for practical deployment. Quantization is a technique that can reduce the model size and also speed up the inference through low-precision computation. In this work, we propose and evaluate a quantization framework and workflow for Bayesian deep learning workloads, which leverages 8-bit integer (INT8) operations to accelerate inference on the 4th Gen Intel Xeon scalable processor (formerly codenamed Sapphire Rapids). We demonstrate that our quantization workflow achieves 6.9x inference throughput speedup on the ImageNet benchmark without sacrificing the model accuracy and quality of uncertainty. Furthermore, we evaluate the effects of quantization on Bayesian neural networks w.r.t. generalizability, robustness against data drift, and its capability in uncertainty estimation on large-scale datasets including a real-world safety-critical application. Our code has been integrated into an open-source project and made available on GitHub at the following URL: https://github.com/IntelLabs/bayesian-torch.

Original languageEnglish (US)
Title of host publicationProceedings - 2023 IEEE International Symposium on Workload Characterization, IISWC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages180-192
Number of pages13
ISBN (Electronic)9798350303179
DOIs
StatePublished - 2023
Event26th IEEE International Symposium on Workload Characterization, IISWC 2023 - Gent, Belgium
Duration: Oct 1 2023Oct 3 2023

Publication series

NameProceedings - 2023 IEEE International Symposium on Workload Characterization, IISWC 2023

Conference

Conference26th IEEE International Symposium on Workload Characterization, IISWC 2023
Country/TerritoryBelgium
CityGent
Period10/1/2310/3/23

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Quantization for Bayesian Deep Learning: Low-Precision Characterization and Robustness'. Together they form a unique fingerprint.

Cite this