Synthetic Augmentation with Large-Scale Unconditional Pre-training

Jiarong Ye, Haomiao Ni, Peng Jin, Sharon X. Huang, Yuan Xue

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Deep learning based medical image recognition systems often require a substantial amount of training data with expert annotations, which can be expensive and time-consuming to obtain. Recently, synthetic augmentation techniques have been proposed to mitigate the issue by generating realistic images conditioned on class labels. However, the effectiveness of these methods heavily depends on the representation capability of the trained generative model, which cannot be guaranteed without sufficient labeled training data. To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training. In particular, we train a latent diffusion model (LDM) on diverse unlabeled datasets to learn common features and generate realistic images without conditional inputs. Then, we fine-tune the model with classifier guidance in latent space on an unseen labeled dataset so that the model can synthesize images of specific categories. Additionally, we adopt a selective mechanism to only add synthetic samples with high confidence of matching to target labels. We evaluate our proposed method by pre-training on three histopathology datasets and testing on a histopathology dataset of colorectal cancer (CRC) excluded from the pre-training datasets. With HistoDiffusion augmentation, the classification accuracy of a backbone classifier is remarkably improved by 6.4% using a small set of the original labels. Our code is available at https://github.com/karenyyy/HistoDiffAug.

Original languageEnglish (US)
Title of host publicationMedical Image Computing and Computer Assisted Intervention – MICCAI 2023 - 26th International Conference, Proceedings
EditorsHayit Greenspan, Hayit Greenspan, Anant Madabhushi, Parvin Mousavi, Septimiu Salcudean, James Duncan, Tanveer Syeda-Mahmood, Russell Taylor
PublisherSpringer Science and Business Media Deutschland GmbH
Pages754-764
Number of pages11
ISBN (Print)9783031438943
DOIs
StatePublished - 2023
Event26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023 - Vancouver, Canada
Duration: Oct 8 2023Oct 12 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14221 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023
Country/TerritoryCanada
CityVancouver
Period10/8/2310/12/23

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Cite this