CONCURRENT ADVERSARIAL LEARNING FOR LARGE-BATCH TRAINING

Yong Liu, Xiangning Chen, Minhao Cheng, Cho Jui Hsieh, Yang You

Research output: Contribution to conferencePaperpeer-review

5 Scopus citations

Abstract

Large-batch training has become a widely used technique when training neural networks with a large number of GPU/TPU processors. As batch size increases, stochastic optimizers tend to converge to sharp local minima, leading to degraded test performance. Current methods usually use extensive data augmentation to increase the batch size as a remedy, but we found the performance brought by data augmentation decreases with the increase of batch size. In this paper, we propose to leverage adversarial learning to increase the batch size in large-batch training. Despite being a natural choice for smoothing the decision surface and biasing towards a flat region, adversarial learning has not been successfully applied in large-batch training since it requires at least two sequential gradient computations at each step. To overcome this issue, we propose a novel Concurrent Adversarial Learning (ConAdv) method that decouples the sequential gradient computations in adversarial learning by utilizing stale parameters. Experimental results demonstrate that ConAdv can successfully increase the batch size of ResNet-50 training on ImageNet while maintaining high accuracy. This is the first work that successfully scales the ResNet-50 training batch size to 96K.

Original languageEnglish (US)
StatePublished - 2022
Event10th International Conference on Learning Representations, ICLR 2022 - Virtual, Online
Duration: Apr 25 2022Apr 29 2022

Conference

Conference10th International Conference on Learning Representations, ICLR 2022
CityVirtual, Online
Period4/25/224/29/22

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Cite this