Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers

Jianbo Ye, Xin Lu, Zhe Lin, James Z. Wang

Research output: Contribution to conferencePaperpeer-review

297 Scopus citations

Abstract

Model pruning has become a useful technique that improves the computational efficiency of deep learning, making it possible to deploy solutions in resource-limited scenarios. A widely-used practice in relevant work assumes that a smaller-norm parameter or feature plays a less informative role at the inference time. In this paper, we propose a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that does not critically rely on this assumption. Instead, it focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task of making high-dimensional tensors of CNN structured sparse. Our approach takes two stages: first to adopt an end-to-end stochastic training method that eventually forces the outputs of some channels to be constant, and then to prune those constant channels from the original neural network by adjusting the biases of their impacting layers such that the resulting compact model can be quickly fine-tuned. Our approach is mathematically appealing from an optimization perspective and easy to reproduce. We experimented our approach through several image learning benchmarks and demonstrate its interesting aspects and competitive performance.

Original languageEnglish (US)
StatePublished - Jan 1 2018
Event6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada
Duration: Apr 30 2018May 3 2018

Conference

Conference6th International Conference on Learning Representations, ICLR 2018
Country/TerritoryCanada
CityVancouver
Period4/30/185/3/18

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Education
  • Computer Science Applications
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers'. Together they form a unique fingerprint.

Cite this