Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks

Lichao Sun, Congying Xia, Wenpeng Yin, Tingting Liang, Philip S. Yu, Lifang He

Research output: Chapter in Book/Report/Conference proceedingConference contribution

66 Scopus citations

Abstract

Mixup (Zhang et al., 2017) is a latest data augmentation technique that linearly interpolates input examples and the corresponding labels. It has shown strong effectiveness in image classification by interpolating images at the pixel level. Inspired by this line of research, in this paper, we explore: i) how to apply mixup to natural language processing tasks since text data can hardly be mixed in the raw format; ii) if mixup is still effective in transformer-based learning models, e.g., BERT. To achieve the goal, we incorporate mixup to transformer-based pre-trained architecture, named “mixup-transformer”, for a wide range of NLP tasks while keeping the whole end-to-end training system. We evaluate the proposed framework by running extensive experiments on the GLUE benchmark. Furthermore, we also examine the performance of mixup-transformer in low-resource scenarios by reducing the training data with a certain ratio. Our studies show that mixup is a domain-independent data augmentation technique to pre-trained language models, resulting in significant performance improvement for transformer-based models.

Original languageEnglish (US)
Title of host publicationCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
EditorsDonia Scott, Nuria Bel, Chengqing Zong
PublisherAssociation for Computational Linguistics (ACL)
Pages3436-3440
Number of pages5
ISBN (Electronic)9781952148279
StatePublished - 2020
Event28th International Conference on Computational Linguistics, COLING 2020 - Virtual, Online, Spain
Duration: Dec 8 2020Dec 13 2020

Publication series

NameCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference

Conference

Conference28th International Conference on Computational Linguistics, COLING 2020
Country/TerritorySpain
CityVirtual, Online
Period12/8/2012/13/20

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Computational Theory and Mathematics
  • Theoretical Computer Science

Cite this