TY - GEN
T1 - Mixup-Transformer
T2 - 28th International Conference on Computational Linguistics, COLING 2020
AU - Sun, Lichao
AU - Xia, Congying
AU - Yin, Wenpeng
AU - Liang, Tingting
AU - Yu, Philip S.
AU - He, Lifang
N1 - Funding Information:
This work is supported in part by NSF under grants III-1763325, III-1909323, and SaTC-1930941.
Publisher Copyright:
© 2020 COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Mixup (Zhang et al., 2017) is a latest data augmentation technique that linearly interpolates input examples and the corresponding labels. It has shown strong effectiveness in image classification by interpolating images at the pixel level. Inspired by this line of research, in this paper, we explore: i) how to apply mixup to natural language processing tasks since text data can hardly be mixed in the raw format; ii) if mixup is still effective in transformer-based learning models, e.g., BERT. To achieve the goal, we incorporate mixup to transformer-based pre-trained architecture, named “mixup-transformer”, for a wide range of NLP tasks while keeping the whole end-to-end training system. We evaluate the proposed framework by running extensive experiments on the GLUE benchmark. Furthermore, we also examine the performance of mixup-transformer in low-resource scenarios by reducing the training data with a certain ratio. Our studies show that mixup is a domain-independent data augmentation technique to pre-trained language models, resulting in significant performance improvement for transformer-based models.
AB - Mixup (Zhang et al., 2017) is a latest data augmentation technique that linearly interpolates input examples and the corresponding labels. It has shown strong effectiveness in image classification by interpolating images at the pixel level. Inspired by this line of research, in this paper, we explore: i) how to apply mixup to natural language processing tasks since text data can hardly be mixed in the raw format; ii) if mixup is still effective in transformer-based learning models, e.g., BERT. To achieve the goal, we incorporate mixup to transformer-based pre-trained architecture, named “mixup-transformer”, for a wide range of NLP tasks while keeping the whole end-to-end training system. We evaluate the proposed framework by running extensive experiments on the GLUE benchmark. Furthermore, we also examine the performance of mixup-transformer in low-resource scenarios by reducing the training data with a certain ratio. Our studies show that mixup is a domain-independent data augmentation technique to pre-trained language models, resulting in significant performance improvement for transformer-based models.
UR - http://www.scopus.com/inward/record.url?scp=85149663227&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149663227&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85149663227
T3 - COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
SP - 3436
EP - 3440
BT - COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
A2 - Scott, Donia
A2 - Bel, Nuria
A2 - Zong, Chengqing
PB - Association for Computational Linguistics (ACL)
Y2 - 8 December 2020 through 13 December 2020
ER -