TY - JOUR
T1 - Unifying adversarial training algorithms with data gradient regularization
AU - Ororbia, Alexander G.
AU - Kifer, Daniel
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2017 Massachusetts Institute of Technology.
PY - 2017/4/1
Y1 - 2017/4/1
N2 - Many previous proposals for adversarial training of deep neural nets have included directly modifying the gradient, training on a mix of original and adversarial examples, using contractive penalties, and approximately optimizing constrained adversarial objective functions. In this article, we show that these proposals are actually all instances of optimizing a general, regularized objective we call DataGrad. Our proposed DataGrad framework, which can be viewed as a deep extension of the layerwise contractive autoencoder penalty, cleanly simplifies prior work and easily allows extensions such as adversarial training with multitask cues. In our experiments, we find that the deep gradient regularization of DataGrad (which also has L1 and L2 flavors of regularization) outperforms alternative forms of regularization, including classical L1, L2, and multitask, on both the original data set and adversarial sets. Furthermore, we find that combining multitask optimization with DataGrad adversarial training results in the most robust performance.
AB - Many previous proposals for adversarial training of deep neural nets have included directly modifying the gradient, training on a mix of original and adversarial examples, using contractive penalties, and approximately optimizing constrained adversarial objective functions. In this article, we show that these proposals are actually all instances of optimizing a general, regularized objective we call DataGrad. Our proposed DataGrad framework, which can be viewed as a deep extension of the layerwise contractive autoencoder penalty, cleanly simplifies prior work and easily allows extensions such as adversarial training with multitask cues. In our experiments, we find that the deep gradient regularization of DataGrad (which also has L1 and L2 flavors of regularization) outperforms alternative forms of regularization, including classical L1, L2, and multitask, on both the original data set and adversarial sets. Furthermore, we find that combining multitask optimization with DataGrad adversarial training results in the most robust performance.
UR - http://www.scopus.com/inward/record.url?scp=85016035546&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85016035546&partnerID=8YFLogxK
U2 - 10.1162/NECO_a_00928
DO - 10.1162/NECO_a_00928
M3 - Article
C2 - 28095194
AN - SCOPUS:85016035546
SN - 0899-7667
VL - 29
SP - 867
EP - 887
JO - Neural computation
JF - Neural computation
IS - 4
ER -