TY - GEN
T1 - CNNs with Compact Activation Function
AU - Wang, Jindong
AU - Xu, Jinchao
AU - Zhu, Jianqing
N1 - Funding Information:
Acknowledgment. The work of Jinchao Xu is supported in part by the National Science Foundation (Grant No. DMS-2111387). The work of Jianqing Zhu is supported in part by Beijing Natural Science Foundation (Grant No. Z200002). The work of Jindong Wang is supported in part by High Performance Computing Platform of Peking University.
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Activation function plays an important role in neural networks. We propose to use hat activation function, namely the first order B-spline, as activation function for CNNs including MgNet and ResNet. Different from commonly used activation functions like ReLU, the hat function has a compact support and no obvious spectral bias. Although spectral bias is thought to be beneficial for generalization, we show that MgNet and ResNet with hat function still exhibit a slightly better generalization performance than CNNs with ReLU function by our experiments of classification on MNIST, CIFAR10/100 and ImageNet datasets. This indicates that CNNs without spectral bias can have a good generalization capability. We also illustrate that although hat function has a small activation area which is more likely to induce vanishing gradient problem, hat CNNs with various initialization methods still works well.
AB - Activation function plays an important role in neural networks. We propose to use hat activation function, namely the first order B-spline, as activation function for CNNs including MgNet and ResNet. Different from commonly used activation functions like ReLU, the hat function has a compact support and no obvious spectral bias. Although spectral bias is thought to be beneficial for generalization, we show that MgNet and ResNet with hat function still exhibit a slightly better generalization performance than CNNs with ReLU function by our experiments of classification on MNIST, CIFAR10/100 and ImageNet datasets. This indicates that CNNs without spectral bias can have a good generalization capability. We also illustrate that although hat function has a small activation area which is more likely to induce vanishing gradient problem, hat CNNs with various initialization methods still works well.
UR - http://www.scopus.com/inward/record.url?scp=85134349788&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134349788&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-08754-7_40
DO - 10.1007/978-3-031-08754-7_40
M3 - Conference contribution
AN - SCOPUS:85134349788
SN - 9783031087530
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 319
EP - 327
BT - Computational Science - ICCS 2022, 22nd International Conference, Proceedings
A2 - Groen, Derek
A2 - de Mulatier, Clélia
A2 - Krzhizhanovskaya, Valeria V.
A2 - Sloot, Peter M.A.
A2 - Paszynski, Maciej
A2 - Dongarra, Jack J.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 22nd Annual International Conference on Computational Science, ICCS 2022
Y2 - 21 June 2022 through 23 June 2022
ER -