TY - GEN
T1 - Learning Neural Processes on the Fly
AU - Jung, Younghwa
AU - Yuan, Zhenyuan
AU - Seo, Seung Woo
AU - Zhu, Minghui
AU - Kim, Seong Woo
N1 - Funding Information:
This work was supported by the MOTIE (Ministry of Trade, Industry, and Energy) in Korea, under the Fostering Global Talents for Innovative Growth Program (P0008747) supervised by the Korea Institute for Advancement of Technology (KIAT), and in part by the National Research Foundation of Korea (NRF) through the Ministry of Science and ICT under Grant 2021R1A2C1093957. The Institute of Engineering Research at Seoul National University provided research facilities for this work. References [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012. [2] S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” in ICLR, 2017. [3] K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” in Advances in Neural Information Processing Systems,2018,pp. 4754–4765 [4] S. Thrun and L. Pratt, Learning to learn. Springer Science & Business Media, 2012. [5] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 1126–1135. [6] R. M. Neal, “Priors for infinite networks (tech. rep. no. crg-tr-94-1),” University of Toronto, 1994.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Deep neural networks (DNNs) have performed impressively on a wide range of tasks, but they usually require a significant number of training samples to achieve good performance. Thus, DNNs do not work well in low-data regimes because they tend to overfit a small dataset and make poor predictions. In contrast, shallow neural networks (SNNs) generally are robust against overfitting in low-data regimes and converge more quickly than DNNs, but they struggle to represent very complex systems. Hence, DNNs and SNNs have a complementary relationship, and combining their benefits can provide fast-learning capability with high asymptotic performance, as meta-learning does. However, aggregating heterogeneous methods with opposite properties is not trivial, as it can make the combined method inferior to each base method. In this paper, we propose a new algorithm called anytime neural processes that combines DNNs and SNNs and can work in both low-data and high-data regimes. To combine heterogeneous models effectively, we propose a novel aggregation method based on a generalized product-of-exports and a winner-take-all gate network. Moreover, we discuss the theoretical basis of the proposed method. Experiments on a public dataset show that the proposed method achieves comparable performance with other state-of-the-art methods.
AB - Deep neural networks (DNNs) have performed impressively on a wide range of tasks, but they usually require a significant number of training samples to achieve good performance. Thus, DNNs do not work well in low-data regimes because they tend to overfit a small dataset and make poor predictions. In contrast, shallow neural networks (SNNs) generally are robust against overfitting in low-data regimes and converge more quickly than DNNs, but they struggle to represent very complex systems. Hence, DNNs and SNNs have a complementary relationship, and combining their benefits can provide fast-learning capability with high asymptotic performance, as meta-learning does. However, aggregating heterogeneous methods with opposite properties is not trivial, as it can make the combined method inferior to each base method. In this paper, we propose a new algorithm called anytime neural processes that combines DNNs and SNNs and can work in both low-data and high-data regimes. To combine heterogeneous models effectively, we propose a novel aggregation method based on a generalized product-of-exports and a winner-take-all gate network. Moreover, we discuss the theoretical basis of the proposed method. Experiments on a public dataset show that the proposed method achieves comparable performance with other state-of-the-art methods.
UR - http://www.scopus.com/inward/record.url?scp=85143835214&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143835214&partnerID=8YFLogxK
U2 - 10.1109/ICCE-Asia57006.2022.9954707
DO - 10.1109/ICCE-Asia57006.2022.9954707
M3 - Conference contribution
AN - SCOPUS:85143835214
T3 - 2022 IEEE International Conference on Consumer Electronics-Asia, ICCE-Asia 2022
BT - 2022 IEEE International Conference on Consumer Electronics-Asia, ICCE-Asia 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Conference on Consumer Electronics-Asia, ICCE-Asia 2022
Y2 - 26 October 2022 through 28 October 2022
ER -