TY - JOUR
T1 - Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations
AU - Ororbia, Alexander
AU - Mali, Ankur
AU - Lee Giles, C.
AU - Kifer, Daniel
N1 - Funding Information:
Manuscript received December 10, 2018; revised August 8, 2019; accepted November 2, 2019. Date of publication January 20, 2020; date of current version October 6, 2020. This work was supported in part by the National Science Foundation. (Alexander Ororbia and Ankur Mali contributed equally to this work.) (Corresponding author: Alexander Ororbia.) A. Ororbia is with the Department of Computer Science, Rochester Institute of Technology, Rochester, NY 14623 USA (e-mail: ago@cs.rit.edu). A. Mali, C. L. Giles, and D. Kifer are with Pennsylvania State University, State College, PA 16801 USA. Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org. This article has supplementary downloadable material available at http://ieeexplore.ieee.org, provided by the authors. Digital Object Identifier 10.1109/TNNLS.2019.2953622
Publisher Copyright:
© 2012 IEEE.
PY - 2020/10
Y1 - 2020/10
N2 - Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications, including language modeling and speech processing. However, training these models often relies on backpropagation through time (BPTT), which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of backpropagation itself does not permit the use of nondifferentiable activation functions and is inherently sequential, making parallelization of the underlying training process difficult. Here, we propose the parallel temporal neural coding network (P-TNCN), a biologically inspired model trained by the learning algorithm we call local representation alignment. It aims to resolve the difficulties and problems that plague recurrent networks trained by BPTT. The architecture requires neither unrolling in time nor the derivatives of its internal activation functions. We compare our model and learning procedure with other BPTT alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization. We show that it outperforms these on-sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we denote as Bouncing NotMNIST, and Penn Treebank. Notably, our approach can, in some instances, outperform full BPTT as well as variants such as sparse attentive backtracking. Significantly, the hidden unit correction phase of P-TNCN allows it to adapt to new data sets even if its synaptic weights are held fixed (zero-shot adaptation) and facilitates retention of prior generative knowledge when faced with a task sequence. We present results that show the P-TNCN's ability to conduct zero-shot adaptation and online continual sequence modeling.
AB - Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications, including language modeling and speech processing. However, training these models often relies on backpropagation through time (BPTT), which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of backpropagation itself does not permit the use of nondifferentiable activation functions and is inherently sequential, making parallelization of the underlying training process difficult. Here, we propose the parallel temporal neural coding network (P-TNCN), a biologically inspired model trained by the learning algorithm we call local representation alignment. It aims to resolve the difficulties and problems that plague recurrent networks trained by BPTT. The architecture requires neither unrolling in time nor the derivatives of its internal activation functions. We compare our model and learning procedure with other BPTT alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization. We show that it outperforms these on-sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we denote as Bouncing NotMNIST, and Penn Treebank. Notably, our approach can, in some instances, outperform full BPTT as well as variants such as sparse attentive backtracking. Significantly, the hidden unit correction phase of P-TNCN allows it to adapt to new data sets even if its synaptic weights are held fixed (zero-shot adaptation) and facilitates retention of prior generative knowledge when faced with a task sequence. We present results that show the P-TNCN's ability to conduct zero-shot adaptation and online continual sequence modeling.
UR - http://www.scopus.com/inward/record.url?scp=85092679798&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092679798&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2019.2953622
DO - 10.1109/TNNLS.2019.2953622
M3 - Article
C2 - 31976910
AN - SCOPUS:85092679798
SN - 2162-237X
VL - 31
SP - 4267
EP - 4278
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 10
M1 - 8963851
ER -