Investigating Backpropagation Alternatives when Learning to Dynamically Count with Recurrent Neural Networks

Ankur Mali, Alexander Ororbia, Daniel Kifer, Lee Giles

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations


Artificial neural networks, such as recurrent neural networks (RNNs) and transformers, have empirically demonstrated impressive performance across many natural language processing tasks. However, automated text processing at a deeper and more interpretable level arguably requires extracting intricate patterns such as underlying grammatical structures. As a result, correctly interpreting a neural language model would require an understanding of linguistic structure through formal language theory. Nonetheless, there is often a discrepancy between theoretical and practical findings that restrict models informed by formal language theory in real-life scenarios. For instance, while learning context-free grammars (CFGs), existing neural models fall short because they lack appropriate memory structures. In this work, we investigate how learning algorithms affect the generalization ability of RNNs that are designed to learn context-free languages (CFGs) as well as their ability to encode hierarchical representations. To do so, we investigate a range of learning algorithms on complex, context-free languages such as the Dyck languages, with a focus on the RNN’s ability to generalize to longer sequences when processing a CFG. Our results demonstrate that a Long Short-term memory (LSTM) RNN equipped with second-order connections, trained with the sparse attentive backtracking (SAB) algorithm, performs stably across various Dyck languages and successfully emulates real-time-counter machines. We empirically show that RNNs without external memory are incapable of recognizing Dyck-2 languages, which require a stack-like structure. We finally investigate each learning algorithm’s performance on real-world language modeling tasks using the Penn Tree Bank and text8 benchmarks. We further investigate how an increase in model parameters affects each RNN’s stability and grammar recognition performance when trained using different learning algorithms.

Original languageEnglish (US)
Pages (from-to)154-175
Number of pages22
JournalProceedings of Machine Learning Research
StatePublished - 2021
Event15th International Conference on Grammatical Inference, ICGI 2020/21 - Virtual, Online
Duration: Aug 23 2021Aug 27 2021

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Cite this