Abstract
To learn complex formal grammars, recurrent neural networks (RNNs) require sufficient computational resources to ensure correct grammar recognition. One approach to expand model capacity is to couple an RNN to an external stack memory. Here, we introduce a “neural state” pushdown automaton (NSPDA), which consists of a discrete stack instead of an continuous one and is coupled to a neural network state machine. We empirically show its effectiveness in recognizing various context-free grammars (CFGs). First, we develop the underlying mechanics of the proposed higher order recurrent network and its manipulation of a stack as well as how to stably program its underlying pushdown automaton (PDA). We also introduce a noise regularization scheme for higher-order (tensor) networks and design an algorithm for improved incremental learning. Finally, we design a method for inserting grammar rules into a NSPDA and empirically show that this prior knowledge improves its training convergence time by an order of magnitude and, in some cases, leads to better generalization. The NSPDA is also compared to a classical analog stack neural network pushdown automaton (NNPDA) as well as a wide array of first and second-order RNNs with and without external memory, trained using different learning algorithms. Our results show that for the Dyck languages, prior rule-based knowledge is critical for optimization convergence and for ensuring generalization to longer sequences at test time. We observe that many RNNs with and without memory, but no prior knowledge, struggle to converge and generalize on complex and longer CFGs. Impact Statement—The use of machine learning methods in real world applications continues to grow. Real world applications often require such systems to use and understand temporal processes and to use memory structures. Context free grammars are important parts of many real world applications such as control systems, natural language processing, and software. This article addresses how well temporal machine learning models such as recurrent neural networks (RNNs) learn pushdown automata and their associated context free grammars by connecting to and learning to use an external stack memory. Showing how knowledge and memory structures can be incorporated in machine learning models gives insights as how prior knowledge can be effectively used for training and knowledge extraction. Such work aids in the explainability and verification of these machine learning methods and builds trust in their use.
Original language | English (US) |
---|---|
Pages (from-to) | 193-205 |
Number of pages | 13 |
Journal | IEEE Transactions on Artificial Intelligence |
Volume | 1 |
Issue number | 3 |
DOIs | |
State | Published - Dec 2020 |
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Artificial Intelligence