CompCog: Modeling syntactic priming in language production according to corpus data

  • Reitter, David T. (PI)

Project: Research project

Project Details


How do humans learn, understand, and produce language? What are the mental representations necessary to compose natural-sounding sentences? Addressing such fundamental questions from cognitive linguistics is key to improving second-language learning in a nation that is becoming increasingly linguistically diverse, and to develop better natural-language computer interfaces. This project follows a big-data approach in that it looks at adaptation between speakers and authors in recorded conversations and large text databases in order to infer mental representations. It follows the basic idea that adaptation indicates the presence of such mental structures. With this methodology, the researchers will use large text datasets as a keyhole into the human mind. They will create an unbiased and largely automatic way to evaluate computational models that describe how the mind achieves fast, fluent, near-perfect language production.

The goal of the project is to develop a psycholinguistic, computational model that spells out precisely the steps and representations necessary for language production. The models can be compared and improved incrementally because they are tested on large-scale language data. This project will develop a cognitive model to describe alignment in language production as found in natural dialogue in speech corpora. As a basis for alignment at the structural level, it will explain and predict syntactic 'priming' effects: E.g., 'The linguist gave the lab keys to his student' primes a listener to mirror the sentence structure with 'The student showed his results to the editor' (target), rather than '... showed the editor his results'. The model will simulate language production with general cognitive operations studied by cognitive psychology, such as cue-based memory retrieval. It will account for key characteristics of priming, including rapid decay, long-term persistence and convergence, lexical boost effects, and interference sensitivity to intervening sentences. The model will be based on a cognitive framework, ACT-R, thereby integrating language processing with general quantitative and computational accounts of memory. A broad-coverage, lexicalized syntax formalism is used to account for real-life language data.

Effective start/end date7/15/1512/31/16


  • National Science Foundation: $75,000.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.