CAREER: Large-Scale Exploration and Interpretation of Consumer-Oriented Legal Documents

Project: Research project

Project Details


People face large volumes of text in their everyday lives, and they must choose what to read and what to leave unread. To function in our information society, consumers must read and accept terms of service agreements, financial agreements, health care agreements, rental agreements, privacy policies, and other varieties of "fine print" to receive essential goods and services. These consumer-oriented legal documents (COLDs, for brevity) specify requirements, penalties, boundaries of acceptable use, options for recourse if something goes wrong, privacy practices, intellectual property stipulations, and many other important topics. People tend to accept COLDs without reading or understanding them, and the lack of understanding disempowers individuals and affects them unequally. This project will answer three related questions: (1) What recurring information structures and types of knowledge exist in COLDs?; (2) What are the capabilities and limitations of text mining applied to automating extraction of information from COLDs?; and (3) To what extent do the contents of COLDs intersect with the interests and needs of consumers? Toward answering these questions, the project will develop natural language processing methods that support user engagement with typically unengaging but important text. Additionally, this project will introduce first-year undergraduates to research and encourage them to pursue STEM careers.The project will advance knowledge by focusing on the following goals. First, the project will discover the availability and characteristics of common types of COLDs by creating large-scale corpora of them from online sources. These corpora will enable studying issues in availability, accessibility, navigation, and readability. Second, the project will develop methods for automated extraction of two particularly salient features of COLD text: choice points (statements in text that describe actions a reader can take potentially for their benefit) and outlier statements (statements that deviate from what is typical for the relevant type of COLD, differentiating a COLD from its peers and motivating acute attention). Third, the project will build browser extensions to explore how choice points and outlier statements affect people's engagement with COLDs' contents. To support further research, the project also will produce and disseminate an array of corpora, language models, and other tools for researchers in natural language processing and public policy.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Effective start/end date8/1/237/31/28


  • National Science Foundation: $556,397.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.