Improving the Efficiency of Militarized Interstate Dispute Data Collection using Automated Textual Analysis

Project: Research project

Project Details


The Militarized Interstate Disputes (MID) dataset is the most prominent and frequently used collection in the study of international conflict. Updating this dataset has become both labor-intensive and prohibitively expensive, with the result being a cycle of delayed updates and escalating costs when updates do occur. To maintain the data on a continuing basis requires a smoother, smaller, more efficient, and less expensive methodology. This project proposes to use automated text analysis algorithms to identify news stories that contain codable militarized interstate actions, which are the most costly and labor-intensive part of the process. The goal is to produce a search process that closely emulates previous human data collection, allowing for more timely and regular updates of the MID dataset in the future. This project will modify a search-engine - TABARI (Textual Analysis By Augmented Replacement Instructions) - that was developed as part of the Kansas Event Data Project (KEDS) to identify news reports that contain information about events of interest to the MID project. The project will do this by experimenting on news reports that have already been coded for the year 2001. We want to refine TABARI so that it can find the news stories that human coders previously identified without producing large numbers of irrelevant reports. Human coders will read the news stories identified by TABARI and verify that they have relevant material or determine that the events reported do not constitute militarized incidents. We expect to refine search procedures so that the new routine can be applied to the task of gathering information in the next broad MID data collection and in subsequent expansions.

The broader significance of this project is that successful completion will provide the basis for an update of the MID dataset for the period 2002 through 2008 and nearly continuous updates thereafter. Because the MID data is so widely used by the scholarly community, frequent updates will benefit researchers addressing a wide range of research questions, including but not limited to the democratic peace, arms races, alliances, conflict management, diversionary uses of force, political geography and territorial disputes, crisis escalation, regional security, trade and interdependence, international norms, collective security, military strategy, Palestinian attitudes, intervention, and early warning systems. In particular, expansion of the data set to this new period will allow for stronger examination of issues regarding international conflict in the post-Cold War and post-9/11 world. Moreover, this project will illustrate the feasibility of this type of automated coding for other datasets in the social sciences that are used widely by scholars.

Effective start/end date7/15/0712/31/09


  • National Science Foundation: $144,241.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.