Management of Complex Data Structures: Upgrades to EUGene data software

Project: Research project

Project Details


Since it was first made available in 1998, the software program EUGene has become a useful tool for scholars who analyze international conflict and crisis data by providing simple to use tools for dataset construction. The software development laid out in this proposal will further expand the EUGene program, broadening its application into a tool for developing international relations, comparative politics, and comparative political economy data. Continued program development will focus on a complete redesign of the data management engine to allow for the development of new, more flexible datasets and research designs that better fit contemporary research needs.

The program makes routine a set of data preparation tasks that are cumbersome and difficult, and keeps track of critical research design choices made by users. This facilitates more advanced research and theorizing by scholars by freeing them of the necessity to perform technical data manipulations. AS in coding previous versions, the PI's integrate graduate and undergraduate student programmers into the project as well as supervising contracted professional software engineers. EUGENE will benefit both experienced researchers as undergraduate and graduate students in training.

In this expansion, the focus is on five key new additions that will advance future scientific understanding and discovery, while building on existing work. The software will be freely distributed with a point and click user interface. The development team will add five new program attributes to broaden the impacts of the exiting software. 1. Develop an observational format that relaxes the current restrictions of monadic or dyadic data allowing for k-adic research designs. Using dy-adic data to analyze what are, in reality, k-adic events leads to model misspecification and, inevitably, substantial statistical bias. 2. Allow users to specify the unit of observation. Users will also be able to select the temporal element, e.g. daily, weekly, monthly, quarterly, or annual data. Users will be able to choose the units including the country, the IGO, the terrorist group, MID, or any user specified non-state actor. 3. Develop an expanded sampling engine to accommodate the greatly increased population of observations the new data structures will generate. 4. Build in an expanding scaling and measurement engine that allows users to apply principle component analysis and factor analysis towards the building of a wide range of scales and indices such as NOMINATE and S scores. 5. Develop the data handling routines needed for spatial regression analysis, to control for spatial interdependence in the user's data.

In addition EUGene 4.0 will include the most recent releases of an expanding variety of pre-existing datasets.

Effective start/end date9/15/118/31/14


  • National Science Foundation: $494,669.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.