EarthCube IA: Collaborative Proposal: Building Interoperable Cyberinfrastructure (CI) at the Interface between Paleogeoinformatics and Bioinformatics

  • Miller, Douglas Alan (PI)
  • Graham, Russell W. (CoPI)
  • Bills, Brian B. (CoPI)

Project: Research project

Project Details


Paleontologists provide data about the past distribution and diversity of life. These data are useful both to geologists, because they can help determine the age of rocks, reconstruct past environments, and constrain models of the Earth system; and to biologists interested in the evolutionary history of organisms and the behavior of ecological systems during past global changes. Currently, data about fossils are dispersed across thousands of scientific publications, and dozens of small to large databases, only some of which are publicly available via the Internet. Even publicly available databases can be difficult to access because each stores different kinds of data with different conventions, requiring researchers to individually harmonize searches and their outputs. This project brings together six paleobiological databases so that they share a single set of Internet-based commands by which researchers and the public can easily access fossil records from all of Earth history. By coordinating with other emerging efforts in geological and biological data sharing, best practices, and protocols, we ensure that data will be freely available to all, enabling new scientific syntheses and discovery, more powerful educational opportunities, and general exploration of the history of life on Earth.

The paleobiological sciences sit at the nexus between geosciences and the biosciences, with close interdependencies in both domains. Within the geosciences, information about the past spatiotemporal distribution of organisms, species, and assemblages of species is essential to a wide array of allied disciplines: to sedimentologists and economic geologists studying facies relationships and employing biostratigraphic controls for correlating rock strata, to structural geologists and geophysicists seeking biogeographic constraints on reconstructions of former tectonic plate positions, to paleoclimatologists extracting paleoclimatic signals from paleoecological data, and to earth system modelers seeking to understand how biospheric dynamics have shaped, and continue to shape, the history of the Earth-Life system. Within the biosciences, the fossil record is essential for understanding how contemporary ecological systems are shaped by historical legacies of slow-acting processes, for testing climate-driven models of species distribution and diversity that are being used to project the impacts of 21st century climate change, for constraining phylogenetic models of species divergence and rates of evolution, and for understanding the fundamental drivers of biodiversity (i.e. species extinctions and originations). In an era of global change, when stewarding biodiversity is an urgent societal concern, conservation biologists, global change ecologists, and earth system scientists are all looking to the past to study the behavior of the Earth-Life system during rapid transitions. Paleobiological data are currently served by a wide array of databases that vary in structure, composition, temporal scales, types of data and metadata. To conduct ?global? or holistic analyses of the paleobiological record it is necessary to retrieve data from a variety of these databases - requiring queries of each database to retrieve the types of data needed. The purpose of this project is to make six different paleobiological databases interoperable so that they can be accessed via a common Application Programming Interface (API) to query the data from these and other databases. Towards that end, five key records of North American Pleistocene lakes will be uploaded and become available through this integrative project. This project also will increase the interoperability between these paleobiological resources and contemporary databases of species distributions and diversity, enabling continuous time-series analyses (e.g., of biodiversity) from the beginning of life on earth to today. Integration of the paleobiological databases with databases of the stratigraphic record (Macrostrat) will enhance the value of both types of data. New R packages will facilitate retrieval and analysis of data from all of the databases. Finally, this proposal establishes a Paleobiological Data Consortium, consisting of leaders of cyberinfrastructure resources in the paleobiosciences and allied disciplines, with the goal of sharing best practices and protocols among the geoinformatic and bioinformatic communities.

Effective start/end date9/1/158/31/19


  • National Science Foundation: $199,895.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.