People and societies thrive best when they understand how the social and physical dynamics of their environment work, allowing them to respond appropriately. Natural scientists have built our understanding of the physical world. The scientific understanding they built has contributed to the development of technologies and practices that benefit human economies. For example, genetic sequencing of DNA enables deeper understanding of biological organisms; the consequences for human health, food production, understanding of evolutionary adaptation, etc. have been revolutionary and are still unfolding. The DNA sequence is the blueprint for an organism's anatomical structure (morphology) and function, but images capturing morphology are now much less prevalent than genetic data. Museums and researchers have been creating 3D digital images of natural history collections, and there are extensive 3D image data sets for some model organisms, but these data are mostly in closed collections, and generally unavailable or very difficult to access. This project aims to provide infrastructure to increase the accessibility of anatomical information, with a focus on 3D images. The resource will create the first open access, web-enabled image archive accepting and serving high-resolution, 3D scans of all organisms, called MorphoSource. Standardized descriptive tags will allow scientists to use this database to easily combine genetic and anatomical datasets for the first time, supporting the formulation of novel research questions. MorphoSource will link to other databases (such as iDigBio [www.idigbio.org]) that aggregate information on museum specimens from around the world. Having a shared common resource will change the culture among researchers and museums, making collaborations between physically distant experts more feasible, but it will also open the linked research collections of museums to anyone with Internet access anywhere in the world. Large data sets are prerequisites for many statistical and machine learning methods, so the resource will enable innovations in computational image analysis methods, fostering new types of collaborations that advance field-wide scientific understanding. The resource will track data use, enhancing reproducibility and also providing an objective metric of the value of individual data elements. Open access to the data linked through MorphoSource will enable anyone with Internet access to see the detailed anatomical evidence for theories like evolution. Pilot work has shown that teachers and students eagerly consume this newly available information, with numbers already in the thousands. Positive results of this access include (1) providing a more intuitive type of raw data (compared to DNA sequence) for showing the public why some conclusions about evolutionary relationships were reached, (2) providing an 'interest metric' for the value of natural history museums and the collections they hold, (3) increasing the community of people (including citizen scientists) who have access to the data required to make important discoveries by studying biological variation.
The specific plan for creating the repository for 3D data on all organisms is as follows. The primary goal is to restructure and improve a proof of concept database called MorphoSource. The restructuring will allow MorphoSource to meet the needs of a growing community of researchers and educators through massive upscaling, and to implement a novel approach for economically preserving data for the long term. To accomplish this, the MorphoSource server will be rebuilt to use the Fedora digital asset management architecture, which has been developed by library scientists to serve emerging needs related to the archiving and sharing of digital data. As part of this architecture upgrade, the data hosted on MorphoSource will be given an additional layer of protection through managing asynchronous copies in DuraCloud, a digital data preservation platform that leverages Amazon cloud. This restructuring will allow the MorphoSource server architecture to be integrated with the Duke University Libraries repository infrastructure. MorphoSource will also be able to invite institutional communities to be consortium partners in support of data storage and to enact data preservation techniques that guarantee integrity and readability for the foreseeable future. Additional tools will (1) allow for rapid, automated ingestion of dozens to hundreds of datasets at once, (2) link MorphoSource with major biodiversity archives, and (3) provide in-browser visualization of 3D series of image slices, such as those generated by CT and MRI scanners. The plan includes ingesting thousands of high quality legacy CT datasets from published studies, enabling their reuse, increasing the repeatability of studies. The project leaders plan to directly work with and design tools for K-12 educators and students to help them benefit from this resource. These datasets and educational tools will be available to researchers and the public through the updated MorphoSource website, available at www.morphosource.org.
|Effective start/end date
|9/1/17 → 8/31/21
- National Science Foundation: $505,997.00