Project Details
Description
This project is concerned with developing statistical tools for several types of data that have become increasingly common during the big data revolution occurring in the sciences. The first type of data are called manifolds, which are low-dimensional structures in higher-dimensional spaces, and occur frequently in biomedical imaging studies. The second type of data consists of large numbers of predictors as are commonly found in genetic studies. The goals of this project are to develop statistical tools to analyze samples of manifolds or functions, as outcomes, alongside large numbers of scalar predictors. In addition, due to the potential for privacy breaches in such data, the project also aims to develop privacy mechanisms that guarantee the anonymity of subjects when statistical summaries and analyses of these data are released. This work will have broad impacts in both statistics and bioinformatics.
The project develops statistical methods for complex high-dimensional data using techniques from functional data analysis (FDA). The research is divided into three areas: manifold data analysis (MDA), high-dimensional functional regression, and statistical disclosure control (SDC). MDA, which is the focus here, combines FDA and manifold learning techniques to analyze statistically samples of manifolds. Methods for high-dimensional functional regression will be developed for handling large numbers of scalar predictors alongside functional/manifold outcomes; the tools will exploit smooth and sparse structures for increased statistical performance. Lastly, SDC methods for FDA will be developed. Privacy remains a central concern for researchers and society as a whole, but very little has been done for functional data, which may contain substantial amounts of individual level information. Functional data require carefully constructed privacy mechanisms to ensure that anonymity is maintained while preserving the scientific import of the objects. These three areas are motivated by joint anthropological work, ADAPT, analyzing high-resolution 3D facial images, with data consisting of thousands of subjects, thousands of measurements per face, and hundreds of thousands of genetic markers. The goal is to uncover the genetic architecture of the human face, and to better understand the ancestry of different facial features. The developed methods will be applied to this example, theoretical results will be derived, and sophisticated software will be provided.
Status | Finished |
---|---|
Effective start/end date | 7/1/17 → 6/30/20 |
Funding
- National Science Foundation: $190,347.00