Funding supports the purchase of a share of a 128-node Lion-XL Linux cluster, a disk array, and supporting central server and network equipment dedicated to the support of research in the statistical sciences at Pennsylvania State University. By distributing large-scale jobs on the Lion-XL Linux cluster, computing speed can be improved by two or more orders of magnitude than can be achieved from any single machine alone. This facilitates the efficient implementation of computationally intensive algorithms for complex hull peeling of massive astronomical data, fitting 2-D multiresolution hidden Markov models to images, Monte Carlo simulations to assess performance of covariate selection procedures, nonparametric maximum likelihood estimation of the number of genes in a cDNA library, and Markov Chain Monte Algorithms for spatiotemporal prediction of soil moisture fields. The disk array is used to store massive astronomical, environmental, genomic, and image data.
The funded equipment is used to support five applied statistical research projects:
1. Analysis of Massive Streaming Data in Astronomy (G.J. Babu). Methods are developed to efficiently obtain statistical summaries of data from the Micron All Sky Survey and Sloan Digital Sky Survey. Obtaining such summaries is challenged by the size of these data, ranging up to tens of terabytes in length (millions of millions in length).
2. Studying Digital Imagery of Ancient Paintings by Mixtures of Stochastic Models (J. Li). Methods are developed to profile the styles of Asian artists, providing a powerful tool to art historians for studying connections among artists or periods in art history.
3. Model Selection for Semi-parametric Regression Analysis (R. Li). Semi-parametric regression models have found many applications in medical research. Methods for model selection are required to determine what characteristics of individual patients significantly impact their health outcomes.
4. Mixture Modeling in EST Data (B. Lindsay). Methods are developed for estimating the number of expressed genes in a DNA library. Such estimates are required to further our understanding of gene function.
5. Hierarchical Spatiotemporal Modeling of Soil Moisture. A statistical approach is developed for large-scale prediction of patterns of variation in soil moisture, based on borrowing strength from multiple data sources including field sampling, remote sensing, and soil maps. Such predictions are required for global circulation models of climate change, hydrological and contaminant transport models, and for decisions regarding land use, crop selection, and irrigation.
This research impacts not only the research of the investigator and his colleagues, but also the research of their collaborators in astronomy, environmental sciences, genetics, and art history. Graduate students of the participating researchers are central participants in the development of algorithms for exploiting the distributed computing environment of the Lion XL Linux cluster.
|Effective start/end date
|8/1/03 → 7/31/04
- National Science Foundation: $65,347.00