Project Details

Description

The Pennsylvania State University at University Park is awarded a grant to study regional (primarily intra-chromosomal) variation and co-variation in rates of different mutation types from comparisons of completely sequenced mammalian genomes and human re-sequencing data. Mutations are the source of genetic variation in natural populations and provide material for molecular evolution, yet the mechanisms of mutagenesis are, to date, not completely understood. The project will first characterize the regional variation and co-variation of mutation types such as nucleotide substitutions, small insertions and deletions, and changes in microsatellite repeat number, examining their genomic co-occurrence and linear association at multiple scales. This will lead to insights on co-occurring mutation types and on genomic scales at which their co-variation prevails. Secondly, the team will investigate potential causes of regional rate variation and co-variation for these mutation types, simultaneously relating them to genome landscape features with linear approaches at multiple scales. This will lead to insights on the role of genomic features in explaining regional rate variation and co-variation for mutations of different types. Thirdly, they will assess the need for, and implement, non-linear analysis techniques and regression methods. Fourth, the results on mutation rates and co-occurrence will be used to improve computational predictions of functional regions by means of background corrections exploiting several mutation types simultaneously. Local corrections will be based on rates of multiple mutation types, and employed to improve the performance of functional element prediction algorithms. Finally, the computational and statistical tools developed in this project will be implemented in readily accessible software suites in Galaxy, a free-standing genome analysis platform (http://galaxyproject.org). Tools for detecting mutations, estimating and apportioning mutation rates and genomic features in windows, and applying multivariate, multi-scale and non-linear techniques will be integrated into a web-based platform that will make them readily available for the analysis of any sequenced genomes. The resulting framework will be highly interactive, based on proven methodology, and easily accessible by other researchers and educators with no need for programming experience. Graduate students working on this project will acquire interdisciplinary training in biology, computer science, and statistics. Undergraduate students from underrepresented groups, women and minorities, will be recruited for the project through existing programs at Penn State.

StatusFinished
Effective start/end date8/1/107/31/15

Funding

  • National Science Foundation: $627,700.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.