TY - GEN
T1 - Comparison of methods for meta-dimensional data analysis using in silico and biological data sets
AU - Holzinger, Emily R.
AU - Dudek, Scott M.
AU - Frase, Alex T.
AU - Fridley, Brooke
AU - Chalise, Prabhakar
AU - Ritchie, Marylyn D.
PY - 2012
Y1 - 2012
N2 - Recent technological innovations have catalyzed the generation of a massive amount of data at various levels of biological regulation, including DNA, RNA and protein. Due to the complex nature of biology, the underlying model may only be discovered by integrating different types of high-throughput data to perform a "meta-dimensional" analysis. For this study, we used simulated gene expression and genotype data to compare three methods that show potential for integrating different types of data in order to generate models that predict a given phenotype: the Analysis Tool for Heritable and Environmental Network Associations (ATHENA), Random Jungle (RJ), and Lasso. Based on our results, we applied RJ and ATHENA sequentially to a biological data set that consisted of genome-wide genotypes and gene expression levels from lymphoblastoid cell lines (LCLs) to predict cytotoxicity. The best model consisted of two SNPs and two gene expression variables with an r-squared value of 0.32.
AB - Recent technological innovations have catalyzed the generation of a massive amount of data at various levels of biological regulation, including DNA, RNA and protein. Due to the complex nature of biology, the underlying model may only be discovered by integrating different types of high-throughput data to perform a "meta-dimensional" analysis. For this study, we used simulated gene expression and genotype data to compare three methods that show potential for integrating different types of data in order to generate models that predict a given phenotype: the Analysis Tool for Heritable and Environmental Network Associations (ATHENA), Random Jungle (RJ), and Lasso. Based on our results, we applied RJ and ATHENA sequentially to a biological data set that consisted of genome-wide genotypes and gene expression levels from lymphoblastoid cell lines (LCLs) to predict cytotoxicity. The best model consisted of two SNPs and two gene expression variables with an r-squared value of 0.32.
UR - http://www.scopus.com/inward/record.url?scp=84859150181&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859150181&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-29066-4_12
DO - 10.1007/978-3-642-29066-4_12
M3 - Conference contribution
AN - SCOPUS:84859150181
SN - 9783642290657
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 134
EP - 143
BT - Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics - 10th European Conference, EvoBIO 2012, Proceedings
T2 - 10th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, EvoBIO 2012
Y2 - 11 April 2012 through 13 April 2012
ER -