## Abstract

Studying the association between quantitative phenotype (such as height or weight) and single nucleotide polymorphisms (SNPs) is an important problem in biology. To understand underlying mechanisms of complex phenotypes, it is often necessary to consider joint genetic effects across multiple SNPs. ANOVA (analysis of variance) test is routinely used in association study. Important findings from studying gene-gene (SNP-pair) interactions are appearing in the literature. However, the number of SNPs can be up to millions. Evaluating joint effects of SNPs is a challenging task even for SNP-pairs. Moreover, with large number of SNPs correlated, permutation procedure is preferred over simple Bonferroni correction for properly controlling family-wise error rate and retaining mapping power, which dramatically increases the computational cost of association study. In this paper, we study the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. We derive an upper bound of SNP-pair ANOVA test, which can be expressed as the sum of two terms. The first term is based on single-SNP ANOVA test. The second term is based on the SNPs and independent of any phenotype permutation. Furthermore, SNP-pairs can be organized into groups, each of which shares a common upper bound. This allows for maximum reuse of intermediate computation, efficient upper bound estimation, and effective SNP-pair pruning. Consequently, FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones. Extensive experiments demonstrate that FastANOVA is orders of magnitude faster than the brute-force implementation of ANOVA tests on all SNP pairs.

Original language | English (US) |
---|---|

Title of host publication | KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining |

Pages | 821-829 |

Number of pages | 9 |

DOIs | |

State | Published - 2008 |

Event | 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 - Las Vegas, NV, United States Duration: Aug 24 2008 → Aug 27 2008 |

### Publication series

Name | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|

### Other

Other | 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 |
---|---|

Country/Territory | United States |

City | Las Vegas, NV |

Period | 8/24/08 → 8/27/08 |

## All Science Journal Classification (ASJC) codes

- Software
- Information Systems