TY - JOUR
T1 - CATE
T2 - A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data
AU - Perera, Deshan
AU - Reisenhofer, Elsa
AU - Hussein, Said
AU - Higgins, Eve
AU - Huber, Christian D.
AU - Long, Quan
N1 - Publisher Copyright:
© 2023 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society.
PY - 2023/8
Y1 - 2023/8
N2 - Statistical tests for molecular evolution provide quantifiable insights into the selection pressures that govern a genome's evolution. Increasing sample sizes used for analysis leads to higher statistical power. However, this requires more computational nodes or longer computational time. CATE (CUDA Accelerated Testing of Evolution) is a computational solution to this problem comprised of two main innovations. The first is a file organization system coupled with a novel search algorithm and the second is a large-scale parallelization of algorithms using both graphical processing unit (GPU) and central processing unit. CATE is capable of conducting evolutionary tests such as Tajima's D, Fu and Li's, and Fay and Wu's test statistics, McDonald–Kreitman Neutrality Index, Fixation Index and Extended Haplotype Homozygosity. CATE is magnitudes faster than standard tools with benchmarks estimating it being on average over 180 times faster. For instance, CATE processes all 54,849 human genes for all 22 autosomal chromosomes across the five super populations present in the 1000 Genomes Project in less than 30 min while counterpart software took 3.62 days. This proven framework has the potential to be adapted for GPU-accelerated large-scale parallel analyses of many evolutionary and genomic analyses.
AB - Statistical tests for molecular evolution provide quantifiable insights into the selection pressures that govern a genome's evolution. Increasing sample sizes used for analysis leads to higher statistical power. However, this requires more computational nodes or longer computational time. CATE (CUDA Accelerated Testing of Evolution) is a computational solution to this problem comprised of two main innovations. The first is a file organization system coupled with a novel search algorithm and the second is a large-scale parallelization of algorithms using both graphical processing unit (GPU) and central processing unit. CATE is capable of conducting evolutionary tests such as Tajima's D, Fu and Li's, and Fay and Wu's test statistics, McDonald–Kreitman Neutrality Index, Fixation Index and Extended Haplotype Homozygosity. CATE is magnitudes faster than standard tools with benchmarks estimating it being on average over 180 times faster. For instance, CATE processes all 54,849 human genes for all 22 autosomal chromosomes across the five super populations present in the 1000 Genomes Project in less than 30 min while counterpart software took 3.62 days. This proven framework has the potential to be adapted for GPU-accelerated large-scale parallel analyses of many evolutionary and genomic analyses.
UR - http://www.scopus.com/inward/record.url?scp=85163646783&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163646783&partnerID=8YFLogxK
U2 - 10.1111/2041-210X.14168
DO - 10.1111/2041-210X.14168
M3 - Article
AN - SCOPUS:85163646783
SN - 2041-210X
VL - 14
SP - 2095
EP - 2109
JO - Methods in Ecology and Evolution
JF - Methods in Ecology and Evolution
IS - 8
ER -