TY - JOUR
T1 - Constrained maximum entropy models to select genotype interactions associated with censored failure times
AU - Yang, Aotian
AU - Miller, David
AU - Pan, Qing
N1 - Publisher Copyright:
© 2018 World Scientific Publishing Europe Ltd.
PY - 2018/12/1
Y1 - 2018/12/1
N2 - We propose a novel screening method targeting genotype interactions associated with disease risks. The proposed method extends the maximum entropy conditional probability model to address disease occurrences over time. Continuous occurrence times are grouped into intervals. The model estimates the conditional distribution over the disease occurrence intervals given individual genotypes by maximizing the corresponding entropy subject to constraints linking genotype interactions to time intervals. The EM algorithm is employed to handle observations with uncertainty, for which the disease occurrence is censored. Stepwise greedy search is proposed to screen a large number of candidate constraints. The minimum description length is employed to select the optimal set of constraints. Extensive simulations show that five or so quantile-dependent intervals are sufficient to categorize disease outcomes into different risk groups. Performance depends on sample size, number of genotypes, and minor allele frequencies. The proposed method outperforms the likelihood ratio test, Lasso, and a previous maximum entropy method with only binary (disease occurrence, non-occurrence) outcomes. Finally, a GWAS study for type 1 diabetes patients is used to illustrate our method. Novel one-genotype and two-genotype interactions associated with neuropathy are identified.
AB - We propose a novel screening method targeting genotype interactions associated with disease risks. The proposed method extends the maximum entropy conditional probability model to address disease occurrences over time. Continuous occurrence times are grouped into intervals. The model estimates the conditional distribution over the disease occurrence intervals given individual genotypes by maximizing the corresponding entropy subject to constraints linking genotype interactions to time intervals. The EM algorithm is employed to handle observations with uncertainty, for which the disease occurrence is censored. Stepwise greedy search is proposed to screen a large number of candidate constraints. The minimum description length is employed to select the optimal set of constraints. Extensive simulations show that five or so quantile-dependent intervals are sufficient to categorize disease outcomes into different risk groups. Performance depends on sample size, number of genotypes, and minor allele frequencies. The proposed method outperforms the likelihood ratio test, Lasso, and a previous maximum entropy method with only binary (disease occurrence, non-occurrence) outcomes. Finally, a GWAS study for type 1 diabetes patients is used to illustrate our method. Novel one-genotype and two-genotype interactions associated with neuropathy are identified.
UR - http://www.scopus.com/inward/record.url?scp=85058817698&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058817698&partnerID=8YFLogxK
U2 - 10.1142/S0219720018400243
DO - 10.1142/S0219720018400243
M3 - Article
C2 - 30567478
AN - SCOPUS:85058817698
SN - 0219-7200
VL - 16
JO - Journal of Bioinformatics and Computational Biology
JF - Journal of Bioinformatics and Computational Biology
IS - 6
M1 - 18400243
ER -