TY - JOUR
T1 - A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis
AU - Lombard, Zané
AU - Park, Chungoo
AU - Makova, Kateryna D.
AU - Ramsay, Michèle
N1 - Funding Information:
We are thankful to Svitlana Tyekucheva for sharing R code for LDA, PCA and ROC analysis and for testing Galaxy tools, to Melissa Wilson Sayres for proofreading parts of the manuscript prior to publication, to Dan Blankenberg and the Galaxy team for assisting in integrating the tools developed here into Galaxy. MR and ZL acknowledge support from the South African National Bioinformatics Network (NBN), the South African National Research Foundation (NRF), the NHLS and the University of the Witwatersrand. CP and KM were supported by start-up funds from Penn State and the R01 HD056452 NIH grant to Laura Carrel.
PY - 2011/6/13
Y1 - 2011/6/13
N2 - Background: Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known.Results: The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (http://main.g2.bx.psu.edu/). Nine genes (APLN, ZC4H2, MAGED4, MAGED4B, RAP2C, FAM156A, FAM156B, TBL1X, and UXT) were highlighted as highly-ranked XLMR methods.Conclusions: The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR.Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius); Prof Neil Smalheiser and Dr Dustin Holloway (nominated by Prof Charles DeLisi).
AB - Background: Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known.Results: The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (http://main.g2.bx.psu.edu/). Nine genes (APLN, ZC4H2, MAGED4, MAGED4B, RAP2C, FAM156A, FAM156B, TBL1X, and UXT) were highlighted as highly-ranked XLMR methods.Conclusions: The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR.Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius); Prof Neil Smalheiser and Dr Dustin Holloway (nominated by Prof Charles DeLisi).
UR - http://www.scopus.com/inward/record.url?scp=79958254607&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79958254607&partnerID=8YFLogxK
U2 - 10.1186/1745-6150-6-30
DO - 10.1186/1745-6150-6-30
M3 - Article
C2 - 21668950
AN - SCOPUS:79958254607
SN - 1745-6150
VL - 6
JO - Biology Direct
JF - Biology Direct
M1 - 30
ER -