TY - GEN
T1 - Identifying interaction sites in "Recalcitrant" proteins
T2 - 11th Pacific Symposium on Biocomputing 2006, PSB 2006
AU - Terribllini, Michael
AU - Lee, Jae Hyung
AU - Yan, Changhui
AU - Jernigan, Robert L.
AU - Carpenter, Susan
AU - Honavar, Vasant
AU - Dobbs, Drena
PY - 2006
Y1 - 2006
N2 - Protein-protein and protein nucleic acid interactions are vitally important for a wide range of biological processes, including regulation or gene expression, protein synthesis, and replication and assembly of many viruses. We have developed machine learning approaches for predicting which amino acids of a protein participate in its interactions with other proteins and/or nucleic acids, using only the proiein sequence as input. In this paper, we describe an application of classifiers trained on datasets of well-characterized protein-protein and protein-RNA complexes for which experimental structures are available. We apply these classifiers to the problem of predicting protein and RNA binding sites in the sequence of a clinically important protein for which the structure is not known: the regulatory protein Rev, essential for the replication of HIV-I and other lentiviruses. We compare our predictions with published biochemical, genetic and partial structural information for HIV-1 and EIAV Rev and with our own published experimental mapping of RNA binding sites in EIAV Rev. The predicted and experimentally determined binding sites are in very good agreement. The ability to predict reliably the residues of a protein that directly contribute to specific binding events - without the requirement for structural information regarding either the protein or complexes in which it participates - can potentially generate new disease intervention strategies.
AB - Protein-protein and protein nucleic acid interactions are vitally important for a wide range of biological processes, including regulation or gene expression, protein synthesis, and replication and assembly of many viruses. We have developed machine learning approaches for predicting which amino acids of a protein participate in its interactions with other proteins and/or nucleic acids, using only the proiein sequence as input. In this paper, we describe an application of classifiers trained on datasets of well-characterized protein-protein and protein-RNA complexes for which experimental structures are available. We apply these classifiers to the problem of predicting protein and RNA binding sites in the sequence of a clinically important protein for which the structure is not known: the regulatory protein Rev, essential for the replication of HIV-I and other lentiviruses. We compare our predictions with published biochemical, genetic and partial structural information for HIV-1 and EIAV Rev and with our own published experimental mapping of RNA binding sites in EIAV Rev. The predicted and experimentally determined binding sites are in very good agreement. The ability to predict reliably the residues of a protein that directly contribute to specific binding events - without the requirement for structural information regarding either the protein or complexes in which it participates - can potentially generate new disease intervention strategies.
UR - http://www.scopus.com/inward/record.url?scp=33746487738&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33746487738&partnerID=8YFLogxK
M3 - Conference contribution
C2 - 17094257
AN - SCOPUS:33746487738
SN - 9812564632
SN - 9789812564634
T3 - Proceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006
SP - 415
EP - 426
BT - Proceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006
Y2 - 3 January 2006 through 7 January 2006
ER -