TY - JOUR
T1 - Identification of interface residues in protease-inhibitor and antigen-antibody complexes
T2 - A support vector machine approach
AU - Yan, Changhui
AU - Honavar, Vasant
AU - Dobbs, Drena
PY - 2004/6
Y1 - 2004/6
N2 - In this paper, we describe a machine learning approach for sequence-based prediction of protein-protein interaction sites. A support vector machine (SVM) classifier was trained to predict whether or not a surface residue is an interface residue (i.e., is located in the protein-protein interaction surface), based on the identity of the target residue and its ten sequence neighbors. Separate classifiers were trained on proteins from two categories of complexes, antibody-antigen and proteaseinhibitor. The effectiveness of each classifier was evaluated using leave-one-out (jack-knife) cross-validation. Interface and non-interface residues were classified with relatively high sensitivity (82.3% and 78.5%) and specificity (81.0% and 77.6%) for proteins in the antigen-antibody and protease-inhibitor complexes, respectively. The correlation between predicted and actual labels was 0.430 and 0.462, indicating that the method performs substantially better than chance (zero correlation). Combined with recently developed methods for identification of surface residues from sequence information, this offers a promising approach to predict residues involved in protein-protein interactions from sequence information alone.
AB - In this paper, we describe a machine learning approach for sequence-based prediction of protein-protein interaction sites. A support vector machine (SVM) classifier was trained to predict whether or not a surface residue is an interface residue (i.e., is located in the protein-protein interaction surface), based on the identity of the target residue and its ten sequence neighbors. Separate classifiers were trained on proteins from two categories of complexes, antibody-antigen and proteaseinhibitor. The effectiveness of each classifier was evaluated using leave-one-out (jack-knife) cross-validation. Interface and non-interface residues were classified with relatively high sensitivity (82.3% and 78.5%) and specificity (81.0% and 77.6%) for proteins in the antigen-antibody and protease-inhibitor complexes, respectively. The correlation between predicted and actual labels was 0.430 and 0.462, indicating that the method performs substantially better than chance (zero correlation). Combined with recently developed methods for identification of surface residues from sequence information, this offers a promising approach to predict residues involved in protein-protein interactions from sequence information alone.
UR - http://www.scopus.com/inward/record.url?scp=3142717479&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=3142717479&partnerID=8YFLogxK
U2 - 10.1007/s00521-004-0414-3
DO - 10.1007/s00521-004-0414-3
M3 - Article
AN - SCOPUS:3142717479
SN - 0941-0643
VL - 13
SP - 123
EP - 129
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 2
ER -