Data-driven generation of decision trees for motif-based assignment of protein sequences to functional families

Dake Wang, Xiangyun Wang, Vasant Honavar, Drena L. Dobbs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

This paper describes an approach to data-driven discovery of sequence motif-based models in the form of decision trees for assigning protein sequences to functional families. Unlike approaches that try to classify protein sequences based on presence of a single motif, this method is able to capture regularities that can be described in terms of presence or absence of arbitrary combinations of motifs. A training set of sequences with known functions is used to automatically construct decision trees that capture regularities that are sufficient to assign the sequences to their respective functional families. The accuracy of the resulting decision tree classifiers are then evaluated on an independent test set. Experimental using several protein data sets indicate that proposed approach matches or beats the technique of assigning protein sequences to functional families based on the presence of a single characteristic motif in terms of the accuracy of resulting classification.

Original languageEnglish (US)
Title of host publicationProceedings of the Atlantic Symposium on Computational Biology and Genome Information Systems and Technology, CBGIST 2001
EditorsC.H. Wu, P.P. Wang, J.T.L. Wang
Pages53-58
Number of pages6
StatePublished - 2001
EventProceedings of the Atlantic Symposium on Computational Biology and Genome Information Systems and Technology, GBGIST 2001 - Durham, NC, United States
Duration: Mar 15 2001Mar 17 2001

Publication series

NameProceedings of the Atlantic Symposium on Computational Biology and Genome Information Systems and Technolgoy, CBGIST 2001

Other

OtherProceedings of the Atlantic Symposium on Computational Biology and Genome Information Systems and Technology, GBGIST 2001
Country/TerritoryUnited States
CityDurham, NC
Period3/15/013/17/01

All Science Journal Classification (ASJC) codes

  • Agricultural and Biological Sciences (miscellaneous)
  • Genetics
  • Computer Science (miscellaneous)

Fingerprint

Dive into the research topics of 'Data-driven generation of decision trees for motif-based assignment of protein sequences to functional families'. Together they form a unique fingerprint.

Cite this