Feature extraction using clustering of protein

Isis Bonet, Yvan Saeys, Ricardo Grau Ábalo, María M. García, Robersy Sanchez, Yves Van De Peer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations


In this paper we investigate the usage of a clustering algorithm as a feature extraction technique to find new features to represent the protein sequence. In particular, our work focuses on the prediction of HIV protease resistance to drugs. We use a biologically motivated similarity function based on the contact energy of the amino acid and the position in the sequence. The performance measure was computed taking into account the clustering reliability and the classification validity. An SVM using 10-fold crossvalidation and the k-means algorithm were used for classification and clustering respectively. The best results were obtained by reducing an initial set of 99 features to a lower dimensional feature set of 36-66 features.

Original languageEnglish (US)
Title of host publicationProgress in Pattern Recognition, Image Analysis and Applications - 11th Iberoamerican Congress in Pattern Recognition, CIARP 2006, Proceedings
PublisherSpringer Verlag
Number of pages10
ISBN (Print)3540465561, 9783540465560
StatePublished - 2006
Event11th Iberoamerican Congress in Pattern Recognition, CIARP 2006 - Cancun, Mexico
Duration: Nov 14 2006Nov 17 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4225 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other11th Iberoamerican Congress in Pattern Recognition, CIARP 2006

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Feature extraction using clustering of protein'. Together they form a unique fingerprint.

Cite this