TY - GEN
T1 - Combining super-structuring and abstraction on sequence classification
AU - Silvescu, Adrian
AU - Caragea, Cornelia
AU - Honavar, Vasant
PY - 2009/12/1
Y1 - 2009/12/1
N2 - We present an approach to adapting the data representation used by a learner on sequence classification tasks. Our approach that exploits the complementary strengths of super-structuring (constructing complex features by combining existing features) and abstraction (grouping of similar features to generate more abstract features), yields smaller and, at the same time, accurate models. Super-structuring provides a way to increase the predictive accuracy of the learned models by enriching the data representation (and hence, increases the complexity of the learned models) whereas abstraction helps reduce the number of model parameters by simplifying the data representation. The results of our experiments on two data sets drawn from macromolecular sequence classification applications show that adapting data representation by combining super-structuring and abstraction, makes it possible to construct predictive models that use significantly smaller number of features (by one to three orders of magnitude) than those that are obtained using super-structuring alone, without sacrificing predictive accuracy. Our experiments also show that simplifying data representation using abstraction yields better performing models than those obtained using feature selection.
AB - We present an approach to adapting the data representation used by a learner on sequence classification tasks. Our approach that exploits the complementary strengths of super-structuring (constructing complex features by combining existing features) and abstraction (grouping of similar features to generate more abstract features), yields smaller and, at the same time, accurate models. Super-structuring provides a way to increase the predictive accuracy of the learned models by enriching the data representation (and hence, increases the complexity of the learned models) whereas abstraction helps reduce the number of model parameters by simplifying the data representation. The results of our experiments on two data sets drawn from macromolecular sequence classification applications show that adapting data representation by combining super-structuring and abstraction, makes it possible to construct predictive models that use significantly smaller number of features (by one to three orders of magnitude) than those that are obtained using super-structuring alone, without sacrificing predictive accuracy. Our experiments also show that simplifying data representation using abstraction yields better performing models than those obtained using feature selection.
UR - http://www.scopus.com/inward/record.url?scp=77951191376&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951191376&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2009.130
DO - 10.1109/ICDM.2009.130
M3 - Conference contribution
AN - SCOPUS:77951191376
SN - 9780769538952
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 986
EP - 991
BT - ICDM 2009 - The 9th IEEE International Conference on Data Mining
T2 - 9th IEEE International Conference on Data Mining, ICDM 2009
Y2 - 6 December 2009 through 9 December 2009
ER -