Abstraction augmented Markov models

Cornelia Caragea, Adrian Silvescu, Doina Caragea, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

High accuracy sequence classification often requires the use of higher order Markov models (MMs). However, the number of MM parameters increases exponentially with the range of direct dependencies between sequence elements, thereby increasing the risk of overfitting when the data set is limited in size. We present abstraction augmented Markov models (AAMMs) that effectively reduce the number of numeric parameters of kth order MMs by successively grouping strings of length k (i.e., k-grams) into abstraction hierarchies. We evaluate AAMMs on three protein subcellular localization prediction tasks. The results of our experiments show that abstraction makes it possible to construct predictive models that use significantly smaller number of features (by one to three orders of magnitude) as compared to MMs. AAMMs are competitive with and, in some cases, significantly outperform MMs. Moreover, the results show that AAMMs often perform significantly better than variable order Markov models, such as decomposed context tree weighting, prediction by partial match, and probabilistic suffix trees.

Original languageEnglish (US)
Title of host publicationProceedings - 10th IEEE International Conference on Data Mining, ICDM 2010
Pages68-77
Number of pages10
DOIs
StatePublished - 2010
Event10th IEEE International Conference on Data Mining, ICDM 2010 - Sydney, NSW, Australia
Duration: Dec 14 2010Dec 17 2010

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other10th IEEE International Conference on Data Mining, ICDM 2010
Country/TerritoryAustralia
CitySydney, NSW
Period12/14/1012/17/10

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Fingerprint

Dive into the research topics of 'Abstraction augmented Markov models'. Together they form a unique fingerprint.

Cite this