TY - GEN
T1 - Semi-supervised sequence classification using Abstraction Augmented Markov Models
AU - Caragea, Cornelia
AU - Silvescu, Adrian
AU - Caragea, Doina
AU - Honavar, Vasant
PY - 2010
Y1 - 2010
N2 - Supervised methods for learning sequence classiffiers rely on the vailability of large amounts of labeled data. However, in many applications because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in semi-supervised methods that can exploit large amounts of unlabeled data together with small amounts of labeled data. In this paper, we introduce a novel Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised learning. We investigate the effectiveness of AAMMs in exploiting unlabeled data. We compare semi-supervised AAMMs with: (i) the Markov models (MMs) (which do not take advantage of unlabeled data); and (ii) an expectation maximization (EM) based approach to semi-supervised training of MMs (that make use of unlabeled data). The results of our experiments on three protein subcellular localization prediction tasks show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; and (ii) are more accurate than both the MMs and the EM based semi-supervised MMs.
AB - Supervised methods for learning sequence classiffiers rely on the vailability of large amounts of labeled data. However, in many applications because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in semi-supervised methods that can exploit large amounts of unlabeled data together with small amounts of labeled data. In this paper, we introduce a novel Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised learning. We investigate the effectiveness of AAMMs in exploiting unlabeled data. We compare semi-supervised AAMMs with: (i) the Markov models (MMs) (which do not take advantage of unlabeled data); and (ii) an expectation maximization (EM) based approach to semi-supervised training of MMs (that make use of unlabeled data). The results of our experiments on three protein subcellular localization prediction tasks show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; and (ii) are more accurate than both the MMs and the EM based semi-supervised MMs.
UR - http://www.scopus.com/inward/record.url?scp=77958074609&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77958074609&partnerID=8YFLogxK
U2 - 10.1145/1854776.1854813
DO - 10.1145/1854776.1854813
M3 - Conference contribution
AN - SCOPUS:77958074609
SN - 9781450304382
T3 - 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010
SP - 257
EP - 264
BT - 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010
T2 - 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010
Y2 - 2 August 2010 through 4 August 2010
ER -