Multinomial event model based abstraction for sequence and text classification

Dae Ki Kang, Jun Zhang, Adrian Silvescu, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

In many machine learning applications that deal with sequences, there is a need for learning algorithms that can effectively utilize the hierarchical grouping of words. We introduce Word Taxonomy guided Naive Bayes Learner for the Multinomial Event Model (WTNBL-MN) that exploits word taxonomy to generate compact classifiers, and Word Taxonomy Learner (WTL) for automated construction of word taxonomy from sequence data. WTNBL-MN is a generalization of the Naive Bayes learner for the Multinomial Event Model for learning classifiers from data using word taxonomy. WTL uses hierarchical agglomerative clustering to cluster words based on the distribution of class labels that co-occur with the words. Our experimental results on protein localization sequences and Reuters text show that the proposed algorithms can generate Naive Bayes classifiers that are more compact and often more accurate than those produced by standard Naive Bayes learner for the Multinomial Model.

Original languageEnglish (US)
Title of host publicationAbstraction, Reformulation and Approximation - 6th International Symposium, SARA 2005, Proceedings
PublisherSpringer Verlag
Pages134-148
Number of pages15
ISBN (Print)3540278729, 9783540278726
DOIs
StatePublished - 2005
Event6th International Symposium on Abstraction, Reformulation and Approximation, SARA 2005 - Airth Castle, Scotland, United Kingdom
Duration: Jul 26 2005Jul 29 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3607 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other6th International Symposium on Abstraction, Reformulation and Approximation, SARA 2005
Country/TerritoryUnited Kingdom
CityAirth Castle, Scotland
Period7/26/057/29/05

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Multinomial event model based abstraction for sequence and text classification'. Together they form a unique fingerprint.

Cite this