A mixture model and EM algorithm for robust classification, outlier rejection, and class discovery

David J. Miller, John Browning

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Several authors have addressed learning a classifier given a mixed labeled/unlabeled training set. These works assume each unlabeled sample originates from one of the (known) classes. Here, we consider the scenario in which unlabeled points may belong either to known/predefined or to heretofore undiscovered classes. There are several practical situations where such data may arise. We propose a novel statistical mixture model which views as observed data not only the feature vector and the class label, but also the fact of label presence/absence for each point. Two types of mixture components are posited to explain label presence/absence. "Predefined" components generate both labeled and unlabeled points and assume labels are missing at random. "Nonpredefined" components only generate unlabeled points - thus, in localized regions, they capture data subsets that are exclusively unlabeled. Such subsets may represent an outlier distribution, or new classes. The components' predefined/non-predefmed natures are data-driven, learned along with the other parameters via an algorithm based on expectation-maximization (EM). There are three natural applications: 1) robust classifier design, given a mixed training set with outliers; 2) classification with rejections; 3) identification of the unlabeled points (and their representative components) that originate from unknown classes, i.e. new class discovery. We evaluate our method and alternative approaches on both synthetic and real-world data sets.

Original languageEnglish (US)
Pages (from-to)809-812
Number of pages4
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2
StatePublished - 2003
Event2003 IEEE International Conference on Accoustics, Speech, and Signal Processing - Hong Kong, Hong Kong
Duration: Apr 6 2003Apr 10 2003

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A mixture model and EM algorithm for robust classification, outlier rejection, and class discovery'. Together they form a unique fingerprint.

Cite this