A source coding approach to classification by vector quantization and the principle of minimum description length

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations


An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(Xi, Yi)}i=1n, which are independent samples from a joint distribution PXY. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution PXY ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution PXY. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CARTR on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.

Original languageEnglish (US)
Title of host publicationProceedings - DCC 2002
Subtitle of host publicationData Compression Conference
EditorsJames A. Storer, Martin Cohn
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages10
ISBN (Electronic)0769514774
StatePublished - Jan 1 2002
EventData Compression Conference, DCC 2002 - Snowbird, United States
Duration: Apr 2 2002Apr 4 2002

Publication series

NameData Compression Conference Proceedings
ISSN (Print)1068-0314


OtherData Compression Conference, DCC 2002
Country/TerritoryUnited States

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications


Dive into the research topics of 'A source coding approach to classification by vector quantization and the principle of minimum description length'. Together they form a unique fingerprint.

Cite this