一种基于矢量量化和最小描述长度原则的源编码分类方法

Proceedings DCC 2002. Data Compression Conference Pub Date : 2002-04-02 DOI:10.1109/DCC.2002.999978

Jia Li

{"title":"一种基于矢量量化和最小描述长度原则的源编码分类方法","authors":"Jia Li","doi":"10.1109/DCC.2002.999978","DOIUrl":null,"url":null,"abstract":"An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(X/sub i/, Y/sub i/)}/sub i=1//sup n/, which are independent samples from a joint distribution P/sub XY/. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution P/sub XY/ ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution P/sub XY/. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CART/sup R/ on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.","PeriodicalId":420897,"journal":{"name":"Proceedings DCC 2002. Data Compression Conference","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A source coding approach to classification by vector quantization and the principle of minimum description length\",\"authors\":\"Jia Li\",\"doi\":\"10.1109/DCC.2002.999978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(X/sub i/, Y/sub i/)}/sub i=1//sup n/, which are independent samples from a joint distribution P/sub XY/. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution P/sub XY/ ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution P/sub XY/. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CART/sup R/ on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.\",\"PeriodicalId\":420897,\"journal\":{\"name\":\"Proceedings DCC 2002. Data Compression Conference\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings DCC 2002. Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.2002.999978\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC 2002. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2002.999978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

提出了一种基于矢量量化和熵编码的监督分类算法。分类规则由一组训练数据{(X/sub i/， Y/sub i/)}/sub i=1//sup n/组成，这些训练数据是来自联合分布P/sub XY/的独立样本。基于最小描述长度(MDL)原则，一个近似于P/sub XY/分布的统计模型应该能够有效地编码X和Y。另一方面，我们期望一个有效编码(X, Y)的系统能够提供关于P/sub XY/分布的充足信息。然后利用这些信息对X进行分类，即根据X预测相应的Y。对X和Y进行编码时，对X采用两级矢量量化器，并以X的每个量化值为条件对Y形成霍夫曼码。编码器的优化相当于设计一个矢量量化器，其目标函数反映量化误差和误分类率的共同惩罚。这个矢量量化器提供了给定X的Y的条件分布的估计，这反过来又产生了贝叶斯分类规则的近似值。该算法即判别向量量化(discriminant vector quantization, DVQ)，在多个数据集上与学习向量量化(learning vector quantization, LVQ)和CART/sup R/进行了比较。DVQ在一些数据集上优于其他两种。本文还讨论了DVQ、密度估计和回归之间的关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A source coding approach to classification by vector quantization and the principle of minimum description length

An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(X/sub i/, Y/sub i/)}/sub i=1//sup n/, which are independent samples from a joint distribution P/sub XY/. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution P/sub XY/ ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution P/sub XY/. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CART/sup R/ on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings DCC 2002. Data Compression Conference

自引率

0.00%

发文量