压缩fisher线性判别分析:随机投影数据的分类

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI:10.1145/1835804.1835945

R. Durrant, A. Kabán

{"title":"压缩fisher线性判别分析:随机投影数据的分类","authors":"R. Durrant, A. Kabán","doi":"10.1145/1835804.1835945","DOIUrl":null,"url":null,"abstract":"We consider random projections in conjunction with classification, specifically the analysis of Fisher's Linear Discriminant (FLD) classifier in randomly projected data spaces. Unlike previous analyses of other classifiers in this setting, we avoid the unnatural effects that arise when one insists that all pairwise distances are approximately preserved under projection. We impose no sparsity or underlying low-dimensional structure constraints on the data; we instead take advantage of the class structure inherent in the problem. We obtain a reasonably tight upper bound on the estimated misclassification error on average over the random choice of the projection, which, in contrast to early distance preserving approaches, tightens in a natural way as the number of training examples increases. It follows that, for good generalisation of FLD, the required projection dimension grows logarithmically with the number of classes. We also show that the error contribution of a covariance misspecification is always no worse in the low-dimensional space than in the initial high-dimensional space. We contrast our findings to previous related work, and discuss our insights.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":"{\"title\":\"Compressed fisher linear discriminant analysis: classification of randomly projected data\",\"authors\":\"R. Durrant, A. Kabán\",\"doi\":\"10.1145/1835804.1835945\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider random projections in conjunction with classification, specifically the analysis of Fisher's Linear Discriminant (FLD) classifier in randomly projected data spaces. Unlike previous analyses of other classifiers in this setting, we avoid the unnatural effects that arise when one insists that all pairwise distances are approximately preserved under projection. We impose no sparsity or underlying low-dimensional structure constraints on the data; we instead take advantage of the class structure inherent in the problem. We obtain a reasonably tight upper bound on the estimated misclassification error on average over the random choice of the projection, which, in contrast to early distance preserving approaches, tightens in a natural way as the number of training examples increases. It follows that, for good generalisation of FLD, the required projection dimension grows logarithmically with the number of classes. We also show that the error contribution of a covariance misspecification is always no worse in the low-dimensional space than in the initial high-dimensional space. We contrast our findings to previous related work, and discuss our insights.\",\"PeriodicalId\":20529,\"journal\":{\"name\":\"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"56\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1835804.1835945\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1835804.1835945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 56

摘要

我们将随机投影与分类结合起来考虑，特别是在随机投影数据空间中分析Fisher的线性判别(FLD)分类器。与之前对其他分类器的分析不同，在这种情况下，我们避免了当一个人坚持在投影下近似地保留所有成对距离时产生的非自然效应。我们没有对数据施加稀疏性或潜在的低维结构约束;相反，我们利用问题中固有的类结构。在随机选择的投影上，我们获得了估计误分类误差的一个相当严格的上界，与早期的距离保持方法相比，它随着训练样本数量的增加以一种自然的方式收紧。由此可见，对于FLD的良好泛化，所需的投影维数随着类的数量呈对数增长。我们还证明了协方差错配的误差贡献在低维空间中并不比在初始高维空间中差。我们将我们的发现与之前的相关工作进行了对比，并讨论了我们的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Compressed fisher linear discriminant analysis: classification of randomly projected data

We consider random projections in conjunction with classification, specifically the analysis of Fisher's Linear Discriminant (FLD) classifier in randomly projected data spaces. Unlike previous analyses of other classifiers in this setting, we avoid the unnatural effects that arise when one insists that all pairwise distances are approximately preserved under projection. We impose no sparsity or underlying low-dimensional structure constraints on the data; we instead take advantage of the class structure inherent in the problem. We obtain a reasonably tight upper bound on the estimated misclassification error on average over the random choice of the projection, which, in contrast to early distance preserving approaches, tightens in a natural way as the number of training examples increases. It follows that, for good generalisation of FLD, the required projection dimension grows logarithmically with the number of classes. We also show that the error contribution of a covariance misspecification is always no worse in the low-dimensional space than in the initial high-dimensional space. We contrast our findings to previous related work, and discuss our insights.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量