{"title":"压缩fisher线性判别分析:随机投影数据的分类","authors":"R. Durrant, A. Kabán","doi":"10.1145/1835804.1835945","DOIUrl":null,"url":null,"abstract":"We consider random projections in conjunction with classification, specifically the analysis of Fisher's Linear Discriminant (FLD) classifier in randomly projected data spaces. Unlike previous analyses of other classifiers in this setting, we avoid the unnatural effects that arise when one insists that all pairwise distances are approximately preserved under projection. We impose no sparsity or underlying low-dimensional structure constraints on the data; we instead take advantage of the class structure inherent in the problem. We obtain a reasonably tight upper bound on the estimated misclassification error on average over the random choice of the projection, which, in contrast to early distance preserving approaches, tightens in a natural way as the number of training examples increases. It follows that, for good generalisation of FLD, the required projection dimension grows logarithmically with the number of classes. We also show that the error contribution of a covariance misspecification is always no worse in the low-dimensional space than in the initial high-dimensional space. We contrast our findings to previous related work, and discuss our insights.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":"{\"title\":\"Compressed fisher linear discriminant analysis: classification of randomly projected data\",\"authors\":\"R. Durrant, A. Kabán\",\"doi\":\"10.1145/1835804.1835945\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider random projections in conjunction with classification, specifically the analysis of Fisher's Linear Discriminant (FLD) classifier in randomly projected data spaces. Unlike previous analyses of other classifiers in this setting, we avoid the unnatural effects that arise when one insists that all pairwise distances are approximately preserved under projection. We impose no sparsity or underlying low-dimensional structure constraints on the data; we instead take advantage of the class structure inherent in the problem. We obtain a reasonably tight upper bound on the estimated misclassification error on average over the random choice of the projection, which, in contrast to early distance preserving approaches, tightens in a natural way as the number of training examples increases. It follows that, for good generalisation of FLD, the required projection dimension grows logarithmically with the number of classes. We also show that the error contribution of a covariance misspecification is always no worse in the low-dimensional space than in the initial high-dimensional space. We contrast our findings to previous related work, and discuss our insights.\",\"PeriodicalId\":20529,\"journal\":{\"name\":\"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"56\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1835804.1835945\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1835804.1835945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Compressed fisher linear discriminant analysis: classification of randomly projected data
We consider random projections in conjunction with classification, specifically the analysis of Fisher's Linear Discriminant (FLD) classifier in randomly projected data spaces. Unlike previous analyses of other classifiers in this setting, we avoid the unnatural effects that arise when one insists that all pairwise distances are approximately preserved under projection. We impose no sparsity or underlying low-dimensional structure constraints on the data; we instead take advantage of the class structure inherent in the problem. We obtain a reasonably tight upper bound on the estimated misclassification error on average over the random choice of the projection, which, in contrast to early distance preserving approaches, tightens in a natural way as the number of training examples increases. It follows that, for good generalisation of FLD, the required projection dimension grows logarithmically with the number of classes. We also show that the error contribution of a covariance misspecification is always no worse in the low-dimensional space than in the initial high-dimensional space. We contrast our findings to previous related work, and discuss our insights.