{"title":"抽样技术的经验比较,为矩阵列子集的选择","authors":"Yining Wang, Aarti Singh","doi":"10.1109/ALLERTON.2015.7447127","DOIUrl":null,"url":null,"abstract":"Column subset selection (CSS) is the problem of selecting a small portion of columns from a large data matrix as one form of interpretable data summarization. Leverage score sampling, which enjoys both sound theoretical guarantee and superior empirical performance, is widely recognized as the state-of-the-art algorithm for column subset selection. In this paper, we revisit iterative norm sampling, another sampling based CSS algorithm proposed even before leverage score sampling, and demonstrate its competitive performance under a wide range of experimental settings. We also compare iterative norm sampling with several of its other competitors and show its superior performance in terms of both approximation accuracy and computational efficiency. We conclude that further theoretical investigation and practical consideration should be devoted to iterative norm sampling in column subset selection.","PeriodicalId":112948,"journal":{"name":"2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"249 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"An empirical comparison of sampling techniques for matrix column subset selection\",\"authors\":\"Yining Wang, Aarti Singh\",\"doi\":\"10.1109/ALLERTON.2015.7447127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Column subset selection (CSS) is the problem of selecting a small portion of columns from a large data matrix as one form of interpretable data summarization. Leverage score sampling, which enjoys both sound theoretical guarantee and superior empirical performance, is widely recognized as the state-of-the-art algorithm for column subset selection. In this paper, we revisit iterative norm sampling, another sampling based CSS algorithm proposed even before leverage score sampling, and demonstrate its competitive performance under a wide range of experimental settings. We also compare iterative norm sampling with several of its other competitors and show its superior performance in terms of both approximation accuracy and computational efficiency. We conclude that further theoretical investigation and practical consideration should be devoted to iterative norm sampling in column subset selection.\",\"PeriodicalId\":112948,\"journal\":{\"name\":\"2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)\",\"volume\":\"249 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ALLERTON.2015.7447127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ALLERTON.2015.7447127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An empirical comparison of sampling techniques for matrix column subset selection
Column subset selection (CSS) is the problem of selecting a small portion of columns from a large data matrix as one form of interpretable data summarization. Leverage score sampling, which enjoys both sound theoretical guarantee and superior empirical performance, is widely recognized as the state-of-the-art algorithm for column subset selection. In this paper, we revisit iterative norm sampling, another sampling based CSS algorithm proposed even before leverage score sampling, and demonstrate its competitive performance under a wide range of experimental settings. We also compare iterative norm sampling with several of its other competitors and show its superior performance in terms of both approximation accuracy and computational efficiency. We conclude that further theoretical investigation and practical consideration should be devoted to iterative norm sampling in column subset selection.