多样性采样是核方法的隐式正则化

IF 1.9 Q1 MATHEMATICS, APPLIED
M. Fanuel, J. Schreurs, J. Suykens
{"title":"多样性采样是核方法的隐式正则化","authors":"M. Fanuel, J. Schreurs, J. Suykens","doi":"10.1137/20M1320031","DOIUrl":null,"url":null,"abstract":"Kernel methods have achieved very good performance on large scale regression and classification problems, by using the Nystrom method and preconditioning techniques. The Nystrom approximation -- based on a subset of landmarks -- gives a low rank approximation of the kernel matrix, and is known to provide a form of implicit regularization. We further elaborate on the impact of sampling diverse landmarks for constructing the Nystrom approximation in supervised as well as unsupervised kernel methods. By using Determinantal Point Processes for sampling, we obtain additional theoretical results concerning the interplay between diversity and regularization. Empirically, we demonstrate the advantages of training kernel methods based on subsets made of diverse points. In particular, if the dataset has a dense bulk and a sparser tail, we show that Nystrom kernel regression with diverse landmarks increases the accuracy of the regression in sparser regions of the dataset, with respect to a uniform landmark sampling. A greedy heuristic is also proposed to select diverse samples of significant size within large datasets when exact DPP sampling is not practically feasible.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"9 1","pages":"280-297"},"PeriodicalIF":1.9000,"publicationDate":"2020-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Diversity sampling is an implicit regularization for kernel methods\",\"authors\":\"M. Fanuel, J. Schreurs, J. Suykens\",\"doi\":\"10.1137/20M1320031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Kernel methods have achieved very good performance on large scale regression and classification problems, by using the Nystrom method and preconditioning techniques. The Nystrom approximation -- based on a subset of landmarks -- gives a low rank approximation of the kernel matrix, and is known to provide a form of implicit regularization. We further elaborate on the impact of sampling diverse landmarks for constructing the Nystrom approximation in supervised as well as unsupervised kernel methods. By using Determinantal Point Processes for sampling, we obtain additional theoretical results concerning the interplay between diversity and regularization. Empirically, we demonstrate the advantages of training kernel methods based on subsets made of diverse points. In particular, if the dataset has a dense bulk and a sparser tail, we show that Nystrom kernel regression with diverse landmarks increases the accuracy of the regression in sparser regions of the dataset, with respect to a uniform landmark sampling. A greedy heuristic is also proposed to select diverse samples of significant size within large datasets when exact DPP sampling is not practically feasible.\",\"PeriodicalId\":74797,\"journal\":{\"name\":\"SIAM journal on mathematics of data science\",\"volume\":\"9 1\",\"pages\":\"280-297\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2020-02-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIAM journal on mathematics of data science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1137/20M1320031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM journal on mathematics of data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/20M1320031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 12

摘要

核方法通过使用Nystrom方法和预处理技术,在大规模回归和分类问题上取得了很好的性能。Nystrom近似——基于一个里程碑子集——给出了核矩阵的低秩近似,并且已知提供了一种隐式正则化形式。我们进一步阐述了采样不同的地标对在监督和无监督核方法中构造Nystrom近似的影响。通过使用确定性点过程进行采样,我们获得了关于多样性和正则化之间相互作用的额外理论结果。从经验上,我们证明了基于由不同点组成的子集的训练核方法的优点。特别是,如果数据集具有密集的主体和稀疏的尾部,我们表明,相对于统一的地标采样,具有不同地标的Nystrom核回归增加了数据集稀疏区域回归的准确性。当精确的DPP抽样实际上不可行时,还提出了一种贪婪启发式方法来选择大数据集中显著大小的不同样本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Diversity sampling is an implicit regularization for kernel methods
Kernel methods have achieved very good performance on large scale regression and classification problems, by using the Nystrom method and preconditioning techniques. The Nystrom approximation -- based on a subset of landmarks -- gives a low rank approximation of the kernel matrix, and is known to provide a form of implicit regularization. We further elaborate on the impact of sampling diverse landmarks for constructing the Nystrom approximation in supervised as well as unsupervised kernel methods. By using Determinantal Point Processes for sampling, we obtain additional theoretical results concerning the interplay between diversity and regularization. Empirically, we demonstrate the advantages of training kernel methods based on subsets made of diverse points. In particular, if the dataset has a dense bulk and a sparser tail, we show that Nystrom kernel regression with diverse landmarks increases the accuracy of the regression in sparser regions of the dataset, with respect to a uniform landmark sampling. A greedy heuristic is also proposed to select diverse samples of significant size within large datasets when exact DPP sampling is not practically feasible.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信