利用投影直方图进行高效学习

IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhanliang Huang, Ata Kabán, Henry Reeve
{"title":"利用投影直方图进行高效学习","authors":"Zhanliang Huang, Ata Kabán, Henry Reeve","doi":"10.1007/s10618-024-01063-6","DOIUrl":null,"url":null,"abstract":"<p>High dimensional learning is a perennial problem due to challenges posed by the “curse of dimensionality”; learning typically demands more computing resources as well as more training data. In differentially private (DP) settings, this is further exacerbated by noise that needs adding to each dimension to achieve the required privacy. In this paper, we present a surprisingly simple approach to address all of these concerns at once, based on histograms constructed on a low-dimensional random projection (RP) of the data. Our approach exploits RP to take advantage of hidden low-dimensional structures in the data, yielding both computational efficiency, and improved error convergence with respect to the sample size—whereby less training data suffice for learning. We also propose a variant for efficient differentially private (DP) classification that further exploits the data-oblivious nature of both the histogram construction and the RP based dimensionality reduction, resulting in an efficient management of the privacy budget. We present a detailed and rigorous theoretical analysis of generalisation of our algorithms in several settings, showing that our approach is able to exploit low-dimensional structure of the data, ameliorates the ill-effects of noise required for privacy, and has good generalisation under minimal conditions. We also corroborate our findings experimentally, and demonstrate that our algorithms achieve competitive classification accuracy in both non-private and private settings.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"24 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient learning with projected histograms\",\"authors\":\"Zhanliang Huang, Ata Kabán, Henry Reeve\",\"doi\":\"10.1007/s10618-024-01063-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>High dimensional learning is a perennial problem due to challenges posed by the “curse of dimensionality”; learning typically demands more computing resources as well as more training data. In differentially private (DP) settings, this is further exacerbated by noise that needs adding to each dimension to achieve the required privacy. In this paper, we present a surprisingly simple approach to address all of these concerns at once, based on histograms constructed on a low-dimensional random projection (RP) of the data. Our approach exploits RP to take advantage of hidden low-dimensional structures in the data, yielding both computational efficiency, and improved error convergence with respect to the sample size—whereby less training data suffice for learning. We also propose a variant for efficient differentially private (DP) classification that further exploits the data-oblivious nature of both the histogram construction and the RP based dimensionality reduction, resulting in an efficient management of the privacy budget. We present a detailed and rigorous theoretical analysis of generalisation of our algorithms in several settings, showing that our approach is able to exploit low-dimensional structure of the data, ameliorates the ill-effects of noise required for privacy, and has good generalisation under minimal conditions. We also corroborate our findings experimentally, and demonstrate that our algorithms achieve competitive classification accuracy in both non-private and private settings.</p>\",\"PeriodicalId\":55183,\"journal\":{\"name\":\"Data Mining and Knowledge Discovery\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Mining and Knowledge Discovery\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10618-024-01063-6\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Mining and Knowledge Discovery","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10618-024-01063-6","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

由于 "维度诅咒 "带来的挑战,高维学习是一个长期存在的问题;学习通常需要更多的计算资源和更多的训练数据。在差异化隐私(DP)设置中,为了达到所需的隐私性,需要在每个维度上添加噪声,这进一步加剧了问题的严重性。在本文中,我们提出了一种基于数据低维随机投影 (RP) 构建的直方图的简单方法,可以一次性解决所有这些问题。我们的方法利用 RP 来利用数据中隐藏的低维结构,既提高了计算效率,又改善了与样本大小相关的误差收敛性--在这种情况下,只需较少的训练数据即可进行学习。我们还提出了一种高效差异化隐私(DP)分类的变体,它进一步利用了直方图构建和基于 RP 的降维的数据无关性,从而有效地管理了隐私预算。我们对算法在几种环境下的泛化进行了详细而严谨的理论分析,表明我们的方法能够利用数据的低维结构,改善隐私所需的噪声不良影响,并在最低条件下具有良好的泛化能力。我们还通过实验证实了我们的发现,并证明我们的算法在非隐私和隐私环境下都能达到具有竞争力的分类准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Efficient learning with projected histograms

Efficient learning with projected histograms

High dimensional learning is a perennial problem due to challenges posed by the “curse of dimensionality”; learning typically demands more computing resources as well as more training data. In differentially private (DP) settings, this is further exacerbated by noise that needs adding to each dimension to achieve the required privacy. In this paper, we present a surprisingly simple approach to address all of these concerns at once, based on histograms constructed on a low-dimensional random projection (RP) of the data. Our approach exploits RP to take advantage of hidden low-dimensional structures in the data, yielding both computational efficiency, and improved error convergence with respect to the sample size—whereby less training data suffice for learning. We also propose a variant for efficient differentially private (DP) classification that further exploits the data-oblivious nature of both the histogram construction and the RP based dimensionality reduction, resulting in an efficient management of the privacy budget. We present a detailed and rigorous theoretical analysis of generalisation of our algorithms in several settings, showing that our approach is able to exploit low-dimensional structure of the data, ameliorates the ill-effects of noise required for privacy, and has good generalisation under minimal conditions. We also corroborate our findings experimentally, and demonstrate that our algorithms achieve competitive classification accuracy in both non-private and private settings.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery 工程技术-计算机:人工智能
CiteScore
10.40
自引率
4.20%
发文量
68
审稿时长
10 months
期刊介绍: Advances in data gathering, storage, and distribution have created a need for computational tools and techniques to aid in data analysis. Data Mining and Knowledge Discovery in Databases (KDD) is a rapidly growing area of research and application that builds on techniques and theories from many fields, including statistics, databases, pattern recognition and learning, data visualization, uncertainty modelling, data warehousing and OLAP, optimization, and high performance computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信