(ϵ,δ)-differentially private partial least squares regression

IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS
Ramin Nikzad-Langerodi , Mohit Kumar , Du Nguyen Duy , Mahtab Alghasi
{"title":"(ϵ,δ)-differentially private partial least squares regression","authors":"Ramin Nikzad-Langerodi ,&nbsp;Mohit Kumar ,&nbsp;Du Nguyen Duy ,&nbsp;Mahtab Alghasi","doi":"10.1016/j.chemolab.2025.105465","DOIUrl":null,"url":null,"abstract":"<div><div>As data-privacy requirements are becoming increasingly stringent and statistical models based on sensitive data are being deployed and used more routinely, protecting data-privacy becomes pivotal. Partial Least Squares (PLS) regression is the premier tool for building such models in analytical chemistry, yet it does not inherently provide privacy guarantees, leaving sensitive (training) data vulnerable to privacy attacks. To address this gap, we propose an <span><math><mrow><mo>(</mo><mi>ϵ</mi><mo>,</mo><mi>δ</mi><mo>)</mo></mrow></math></span>-differentially private PLS (edPLS) algorithm, which integrates well-studied and theoretically motivated Gaussian noise-adding mechanisms into the PLS algorithm to ensure the privacy of the data underlying the model. Our approach involves adding carefully calibrated Gaussian noise to the outputs of four key functions in the PLS algorithm: the weights, scores, <span><math><mi>X</mi></math></span>-loadings, and <span><math><mi>Y</mi></math></span>-loadings. The noise variance is determined based on the sensitivity of each function, ensuring that the privacy loss is controlled according to the <span><math><mrow><mo>(</mo><mi>ϵ</mi><mo>,</mo><mi>δ</mi><mo>)</mo></mrow></math></span>-differential privacy framework. Specifically, we derive the sensitivity bounds for each function and use these bounds to calibrate the noise added to the model components. Experimental results demonstrate that edPLS effectively renders privacy attacks, aimed at recovering unique sources of variability in the training data, ineffective. Application of edPLS to the NIR corn benchmark dataset shows that the root mean squared error of prediction (RMSEP) remains competitive even at strong privacy levels (i.e., <span><math><mrow><mi>ϵ</mi><mo>=</mo><mn>1</mn></mrow></math></span>), given proper pre-processing of the corresponding spectra. These findings highlight the practical utility of edPLS in creating privacy-preserving multivariate calibrations and for the analysis of their privacy-utility trade-offs.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"264 ","pages":"Article 105465"},"PeriodicalIF":3.7000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925001509","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

As data-privacy requirements are becoming increasingly stringent and statistical models based on sensitive data are being deployed and used more routinely, protecting data-privacy becomes pivotal. Partial Least Squares (PLS) regression is the premier tool for building such models in analytical chemistry, yet it does not inherently provide privacy guarantees, leaving sensitive (training) data vulnerable to privacy attacks. To address this gap, we propose an (ϵ,δ)-differentially private PLS (edPLS) algorithm, which integrates well-studied and theoretically motivated Gaussian noise-adding mechanisms into the PLS algorithm to ensure the privacy of the data underlying the model. Our approach involves adding carefully calibrated Gaussian noise to the outputs of four key functions in the PLS algorithm: the weights, scores, X-loadings, and Y-loadings. The noise variance is determined based on the sensitivity of each function, ensuring that the privacy loss is controlled according to the (ϵ,δ)-differential privacy framework. Specifically, we derive the sensitivity bounds for each function and use these bounds to calibrate the noise added to the model components. Experimental results demonstrate that edPLS effectively renders privacy attacks, aimed at recovering unique sources of variability in the training data, ineffective. Application of edPLS to the NIR corn benchmark dataset shows that the root mean squared error of prediction (RMSEP) remains competitive even at strong privacy levels (i.e., ϵ=1), given proper pre-processing of the corresponding spectra. These findings highlight the practical utility of edPLS in creating privacy-preserving multivariate calibrations and for the analysis of their privacy-utility trade-offs.
(ε,δ)-微分私有偏最小二乘回归
随着数据隐私要求变得越来越严格,以及基于敏感数据的统计模型被越来越常规地部署和使用,保护数据隐私变得至关重要。偏最小二乘(PLS)回归是在分析化学中构建此类模型的首要工具,但它本身不提供隐私保证,使敏感(训练)数据容易受到隐私攻击。为了解决这一差距,我们提出了一种(λ,δ)差分私有PLS (edPLS)算法,该算法将经过充分研究和理论上驱动的高斯噪声添加机制集成到PLS算法中,以确保模型底层数据的隐私性。我们的方法包括在PLS算法中的四个关键函数的输出中添加仔细校准的高斯噪声:权重、分数、x加载和y加载。噪声方差是根据每个函数的灵敏度确定的,确保隐私损失是根据(λ,δ)-差分隐私框架控制的。具体来说,我们推导了每个函数的灵敏度界限,并使用这些界限来校准添加到模型组件中的噪声。实验结果表明,edPLS有效地使旨在恢复训练数据中独特变异性来源的隐私攻击无效。edPLS对近红外玉米基准数据集的应用表明,如果对相应光谱进行适当的预处理,即使在强隐私水平(即ε =1)下,预测的均方根误差(RMSEP)仍然具有竞争力。这些发现突出了edPLS在创建保护隐私的多变量校准以及分析其隐私-效用权衡方面的实际效用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
7.70%
发文量
169
审稿时长
3.4 months
期刊介绍: Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信