Data‐driven sparse partial least squares

Hadrien Lorenzo, O. Cloarec, R. Thiébaut, J. Saracco
{"title":"Data‐driven sparse partial least squares","authors":"Hadrien Lorenzo, O. Cloarec, R. Thiébaut, J. Saracco","doi":"10.1002/sam.11558","DOIUrl":null,"url":null,"abstract":"In the supervised high dimensional settings with a large number of variables and a low number of individuals, variable selection allows a simpler interpretation and more reliable predictions. That subspace selection is often managed with supervised tools when the real question is motivated by variable prediction. We propose a partial least square (PLS) based method, called data‐driven sparse PLS (ddsPLS), allowing variable selection both in the covariate and the response parts using a single hyperparameter per component. The subspace estimation is also performed by tuning a number of underlying parameters. The ddsPLS method is compared with existing methods such as classical PLS and two well established sparse PLS methods through numerical simulations. The observed results are promising both in terms of variable selection and prediction performance. This methodology is based on new prediction quality descriptors associated with the classical R2 and Q2 , and uses bootstrap sampling to tune parameters and select an optimal regression model.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In the supervised high dimensional settings with a large number of variables and a low number of individuals, variable selection allows a simpler interpretation and more reliable predictions. That subspace selection is often managed with supervised tools when the real question is motivated by variable prediction. We propose a partial least square (PLS) based method, called data‐driven sparse PLS (ddsPLS), allowing variable selection both in the covariate and the response parts using a single hyperparameter per component. The subspace estimation is also performed by tuning a number of underlying parameters. The ddsPLS method is compared with existing methods such as classical PLS and two well established sparse PLS methods through numerical simulations. The observed results are promising both in terms of variable selection and prediction performance. This methodology is based on new prediction quality descriptors associated with the classical R2 and Q2 , and uses bootstrap sampling to tune parameters and select an optimal regression model.
数据驱动的稀疏偏最小二乘
在具有大量变量和少量个体的监督高维设置中,变量选择允许更简单的解释和更可靠的预测。当实际问题由变量预测驱动时,子空间选择通常使用监督工具进行管理。我们提出了一种基于偏最小二乘(PLS)的方法,称为数据驱动的稀疏PLS (ddsPLS),允许在协变量和响应部分使用每个组件的单个超参数进行变量选择。子空间估计也是通过调优一些底层参数来执行的。通过数值模拟,将ddsPLS方法与现有的经典PLS方法和两种已建立的稀疏PLS方法进行了比较。观察结果在变量选择和预测性能方面都是有希望的。该方法基于与经典R2和Q2相关的新的预测质量描述符,并使用自举抽样来调整参数并选择最优回归模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信