Data‐driven sparse partial least squares

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-10-18 DOI:10.1002/sam.11558

Hadrien Lorenzo, O. Cloarec, R. Thiébaut, J. Saracco

引用次数: 2

Abstract

In the supervised high dimensional settings with a large number of variables and a low number of individuals, variable selection allows a simpler interpretation and more reliable predictions. That subspace selection is often managed with supervised tools when the real question is motivated by variable prediction. We propose a partial least square (PLS) based method, called data‐driven sparse PLS (ddsPLS), allowing variable selection both in the covariate and the response parts using a single hyperparameter per component. The subspace estimation is also performed by tuning a number of underlying parameters. The ddsPLS method is compared with existing methods such as classical PLS and two well established sparse PLS methods through numerical simulations. The observed results are promising both in terms of variable selection and prediction performance. This methodology is based on new prediction quality descriptors associated with the classical R2 and Q2 , and uses bootstrap sampling to tune parameters and select an optimal regression model.

查看原文本刊更多论文

数据驱动的稀疏偏最小二乘

在具有大量变量和少量个体的监督高维设置中，变量选择允许更简单的解释和更可靠的预测。当实际问题由变量预测驱动时，子空间选择通常使用监督工具进行管理。我们提出了一种基于偏最小二乘(PLS)的方法，称为数据驱动的稀疏PLS (ddsPLS)，允许在协变量和响应部分使用每个组件的单个超参数进行变量选择。子空间估计也是通过调优一些底层参数来执行的。通过数值模拟，将ddsPLS方法与现有的经典PLS方法和两种已建立的稀疏PLS方法进行了比较。观察结果在变量选择和预测性能方面都是有希望的。该方法基于与经典R2和Q2相关的新的预测质量描述符，并使用自举抽样来调整参数并选择最优回归模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Analysis and Data Mining: The ASA Data Science Journal

自引率

0.00%

发文量