A PRESS statistic for two-block partial least squares regression

2010 UK Workshop on Computational Intelligence (UKCI) Pub Date : 2010-11-09 DOI:10.1109/UKCI.2010.5625583

B. McWilliams, G. Montana

{"title":"A PRESS statistic for two-block partial least squares regression","authors":"B. McWilliams, G. Montana","doi":"10.1109/UKCI.2010.5625583","DOIUrl":null,"url":null,"abstract":"Predictive modelling of multivariate data where both the covariates and responses are high-dimensional is becoming an increasingly popular task in many data mining applications. Partial Least Squares (PLS) regression often turns out to be a useful model in these situations since it performs dimensionality reduction by assuming the existence of a small number of latent factors that may explain the linear dependence between input and output. In practice, the number of latent factors to be retained, which controls the complexity of the model and its predictive ability, has to be carefully selected. Typically this is done by cross validating a performance measure, such as the predictive error. Although cross validation works well in many practical settings, it can be computationally expensive. Various extensions to PLS have also been proposed for regularising the PLS solution and performing simultaneous dimensionality reduction and variable selection, but these come at the expense of additional complexity parameters that also need to be tuned by cross-validation. In this paper we derive a computationally efficient alternative to leave-one-out cross validation (LOOCV), a predicted sum of squares (PRESS) statistic for two-block PLS. We show that the PRESS is nearly identical to LOOCV but has the computational expense of only a single PLS model fit. Examples of the PRESS for selecting the number of latent factors and regularisation parameters are provided.","PeriodicalId":403291,"journal":{"name":"2010 UK Workshop on Computational Intelligence (UKCI)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 UK Workshop on Computational Intelligence (UKCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UKCI.2010.5625583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Predictive modelling of multivariate data where both the covariates and responses are high-dimensional is becoming an increasingly popular task in many data mining applications. Partial Least Squares (PLS) regression often turns out to be a useful model in these situations since it performs dimensionality reduction by assuming the existence of a small number of latent factors that may explain the linear dependence between input and output. In practice, the number of latent factors to be retained, which controls the complexity of the model and its predictive ability, has to be carefully selected. Typically this is done by cross validating a performance measure, such as the predictive error. Although cross validation works well in many practical settings, it can be computationally expensive. Various extensions to PLS have also been proposed for regularising the PLS solution and performing simultaneous dimensionality reduction and variable selection, but these come at the expense of additional complexity parameters that also need to be tuned by cross-validation. In this paper we derive a computationally efficient alternative to leave-one-out cross validation (LOOCV), a predicted sum of squares (PRESS) statistic for two-block PLS. We show that the PRESS is nearly identical to LOOCV but has the computational expense of only a single PLS model fit. Examples of the PRESS for selecting the number of latent factors and regularisation parameters are provided.

查看原文本刊更多论文

二块偏最小二乘回归的PRESS统计

在许多数据挖掘应用中，协变量和响应都是高维的多变量数据的预测建模正成为一项日益流行的任务。在这些情况下，偏最小二乘(PLS)回归通常是一个有用的模型，因为它通过假设存在少量可能解释输入和输出之间线性依赖的潜在因素来执行降维。在实践中，需要仔细选择保留的潜在因素的数量，它控制着模型的复杂性和预测能力。这通常是通过交叉验证性能度量来完成的，比如预测误差。尽管交叉验证在许多实际设置中工作得很好，但它在计算上可能很昂贵。还提出了对PLS的各种扩展，以规范PLS解决方案并同时执行降维和变量选择，但这些都是以额外的复杂性参数为代价的，这些参数也需要通过交叉验证进行调整。在本文中，我们推导了一种计算效率高的替代方案，即两块PLS的预测平方和(PRESS)统计。我们表明PRESS几乎与LOOCV相同，但只有单个PLS模型拟合的计算费用。给出了选择潜在因素数量和正则化参数的PRESS示例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 UK Workshop on Computational Intelligence (UKCI)

自引率

0.00%

发文量