Variable Selection and Redundancy in Multivariate Regression Models

F. Westad, Federico Marini
{"title":"Variable Selection and Redundancy in Multivariate Regression Models","authors":"F. Westad, Federico Marini","doi":"10.3389/frans.2022.897605","DOIUrl":null,"url":null,"abstract":"Variable selection is a topic of interest in many scientific communities. Within chemometrics, where the number of variables for multi-channel instruments like NIR spectroscopy and metabolomics in many situations is larger than the number of samples, the strategy has been to use latent variable regression methods to overcome the challenges with multiple linear regression. Thereby, there is no need to remove variables as such, as the low-rank models handle collinearity and redundancy. In most studies on variable selection, the main objective was to compare the prediction performance (RMSE or accuracy in classification) between various methods. Nevertheless, different methods with the same objective will, in most cases, give results that are not significantly different. In this study, we present three other main objectives: i) to eliminate variables that are not relevant; ii) to return a small subset of variables that has the same or better prediction performance as a model with all original variables; and iii) to investigate the consistency of these small subsets.","PeriodicalId":73063,"journal":{"name":"Frontiers in analytical science","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in analytical science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frans.2022.897605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Variable selection is a topic of interest in many scientific communities. Within chemometrics, where the number of variables for multi-channel instruments like NIR spectroscopy and metabolomics in many situations is larger than the number of samples, the strategy has been to use latent variable regression methods to overcome the challenges with multiple linear regression. Thereby, there is no need to remove variables as such, as the low-rank models handle collinearity and redundancy. In most studies on variable selection, the main objective was to compare the prediction performance (RMSE or accuracy in classification) between various methods. Nevertheless, different methods with the same objective will, in most cases, give results that are not significantly different. In this study, we present three other main objectives: i) to eliminate variables that are not relevant; ii) to return a small subset of variables that has the same or better prediction performance as a model with all original variables; and iii) to investigate the consistency of these small subsets.
多元回归模型中的变量选择与冗余
变量选择是许多科学界感兴趣的话题。在化学计学中,在许多情况下,NIR光谱和代谢组学等多通道仪器的变量数量大于样本数量,策略是使用潜在变量回归方法来克服多元线性回归的挑战。因此,不需要这样去除变量,因为低秩模型处理共线和冗余。在大多数关于变量选择的研究中,主要目的是比较各种方法之间的预测性能(RMSE或分类准确性)。然而,在大多数情况下,具有相同目标的不同方法会产生没有显著差异的结果。在这项研究中,我们提出了其他三个主要目标:i)消除不相关的变量;ii)返回具有与具有所有原始变量的模型相同或更好的预测性能的变量的小子集;以及iii)研究这些小子集的一致性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信