Performance of PLS regression coefficients in selecting variables for each response of a multivariate PLS for omics-type data.

Q2 Biochemistry, Genetics and Molecular Biology
Giuseppe Palermo, Paolo Piraino, Hans-Dieter Zucht
{"title":"Performance of PLS regression coefficients in selecting variables for each response of a multivariate PLS for omics-type data.","authors":"Giuseppe Palermo,&nbsp;Paolo Piraino,&nbsp;Hans-Dieter Zucht","doi":"10.2147/aabc.s3619","DOIUrl":null,"url":null,"abstract":"<p><p>Multivariate partial least square (PLS) regression allows the modeling of complex biological events, by considering different factors at the same time. It is unaffected by data collinearity, representing a valuable method for modeling high-dimensional biological data (as derived from genomics, proteomics and peptidomics). In presence of multiple responses, it is of particular interest how to appropriately \"dissect\" the model, to reveal the importance of single attributes with regard to individual responses (for example, variable selection). In this paper, performances of multivariate PLS regression coefficients, in selecting relevant predictors for different responses in omics-type of data, were investigated by means of a receiver operating characteristic (ROC) analysis. For this purpose, simulated data, mimicking the covariance structures of microarray and liquid chromatography mass spectrometric data, were used to generate matrices of predictors and responses. The relevant predictors were set a priori. The influences of noise, the source of data with different covariance structure and the size of relevant predictors were investigated. Results demonstrate the applicability of PLS regression coefficients in selecting variables for each response of a multivariate PLS, in omics-type of data. Comparisons with other feature selection methods, such as variable importance in the projection scores, principal component regression, and least absolute shrinkage and selection operator regression were also provided.</p>","PeriodicalId":53584,"journal":{"name":"Advances and Applications in Bioinformatics and Chemistry","volume":"2 ","pages":"57-70"},"PeriodicalIF":0.0000,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2147/aabc.s3619","citationCount":"106","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances and Applications in Bioinformatics and Chemistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2147/aabc.s3619","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2009/5/13 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 106

Abstract

Multivariate partial least square (PLS) regression allows the modeling of complex biological events, by considering different factors at the same time. It is unaffected by data collinearity, representing a valuable method for modeling high-dimensional biological data (as derived from genomics, proteomics and peptidomics). In presence of multiple responses, it is of particular interest how to appropriately "dissect" the model, to reveal the importance of single attributes with regard to individual responses (for example, variable selection). In this paper, performances of multivariate PLS regression coefficients, in selecting relevant predictors for different responses in omics-type of data, were investigated by means of a receiver operating characteristic (ROC) analysis. For this purpose, simulated data, mimicking the covariance structures of microarray and liquid chromatography mass spectrometric data, were used to generate matrices of predictors and responses. The relevant predictors were set a priori. The influences of noise, the source of data with different covariance structure and the size of relevant predictors were investigated. Results demonstrate the applicability of PLS regression coefficients in selecting variables for each response of a multivariate PLS, in omics-type of data. Comparisons with other feature selection methods, such as variable importance in the projection scores, principal component regression, and least absolute shrinkage and selection operator regression were also provided.

Abstract Image

Abstract Image

Abstract Image

PLS回归系数在为组学类型数据的多变量PLS的每个响应选择变量时的表现。
多元偏最小二乘(PLS)回归允许建模复杂的生物事件,同时考虑不同的因素。它不受数据共线性的影响,代表了一种有价值的高维生物数据建模方法(来自基因组学,蛋白质组学和肽组学)。在存在多个响应的情况下,如何适当地“剖析”模型以揭示单个属性相对于单个响应的重要性(例如,变量选择)是特别有趣的。本文通过受试者工作特征(ROC)分析,研究了多元PLS回归系数在为组学类型数据的不同反应选择相关预测因子方面的表现。为此,模拟数据,模拟微阵列和液相色谱质谱数据的协方差结构,用于生成预测因子和响应矩阵。相关预测因子是先验设定的。考察了噪声、不同协方差结构的数据来源以及相关预测因子的大小等因素的影响。结果证明了PLS回归系数在组学类型数据中为多元PLS的每个响应选择变量时的适用性。还提供了与其他特征选择方法的比较,例如投影分数中的变量重要性,主成分回归,最小绝对收缩和选择算子回归。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Advances and Applications in Bioinformatics and Chemistry
Advances and Applications in Bioinformatics and Chemistry Biochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (miscellaneous)
CiteScore
6.50
自引率
0.00%
发文量
7
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信