多元线性模型中的变量选择方法:在LC-MS代谢组学数据中的应用。

Pub Date : 2018-09-08 DOI:10.1515/sagmb-2017-0077
Marie Perrot-Dockès, Céline Lévy-Leduc, Julien Chiquet, Laure Sansonnet, Margaux Brégère, Marie-Pierre Étienne, Stéphane Robin, Grégory Genta-Jouve
{"title":"多元线性模型中的变量选择方法:在LC-MS代谢组学数据中的应用。","authors":"Marie Perrot-Dockès,&nbsp;Céline Lévy-Leduc,&nbsp;Julien Chiquet,&nbsp;Laure Sansonnet,&nbsp;Margaux Brégère,&nbsp;Marie-Pierre Étienne,&nbsp;Stéphane Robin,&nbsp;Grégory Genta-Jouve","doi":"10.1515/sagmb-2017-0077","DOIUrl":null,"url":null,"abstract":"<p><p>Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. Applying statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure within the multivariate linear model framework that accounts for the dependence between the multiple responses. We shall focus on a specific type of dependence which consists in assuming that the responses of a given individual can be modelled as a time series. We propose a novel Lasso-based approach within the framework of the multivariate linear model taking into account the dependence structure by using different types of stationary processes covariance structures for the random error matrix. Our numerical experiments show that including the estimation of the covariance matrix of the random error matrix in the Lasso criterion dramatically improves the variable selection performance. Our approach is successfully applied to an untargeted LC-MS (Liquid Chromatography-Mass Spectrometry) data set made of African copals samples. Our methodology is implemented in the R package MultiVarSel which is available from the Comprehensive R Archive Network (CRAN).</p>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2017-0077","citationCount":"6","resultStr":"{\"title\":\"A variable selection approach in the multivariate linear model: an application to LC-MS metabolomics data.\",\"authors\":\"Marie Perrot-Dockès,&nbsp;Céline Lévy-Leduc,&nbsp;Julien Chiquet,&nbsp;Laure Sansonnet,&nbsp;Margaux Brégère,&nbsp;Marie-Pierre Étienne,&nbsp;Stéphane Robin,&nbsp;Grégory Genta-Jouve\",\"doi\":\"10.1515/sagmb-2017-0077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. Applying statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure within the multivariate linear model framework that accounts for the dependence between the multiple responses. We shall focus on a specific type of dependence which consists in assuming that the responses of a given individual can be modelled as a time series. We propose a novel Lasso-based approach within the framework of the multivariate linear model taking into account the dependence structure by using different types of stationary processes covariance structures for the random error matrix. Our numerical experiments show that including the estimation of the covariance matrix of the random error matrix in the Lasso criterion dramatically improves the variable selection performance. Our approach is successfully applied to an untargeted LC-MS (Liquid Chromatography-Mass Spectrometry) data set made of African copals samples. Our methodology is implemented in the R package MultiVarSel which is available from the Comprehensive R Archive Network (CRAN).</p>\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2018-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/sagmb-2017-0077\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1515/sagmb-2017-0077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2017-0077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

组学数据的特点是存在很强的依赖结构,这些结构要么来自数据采集,要么来自一些潜在的生物过程。应用不将变量选择步骤调整为依赖模式的统计过程可能导致功率损失和选择虚假变量。本文的目标是在多元线性模型框架内提出一个变量选择程序,该程序考虑了多个响应之间的依赖性。我们将把重点放在一种特定类型的依赖上,这种依赖包括假设一个给定个体的反应可以建模为一个时间序列。我们在多元线性模型的框架内提出了一种新的基于lasso的方法,通过对随机误差矩阵使用不同类型的平稳过程协方差结构来考虑依赖结构。我们的数值实验表明,在Lasso准则中加入随机误差矩阵的协方差矩阵的估计可以显著提高变量选择的性能。我们的方法成功地应用于由非洲煤样品组成的非靶向LC-MS(液相色谱-质谱)数据集。我们的方法是在R软件包MultiVarSel中实现的,该软件包可从综合R档案网络(CRAN)获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
分享
查看原文
A variable selection approach in the multivariate linear model: an application to LC-MS metabolomics data.

Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. Applying statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure within the multivariate linear model framework that accounts for the dependence between the multiple responses. We shall focus on a specific type of dependence which consists in assuming that the responses of a given individual can be modelled as a time series. We propose a novel Lasso-based approach within the framework of the multivariate linear model taking into account the dependence structure by using different types of stationary processes covariance structures for the random error matrix. Our numerical experiments show that including the estimation of the covariance matrix of the random error matrix in the Lasso criterion dramatically improves the variable selection performance. Our approach is successfully applied to an untargeted LC-MS (Liquid Chromatography-Mass Spectrometry) data set made of African copals samples. Our methodology is implemented in the R package MultiVarSel which is available from the Comprehensive R Archive Network (CRAN).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信