Fast computation of genome-metagenome interaction effects.

IF 1.5 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS
Algorithms for Molecular Biology Pub Date : 2020-07-01 eCollection Date: 2020-01-01 DOI:10.1186/s13015-020-00173-2
Florent Guinot, Marie Szafranski, Julien Chiquet, Anouk Zancarini, Christine Le Signor, Christophe Mougel, Christophe Ambroise
{"title":"Fast computation of genome-metagenome interaction effects.","authors":"Florent Guinot,&nbsp;Marie Szafranski,&nbsp;Julien Chiquet,&nbsp;Anouk Zancarini,&nbsp;Christine Le Signor,&nbsp;Christophe Mougel,&nbsp;Christophe Ambroise","doi":"10.1186/s13015-020-00173-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Association studies have been widely used to search for associations between common genetic variants observations and a given phenotype. However, it is now generally accepted that genes and environment must be examined jointly when estimating phenotypic variance. In this work we consider two types of biological markers: genotypic markers, which characterize an observation in terms of inherited genetic information, and metagenomic marker which are related to the environment. Both types of markers are available in their millions and can be used to characterize any observation uniquely.</p><p><strong>Objective: </strong>Our focus is on detecting interactions between groups of genetic and metagenomic markers in order to gain a better understanding of the complex relationship between environment and genome in the expression of a given phenotype.</p><p><strong>Contributions: </strong>We propose a novel approach for efficiently detecting interactions between complementary datasets in a high-dimensional setting with a reduced computational cost. The method, named SICOMORE, reduces the dimension of the search space by selecting a subset of supervariables in the two complementary datasets. These supervariables are given by a weighted group structure defined on sets of variables at different scales. A Lasso selection is then applied on each type of supervariable to obtain a subset of potential interactions that will be explored via linear model testing.</p><p><strong>Results: </strong>We compare SICOMORE with other approaches in simulations, with varying sample sizes, noise, and numbers of true interactions. SICOMORE exhibits convincing results in terms of recall, as well as competitive performances with respect to running time. The method is also used to detect interaction between genomic markers in <i>Medicago truncatula</i> and metagenomic markers in its rhizosphere bacterial community.</p><p><strong>Software availability: </strong>An R package is available [4], along with its documentation and associated scripts, allowing the reader to reproduce the results presented in the paper.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"15 ","pages":"13"},"PeriodicalIF":1.5000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-020-00173-2","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-020-00173-2","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 2

Abstract

Motivation: Association studies have been widely used to search for associations between common genetic variants observations and a given phenotype. However, it is now generally accepted that genes and environment must be examined jointly when estimating phenotypic variance. In this work we consider two types of biological markers: genotypic markers, which characterize an observation in terms of inherited genetic information, and metagenomic marker which are related to the environment. Both types of markers are available in their millions and can be used to characterize any observation uniquely.

Objective: Our focus is on detecting interactions between groups of genetic and metagenomic markers in order to gain a better understanding of the complex relationship between environment and genome in the expression of a given phenotype.

Contributions: We propose a novel approach for efficiently detecting interactions between complementary datasets in a high-dimensional setting with a reduced computational cost. The method, named SICOMORE, reduces the dimension of the search space by selecting a subset of supervariables in the two complementary datasets. These supervariables are given by a weighted group structure defined on sets of variables at different scales. A Lasso selection is then applied on each type of supervariable to obtain a subset of potential interactions that will be explored via linear model testing.

Results: We compare SICOMORE with other approaches in simulations, with varying sample sizes, noise, and numbers of true interactions. SICOMORE exhibits convincing results in terms of recall, as well as competitive performances with respect to running time. The method is also used to detect interaction between genomic markers in Medicago truncatula and metagenomic markers in its rhizosphere bacterial community.

Software availability: An R package is available [4], along with its documentation and associated scripts, allowing the reader to reproduce the results presented in the paper.

Abstract Image

Abstract Image

Abstract Image

基因组-宏基因组相互作用效应的快速计算。
动机:关联研究已被广泛用于寻找常见遗传变异观察和给定表型之间的关联。然而,现在人们普遍认为,在估计表型变异时,基因和环境必须联合检查。在这项工作中,我们考虑了两种类型的生物标记:基因型标记,它表征了遗传遗传信息方面的观察,以及与环境相关的宏基因组标记。这两种类型的标记数以百万计,可用于表征任何观察独特。目的:我们的重点是检测遗传和宏基因组标记之间的相互作用,以便更好地理解环境和基因组在特定表型表达中的复杂关系。贡献:我们提出了一种新的方法,可以有效地检测高维环境中互补数据集之间的相互作用,同时降低了计算成本。该方法名为SICOMORE,通过在两个互补数据集中选择超变量的子集来降低搜索空间的维数。这些超变量由定义在不同尺度上的变量集上的加权群结构给出。然后将Lasso选择应用于每种类型的超变量,以获得将通过线性模型测试探索的潜在相互作用的子集。结果:我们在模拟中比较了SICOMORE和其他方法,不同的样本量、噪声和真实相互作用的数量。SICOMORE在回忆方面表现出令人信服的结果,以及在运行时间方面的竞争表现。该方法还可用于检测苜蓿根际细菌群落基因组标记与宏基因组标记之间的相互作用。软件可用性:一个R包是可用的[4],以及它的文档和相关的脚本,允许读者重现论文中呈现的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Algorithms for Molecular Biology
Algorithms for Molecular Biology 生物-生化研究方法
CiteScore
2.40
自引率
10.00%
发文量
16
审稿时长
>12 weeks
期刊介绍: Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信