少量的错误组装会对基于泛基因组的宏基因组分析产生不成比例的影响。

IF 3.7 2区 生物学 Q2 MICROBIOLOGY
mSphere Pub Date : 2025-05-27 Epub Date: 2025-04-29 DOI:10.1128/msphere.00857-24
Stephanie N Majernik, Larry Beaver, Patrick H Bradley
{"title":"少量的错误组装会对基于泛基因组的宏基因组分析产生不成比例的影响。","authors":"Stephanie N Majernik, Larry Beaver, Patrick H Bradley","doi":"10.1128/msphere.00857-24","DOIUrl":null,"url":null,"abstract":"<p><p>Individual genes from microbiomes can drive host-level phenotypes. To help identify such candidate genes, several recent tools estimate microbial gene copy numbers directly from metagenomes. These tools rely on alignments to pangenomes, which, in turn, are derived from the set of all individual genomes from one species. While large-scale metagenomic assembly efforts have made pangenome estimates more complete, mixed communities can also introduce contamination into assemblies, and it is unknown how robust pangenome-based metagenomic analyses are to these errors. To gain insight into this problem, we re-analyzed a case-control study of the gut microbiome in cirrhosis, focusing on commensal Clostridia previously implicated in this disease. We tested for differentially prevalent genes in the <i>Lachnospiraceae</i> and then investigated which were likely to be contaminants using sequence similarity searches. Out of 86 differentially prevalent genes, we found that 33 (38%) were probably contaminants originating in taxa such as <i>Veillonella</i> and <i>Haemophilus</i>, unrelated genera that were independently correlated with disease status. Our results demonstrate that even small amounts of contamination in metagenome assemblies, below typical quality thresholds, can threaten to overwhelm gene-level metagenomic analyses. However, we also show that such contaminants can be accurately identified using a method based on gene-to-species correlation. After removing these contaminants, we observe that several flagellar motility gene clusters in the <i>Lachnospira eligens</i> pangenome are associated with cirrhosis status. We have integrated our analyses into an analysis and visualization pipeline, PanSweep, that can automatically identify cases where pangenome contamination may bias the results of gene-resolved analyses.IMPORTANCEMetagenome-assembled genomes, or MAGs, can be constructed without pure cultures of microbes. Large-scale efforts to build MAGs have yielded more complete pangenomes (i.e., sets of all genes found in one species). Pangenomes allow us to measure strain variation in gene content, which can strongly affect phenotype. However, because MAGs come from mixed communities, they can contaminate pangenomes with unrelated DNA; how much this impacts downstream analyses has not been studied. Using a metagenomic study of gut microbes in cirrhosis as our test case, we investigate how contamination affects analyses of microbial gene content. Surprisingly, even small, typical amounts of MAG contamination (<5%) result in disproportionately high levels of false positive associations (38%). Fortunately, we show that most contaminants can be automatically flagged and provide a simple method for doing so. Furthermore, applying this method reveals a new association between cirrhosis and gut microbial motility.</p>","PeriodicalId":19052,"journal":{"name":"mSphere","volume":" ","pages":"e0085724"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12108083/pdf/","citationCount":"0","resultStr":"{\"title\":\"Small amounts of misassembly can have disproportionate effects on pangenome-based metagenomic analyses.\",\"authors\":\"Stephanie N Majernik, Larry Beaver, Patrick H Bradley\",\"doi\":\"10.1128/msphere.00857-24\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Individual genes from microbiomes can drive host-level phenotypes. To help identify such candidate genes, several recent tools estimate microbial gene copy numbers directly from metagenomes. These tools rely on alignments to pangenomes, which, in turn, are derived from the set of all individual genomes from one species. While large-scale metagenomic assembly efforts have made pangenome estimates more complete, mixed communities can also introduce contamination into assemblies, and it is unknown how robust pangenome-based metagenomic analyses are to these errors. To gain insight into this problem, we re-analyzed a case-control study of the gut microbiome in cirrhosis, focusing on commensal Clostridia previously implicated in this disease. We tested for differentially prevalent genes in the <i>Lachnospiraceae</i> and then investigated which were likely to be contaminants using sequence similarity searches. Out of 86 differentially prevalent genes, we found that 33 (38%) were probably contaminants originating in taxa such as <i>Veillonella</i> and <i>Haemophilus</i>, unrelated genera that were independently correlated with disease status. Our results demonstrate that even small amounts of contamination in metagenome assemblies, below typical quality thresholds, can threaten to overwhelm gene-level metagenomic analyses. However, we also show that such contaminants can be accurately identified using a method based on gene-to-species correlation. After removing these contaminants, we observe that several flagellar motility gene clusters in the <i>Lachnospira eligens</i> pangenome are associated with cirrhosis status. We have integrated our analyses into an analysis and visualization pipeline, PanSweep, that can automatically identify cases where pangenome contamination may bias the results of gene-resolved analyses.IMPORTANCEMetagenome-assembled genomes, or MAGs, can be constructed without pure cultures of microbes. Large-scale efforts to build MAGs have yielded more complete pangenomes (i.e., sets of all genes found in one species). Pangenomes allow us to measure strain variation in gene content, which can strongly affect phenotype. However, because MAGs come from mixed communities, they can contaminate pangenomes with unrelated DNA; how much this impacts downstream analyses has not been studied. Using a metagenomic study of gut microbes in cirrhosis as our test case, we investigate how contamination affects analyses of microbial gene content. Surprisingly, even small, typical amounts of MAG contamination (<5%) result in disproportionately high levels of false positive associations (38%). Fortunately, we show that most contaminants can be automatically flagged and provide a simple method for doing so. Furthermore, applying this method reveals a new association between cirrhosis and gut microbial motility.</p>\",\"PeriodicalId\":19052,\"journal\":{\"name\":\"mSphere\",\"volume\":\" \",\"pages\":\"e0085724\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12108083/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"mSphere\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1128/msphere.00857-24\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/29 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSphere","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msphere.00857-24","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/29 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

来自微生物组的单个基因可以驱动宿主水平的表型。为了帮助鉴定这些候选基因,最近一些工具直接从宏基因组中估计微生物基因拷贝数。这些工具依赖于对泛基因组的比对,而泛基因组又来自于一个物种的所有个体基因组。虽然大规模宏基因组组装工作使泛基因组估计更加完整,但混合群落也可能将污染引入组装中,并且尚不清楚基于泛基因组的宏基因组分析对这些错误的鲁棒性。为了深入了解这个问题,我们重新分析了肝硬化中肠道微生物组的病例对照研究,重点关注了以前与该疾病有关的共生梭菌。我们测试了毛螺科中差异流行的基因,然后使用序列相似性搜索调查了哪些可能是污染物。在86个差异流行基因中,我们发现33个(38%)可能是源自微孔菌和嗜血杆菌等分类群的污染物,这些与疾病状态独立相关的不相关属。我们的研究结果表明,即使宏基因组组件中的少量污染,低于典型的质量阈值,也可能威胁到基因水平的宏基因组分析。然而,我们也表明,这种污染物可以使用基于基因-物种相关性的方法准确识别。在去除这些污染物后,我们观察到革毛螺旋体泛基因组中的几个鞭毛运动基因簇与肝硬化状态相关。我们已经将我们的分析集成到一个分析和可视化管道中,PanSweep,它可以自动识别泛基因组污染可能影响基因解析分析结果的情况。重要意义元基因组组装基因组(MAGs)可以在没有纯微生物培养的情况下构建。构建mag的大规模努力已经产生了更完整的泛基因组(即在一个物种中发现的所有基因的集合)。泛基因组使我们能够测量基因含量的菌株变异,这可以强烈影响表型。然而,由于mag来自混合群落,它们可以用不相关的DNA污染泛基因组;这对下游分析有多大影响还没有研究。使用肝硬化肠道微生物的宏基因组研究作为我们的测试案例,我们研究污染如何影响微生物基因含量的分析。令人惊讶的是,即使是少量的典型MAG污染(
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Small amounts of misassembly can have disproportionate effects on pangenome-based metagenomic analyses.

Individual genes from microbiomes can drive host-level phenotypes. To help identify such candidate genes, several recent tools estimate microbial gene copy numbers directly from metagenomes. These tools rely on alignments to pangenomes, which, in turn, are derived from the set of all individual genomes from one species. While large-scale metagenomic assembly efforts have made pangenome estimates more complete, mixed communities can also introduce contamination into assemblies, and it is unknown how robust pangenome-based metagenomic analyses are to these errors. To gain insight into this problem, we re-analyzed a case-control study of the gut microbiome in cirrhosis, focusing on commensal Clostridia previously implicated in this disease. We tested for differentially prevalent genes in the Lachnospiraceae and then investigated which were likely to be contaminants using sequence similarity searches. Out of 86 differentially prevalent genes, we found that 33 (38%) were probably contaminants originating in taxa such as Veillonella and Haemophilus, unrelated genera that were independently correlated with disease status. Our results demonstrate that even small amounts of contamination in metagenome assemblies, below typical quality thresholds, can threaten to overwhelm gene-level metagenomic analyses. However, we also show that such contaminants can be accurately identified using a method based on gene-to-species correlation. After removing these contaminants, we observe that several flagellar motility gene clusters in the Lachnospira eligens pangenome are associated with cirrhosis status. We have integrated our analyses into an analysis and visualization pipeline, PanSweep, that can automatically identify cases where pangenome contamination may bias the results of gene-resolved analyses.IMPORTANCEMetagenome-assembled genomes, or MAGs, can be constructed without pure cultures of microbes. Large-scale efforts to build MAGs have yielded more complete pangenomes (i.e., sets of all genes found in one species). Pangenomes allow us to measure strain variation in gene content, which can strongly affect phenotype. However, because MAGs come from mixed communities, they can contaminate pangenomes with unrelated DNA; how much this impacts downstream analyses has not been studied. Using a metagenomic study of gut microbes in cirrhosis as our test case, we investigate how contamination affects analyses of microbial gene content. Surprisingly, even small, typical amounts of MAG contamination (<5%) result in disproportionately high levels of false positive associations (38%). Fortunately, we show that most contaminants can be automatically flagged and provide a simple method for doing so. Furthermore, applying this method reveals a new association between cirrhosis and gut microbial motility.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
mSphere
mSphere Immunology and Microbiology-Microbiology
CiteScore
8.50
自引率
2.10%
发文量
192
审稿时长
11 weeks
期刊介绍: mSphere™ is a multi-disciplinary open-access journal that will focus on rapid publication of fundamental contributions to our understanding of microbiology. Its scope will reflect the immense range of fields within the microbial sciences, creating new opportunities for researchers to share findings that are transforming our understanding of human health and disease, ecosystems, neuroscience, agriculture, energy production, climate change, evolution, biogeochemical cycling, and food and drug production. Submissions will be encouraged of all high-quality work that makes fundamental contributions to our understanding of microbiology. mSphere™ will provide streamlined decisions, while carrying on ASM''s tradition for rigorous peer review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信