利用快速增长的基因组集合对微生物群落进行基因分型的陷阱。

IF 9 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Cell Systems Pub Date : 2023-02-15 Epub Date: 2023-01-18 DOI:10.1016/j.cels.2022.12.007
Chunyu Zhao, Zhou Jason Shi, Katherine S Pollard
{"title":"利用快速增长的基因组集合对微生物群落进行基因分型的陷阱。","authors":"Chunyu Zhao, Zhou Jason Shi, Katherine S Pollard","doi":"10.1016/j.cels.2022.12.007","DOIUrl":null,"url":null,"abstract":"<p><p>Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":"14 2","pages":"160-176.e3"},"PeriodicalIF":9.0000,"publicationDate":"2023-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9957970/pdf/","citationCount":"0","resultStr":"{\"title\":\"Pitfalls of genotyping microbial communities with rapidly growing genome collections.\",\"authors\":\"Chunyu Zhao, Zhou Jason Shi, Katherine S Pollard\",\"doi\":\"10.1016/j.cels.2022.12.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.</p>\",\"PeriodicalId\":54348,\"journal\":{\"name\":\"Cell Systems\",\"volume\":\"14 2\",\"pages\":\"160-176.e3\"},\"PeriodicalIF\":9.0000,\"publicationDate\":\"2023-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9957970/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cell Systems\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.cels.2022.12.007\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell Systems","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.cels.2022.12.007","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/18 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

检测元基因组数据中的遗传变异是了解微生物群落进化、生态和功能特征的当务之急。许多进行元基因分型的工具都依赖于将来源不明的读数与来自许多物种的序列数据库进行比对,然后再调用变异。在这篇综述中,我们研究了日益多样化和密切相关的物种数据库是如何挑战当前比对算法的极限,从而降低元基因分型工具的性能的。我们发现多重配对读数是错误的主要来源,并说明了保留正确配对与限制错误配对之间的权衡,其中许多错误配对将读数映射到了错误的物种。然后,我们评估了几种可行的缓解策略,并回顾了有望进一步改进元基因分型的新兴方法,以应对基因组收集的快速增长。我们的研究结果不仅对元基因分型有影响,还对微生物基因组学中许多依赖于准确读数映射的工具有影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Pitfalls of genotyping microbial communities with rapidly growing genome collections.

Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cell Systems
Cell Systems Medicine-Pathology and Forensic Medicine
CiteScore
16.50
自引率
1.10%
发文量
84
审稿时长
42 days
期刊介绍: In 2015, Cell Systems was founded as a platform within Cell Press to showcase innovative research in systems biology. Our primary goal is to investigate complex biological phenomena that cannot be simply explained by basic mathematical principles. While the physical sciences have long successfully tackled such challenges, we have discovered that our most impactful publications often employ quantitative, inference-based methodologies borrowed from the fields of physics, engineering, mathematics, and computer science. We are committed to providing a home for elegant research that addresses fundamental questions in systems biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信