{"title":"利用快速增长的基因组集合对微生物群落进行基因分型的陷阱。","authors":"Chunyu Zhao, Zhou Jason Shi, Katherine S Pollard","doi":"10.1016/j.cels.2022.12.007","DOIUrl":null,"url":null,"abstract":"<p><p>Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":"14 2","pages":"160-176.e3"},"PeriodicalIF":9.0000,"publicationDate":"2023-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9957970/pdf/","citationCount":"0","resultStr":"{\"title\":\"Pitfalls of genotyping microbial communities with rapidly growing genome collections.\",\"authors\":\"Chunyu Zhao, Zhou Jason Shi, Katherine S Pollard\",\"doi\":\"10.1016/j.cels.2022.12.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.</p>\",\"PeriodicalId\":54348,\"journal\":{\"name\":\"Cell Systems\",\"volume\":\"14 2\",\"pages\":\"160-176.e3\"},\"PeriodicalIF\":9.0000,\"publicationDate\":\"2023-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9957970/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cell Systems\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.cels.2022.12.007\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell Systems","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.cels.2022.12.007","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/18 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Pitfalls of genotyping microbial communities with rapidly growing genome collections.
Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.
Cell SystemsMedicine-Pathology and Forensic Medicine
CiteScore
16.50
自引率
1.10%
发文量
84
审稿时长
42 days
期刊介绍:
In 2015, Cell Systems was founded as a platform within Cell Press to showcase innovative research in systems biology. Our primary goal is to investigate complex biological phenomena that cannot be simply explained by basic mathematical principles. While the physical sciences have long successfully tackled such challenges, we have discovered that our most impactful publications often employ quantitative, inference-based methodologies borrowed from the fields of physics, engineering, mathematics, and computer science. We are committed to providing a home for elegant research that addresses fundamental questions in systems biology.