Putative genome contamination has minimal impact on the GTDB taxonomy.

IF 4 2区 生物学 Q1 GENETICS & HEREDITY
Aaron J Mussig, Pierre-Alain Chaumeil, Maria Chuvochina, Christian Rinke, Donovan H Parks, Philip Hugenholtz
{"title":"Putative genome contamination has minimal impact on the GTDB taxonomy.","authors":"Aaron J Mussig, Pierre-Alain Chaumeil, Maria Chuvochina, Christian Rinke, Donovan H Parks, Philip Hugenholtz","doi":"10.1099/mgen.0.001256","DOIUrl":null,"url":null,"abstract":"<p><p>The Genome Taxonomy Database (GTDB) provides a species to domain classification of publicly available genomes based on average nucleotide identity (ANI) (for species) and a concatenated gene phylogeny normalized by evolutionary rates (for genus to phylum), which has been widely adopted by the scientific community. Here, we use the Genome UNClutterer (GUNC) software to identify putatively contaminated genomes in GTDB release 07-RS207. We found that GUNC reported 35,723 genomes as putatively contaminated, comprising 11.25 % of the 317,542 genomes in GTDB release 07-RS207. To assess the impact of this high level of inferred contamination on the delineation of taxa, we created 'clean' versions of the 34,846 putatively contaminated bacterial genomes by removing the most contaminated half. For each clean half, we re-calculated the ANI and concatenated gene phylogeny and found that only 77 (0.22 %) of the genomes were not consistent with their original classification. We conclude that the delineation of taxa in GTDB is robust to the putative contamination detected by GUNC.</p>","PeriodicalId":18487,"journal":{"name":"Microbial Genomics","volume":"10 5","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11261887/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbial Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1099/mgen.0.001256","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

The Genome Taxonomy Database (GTDB) provides a species to domain classification of publicly available genomes based on average nucleotide identity (ANI) (for species) and a concatenated gene phylogeny normalized by evolutionary rates (for genus to phylum), which has been widely adopted by the scientific community. Here, we use the Genome UNClutterer (GUNC) software to identify putatively contaminated genomes in GTDB release 07-RS207. We found that GUNC reported 35,723 genomes as putatively contaminated, comprising 11.25 % of the 317,542 genomes in GTDB release 07-RS207. To assess the impact of this high level of inferred contamination on the delineation of taxa, we created 'clean' versions of the 34,846 putatively contaminated bacterial genomes by removing the most contaminated half. For each clean half, we re-calculated the ANI and concatenated gene phylogeny and found that only 77 (0.22 %) of the genomes were not consistent with their original classification. We conclude that the delineation of taxa in GTDB is robust to the putative contamination detected by GUNC.

假定基因组污染对 GTDB 分类的影响微乎其微。
基因组分类数据库(GTDB)根据平均核苷酸同一性(ANI)(种)和按进化速度归一化的基因系统发生序列(属到门)对公开的基因组进行从种到域的分类,这种分类方法已被科学界广泛采用。在此,我们使用基因组 UNClutterer(GUNC)软件来识别 GTDB 07-RS207 版中可能受到污染的基因组。我们发现,GUNC 报告了 35,723 个基因组可能受到污染,占 GTDB 07-RS207 版中 317,542 个基因组的 11.25%。为了评估这种高水平的推断污染对分类群划分的影响,我们删除了受污染最严重的一半基因组,从而创建了 34,846 个推测受污染细菌基因组的 "干净 "版本。对于每一半干净的基因组,我们都重新计算了ANI和连接基因系统发生,结果发现只有77个基因组(0.22%)与原来的分类不一致。我们的结论是,GTDB 中的类群划分对 GUNC 检测到的假定污染是稳健的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Microbial Genomics
Microbial Genomics Medicine-Epidemiology
CiteScore
6.60
自引率
2.60%
发文量
153
审稿时长
12 weeks
期刊介绍: Microbial Genomics (MGen) is a fully open access, mandatory open data and peer-reviewed journal publishing high-profile original research on archaea, bacteria, microbial eukaryotes and viruses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信