MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification.

IF 6.2 2区 环境科学与生态学 Q1 GENETICS & HEREDITY
M Pilar Cabezas, Nuno A Fonseca, Antonio Muñoz-Mérida
{"title":"MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification.","authors":"M Pilar Cabezas, Nuno A Fonseca, Antonio Muñoz-Mérida","doi":"10.1186/s40793-024-00634-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Accurate determination and quantification of the taxonomic composition of microbial communities, especially at the species level, is one of the major issues in metagenomics. This is primarily due to the limitations of commonly used 16S rRNA reference databases, which either contain a lot of redundancy or a high percentage of sequences with missing taxonomic information. This may lead to erroneous identifications and, thus, to inaccurate conclusions regarding the ecological role and importance of those microorganisms in the ecosystem.</p><p><strong>Results: </strong>The current study presents MIMt, a new 16S rRNA database for archaea and bacteria's identification, encompassing 47 001 sequences, all precisely identified at species level. In addition, a MIMt2.0 version was created with only curated sequences from RefSeq Targeted loci with 32 086 sequences. MIMt aims to be updated twice a year to include all newly sequenced species. We evaluated MIMt against Greengenes, RDP, GTDB and SILVA in terms of sequence distribution and taxonomic assignments accuracy. Our results showed that MIMt contains less redundancy, and despite being 20 to 500 times smaller than existing databases, outperforms them in completeness and taxonomic accuracy, enabling more precise assignments at lower taxonomic ranks and thus, significantly improving species-level identification.</p>","PeriodicalId":48553,"journal":{"name":"Environmental Microbiome","volume":"19 1","pages":"88"},"PeriodicalIF":6.2000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550520/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Microbiome","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1186/s40793-024-00634-w","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Accurate determination and quantification of the taxonomic composition of microbial communities, especially at the species level, is one of the major issues in metagenomics. This is primarily due to the limitations of commonly used 16S rRNA reference databases, which either contain a lot of redundancy or a high percentage of sequences with missing taxonomic information. This may lead to erroneous identifications and, thus, to inaccurate conclusions regarding the ecological role and importance of those microorganisms in the ecosystem.

Results: The current study presents MIMt, a new 16S rRNA database for archaea and bacteria's identification, encompassing 47 001 sequences, all precisely identified at species level. In addition, a MIMt2.0 version was created with only curated sequences from RefSeq Targeted loci with 32 086 sequences. MIMt aims to be updated twice a year to include all newly sequenced species. We evaluated MIMt against Greengenes, RDP, GTDB and SILVA in terms of sequence distribution and taxonomic assignments accuracy. Our results showed that MIMt contains less redundancy, and despite being 20 to 500 times smaller than existing databases, outperforms them in completeness and taxonomic accuracy, enabling more precise assignments at lower taxonomic ranks and thus, significantly improving species-level identification.

MIMt:一个经过整理的 16S rRNA 参考数据库,冗余较少,物种级鉴定准确率较高。
动机准确确定和量化微生物群落的分类组成,尤其是物种水平的分类组成,是元基因组学的主要问题之一。这主要是由于常用的 16S rRNA 参考数据库存在局限性,要么包含大量冗余信息,要么有很大比例的序列缺少分类信息。这可能会导致错误的鉴定,从而对这些微生物在生态系统中的生态作用和重要性得出不准确的结论:目前的研究介绍了用于鉴定古细菌和细菌的新型 16S rRNA 数据库 MIMt,该数据库包含 47 001 个序列,所有序列均已精确鉴定为物种。此外,还创建了一个 MIMt2.0 版本,其中只包含来自 RefSeq 目标基因座的 32 086 条序列。MIMt 的目标是每年更新两次,以纳入所有新测序的物种。我们对 MIMt 与 Greengenes、RDP、GTDB 和 SILVA 在序列分布和分类分配准确性方面的比较进行了评估。结果表明,MIMt 包含的冗余较少,尽管比现有数据库小 20 到 500 倍,但在完整性和分类准确性方面优于现有数据库,可以在较低的分类等级上进行更精确的分类,从而显著提高物种鉴定水平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Environmental Microbiome
Environmental Microbiome Immunology and Microbiology-Microbiology
CiteScore
7.40
自引率
2.50%
发文量
55
审稿时长
13 weeks
期刊介绍: Microorganisms, omnipresent across Earth's diverse environments, play a crucial role in adapting to external changes, influencing Earth's systems and cycles, and contributing significantly to agricultural practices. Through applied microbiology, they offer solutions to various everyday needs. Environmental Microbiome recognizes the universal presence and significance of microorganisms, inviting submissions that explore the diverse facets of environmental and applied microbiological research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信