MetaKSSD: boosting the scalability of the reference taxonomic marker database and the performance of metagenomic profiling using sketch operations.

IF 18.3 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Huiguang Yi, Xiaoxin Lu, Qing Chang
{"title":"MetaKSSD: boosting the scalability of the reference taxonomic marker database and the performance of metagenomic profiling using sketch operations.","authors":"Huiguang Yi, Xiaoxin Lu, Qing Chang","doi":"10.1038/s43588-025-00855-0","DOIUrl":null,"url":null,"abstract":"<p><p>The performance of metagenomic profiling is constrained by the diversity of taxa present in the reference taxonomic marker database (MarkerDB) used. However, continually updating MarkerDB to include newly determined taxa using existing approaches faces increasing difficulties and will soon become impractical. Here we introduce MetaKSSD, which redefines MarkerDB construction and metagenomic profiling using sketch operations, enhancing MarkerDB scalability and profiling performance. MetaKSSD encompasses 85,202 species in its MarkerDB using just 0.17 GB of storage and profiles 10 GB of data within seconds. Leveraging its comprehensive MarkerDB, MetaKSSD substantially improves profiling results. In a microbiome-phenotype association study, MetaKSSD identified more effective associations than MetaPhlAn4. We profiled 382,016 metagenomic runs using MetaKSSD, conducted extensive sample clustering analyses and suggested potential yet-to-be-discovered niches. MetaKSSD offers functionality for instantaneous searching of similar profiles. It enables the swift transmission of metagenome sketches over the network and real-time online metagenomic analysis, facilitating use by non-expert users.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3000,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43588-025-00855-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

The performance of metagenomic profiling is constrained by the diversity of taxa present in the reference taxonomic marker database (MarkerDB) used. However, continually updating MarkerDB to include newly determined taxa using existing approaches faces increasing difficulties and will soon become impractical. Here we introduce MetaKSSD, which redefines MarkerDB construction and metagenomic profiling using sketch operations, enhancing MarkerDB scalability and profiling performance. MetaKSSD encompasses 85,202 species in its MarkerDB using just 0.17 GB of storage and profiles 10 GB of data within seconds. Leveraging its comprehensive MarkerDB, MetaKSSD substantially improves profiling results. In a microbiome-phenotype association study, MetaKSSD identified more effective associations than MetaPhlAn4. We profiled 382,016 metagenomic runs using MetaKSSD, conducted extensive sample clustering analyses and suggested potential yet-to-be-discovered niches. MetaKSSD offers functionality for instantaneous searching of similar profiles. It enables the swift transmission of metagenome sketches over the network and real-time online metagenomic analysis, facilitating use by non-expert users.

MetaKSSD:提高参考分类标记数据库的可扩展性和使用草图操作的宏基因组分析的性能。
宏基因组分析的性能受到所使用的参考分类标记数据库(MarkerDB)中存在的分类群多样性的限制。然而,使用现有方法不断更新MarkerDB以包括新确定的分类群面临越来越多的困难,并且很快就会变得不切实际。本文介绍了MetaKSSD,它使用草图操作重新定义了markdb的构建和宏基因组分析,增强了markdb的可扩展性和分析性能。MetaKSSD在其markdb中包含85,202个物种,仅使用0.17 GB的存储空间,并在几秒钟内配置10gb的数据。利用其全面的markdb, MetaKSSD大大提高了分析结果。在一项微生物组-表型关联研究中,MetaKSSD鉴定出比MetaPhlAn4更有效的关联。我们使用MetaKSSD分析了382016个宏基因组序列,进行了广泛的样本聚类分析,并提出了潜在的尚未发现的利基。MetaKSSD提供了即时搜索类似配置文件的功能。它可以通过网络快速传输宏基因组草图和实时在线宏基因组分析,方便非专业用户使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
11.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信