泛基因组数据库改进了临床元基因组数据中的宿主去除和分枝杆菌分类工作

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Michael B Hall, Lachlan J M Coin
{"title":"泛基因组数据库改进了临床元基因组数据中的宿主去除和分枝杆菌分类工作","authors":"Michael B Hall, Lachlan J M Coin","doi":"10.1093/gigascience/giae010","DOIUrl":null,"url":null,"abstract":"Background Culture-free real-time sequencing of clinical metagenomic samples promises both rapid pathogen detection and antimicrobial resistance profiling. However, this approach introduces the risk of patient DNA leakage. To mitigate this risk, we need near-comprehensive removal of human DNA sequences at the point of sequencing, typically involving the use of resource-constrained devices. Existing benchmarks have largely focused on the use of standardized databases and largely ignored the computational requirements of depletion pipelines as well as the impact of human genome diversity. Results We benchmarked host removal pipelines on simulated and artificial real Illumina and Nanopore metagenomic samples. We found that construction of a custom kraken database containing diverse human genomes results in the best balance of accuracy and computational resource usage. In addition, we benchmarked pipelines using kraken and minimap2 for taxonomic classification of Mycobacterium reads using standard and custom databases. With a database representative of the Mycobacterium genus, both tools obtained improved specificity and sensitivity, compared to the standard databases for classification of Mycobacterium tuberculosis. Computational efficiency of these custom databases was superior to most standard approaches, allowing them to be executed on a laptop device. Conclusions Customized pangenome databases provide the best balance of accuracy and computational efficiency when compared to standard databases for the task of human read removal and M. tuberculosis read classification from metagenomic samples. Such databases allow for execution on a laptop, without sacrificing accuracy, an especially important consideration in low-resource settings. We make all customized databases and pipelines freely available.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"244 1","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data\",\"authors\":\"Michael B Hall, Lachlan J M Coin\",\"doi\":\"10.1093/gigascience/giae010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background Culture-free real-time sequencing of clinical metagenomic samples promises both rapid pathogen detection and antimicrobial resistance profiling. However, this approach introduces the risk of patient DNA leakage. To mitigate this risk, we need near-comprehensive removal of human DNA sequences at the point of sequencing, typically involving the use of resource-constrained devices. Existing benchmarks have largely focused on the use of standardized databases and largely ignored the computational requirements of depletion pipelines as well as the impact of human genome diversity. Results We benchmarked host removal pipelines on simulated and artificial real Illumina and Nanopore metagenomic samples. We found that construction of a custom kraken database containing diverse human genomes results in the best balance of accuracy and computational resource usage. In addition, we benchmarked pipelines using kraken and minimap2 for taxonomic classification of Mycobacterium reads using standard and custom databases. With a database representative of the Mycobacterium genus, both tools obtained improved specificity and sensitivity, compared to the standard databases for classification of Mycobacterium tuberculosis. Computational efficiency of these custom databases was superior to most standard approaches, allowing them to be executed on a laptop device. Conclusions Customized pangenome databases provide the best balance of accuracy and computational efficiency when compared to standard databases for the task of human read removal and M. tuberculosis read classification from metagenomic samples. Such databases allow for execution on a laptop, without sacrificing accuracy, an especially important consideration in low-resource settings. We make all customized databases and pipelines freely available.\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":\"244 1\",\"pages\":\"\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2024-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giae010\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giae010","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

背景 临床元基因组样本的无培养基实时测序可实现病原体的快速检测和抗菌药耐药性分析。然而,这种方法会带来病人 DNA 泄漏的风险。为了降低这种风险,我们需要在测序时近乎全面地清除人类 DNA 序列,通常需要使用资源有限的设备。现有的基准主要集中在标准化数据库的使用上,在很大程度上忽略了删除管道的计算要求以及人类基因组多样性的影响。结果 我们在模拟和人工真实 Illumina 和 Nanopore 元基因组样本上对宿主去除管道进行了基准测试。我们发现,构建一个包含不同人类基因组的定制 kraken 数据库,能在准确性和计算资源使用之间取得最佳平衡。此外,我们还利用标准数据库和定制数据库,对使用 kraken 和 minimap2 对分枝杆菌读数进行分类的管道进行了基准测试。与结核分枝杆菌分类的标准数据库相比,使用具有代表性的分枝杆菌属数据库,这两种工具都提高了特异性和灵敏度。这些定制数据库的计算效率优于大多数标准方法,可以在笔记本电脑上执行。结论 与标准数据库相比,定制的泛基因组数据库在从元基因组样本中去除人类读数和进行结核分枝杆菌读数分类时,能在准确性和计算效率之间取得最佳平衡。这样的数据库可以在笔记本电脑上执行,而不会牺牲准确性,这在资源匮乏的环境中是一个特别重要的考虑因素。我们免费提供所有定制的数据库和管道。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data
Background Culture-free real-time sequencing of clinical metagenomic samples promises both rapid pathogen detection and antimicrobial resistance profiling. However, this approach introduces the risk of patient DNA leakage. To mitigate this risk, we need near-comprehensive removal of human DNA sequences at the point of sequencing, typically involving the use of resource-constrained devices. Existing benchmarks have largely focused on the use of standardized databases and largely ignored the computational requirements of depletion pipelines as well as the impact of human genome diversity. Results We benchmarked host removal pipelines on simulated and artificial real Illumina and Nanopore metagenomic samples. We found that construction of a custom kraken database containing diverse human genomes results in the best balance of accuracy and computational resource usage. In addition, we benchmarked pipelines using kraken and minimap2 for taxonomic classification of Mycobacterium reads using standard and custom databases. With a database representative of the Mycobacterium genus, both tools obtained improved specificity and sensitivity, compared to the standard databases for classification of Mycobacterium tuberculosis. Computational efficiency of these custom databases was superior to most standard approaches, allowing them to be executed on a laptop device. Conclusions Customized pangenome databases provide the best balance of accuracy and computational efficiency when compared to standard databases for the task of human read removal and M. tuberculosis read classification from metagenomic samples. Such databases allow for execution on a laptop, without sacrificing accuracy, an especially important consideration in low-resource settings. We make all customized databases and pipelines freely available.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信