使用CHM13-T2T基因组通过减少宿主DNA污染来改善宏基因组分析。

IF 4.6 2区 生物学 Q1 MICROBIOLOGY
mSystems Pub Date : 2025-09-10 DOI:10.1128/msystems.00840-25
Donglai Liu, Jinjun Hu, Dan Zhang, Shanshan Ren, Lanqing Zhao, Hongyan Gao, Songnian Hu, Sihong Xu, Guanxiang Liang
{"title":"使用CHM13-T2T基因组通过减少宿主DNA污染来改善宏基因组分析。","authors":"Donglai Liu, Jinjun Hu, Dan Zhang, Shanshan Ren, Lanqing Zhao, Hongyan Gao, Songnian Hu, Sihong Xu, Guanxiang Liang","doi":"10.1128/msystems.00840-25","DOIUrl":null,"url":null,"abstract":"<p><p>Human-associated metagenomic data often contain human nucleic acid information, which can affect the accuracy of microbial classification or raise ethical concerns. These reads are typically removed through alignment to the human genome using various metagenomic mapping tools or human reference genomes, followed by filtration before metagenomic analysis. In this study, we conducted a comprehensive analysis to identify the optimal combination of alignment software and human reference genomes using benchmarking data. Our findings show that the combination of bwa-mem and the telomere-to-telomere human genome (CHM13-T2T) is the most effective in removing human reads in simulated data. We also analyzed CHM13-T2T-derived sequences in RefSeq to understand how CHM13-T2T reduces false positive results. Finally, we assessed clinical samples and found that CHM13-T2T effectively reduces host-derived contamination, particularly in low microbial biomass samples. This study provides a thorough overview of the application of CHM13-T2T in metagenomic analysis and highlights its significance in improving microbial classification accuracy.IMPORTANCEHuman gene sequences account for a large proportion of metagenomic sequences. To gain accurate and precise microbiome information, effective host-derived contamination removal methods are required. Both the alignment algorithm and the reference genome could influence the effectiveness of this process. The telomere-to-telomere human genome (CHM13-T2T) is a state-of-the-art human genome with 216 Mbp of additional new sequences compared with the commonly used GRCh38.p14. Our findings show the optimal dehosting effect of CHM13-T2T combined with the bwa-mem software in metagenomic analysis. We also investigate the reasons for the superiority of CHM13-T2T. Our study provides insights into optimal strategies for host sequence removal from metagenomic data. A standard reference is proposed for future metagenomic analysis, which can improve the accuracy of microbial identification.</p>","PeriodicalId":18819,"journal":{"name":"mSystems","volume":" ","pages":"e0084025"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Use of the CHM13-T2T genome improves metagenomic analysis by minimizing host DNA contamination.\",\"authors\":\"Donglai Liu, Jinjun Hu, Dan Zhang, Shanshan Ren, Lanqing Zhao, Hongyan Gao, Songnian Hu, Sihong Xu, Guanxiang Liang\",\"doi\":\"10.1128/msystems.00840-25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Human-associated metagenomic data often contain human nucleic acid information, which can affect the accuracy of microbial classification or raise ethical concerns. These reads are typically removed through alignment to the human genome using various metagenomic mapping tools or human reference genomes, followed by filtration before metagenomic analysis. In this study, we conducted a comprehensive analysis to identify the optimal combination of alignment software and human reference genomes using benchmarking data. Our findings show that the combination of bwa-mem and the telomere-to-telomere human genome (CHM13-T2T) is the most effective in removing human reads in simulated data. We also analyzed CHM13-T2T-derived sequences in RefSeq to understand how CHM13-T2T reduces false positive results. Finally, we assessed clinical samples and found that CHM13-T2T effectively reduces host-derived contamination, particularly in low microbial biomass samples. This study provides a thorough overview of the application of CHM13-T2T in metagenomic analysis and highlights its significance in improving microbial classification accuracy.IMPORTANCEHuman gene sequences account for a large proportion of metagenomic sequences. To gain accurate and precise microbiome information, effective host-derived contamination removal methods are required. Both the alignment algorithm and the reference genome could influence the effectiveness of this process. The telomere-to-telomere human genome (CHM13-T2T) is a state-of-the-art human genome with 216 Mbp of additional new sequences compared with the commonly used GRCh38.p14. Our findings show the optimal dehosting effect of CHM13-T2T combined with the bwa-mem software in metagenomic analysis. We also investigate the reasons for the superiority of CHM13-T2T. Our study provides insights into optimal strategies for host sequence removal from metagenomic data. A standard reference is proposed for future metagenomic analysis, which can improve the accuracy of microbial identification.</p>\",\"PeriodicalId\":18819,\"journal\":{\"name\":\"mSystems\",\"volume\":\" \",\"pages\":\"e0084025\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"mSystems\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1128/msystems.00840-25\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSystems","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msystems.00840-25","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

与人类相关的宏基因组数据通常包含人类核酸信息,这可能会影响微生物分类的准确性或引起伦理问题。这些读数通常通过使用各种宏基因组制图工具或人类参考基因组与人类基因组比对来去除,然后在宏基因组分析之前进行过滤。在这项研究中,我们进行了全面的分析,以确定比对软件和人类参考基因组使用基准数据的最佳组合。我们的研究结果表明,bwa-mem和端粒到端粒人类基因组(CHM13-T2T)的组合在模拟数据中去除人类reads最有效。我们还在RefSeq中分析了CHM13-T2T衍生序列,以了解CHM13-T2T如何减少假阳性结果。最后,我们评估了临床样本,发现CHM13-T2T有效地减少了宿主来源的污染,特别是在微生物生物量低的样本中。本研究全面综述了CHM13-T2T在宏基因组分析中的应用,并强调了其在提高微生物分类准确性方面的重要意义。人类基因序列在宏基因组序列中占很大比例。为了获得准确和精确的微生物组信息,需要有效的宿主源污染去除方法。比对算法和参考基因组都会影响这一过程的有效性。端粒到端粒人类基因组(CHM13-T2T)是最先进的人类基因组,与常用的GRCh38.p14相比,增加了216 Mbp的新序列。我们的研究结果表明,CHM13-T2T联合bwa-mem软件在宏基因组分析中的去宿主效果最佳。我们还探讨了CHM13-T2T优越的原因。我们的研究为从宏基因组数据中去除宿主序列的最佳策略提供了见解。为今后宏基因组分析提供参考标准,提高微生物鉴定的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Use of the CHM13-T2T genome improves metagenomic analysis by minimizing host DNA contamination.

Human-associated metagenomic data often contain human nucleic acid information, which can affect the accuracy of microbial classification or raise ethical concerns. These reads are typically removed through alignment to the human genome using various metagenomic mapping tools or human reference genomes, followed by filtration before metagenomic analysis. In this study, we conducted a comprehensive analysis to identify the optimal combination of alignment software and human reference genomes using benchmarking data. Our findings show that the combination of bwa-mem and the telomere-to-telomere human genome (CHM13-T2T) is the most effective in removing human reads in simulated data. We also analyzed CHM13-T2T-derived sequences in RefSeq to understand how CHM13-T2T reduces false positive results. Finally, we assessed clinical samples and found that CHM13-T2T effectively reduces host-derived contamination, particularly in low microbial biomass samples. This study provides a thorough overview of the application of CHM13-T2T in metagenomic analysis and highlights its significance in improving microbial classification accuracy.IMPORTANCEHuman gene sequences account for a large proportion of metagenomic sequences. To gain accurate and precise microbiome information, effective host-derived contamination removal methods are required. Both the alignment algorithm and the reference genome could influence the effectiveness of this process. The telomere-to-telomere human genome (CHM13-T2T) is a state-of-the-art human genome with 216 Mbp of additional new sequences compared with the commonly used GRCh38.p14. Our findings show the optimal dehosting effect of CHM13-T2T combined with the bwa-mem software in metagenomic analysis. We also investigate the reasons for the superiority of CHM13-T2T. Our study provides insights into optimal strategies for host sequence removal from metagenomic data. A standard reference is proposed for future metagenomic analysis, which can improve the accuracy of microbial identification.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
mSystems
mSystems Biochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
10.50
自引率
3.10%
发文量
308
审稿时长
13 weeks
期刊介绍: mSystems™ will publish preeminent work that stems from applying technologies for high-throughput analyses to achieve insights into the metabolic and regulatory systems at the scale of both the single cell and microbial communities. The scope of mSystems™ encompasses all important biological and biochemical findings drawn from analyses of large data sets, as well as new computational approaches for deriving these insights. mSystems™ will welcome submissions from researchers who focus on the microbiome, genomics, metagenomics, transcriptomics, metabolomics, proteomics, glycomics, bioinformatics, and computational microbiology. mSystems™ will provide streamlined decisions, while carrying on ASM''s tradition of rigorous peer review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信