{"title":"使用CHM13-T2T基因组通过减少宿主DNA污染来改善宏基因组分析。","authors":"Donglai Liu, Jinjun Hu, Dan Zhang, Shanshan Ren, Lanqing Zhao, Hongyan Gao, Songnian Hu, Sihong Xu, Guanxiang Liang","doi":"10.1128/msystems.00840-25","DOIUrl":null,"url":null,"abstract":"<p><p>Human-associated metagenomic data often contain human nucleic acid information, which can affect the accuracy of microbial classification or raise ethical concerns. These reads are typically removed through alignment to the human genome using various metagenomic mapping tools or human reference genomes, followed by filtration before metagenomic analysis. In this study, we conducted a comprehensive analysis to identify the optimal combination of alignment software and human reference genomes using benchmarking data. Our findings show that the combination of bwa-mem and the telomere-to-telomere human genome (CHM13-T2T) is the most effective in removing human reads in simulated data. We also analyzed CHM13-T2T-derived sequences in RefSeq to understand how CHM13-T2T reduces false positive results. Finally, we assessed clinical samples and found that CHM13-T2T effectively reduces host-derived contamination, particularly in low microbial biomass samples. This study provides a thorough overview of the application of CHM13-T2T in metagenomic analysis and highlights its significance in improving microbial classification accuracy.IMPORTANCEHuman gene sequences account for a large proportion of metagenomic sequences. To gain accurate and precise microbiome information, effective host-derived contamination removal methods are required. Both the alignment algorithm and the reference genome could influence the effectiveness of this process. The telomere-to-telomere human genome (CHM13-T2T) is a state-of-the-art human genome with 216 Mbp of additional new sequences compared with the commonly used GRCh38.p14. Our findings show the optimal dehosting effect of CHM13-T2T combined with the bwa-mem software in metagenomic analysis. We also investigate the reasons for the superiority of CHM13-T2T. Our study provides insights into optimal strategies for host sequence removal from metagenomic data. A standard reference is proposed for future metagenomic analysis, which can improve the accuracy of microbial identification.</p>","PeriodicalId":18819,"journal":{"name":"mSystems","volume":" ","pages":"e0084025"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Use of the CHM13-T2T genome improves metagenomic analysis by minimizing host DNA contamination.\",\"authors\":\"Donglai Liu, Jinjun Hu, Dan Zhang, Shanshan Ren, Lanqing Zhao, Hongyan Gao, Songnian Hu, Sihong Xu, Guanxiang Liang\",\"doi\":\"10.1128/msystems.00840-25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Human-associated metagenomic data often contain human nucleic acid information, which can affect the accuracy of microbial classification or raise ethical concerns. These reads are typically removed through alignment to the human genome using various metagenomic mapping tools or human reference genomes, followed by filtration before metagenomic analysis. In this study, we conducted a comprehensive analysis to identify the optimal combination of alignment software and human reference genomes using benchmarking data. Our findings show that the combination of bwa-mem and the telomere-to-telomere human genome (CHM13-T2T) is the most effective in removing human reads in simulated data. We also analyzed CHM13-T2T-derived sequences in RefSeq to understand how CHM13-T2T reduces false positive results. Finally, we assessed clinical samples and found that CHM13-T2T effectively reduces host-derived contamination, particularly in low microbial biomass samples. This study provides a thorough overview of the application of CHM13-T2T in metagenomic analysis and highlights its significance in improving microbial classification accuracy.IMPORTANCEHuman gene sequences account for a large proportion of metagenomic sequences. To gain accurate and precise microbiome information, effective host-derived contamination removal methods are required. Both the alignment algorithm and the reference genome could influence the effectiveness of this process. The telomere-to-telomere human genome (CHM13-T2T) is a state-of-the-art human genome with 216 Mbp of additional new sequences compared with the commonly used GRCh38.p14. Our findings show the optimal dehosting effect of CHM13-T2T combined with the bwa-mem software in metagenomic analysis. We also investigate the reasons for the superiority of CHM13-T2T. Our study provides insights into optimal strategies for host sequence removal from metagenomic data. A standard reference is proposed for future metagenomic analysis, which can improve the accuracy of microbial identification.</p>\",\"PeriodicalId\":18819,\"journal\":{\"name\":\"mSystems\",\"volume\":\" \",\"pages\":\"e0084025\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"mSystems\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1128/msystems.00840-25\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSystems","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msystems.00840-25","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
Use of the CHM13-T2T genome improves metagenomic analysis by minimizing host DNA contamination.
Human-associated metagenomic data often contain human nucleic acid information, which can affect the accuracy of microbial classification or raise ethical concerns. These reads are typically removed through alignment to the human genome using various metagenomic mapping tools or human reference genomes, followed by filtration before metagenomic analysis. In this study, we conducted a comprehensive analysis to identify the optimal combination of alignment software and human reference genomes using benchmarking data. Our findings show that the combination of bwa-mem and the telomere-to-telomere human genome (CHM13-T2T) is the most effective in removing human reads in simulated data. We also analyzed CHM13-T2T-derived sequences in RefSeq to understand how CHM13-T2T reduces false positive results. Finally, we assessed clinical samples and found that CHM13-T2T effectively reduces host-derived contamination, particularly in low microbial biomass samples. This study provides a thorough overview of the application of CHM13-T2T in metagenomic analysis and highlights its significance in improving microbial classification accuracy.IMPORTANCEHuman gene sequences account for a large proportion of metagenomic sequences. To gain accurate and precise microbiome information, effective host-derived contamination removal methods are required. Both the alignment algorithm and the reference genome could influence the effectiveness of this process. The telomere-to-telomere human genome (CHM13-T2T) is a state-of-the-art human genome with 216 Mbp of additional new sequences compared with the commonly used GRCh38.p14. Our findings show the optimal dehosting effect of CHM13-T2T combined with the bwa-mem software in metagenomic analysis. We also investigate the reasons for the superiority of CHM13-T2T. Our study provides insights into optimal strategies for host sequence removal from metagenomic data. A standard reference is proposed for future metagenomic analysis, which can improve the accuracy of microbial identification.
mSystemsBiochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
10.50
自引率
3.10%
发文量
308
审稿时长
13 weeks
期刊介绍:
mSystems™ will publish preeminent work that stems from applying technologies for high-throughput analyses to achieve insights into the metabolic and regulatory systems at the scale of both the single cell and microbial communities. The scope of mSystems™ encompasses all important biological and biochemical findings drawn from analyses of large data sets, as well as new computational approaches for deriving these insights. mSystems™ will welcome submissions from researchers who focus on the microbiome, genomics, metagenomics, transcriptomics, metabolomics, proteomics, glycomics, bioinformatics, and computational microbiology. mSystems™ will provide streamlined decisions, while carrying on ASM''s tradition of rigorous peer review.