一种新的检测混合结核分枝杆菌感染和重建组成菌株的方法提供了对传播的见解。

IF 10.4 1区 生物学 Q1 GENETICS & HEREDITY
Benjamin Sobkowiak, Patrick Cudahy, Melanie H Chitwood, Taane G Clark, Caroline Colijn, Louis Grandjean, Katharine S Walter, Valeriu Crudu, Ted Cohen
{"title":"一种新的检测混合结核分枝杆菌感染和重建组成菌株的方法提供了对传播的见解。","authors":"Benjamin Sobkowiak, Patrick Cudahy, Melanie H Chitwood, Taane G Clark, Caroline Colijn, Louis Grandjean, Katharine S Walter, Valeriu Crudu, Ted Cohen","doi":"10.1186/s13073-025-01430-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Mixed infection with multiple strains of the same pathogen in a single host can present clinical and analytical challenges. Whole genome sequence (WGS) data can identify signals of multiple strains in samples, though the precision of previous methods can be improved. Here, we present MixInfect2, a new tool to accurately detect mixed samples from Mycobacterium tuberculosis short-read WGS data. We then evaluate three approaches for reconstructing the underlying mixed constituent strain sequences. This allows these samples to be included in downstream analysis to gain insights into the epidemiology and transmission of mixed infections.</p><p><strong>Methods: </strong>We employed a Gaussian mixture model to cluster allele frequencies at mixed sites (hSNPs) in each sample to identify signals of multiple strains. Building upon our previous tool, MixInfect, we increased the accuracy of classifying in vitro mixed samples through multiple improvements to the bioinformatic pipeline. Major and minor proportion constituent strains were reconstructed using three approaches and assessed by comparing the estimated sequence to the known constituent strain sequence. Lastly, mixed infections in a real-world Mycobacterium tuberculosis population from Moldova were detected with MixInfect2 and clusters of recent transmission that included major and minor constituent strains were built.</p><p><strong>Results: </strong>All 36/36 in vitro mixed and 12/12 non-mixed samples were correctly classified with MixInfect2, and major strain proportions were estimated with high accuracy (within 3% of the true strain proportion), outperforming previous tools. Reconstructed major strain sequences closely matched the true constituent sequence by taking the allele at the highest frequency at hSNPs, while the best-performing approach to reconstruct the minor proportion strain sequence was identifying the closest non-mixed isolate in the same population, though no approach was effective when the minor strain proportion was at 5%. Finally, fewer mixed infections were identified in Moldova than previous estimates (6.6% vs 17.4%) and we found multiple instances where the constituent strains of mixed samples were present in transmission clusters.</p><p><strong>Conclusions: </strong>MixInfect2 accurately detects samples with evidence of mixed infection from short-read WGS data and provides an excellent estimate of the mixture proportions. While there are limitations in reconstructing the constituent strain sequences of mixed samples, we present recommendations for the best approach to include these isolates in further analyses.</p>","PeriodicalId":12645,"journal":{"name":"Genome Medicine","volume":"17 1","pages":"8"},"PeriodicalIF":10.4000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771024/pdf/","citationCount":"0","resultStr":"{\"title\":\"A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission.\",\"authors\":\"Benjamin Sobkowiak, Patrick Cudahy, Melanie H Chitwood, Taane G Clark, Caroline Colijn, Louis Grandjean, Katharine S Walter, Valeriu Crudu, Ted Cohen\",\"doi\":\"10.1186/s13073-025-01430-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Mixed infection with multiple strains of the same pathogen in a single host can present clinical and analytical challenges. Whole genome sequence (WGS) data can identify signals of multiple strains in samples, though the precision of previous methods can be improved. Here, we present MixInfect2, a new tool to accurately detect mixed samples from Mycobacterium tuberculosis short-read WGS data. We then evaluate three approaches for reconstructing the underlying mixed constituent strain sequences. This allows these samples to be included in downstream analysis to gain insights into the epidemiology and transmission of mixed infections.</p><p><strong>Methods: </strong>We employed a Gaussian mixture model to cluster allele frequencies at mixed sites (hSNPs) in each sample to identify signals of multiple strains. Building upon our previous tool, MixInfect, we increased the accuracy of classifying in vitro mixed samples through multiple improvements to the bioinformatic pipeline. Major and minor proportion constituent strains were reconstructed using three approaches and assessed by comparing the estimated sequence to the known constituent strain sequence. Lastly, mixed infections in a real-world Mycobacterium tuberculosis population from Moldova were detected with MixInfect2 and clusters of recent transmission that included major and minor constituent strains were built.</p><p><strong>Results: </strong>All 36/36 in vitro mixed and 12/12 non-mixed samples were correctly classified with MixInfect2, and major strain proportions were estimated with high accuracy (within 3% of the true strain proportion), outperforming previous tools. Reconstructed major strain sequences closely matched the true constituent sequence by taking the allele at the highest frequency at hSNPs, while the best-performing approach to reconstruct the minor proportion strain sequence was identifying the closest non-mixed isolate in the same population, though no approach was effective when the minor strain proportion was at 5%. Finally, fewer mixed infections were identified in Moldova than previous estimates (6.6% vs 17.4%) and we found multiple instances where the constituent strains of mixed samples were present in transmission clusters.</p><p><strong>Conclusions: </strong>MixInfect2 accurately detects samples with evidence of mixed infection from short-read WGS data and provides an excellent estimate of the mixture proportions. While there are limitations in reconstructing the constituent strain sequences of mixed samples, we present recommendations for the best approach to include these isolates in further analyses.</p>\",\"PeriodicalId\":12645,\"journal\":{\"name\":\"Genome Medicine\",\"volume\":\"17 1\",\"pages\":\"8\"},\"PeriodicalIF\":10.4000,\"publicationDate\":\"2025-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771024/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Medicine\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13073-025-01430-y\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Medicine","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13073-025-01430-y","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

背景:同一病原体的多种菌株在单一宿主中的混合感染可能会带来临床和分析方面的挑战。全基因组序列(WGS)数据可以识别样品中多个菌株的信号,但现有方法的精度有待提高。在这里,我们提出MixInfect2,一个新的工具,以准确地检测结核分枝杆菌短读WGS数据的混合样本。然后,我们评估了三种重建潜在混合成分应变序列的方法。这允许将这些样本纳入下游分析,以深入了解混合感染的流行病学和传播。方法:采用高斯混合模型对各样本混合位点的等位基因频率(hsnp)进行聚类,识别多菌株的信号。在我们之前的工具mix感染的基础上,我们通过对生物信息学管道的多次改进提高了体外混合样品分类的准确性。用三种方法重建了主要和次要比例的组成菌株,并将估计的序列与已知的组成菌株序列进行比较。最后,使用MixInfect2检测了摩尔多瓦真实结核分枝杆菌人群中的混合感染,并建立了包括主要和次要组成菌株的近期传播聚集性。结果:MixInfect2对36/36个体外混合样本和12/12个非混合样本进行了正确分类,估计主要菌株比例的准确性较高(在真实菌株比例的3%以内),优于以往的工具。通过选取hsnp位点频率最高的等位基因重构主菌株序列,与真实组成序列匹配程度最高,而小比例菌株序列重构效果最好的方法是在同一种群中寻找最接近的非混合菌株,但当小比例为5%时,该方法无效。最后,在摩尔多瓦发现的混合感染比以前的估计要少(6.6%对17.4%),我们发现在传播聚集性中存在混合样本组成菌株的多个实例。结论:MixInfect2从短读WGS数据中准确地检测出混合感染的证据,并提供了对混合比例的极好估计。虽然在重建混合样本的组成菌株序列方面存在局限性,但我们提出了将这些分离株纳入进一步分析的最佳方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission.

Background: Mixed infection with multiple strains of the same pathogen in a single host can present clinical and analytical challenges. Whole genome sequence (WGS) data can identify signals of multiple strains in samples, though the precision of previous methods can be improved. Here, we present MixInfect2, a new tool to accurately detect mixed samples from Mycobacterium tuberculosis short-read WGS data. We then evaluate three approaches for reconstructing the underlying mixed constituent strain sequences. This allows these samples to be included in downstream analysis to gain insights into the epidemiology and transmission of mixed infections.

Methods: We employed a Gaussian mixture model to cluster allele frequencies at mixed sites (hSNPs) in each sample to identify signals of multiple strains. Building upon our previous tool, MixInfect, we increased the accuracy of classifying in vitro mixed samples through multiple improvements to the bioinformatic pipeline. Major and minor proportion constituent strains were reconstructed using three approaches and assessed by comparing the estimated sequence to the known constituent strain sequence. Lastly, mixed infections in a real-world Mycobacterium tuberculosis population from Moldova were detected with MixInfect2 and clusters of recent transmission that included major and minor constituent strains were built.

Results: All 36/36 in vitro mixed and 12/12 non-mixed samples were correctly classified with MixInfect2, and major strain proportions were estimated with high accuracy (within 3% of the true strain proportion), outperforming previous tools. Reconstructed major strain sequences closely matched the true constituent sequence by taking the allele at the highest frequency at hSNPs, while the best-performing approach to reconstruct the minor proportion strain sequence was identifying the closest non-mixed isolate in the same population, though no approach was effective when the minor strain proportion was at 5%. Finally, fewer mixed infections were identified in Moldova than previous estimates (6.6% vs 17.4%) and we found multiple instances where the constituent strains of mixed samples were present in transmission clusters.

Conclusions: MixInfect2 accurately detects samples with evidence of mixed infection from short-read WGS data and provides an excellent estimate of the mixture proportions. While there are limitations in reconstructing the constituent strain sequences of mixed samples, we present recommendations for the best approach to include these isolates in further analyses.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genome Medicine
Genome Medicine GENETICS & HEREDITY-
CiteScore
20.80
自引率
0.80%
发文量
128
审稿时长
6-12 weeks
期刊介绍: Genome Medicine is an open access journal that publishes outstanding research applying genetics, genomics, and multi-omics to understand, diagnose, and treat disease. Bridging basic science and clinical research, it covers areas such as cancer genomics, immuno-oncology, immunogenomics, infectious disease, microbiome, neurogenomics, systems medicine, clinical genomics, gene therapies, precision medicine, and clinical trials. The journal publishes original research, methods, software, and reviews to serve authors and promote broad interest and importance in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信