Classification of unsequenced Mycobacterium tuberculosis strains in a high-burden setting using a pairwise logistic regression approach.

Access microbiology Pub Date : 2025-05-12 eCollection Date: 2025-01-01 DOI:10.1099/acmi.0.000964.v3
Isabel Rancu, Benjamin Sobkowiak, Joshua L Warren, Nelly Ciobanu, Alexandru Codreanu, Valeriu Crudu, Caroline Colijn, Ted Cohen, Melanie H Chitwood
{"title":"Classification of unsequenced Mycobacterium tuberculosis strains in a high-burden setting using a pairwise logistic regression approach.","authors":"Isabel Rancu, Benjamin Sobkowiak, Joshua L Warren, Nelly Ciobanu, Alexandru Codreanu, Valeriu Crudu, Caroline Colijn, Ted Cohen, Melanie H Chitwood","doi":"10.1099/acmi.0.000964.v3","DOIUrl":null,"url":null,"abstract":"<p><p>Over the past three decades, molecular epidemiological studies have provided new opportunities to investigate the transmission dynamics of <i>Mycobacterium tuberculosis</i>. In most studies, a sizable fraction of individuals with notified tuberculosis cannot be included, either because they do not have culture-positive disease (and thus do not have specimens available for molecular typing) or because resources for conducting sequencing are limited. A recent study introduced a regression-based approach for inferring the membership of unsequenced tuberculosis cases in transmission clusters based on host demographic and epidemiological data. This method was able to identify the most likely cluster to which an unsequenced strain belonged with an accuracy of 35%, although this was in a low-burden setting where a large fraction of cases occurred among foreign-born migrants. Here, we apply a similar model to <i>M. tuberculosis</i> whole-genome sequencing data from the Republic of Moldova, a setting of relatively high local transmission. Using a maximum cluster span of ~40 single nucleotide polymorphisms (SNPs) and a cluster size cutoff of <i>n</i>≥10, we could best predict the specific cluster to which each clustered case was most likely to be a member with an accuracy of 17.2 %. In sensitivity analyses, we found that a more restrictive (~20 SNPs threshold) or permissive (~80 SNPs) threshold did not improve performance. We found that increasing the minimum cluster size improved prediction accuracy. These findings highlight the challenges of transmission inference in high-burden settings like Moldova.</p>","PeriodicalId":94366,"journal":{"name":"Access microbiology","volume":"7 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12163731/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Access microbiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1099/acmi.0.000964.v3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Over the past three decades, molecular epidemiological studies have provided new opportunities to investigate the transmission dynamics of Mycobacterium tuberculosis. In most studies, a sizable fraction of individuals with notified tuberculosis cannot be included, either because they do not have culture-positive disease (and thus do not have specimens available for molecular typing) or because resources for conducting sequencing are limited. A recent study introduced a regression-based approach for inferring the membership of unsequenced tuberculosis cases in transmission clusters based on host demographic and epidemiological data. This method was able to identify the most likely cluster to which an unsequenced strain belonged with an accuracy of 35%, although this was in a low-burden setting where a large fraction of cases occurred among foreign-born migrants. Here, we apply a similar model to M. tuberculosis whole-genome sequencing data from the Republic of Moldova, a setting of relatively high local transmission. Using a maximum cluster span of ~40 single nucleotide polymorphisms (SNPs) and a cluster size cutoff of n≥10, we could best predict the specific cluster to which each clustered case was most likely to be a member with an accuracy of 17.2 %. In sensitivity analyses, we found that a more restrictive (~20 SNPs threshold) or permissive (~80 SNPs) threshold did not improve performance. We found that increasing the minimum cluster size improved prediction accuracy. These findings highlight the challenges of transmission inference in high-burden settings like Moldova.

使用两两逻辑回归方法对高负担环境中未测序的结核分枝杆菌菌株进行分类。
在过去的三十年中,分子流行病学研究为研究结核分枝杆菌的传播动力学提供了新的机会。在大多数研究中,有相当一部分通报的结核病患者不能被纳入研究,要么是因为他们没有培养阳性疾病(因此没有可用于分子分型的标本),要么是因为进行测序的资源有限。最近的一项研究介绍了一种基于回归的方法,根据宿主人口统计和流行病学数据推断未排序的结核病病例是否属于传播聚集性。该方法能够以35%的准确率确定最可能的未测序菌株所属集群,尽管这是在低负担环境中,其中大部分病例发生在外国出生的移民中。在这里,我们将类似的模型应用于摩尔多瓦共和国的结核分枝杆菌全基因组测序数据,这是一个本地传播相对较高的环境。使用约40个单核苷酸多态性(snp)的最大聚类跨度和n≥10的聚类大小截断值,我们可以最好地预测每个聚类病例最有可能成为成员的特定聚类,准确率为17.2%。在敏感性分析中,我们发现更严格的(~20个snp的阈值)或更宽松的(~80个snp的阈值)并没有提高性能。我们发现,增加最小聚类大小可以提高预测精度。这些发现突出了在摩尔多瓦等高负担环境中传播推断所面临的挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.00
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信