具有不确定性量化的应变跟踪。

Younhun Kim, Colin J Worby, Sawal Acharya, Lucas R van Dijk, Daniel Alfonsetti, Zackary Gromko, Philippe Azimzadeh, Karen Dodson, Georg Gerber, Scott Hultgren, Ashlee M Earl, Bonnie Berger, Travis E Gibson
{"title":"具有不确定性量化的应变跟踪。","authors":"Younhun Kim, Colin J Worby, Sawal Acharya, Lucas R van Dijk, Daniel Alfonsetti, Zackary Gromko, Philippe Azimzadeh, Karen Dodson, Georg Gerber, Scott Hultgren, Ashlee M Earl, Bonnie Berger, Travis E Gibson","doi":"10.1101/2023.01.25.525531","DOIUrl":null,"url":null,"abstract":"<p><p>The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known <i>a priori</i> , targeted sequencing is often used. In many applications, however, untargeted bulk (shotgun) sequencing is more appropriate; for instance, the tracking of infection transmission events and nucleotide variants across multiple genomic loci, or studying the role of multiple genes in a particular phenotype. Given these applications, and the observation that pathogens (e.g. <i>Clostridioides difficile, Escherichia coli, Salmonella enterica</i> ) and other taxa of interest can reside at low relative abundance in the gastrointestinal tract, there is a critical need for algorithms that accurately track low-abundance taxa with strain level resolution. Here we present a sequence quality- and time-aware model, <i>ChronoStrain</i> , that introduces uncertainty quantification to gauge low-abundance species and significantly outperforms the current state-of-the-art on both real and synthetic data. ChronoStrain leverages sequences' quality scores and the samples' temporal information to produce a probability distribution over abundance trajectories for each strain tracked in the model. We demonstrate Chronostrain's improved performance in capturing post-antibiotic <i>Escherichia coli</i> strain blooms among women with recurrent urinary tract infections (UTIs) from the UTI Microbiome (UMB) Project. Other strain tracking models on the same data either show inconsistent temporal colonization or can only track consistently using very coarse groupings. In contrast, our probabilistic outputs can reveal the relationship between low-confidence strains present in the sample that cannot be reliably assigned a single reference label (either due to poor coverage or novelty) while simultaneously calling high-confidence strains that can be unambiguously assigned a label. We also analyze samples from the Early Life Microbiota Colonisation (ELMC) Study demonstrating the algorithm's ability to correctly identify <i>Enterococcus faecalis</i> strains using paired sample isolates as validation.</p>","PeriodicalId":72407,"journal":{"name":"bioRxiv : the preprint server for biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900846/pdf/","citationCount":"0","resultStr":"{\"title\":\"Strain tracking with uncertainty quantification.\",\"authors\":\"Younhun Kim, Colin J Worby, Sawal Acharya, Lucas R van Dijk, Daniel Alfonsetti, Zackary Gromko, Philippe Azimzadeh, Karen Dodson, Georg Gerber, Scott Hultgren, Ashlee M Earl, Bonnie Berger, Travis E Gibson\",\"doi\":\"10.1101/2023.01.25.525531\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known <i>a priori</i> , targeted sequencing is often used. In many applications, however, untargeted bulk (shotgun) sequencing is more appropriate; for instance, the tracking of infection transmission events and nucleotide variants across multiple genomic loci, or studying the role of multiple genes in a particular phenotype. Given these applications, and the observation that pathogens (e.g. <i>Clostridioides difficile, Escherichia coli, Salmonella enterica</i> ) and other taxa of interest can reside at low relative abundance in the gastrointestinal tract, there is a critical need for algorithms that accurately track low-abundance taxa with strain level resolution. Here we present a sequence quality- and time-aware model, <i>ChronoStrain</i> , that introduces uncertainty quantification to gauge low-abundance species and significantly outperforms the current state-of-the-art on both real and synthetic data. ChronoStrain leverages sequences' quality scores and the samples' temporal information to produce a probability distribution over abundance trajectories for each strain tracked in the model. We demonstrate Chronostrain's improved performance in capturing post-antibiotic <i>Escherichia coli</i> strain blooms among women with recurrent urinary tract infections (UTIs) from the UTI Microbiome (UMB) Project. Other strain tracking models on the same data either show inconsistent temporal colonization or can only track consistently using very coarse groupings. In contrast, our probabilistic outputs can reveal the relationship between low-confidence strains present in the sample that cannot be reliably assigned a single reference label (either due to poor coverage or novelty) while simultaneously calling high-confidence strains that can be unambiguously assigned a label. We also analyze samples from the Early Life Microbiota Colonisation (ELMC) Study demonstrating the algorithm's ability to correctly identify <i>Enterococcus faecalis</i> strains using paired sample isolates as validation.</p>\",\"PeriodicalId\":72407,\"journal\":{\"name\":\"bioRxiv : the preprint server for biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900846/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv : the preprint server for biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2023.01.25.525531\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.01.25.525531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着时间的推移,检测和量化微生物群的能力具有大量的临床、基础科学和公共卫生应用。追踪微生物群的主要手段之一是通过测序技术。当感兴趣的微生物被很好地表征或先验地已知时,通常使用靶向测序。然而,在许多应用中,无靶向批量(霰弹枪)测序更合适;例如,跨多个基因组基因座追踪感染传播事件和核苷酸变体,或研究多个基因在特定表型中的作用。鉴于这些应用,以及观察到病原体(如艰难梭菌、大肠杆菌、肠炎沙门氏菌)和其他感兴趣的分类群可以以较低的相对丰度存在于胃肠道中,因此迫切需要能够以菌株水平分辨率准确跟踪低丰度分类群的算法。在这里,我们提出了一个序列质量和时间感知模型ChronoStrain,该模型引入了不确定性量化来衡量低丰度物种,并在真实数据和合成数据上显著优于当前最先进的技术。ChronoStrain利用序列的质量分数和样本的时间信息,为模型中跟踪的每个菌株产生丰度轨迹上的概率分布。我们证明了Chronstraine在UTI微生物组(UMB)项目中捕获复发性尿路感染(UTI)女性抗生素后大肠杆菌菌株群方面的改进性能。在相同数据上的其他应变跟踪模型要么显示出不一致的时间定殖,要么只能使用非常粗略的分组进行一致的跟踪。相反,我们的概率输出可以揭示样本中存在的低置信度菌株之间的关系,这些菌株不能可靠地分配单个参考标签(由于覆盖率低或新颖性),同时调用可以明确分配标签的高置信度菌株。我们还包括并分析了UMB项目中新测序的培养样本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Strain tracking with uncertainty quantification.

Strain tracking with uncertainty quantification.

Strain tracking with uncertainty quantification.

Strain tracking with uncertainty quantification.

The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known a priori , targeted sequencing is often used. In many applications, however, untargeted bulk (shotgun) sequencing is more appropriate; for instance, the tracking of infection transmission events and nucleotide variants across multiple genomic loci, or studying the role of multiple genes in a particular phenotype. Given these applications, and the observation that pathogens (e.g. Clostridioides difficile, Escherichia coli, Salmonella enterica ) and other taxa of interest can reside at low relative abundance in the gastrointestinal tract, there is a critical need for algorithms that accurately track low-abundance taxa with strain level resolution. Here we present a sequence quality- and time-aware model, ChronoStrain , that introduces uncertainty quantification to gauge low-abundance species and significantly outperforms the current state-of-the-art on both real and synthetic data. ChronoStrain leverages sequences' quality scores and the samples' temporal information to produce a probability distribution over abundance trajectories for each strain tracked in the model. We demonstrate Chronostrain's improved performance in capturing post-antibiotic Escherichia coli strain blooms among women with recurrent urinary tract infections (UTIs) from the UTI Microbiome (UMB) Project. Other strain tracking models on the same data either show inconsistent temporal colonization or can only track consistently using very coarse groupings. In contrast, our probabilistic outputs can reveal the relationship between low-confidence strains present in the sample that cannot be reliably assigned a single reference label (either due to poor coverage or novelty) while simultaneously calling high-confidence strains that can be unambiguously assigned a label. We also analyze samples from the Early Life Microbiota Colonisation (ELMC) Study demonstrating the algorithm's ability to correctly identify Enterococcus faecalis strains using paired sample isolates as validation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信