无论推理模型正确与否，病毒系统地理学的深度学习和似然法都能得出相同的答案。

IF 6.1 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology Pub Date : 2024-05-27 DOI:10.1093/sysbio/syad074

Ammon Thompson, Benjamin J Liebeskind, Erik J Scully, Michael J Landis

{"title":"无论推理模型正确与否，病毒系统地理学的深度学习和似然法都能得出相同的答案。","authors":"Ammon Thompson, Benjamin J Liebeskind, Erik J Scully, Michael J Landis","doi":"10.1093/sysbio/syad074","DOIUrl":null,"url":null,"abstract":"Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"183-206"},"PeriodicalIF":6.1000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11249978/pdf/","citationCount":"0","resultStr":"{\"title\":\"Deep Learning and Likelihood Approaches for Viral Phylogeography Converge on the Same Answers Whether the Inference Model Is Right or Wrong.\",\"authors\":\"Ammon Thompson, Benjamin J Liebeskind, Erik J Scully, Michael J Landis\",\"doi\":\"10.1093/sysbio/syad074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.\",\"PeriodicalId\":22120,\"journal\":{\"name\":\"Systematic Biology\",\"volume\":\" \",\"pages\":\"183-206\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2024-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11249978/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systematic Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/sysbio/syad074\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EVOLUTIONARY BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/sysbio/syad074","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

系统发生树分析已成为流行病学的重要工具。基于似然法的方法将模型拟合到系统发育树上，从而推断病毒传播的系统动力学和历史。然而，这些方法的计算成本往往很高，这限制了系统动力学模型的复杂性和现实性，使其不适合在疫情迅速发展时实时为政策决策提供信息。使用深度学习的无似然方法正在突破这些限制，推动推断的发展。在本文中，我们扩展、比较和对比了最近开发的一种深度学习方法，该方法用于从树木中进行无似然推断。我们利用在五个地点传播的模拟疫情的系统发生训练了多个深度神经网络，发现它们在真实模拟模型下达到了与贝叶斯推断接近的准确度。我们比较了训练有素的神经网络与贝叶斯方法对模型错误规范的鲁棒性。我们发现，两种模型的性能相当，收敛于相似的偏差。我们还采用了一种称为保角量化回归的不确定性量化方法，结果表明，该方法对模型错误规范的敏感性与贝叶斯最高后验密度（HPD）的模式相似，并与 HPD 有很大重叠，但精度较低（更保守）。最后，我们用最近一项关于欧洲 SARS-Cov-2 流行病的研究中的系统地理学数据对神经网络进行了训练和测试，得到了类似的特定地区流行病学参数估计值和欧洲共同祖先的位置。除了与基于似然法的方法一样准确和稳健外，我们训练的神经网络在训练后的平均速度比基于似然法的方法快 3 个数量级。我们的研究结果支持这样一种观点，即神经网络可以通过模拟数据进行训练，以准确模仿生成式系统发育模型似然函数的好坏统计特性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Learning and Likelihood Approaches for Viral Phylogeography Converge on the Same Answers Whether the Inference Model Is Right or Wrong.

Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Systematic Biology 生物-进化生物学

CiteScore

13.00

自引率

7.70%

发文量

审稿时长

6-12 weeks

期刊介绍： Systematic Biology is the bimonthly journal of the Society of Systematic Biologists. Papers for the journal are original contributions to the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things. A Points of View section offers a forum for discussion, while book reviews and announcements of general interest are also featured.