Jacob D. Gardner, Joanna Baker, Chris Venditti, Chris L. Organ
{"title":"在真实和模拟数据中,系统发育预测优于预测方程","authors":"Jacob D. Gardner, Joanna Baker, Chris Venditti, Chris L. Organ","doi":"10.1038/s41467-025-61036-1","DOIUrl":null,"url":null,"abstract":"<p>Inferring unknown trait values is ubiquitous across biological sciences—whether for reconstructing the past, imputing missing values for further analysis, or understanding evolution. Models explicitly incorporating shared ancestry amongst species with both known and unknown values (phylogenetically informed prediction) provide accurate reconstructions. However, 25 years after the introduction of such models, it remains common practice to simply use predictive equations derived from phylogenetic generalised least squares or ordinary least squares regression models to calculate unknown values. Here, we use a comprehensive set of simulations to demonstrate two- to three-fold improvement in the performance of phylogenetically informed predictions compared to both ordinary least squares and phylogenetic generalised least squares predictive equations. We found that phylogenetically informed prediction using the relationship between two weakly correlated (r = 0.25) traits was roughly equivalent to (or even better than) predictive equations for strongly correlated traits (r = 0.75). A critique and comparison of four published predictive analyses showcase real-world examples of phylogenetically informed prediction. We also highlight the importance of prediction intervals, which increase with increasing phylogenetic branch length. Finally, we offer guidelines to making phylogenetically informed predictions across diverse fields such as ecology, epidemiology, evolution, oncology, and palaeontology.</p>","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"5 1","pages":""},"PeriodicalIF":14.7000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Phylogenetically informed predictions outperform predictive equations in real and simulated data\",\"authors\":\"Jacob D. Gardner, Joanna Baker, Chris Venditti, Chris L. Organ\",\"doi\":\"10.1038/s41467-025-61036-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Inferring unknown trait values is ubiquitous across biological sciences—whether for reconstructing the past, imputing missing values for further analysis, or understanding evolution. Models explicitly incorporating shared ancestry amongst species with both known and unknown values (phylogenetically informed prediction) provide accurate reconstructions. However, 25 years after the introduction of such models, it remains common practice to simply use predictive equations derived from phylogenetic generalised least squares or ordinary least squares regression models to calculate unknown values. Here, we use a comprehensive set of simulations to demonstrate two- to three-fold improvement in the performance of phylogenetically informed predictions compared to both ordinary least squares and phylogenetic generalised least squares predictive equations. We found that phylogenetically informed prediction using the relationship between two weakly correlated (r = 0.25) traits was roughly equivalent to (or even better than) predictive equations for strongly correlated traits (r = 0.75). A critique and comparison of four published predictive analyses showcase real-world examples of phylogenetically informed prediction. We also highlight the importance of prediction intervals, which increase with increasing phylogenetic branch length. Finally, we offer guidelines to making phylogenetically informed predictions across diverse fields such as ecology, epidemiology, evolution, oncology, and palaeontology.</p>\",\"PeriodicalId\":19066,\"journal\":{\"name\":\"Nature Communications\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2025-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Communications\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41467-025-61036-1\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-025-61036-1","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Phylogenetically informed predictions outperform predictive equations in real and simulated data
Inferring unknown trait values is ubiquitous across biological sciences—whether for reconstructing the past, imputing missing values for further analysis, or understanding evolution. Models explicitly incorporating shared ancestry amongst species with both known and unknown values (phylogenetically informed prediction) provide accurate reconstructions. However, 25 years after the introduction of such models, it remains common practice to simply use predictive equations derived from phylogenetic generalised least squares or ordinary least squares regression models to calculate unknown values. Here, we use a comprehensive set of simulations to demonstrate two- to three-fold improvement in the performance of phylogenetically informed predictions compared to both ordinary least squares and phylogenetic generalised least squares predictive equations. We found that phylogenetically informed prediction using the relationship between two weakly correlated (r = 0.25) traits was roughly equivalent to (or even better than) predictive equations for strongly correlated traits (r = 0.75). A critique and comparison of four published predictive analyses showcase real-world examples of phylogenetically informed prediction. We also highlight the importance of prediction intervals, which increase with increasing phylogenetic branch length. Finally, we offer guidelines to making phylogenetically informed predictions across diverse fields such as ecology, epidemiology, evolution, oncology, and palaeontology.
期刊介绍:
Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.