Denis Jacob Machado, Fernando Portella de Luna Marques, Larry Jiménez-Ferbans, Taran Grant
{"title":"An empirical test of the relationship between the bootstrap and likelihood ratio support in maximum likelihood phylogenetic analysis","authors":"Denis Jacob Machado, Fernando Portella de Luna Marques, Larry Jiménez-Ferbans, Taran Grant","doi":"10.1111/cla.12496","DOIUrl":null,"url":null,"abstract":"<p>In maximum likelihood (ML), the support for a clade can be calculated directly as the likelihood ratio (LR) or log-likelihood difference (<i>S</i>, LLD) of the best trees with and without the clade of interest. However, bootstrap (BS) clade frequencies are more pervasive in ML phylogenetics and are almost universally interpreted as measuring support. In addition to theoretical arguments against that interpretation, BS has several undesirable attributes for a support measure. For example, it does not vary in proportion to optimality or identify clades that are rejected by the evidence and can be overestimated due to missing data. Nevertheless, if BS is a reliable predictor of <i>S</i>, then it might be an efficient indirect method of measuring support—an attractive possibility, given the speed of many BS implementations. To assess the relationship between <i>S</i> and BS, we analyzed 106 empirical datasets retrieved from TreeBASE. Also, to evaluate the degree to which <i>S</i> and BS are affected by the number of replicates during suboptimal tree searches for <i>S</i> and pseudoreplicates during BS estimation, we randomly selected 5 of the 106 datasets and analyzed them using variable numbers of replicates and pseudoreplicates, respectively. The correlation between <i>S</i> and BS was extremely weak in the datasets we analyzed. Increasing the number of replicates during tree search decreased the estimated values of <i>S</i> for most clades, but the magnitude of change was small. In contrast, although increasing pseudoreplicates affected BS values for only approximately 40% of clades, values both increased and decreased, and they did so at much greater magnitudes. Increasing replicates/pseudoreplicates affected the rank order of clades in each tree for both <i>S</i> and BS. Our findings show decisively that BS is not an efficient indirect method of measuring support and suggest that even quite superficial searches to calculate <i>S</i> provide better estimates of support.</p>","PeriodicalId":50688,"journal":{"name":"Cladistics","volume":"38 3","pages":"392-401"},"PeriodicalIF":3.9000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cladistics","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cla.12496","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
In maximum likelihood (ML), the support for a clade can be calculated directly as the likelihood ratio (LR) or log-likelihood difference (S, LLD) of the best trees with and without the clade of interest. However, bootstrap (BS) clade frequencies are more pervasive in ML phylogenetics and are almost universally interpreted as measuring support. In addition to theoretical arguments against that interpretation, BS has several undesirable attributes for a support measure. For example, it does not vary in proportion to optimality or identify clades that are rejected by the evidence and can be overestimated due to missing data. Nevertheless, if BS is a reliable predictor of S, then it might be an efficient indirect method of measuring support—an attractive possibility, given the speed of many BS implementations. To assess the relationship between S and BS, we analyzed 106 empirical datasets retrieved from TreeBASE. Also, to evaluate the degree to which S and BS are affected by the number of replicates during suboptimal tree searches for S and pseudoreplicates during BS estimation, we randomly selected 5 of the 106 datasets and analyzed them using variable numbers of replicates and pseudoreplicates, respectively. The correlation between S and BS was extremely weak in the datasets we analyzed. Increasing the number of replicates during tree search decreased the estimated values of S for most clades, but the magnitude of change was small. In contrast, although increasing pseudoreplicates affected BS values for only approximately 40% of clades, values both increased and decreased, and they did so at much greater magnitudes. Increasing replicates/pseudoreplicates affected the rank order of clades in each tree for both S and BS. Our findings show decisively that BS is not an efficient indirect method of measuring support and suggest that even quite superficial searches to calculate S provide better estimates of support.
期刊介绍:
Cladistics publishes high quality research papers on systematics, encouraging debate on all aspects of the field, from philosophy, theory and methodology to empirical studies and applications in biogeography, coevolution, conservation biology, ontogeny, genomics and paleontology.
Cladistics is read by scientists working in the research fields of evolution, systematics and integrative biology and enjoys a consistently high position in the ISI® rankings for evolutionary biology.