{"title":"探索古生物系统发育矩阵谱系中同源性加权的影响。","authors":"Martín D. Ezcurra","doi":"10.1111/cla.12581","DOIUrl":null,"url":null,"abstract":"<p>Although simulations have shown that implied weighting (IW) outperforms equal weighting (EW) in phylogenetic parsimony analyses, weighting against homoplasy lacks extensive usage in palaeontology. Iterative modifications of several phylogenetic matrices in the last decades resulted in extensive genealogies of datasets that allow the evaluation of differences in the stability of results for alternative character weighting methods directly on empirical data. Each generation was compared against the most recent generation in each genealogy because it is assumed that it is the most comprehensive (higher sampling), revised (fewer misscorings) and complete (lower amount of missing data) matrix of the genealogy. The analyses were conducted on six different genealogies under EW and IW and extended implied weighting (EIW) with a range of concavity constant values (<i>k</i>) between 3 and 30. Pairwise comparisons between trees were conducted using Robinson–Foulds distances normalized by the total number of groups, distortion coefficient, subtree pruning and regrafting moves, and the proportional sum of group dissimilarities. The results consistently show that IW and EIW produce results more similar to those of the last dataset than EW in the vast majority of genealogies and for all comparative measures. This is significant because almost all of these matrices were originally analysed only under EW. Implied weighting and EIW do not outperform each other unambiguously. Euclidean distances based on a principal components analysis of the comparative measures show that different ranges of <i>k-</i>values retrieve the most similar results to the last generation in different genealogies. There is a significant positive linear correlation between the optimal <i>k-</i>values and the number of terminals of the last generations. This could be employed to inform about the range of <i>k-</i>values to be used in phylogenetic analyses based on matrix size but with the caveat that this emergent relationship still relies on a low sample size of genealogies.</p>","PeriodicalId":50688,"journal":{"name":"Cladistics","volume":"40 3","pages":"242-281"},"PeriodicalIF":3.9000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cla.12581","citationCount":"0","resultStr":"{\"title\":\"Exploring the effects of weighting against homoplasy in genealogies of palaeontological phylogenetic matrices\",\"authors\":\"Martín D. Ezcurra\",\"doi\":\"10.1111/cla.12581\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Although simulations have shown that implied weighting (IW) outperforms equal weighting (EW) in phylogenetic parsimony analyses, weighting against homoplasy lacks extensive usage in palaeontology. Iterative modifications of several phylogenetic matrices in the last decades resulted in extensive genealogies of datasets that allow the evaluation of differences in the stability of results for alternative character weighting methods directly on empirical data. Each generation was compared against the most recent generation in each genealogy because it is assumed that it is the most comprehensive (higher sampling), revised (fewer misscorings) and complete (lower amount of missing data) matrix of the genealogy. The analyses were conducted on six different genealogies under EW and IW and extended implied weighting (EIW) with a range of concavity constant values (<i>k</i>) between 3 and 30. Pairwise comparisons between trees were conducted using Robinson–Foulds distances normalized by the total number of groups, distortion coefficient, subtree pruning and regrafting moves, and the proportional sum of group dissimilarities. The results consistently show that IW and EIW produce results more similar to those of the last dataset than EW in the vast majority of genealogies and for all comparative measures. This is significant because almost all of these matrices were originally analysed only under EW. Implied weighting and EIW do not outperform each other unambiguously. Euclidean distances based on a principal components analysis of the comparative measures show that different ranges of <i>k-</i>values retrieve the most similar results to the last generation in different genealogies. There is a significant positive linear correlation between the optimal <i>k-</i>values and the number of terminals of the last generations. This could be employed to inform about the range of <i>k-</i>values to be used in phylogenetic analyses based on matrix size but with the caveat that this emergent relationship still relies on a low sample size of genealogies.</p>\",\"PeriodicalId\":50688,\"journal\":{\"name\":\"Cladistics\",\"volume\":\"40 3\",\"pages\":\"242-281\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cla.12581\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cladistics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/cla.12581\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EVOLUTIONARY BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cladistics","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cla.12581","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
尽管模拟结果表明,在系统发育的解析分析中,隐含加权(IW)优于等权法(EW),但在古生物学中,同源加权法并没有得到广泛应用。在过去的几十年中,对多个系统发生矩阵进行了迭代修改,形成了大量的数据集谱系,从而可以直接根据经验数据评估替代特征加权法在结果稳定性方面的差异。在每个系谱中,每一代都与最近一代进行比较,因为假定最近一代是系谱中最全面(取样较多)、最经修订(误判较少)和最完整(缺失数据较少)的矩阵。分析是在 EW、IW 和扩展隐含加权(EIW)条件下对六个不同的系谱进行的,凹常量值(k)的范围在 3 到 30 之间。树与树之间的配对比较采用了罗宾逊-福尔德距离(Robinson-Foulds distances),该距离以组群总数、扭曲系数、子树修剪和重新嫁接移动以及组群异质性比例总和进行归一化。结果一致表明,在绝大多数系谱和所有比较指标中,IW 和 EIW 比 EW 产生的结果更接近于上一个数据集的结果。这一点意义重大,因为几乎所有这些矩阵最初都是在 EW 条件下进行分析的。隐含加权和 EIW 并没有明确地相互超越。基于主成分分析的比较度量的欧氏距离显示,在不同的系谱中,不同的 k 值范围能检索到与上一代最相似的结果。最佳 k 值与上一代的终端数量之间存在明显的正线性相关。这可用于根据矩阵大小确定系统发育分析中使用的 k 值范围,但需要注意的是,这种新兴关系仍然依赖于较低的系谱样本量。
Exploring the effects of weighting against homoplasy in genealogies of palaeontological phylogenetic matrices
Although simulations have shown that implied weighting (IW) outperforms equal weighting (EW) in phylogenetic parsimony analyses, weighting against homoplasy lacks extensive usage in palaeontology. Iterative modifications of several phylogenetic matrices in the last decades resulted in extensive genealogies of datasets that allow the evaluation of differences in the stability of results for alternative character weighting methods directly on empirical data. Each generation was compared against the most recent generation in each genealogy because it is assumed that it is the most comprehensive (higher sampling), revised (fewer misscorings) and complete (lower amount of missing data) matrix of the genealogy. The analyses were conducted on six different genealogies under EW and IW and extended implied weighting (EIW) with a range of concavity constant values (k) between 3 and 30. Pairwise comparisons between trees were conducted using Robinson–Foulds distances normalized by the total number of groups, distortion coefficient, subtree pruning and regrafting moves, and the proportional sum of group dissimilarities. The results consistently show that IW and EIW produce results more similar to those of the last dataset than EW in the vast majority of genealogies and for all comparative measures. This is significant because almost all of these matrices were originally analysed only under EW. Implied weighting and EIW do not outperform each other unambiguously. Euclidean distances based on a principal components analysis of the comparative measures show that different ranges of k-values retrieve the most similar results to the last generation in different genealogies. There is a significant positive linear correlation between the optimal k-values and the number of terminals of the last generations. This could be employed to inform about the range of k-values to be used in phylogenetic analyses based on matrix size but with the caveat that this emergent relationship still relies on a low sample size of genealogies.
期刊介绍:
Cladistics publishes high quality research papers on systematics, encouraging debate on all aspects of the field, from philosophy, theory and methodology to empirical studies and applications in biogeography, coevolution, conservation biology, ontogeny, genomics and paleontology.
Cladistics is read by scientists working in the research fields of evolution, systematics and integrative biology and enjoys a consistently high position in the ISI® rankings for evolutionary biology.