Kalle Leppälä, Flavio Augusto da Silva Coelho, Michaela Richter, Victor A Albert, Charlotte Lindqvist
{"title":"D 统计量的五叶概括揭示了掺杂的方向性。","authors":"Kalle Leppälä, Flavio Augusto da Silva Coelho, Michaela Richter, Victor A Albert, Charlotte Lindqvist","doi":"10.1093/molbev/msae198","DOIUrl":null,"url":null,"abstract":"<p><p>Over the past 15 years, the D-statistic, a four-taxon test for organismal admixture (hybridization, or introgression) which incorporates single nucleotide polymorphism data with allelic patterns ABBA and BABA, has seen considerable use. This statistic seeks to discern significant deviation from either a given species tree assumption, or from the balanced incomplete lineage sorting that could otherwise defy this species tree. However, while the D-statistic can successfully discriminate admixture from incomplete lineage sorting, it is not a simple matter to determine the directionality of admixture using only four-leaf tree models. As such, methods have been developed that use five leaves to evaluate admixture. Among these, the DFOIL method (\"FOIL\", a mnemonic for \"First-Outer-Inner-Last\"), which tests allelic patterns on the \"symmetric\" tree S=(((1,2),(3,4)),5), succeeds in finding admixture direction for many five-taxon examples. However, DFOIL does not make full use of all symmetry, nor can DFOIL function properly when ancient samples are included because of the reliance on singleton patterns (such as BAAAA and ABAAA). Here, we take inspiration from DFOIL to develop a new and completely general family of five-leaf admixture tests, dubbed Δ-statistics, that can either incorporate or exclude the singleton allelic patterns depending on individual taxon and age sampling choices. We describe two new shapes that are also fully testable, namely the \"asymmetric\" tree A=((((1,2),3),4),5) and the \"quasisymmetric\" tree Q=(((1,2),3),(4,5)), which can considerably supplement the \"symmetric\" S=(((1,2),(3,4)),5) model used by DFOIL. We demonstrate the consistency of Δ-statistics under various simulated scenarios, and provide empirical examples using data from black, brown and polar bears, the latter also including two ancient polar bear samples from previous studies. Recently, DFOIL and one of these ancient samples was used to argue for a dominant polar bear → brown bear introgression direction. However, we find, using both this ancient polar bear and our own, that by far the strongest signal using both DFOIL and Δ-statistics on tree S is actually bidirectional gene flow of indistinguishable direction. Further experiments on trees A and Q instead highlight what were likely two phases of admixture: one with stronger brown bear → polar bear introgression in ancient times, and a more recent phase with predominant polar bear → brown bear directionality.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Five-leaf Generalizations of the D-statistic Reveal the Directionality of Admixture.\",\"authors\":\"Kalle Leppälä, Flavio Augusto da Silva Coelho, Michaela Richter, Victor A Albert, Charlotte Lindqvist\",\"doi\":\"10.1093/molbev/msae198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Over the past 15 years, the D-statistic, a four-taxon test for organismal admixture (hybridization, or introgression) which incorporates single nucleotide polymorphism data with allelic patterns ABBA and BABA, has seen considerable use. This statistic seeks to discern significant deviation from either a given species tree assumption, or from the balanced incomplete lineage sorting that could otherwise defy this species tree. However, while the D-statistic can successfully discriminate admixture from incomplete lineage sorting, it is not a simple matter to determine the directionality of admixture using only four-leaf tree models. As such, methods have been developed that use five leaves to evaluate admixture. Among these, the DFOIL method (\\\"FOIL\\\", a mnemonic for \\\"First-Outer-Inner-Last\\\"), which tests allelic patterns on the \\\"symmetric\\\" tree S=(((1,2),(3,4)),5), succeeds in finding admixture direction for many five-taxon examples. However, DFOIL does not make full use of all symmetry, nor can DFOIL function properly when ancient samples are included because of the reliance on singleton patterns (such as BAAAA and ABAAA). Here, we take inspiration from DFOIL to develop a new and completely general family of five-leaf admixture tests, dubbed Δ-statistics, that can either incorporate or exclude the singleton allelic patterns depending on individual taxon and age sampling choices. We describe two new shapes that are also fully testable, namely the \\\"asymmetric\\\" tree A=((((1,2),3),4),5) and the \\\"quasisymmetric\\\" tree Q=(((1,2),3),(4,5)), which can considerably supplement the \\\"symmetric\\\" S=(((1,2),(3,4)),5) model used by DFOIL. We demonstrate the consistency of Δ-statistics under various simulated scenarios, and provide empirical examples using data from black, brown and polar bears, the latter also including two ancient polar bear samples from previous studies. Recently, DFOIL and one of these ancient samples was used to argue for a dominant polar bear → brown bear introgression direction. However, we find, using both this ancient polar bear and our own, that by far the strongest signal using both DFOIL and Δ-statistics on tree S is actually bidirectional gene flow of indistinguishable direction. Further experiments on trees A and Q instead highlight what were likely two phases of admixture: one with stronger brown bear → polar bear introgression in ancient times, and a more recent phase with predominant polar bear → brown bear directionality.</p>\",\"PeriodicalId\":18730,\"journal\":{\"name\":\"Molecular biology and evolution\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":11.0000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular biology and evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/molbev/msae198\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msae198","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
在过去的 15 年中,D 统计量(D-statistic)得到了广泛应用,它是一种生物混杂(杂交或引入)的四种群检验方法,结合了等位基因模式 ABBA 和 BABA 的单核苷酸多态性数据。该统计量旨在发现明显偏离特定物种树假设的情况,或偏离平衡的不完整世系分类的情况,否则可能会违背该物种树。然而,虽然 D 统计量可以成功地从不连贯世系排序中分辨出掺杂,但仅用四叶树模型来确定掺杂的方向性并不简单。因此,人们开发了使用五叶树来评估掺杂的方法。其中,在 "对称 "树 S =(((1,2),(3,4)),5)上测试等位基因模式的 DFOIL 方法成功地找到了许多 5 个物种实例的混杂方向。然而,DFOIL 并没有充分利用所有的对称性,而且由于依赖单子模式(如 BAAAA 和 ABAAA),当包含古代样本时,DFOIL 也无法正常工作。在此,我们从 DFOIL 中汲取灵感,开发了一个全新的、完全通用的五叶混杂检验系列,称为 Δ-统计量,它可以根据单个分类群和年龄取样的选择,纳入或排除单子等位基因模式。我们描述了两种也可完全检验的新形状,即 "非对称 "树 A = ((((1,2),3),4),5)和 "准对称 "树 Q = (((1,2),3),(4,5)) ,它们可以大大补充 DFOIL 使用的 "对称 "树 S = (((1,2),(3,4)),5) 模型。我们利用黑熊、棕熊和北极熊的数据证明了Δ统计量在各种模拟情况下的一致性,并提供了经验实例,后者还包括先前研究中的两个古老北极熊样本。最近,DFOIL 和其中一个古老样本被用来论证北极熊→棕熊的主导性引种方向。然而,我们利用这只远古北极熊和我们自己的北极熊样本发现,到目前为止,在树 S 上使用 DFOIL 和 Δ 统计的最强信号实际上是无法区分方向的双向基因流动。在树 A 和树 Q 上的进一步实验反而凸显了可能存在的两个混杂阶段:一个是远古时期较强的棕熊→北极熊引入,另一个是较近的北极熊→棕熊方向性占主导地位的阶段。代码和文档见 https://github.com/KalleLeppala/Delta-statistics。
Five-leaf Generalizations of the D-statistic Reveal the Directionality of Admixture.
Over the past 15 years, the D-statistic, a four-taxon test for organismal admixture (hybridization, or introgression) which incorporates single nucleotide polymorphism data with allelic patterns ABBA and BABA, has seen considerable use. This statistic seeks to discern significant deviation from either a given species tree assumption, or from the balanced incomplete lineage sorting that could otherwise defy this species tree. However, while the D-statistic can successfully discriminate admixture from incomplete lineage sorting, it is not a simple matter to determine the directionality of admixture using only four-leaf tree models. As such, methods have been developed that use five leaves to evaluate admixture. Among these, the DFOIL method ("FOIL", a mnemonic for "First-Outer-Inner-Last"), which tests allelic patterns on the "symmetric" tree S=(((1,2),(3,4)),5), succeeds in finding admixture direction for many five-taxon examples. However, DFOIL does not make full use of all symmetry, nor can DFOIL function properly when ancient samples are included because of the reliance on singleton patterns (such as BAAAA and ABAAA). Here, we take inspiration from DFOIL to develop a new and completely general family of five-leaf admixture tests, dubbed Δ-statistics, that can either incorporate or exclude the singleton allelic patterns depending on individual taxon and age sampling choices. We describe two new shapes that are also fully testable, namely the "asymmetric" tree A=((((1,2),3),4),5) and the "quasisymmetric" tree Q=(((1,2),3),(4,5)), which can considerably supplement the "symmetric" S=(((1,2),(3,4)),5) model used by DFOIL. We demonstrate the consistency of Δ-statistics under various simulated scenarios, and provide empirical examples using data from black, brown and polar bears, the latter also including two ancient polar bear samples from previous studies. Recently, DFOIL and one of these ancient samples was used to argue for a dominant polar bear → brown bear introgression direction. However, we find, using both this ancient polar bear and our own, that by far the strongest signal using both DFOIL and Δ-statistics on tree S is actually bidirectional gene flow of indistinguishable direction. Further experiments on trees A and Q instead highlight what were likely two phases of admixture: one with stronger brown bear → polar bear introgression in ancient times, and a more recent phase with predominant polar bear → brown bear directionality.
期刊介绍:
Molecular Biology and Evolution
Journal Overview:
Publishes research at the interface of molecular (including genomics) and evolutionary biology
Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic
Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research
Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.