{"title":"植物倍性数据库PloiDB","authors":"Keren Halabi, Anat Shafir, Itay Mayrose","doi":"10.1111/nph.19057","DOIUrl":null,"url":null,"abstract":"<p>Polyploidy, namely the acquisition of additional, complete sets of chromosomes to the genome, is widely recognized as a key feature of extant organismal diversity, particularly in plants. It is generally accepted that all angiosperm species have experienced at least one polyploidization event in their evolutionary past (Jiao <i>et al</i>., <span>2011</span>). Therefore, most (if not all) plant species should be considered as paleo-polyploids that have since diploidized to some extent. As such, the distinction between diploids and polyploids should be made with respect to a reference timepoint. In recent decades, polyploid research has experienced a resurgence among plant evolutionary biologists. This is largely due to the use of genomic analyses that have revealed a rich history of genome duplications across multiple plant lineages. Indeed, numerous studies have investigated the impact of polyploidy on morphological and life-history traits, ecology, diversification patterns, and genome evolution (reviewed in Soltis & Soltis, <span>2000</span>; Otto & Whitton, <span>2003</span>; Ramsey & Schemske, <span>2003</span>; Otto, <span>2007</span>; Ramsey & Ramsey, <span>2014</span>; Wendel, <span>2015</span>; Soltis <i>et al</i>., <span>2016</span>; Van de Peer <i>et al</i>., <span>2017</span>; Fox <i>et al</i>., <span>2020</span>). Notably, many of these studies focus on the effect of polyploidy in the context of a specific taxonomic group, which limits our ability to draw conclusions regarding the universal consequences of polyploidy, and to distinguish broad convergent trends from species-specific idiosyncrasies. There is thus a growing need to expand the examination of ploidy estimates across the seed plants clade, to obtain robust and broad information about the effect of polyploidy.</p><p>In the last few years, multiple methods for ploidy inferences based on sequenced genomic data have been developed (e.g. Jiao <i>et al</i>., <span>2011</span>; Rabier <i>et al</i>., <span>2014</span>; Vanneste <i>et al</i>., <span>2014</span>; Tiley <i>et al</i>., <span>2018</span>; Zwaenepoel & Van de Peer, <span>2020</span>). However, due to the computational complexities and substantial amount of genomic data involved, the applications of such methods are still somewhat limited and are usually applied at phylogenetic scales above the species level, for example, by sampling representatives from several clades of interest. As such, the most comprehensive sequence-based analysis to date, which was conducted by the 1KP initiative and encompassed the transcriptomes of roughly 1100 plant species, has identified 244 whole-genome duplication (WGD) events occurring within Viridiplantae (One Thousand Plant Transcriptomes Initiative, <span>2019</span>; Li & Barker, <span>2020</span>).</p><p>Ploidy estimation at the species level is still largely based on information derived from chromosome numbers. A simple utility of chromosome number information for determining ploidy employs threshold techniques to classify polyploid species relative to the lowest chromosome number (or some other measure) found in the genus (Stebbins Jr, <span>1938</span>; Grant, <span>1963</span>; Goldblatt, <span>1980</span>; Wood <i>et al</i>., <span>2009</span>). Other studies have incorporated phylogenetic information and inferred ploidy values based on the maximum parsimony principle (Guggisberg <i>et al</i>., <span>2006</span>; Ohi-Toma <i>et al</i>., <span>2006</span>; Timme <i>et al</i>., <span>2007</span>). A more advanced approach utilizes likelihood models of chromosome-number evolution that accounts for the branch lengths of the phylogeny and allows for different types of transitions in chromosome numbers, whose rates are estimated from the data. Such models have been implemented in the <i>chromEvol</i> probabilistic framework and its extensions (Mayrose <i>et al</i>., <span>2010</span>; Glick & Mayrose, <span>2014</span>; Freyman & Höhna, <span>2018</span>; Zenil-Ferguson <i>et al</i>., <span>2018</span>; Blackmon <i>et al</i>., <span>2019</span>). <i>ChromEvol</i> treats the evolution of chromosome numbers along a phylogeny as a continuous time Markov process and includes parameters that represent the transition rate for different types of events. Three types of transitions correspond to polyploidization events: (1) WGD, an exact duplication of the number of chromosomes; (2) demi-polyploidization, a 1.5-fold multiplication of the chromosome number, representing, for example, a triplication event; and (3) base-number transition, the addition to the genome of any multiplication of an inferred base number, which represents the monoploid chromosome number of the focal group (Glick & Mayrose, <span>2014</span>). In addition to ploidy transitions, <i>chromEvol</i> also considers dysploidization events, which may result in either the addition or the subtraction of a single chromosome number. Such events are the result of rearrangements within chromosomal DNA, and are triggered by double-strand breaks and subsequent misrepair at the breakpoints. A decrease in chromosome number (descending dysploidy) is caused by chromosome fusion, resulting from recombination between at least two non-homologous chromosomes. By contrast, chromosome fission leads to an increase in the chromosome number (ascending dysploidy; Mayrose & Lysak, <span>2021</span>). Given a phylogeny and the respective chromosome numbers at the tips, a standard application of <i>chromEvol</i> allows the selection of the most appropriate model and provides estimates of the expected number of polyploidy and dysploidy transitions along each branch of the phylogeny, thereby allowing categorization of tip taxa as either diploid or polyploid relative to other taxa in the clade (Glick & Mayrose, <span>2014</span>).</p><p>Under the <i>chromEvol</i> phylogenetic framework, inferences are made with respect to a reference timepoint, represented as the root of the phylogeny examined. Consequently, a lineage that experienced a polyploidization event following divergence from the most recent common ancestor (MRCA), but has since diploidized, is still classified as a polyploid. Thus, the choice of a reference timepoint has a large impact on the inferred ploidy values. To date, the <i>chromEvol</i> framework has been applied to phylogenies of different taxonomic scales, ranging from genus (e.g. Mayrose <i>et al</i>., <span>2011</span>; Rice <i>et al</i>., <span>2019</span>), to family (e.g. Chacón & Renner, <span>2014</span>; Mota <i>et al</i>., <span>2016</span>; Román-Palacios <i>et al</i>., <span>2020</span>; Romero-da-Cruz <i>et al</i>., <span>2022</span>), or even higher (e.g. Clark <i>et al</i>., <span>2016</span>; Escudero <i>et al</i>., <span>2018</span>; Carta <i>et al</i>., <span>2020</span>), but the consequences of the chosen time scale remain unclear.</p><p>Information on chromosome numbers across a wide array of plant clades is growing continuously (Rice <i>et al</i>., <span>2015</span>). In parallel, in the past few years, phylogenetic relationships across the seed plants have been more accurately and extensively investigated (Liu <i>et al</i>., <span>2022</span>). We present the plant ploidy database (<i>PloiDB</i>) as an online source that is available online at http://ploidb.tau.ac.il/. <i>PloiDB</i> contains inferences across multiple phylogenetic scales, according to either divergence time or taxonomic resolution. This further allows us to provide a more continuous scale of ploidy inferences, rather than a strictly dichotomous outcome.</p><p>We present <i>PloiDB</i>, as a community resource of ploidy inferences for tens of thousands of plant taxa – including mostly species and occasionally lower taxonomic ranks (e.g. subspecies or varieties). All inferences were generated using the <i>chromEvol</i> framework, with inputs derived from extensive chromosome count information (Rice <i>et al</i>., <span>2015</span>) and a broad seed-plant phylogeny (ALLMB; Smith & Brown, <span>2018</span>). The ALLMB phylogeny consists of 356 305 tips, corresponding to plant taxa at the taxonomic rank of species or below. In our baseline inference scheme, we partitioned the broad ALLMB phylogeny to focal clades according to their generic circumscription. This resulted in 2063 genus-level phylogenies, each with at least five taxa having chromosome-number information, ranging across 210 families belonging to the angiosperms and 8 to the gymnosperms, that together encompass 57 493 taxa. Considering the number of accepted plant taxa in World Flora Online (Borsch <i>et al</i>., <span>2020</span>), the <i>PloiDB</i> inferences cover 14.4% of angiosperm taxa (56 126 out of 389 376) and 26.9% of gymnosperms (442 out of 1643).</p><p>Our results reveal that 31.9% of angiosperm and 5.7% of gymnosperm taxa have experienced one or more polyploidization events since their divergence from the MRCA of their respective genus. Polyploid frequency is similar in monocots and eudicots (32.9% and 31.7%, respectively), but changes dramatically within each group. Polyploidy is common in higher monocots (commelinids, represented by Arecales, Commelinales, Poales, and Zingiberales; 49.3%) but relatively rare in basal monocots (non-commelinid monocots; 21.5%). In eudicots, on the other hand, polyploids are similarly frequent in rosids and asterids (30.5% and 30.8%, respectively) but are more common in basal dicots (37.4%). The distribution of polyploid frequency across genera is left-skewed (Fig. 1a), with 708 genera containing only diploids, while polyploid-rich genera (containing over 50% polyploids) are relatively rare and are only found in 359 angiosperm and zero gymnosperm genera. In addition, our inferences indicate that polyploid frequency is < 20% in roughly half of seed-plant families, while eight families are extremely polyploid rich (i.e. > 80% of the taxa are polyploids). Within the 20 largest angiosperm families, Poaceae, Rosaceae, and Malvaceae are the most polyploid rich (55.1%, 51.3%, and 44%, respectively), while Myrtaceae and Apocynaceae are polyploid poor, containing < 10% polyploids (see Supporting Information Table S1 for polyploid frequency across all families, sorted from the largest to the smallest families).</p><p>None declared.</p><p>IM and KH conceived the study. KH developed the ploidy inference scheme, assembled the <i>PloiDB</i>, and conducted the data analyses. AS provided the <i>chromEvol</i> implementation and consultation during the development of the ploidy inference scheme. IM supervised the study.</p>","PeriodicalId":48887,"journal":{"name":"New Phytologist","volume":"240 3","pages":"918-927"},"PeriodicalIF":9.4000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/nph.19057","citationCount":"2","resultStr":"{\"title\":\"PloiDB: the plant ploidy database\",\"authors\":\"Keren Halabi, Anat Shafir, Itay Mayrose\",\"doi\":\"10.1111/nph.19057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Polyploidy, namely the acquisition of additional, complete sets of chromosomes to the genome, is widely recognized as a key feature of extant organismal diversity, particularly in plants. It is generally accepted that all angiosperm species have experienced at least one polyploidization event in their evolutionary past (Jiao <i>et al</i>., <span>2011</span>). Therefore, most (if not all) plant species should be considered as paleo-polyploids that have since diploidized to some extent. As such, the distinction between diploids and polyploids should be made with respect to a reference timepoint. In recent decades, polyploid research has experienced a resurgence among plant evolutionary biologists. This is largely due to the use of genomic analyses that have revealed a rich history of genome duplications across multiple plant lineages. Indeed, numerous studies have investigated the impact of polyploidy on morphological and life-history traits, ecology, diversification patterns, and genome evolution (reviewed in Soltis & Soltis, <span>2000</span>; Otto & Whitton, <span>2003</span>; Ramsey & Schemske, <span>2003</span>; Otto, <span>2007</span>; Ramsey & Ramsey, <span>2014</span>; Wendel, <span>2015</span>; Soltis <i>et al</i>., <span>2016</span>; Van de Peer <i>et al</i>., <span>2017</span>; Fox <i>et al</i>., <span>2020</span>). Notably, many of these studies focus on the effect of polyploidy in the context of a specific taxonomic group, which limits our ability to draw conclusions regarding the universal consequences of polyploidy, and to distinguish broad convergent trends from species-specific idiosyncrasies. There is thus a growing need to expand the examination of ploidy estimates across the seed plants clade, to obtain robust and broad information about the effect of polyploidy.</p><p>In the last few years, multiple methods for ploidy inferences based on sequenced genomic data have been developed (e.g. Jiao <i>et al</i>., <span>2011</span>; Rabier <i>et al</i>., <span>2014</span>; Vanneste <i>et al</i>., <span>2014</span>; Tiley <i>et al</i>., <span>2018</span>; Zwaenepoel & Van de Peer, <span>2020</span>). However, due to the computational complexities and substantial amount of genomic data involved, the applications of such methods are still somewhat limited and are usually applied at phylogenetic scales above the species level, for example, by sampling representatives from several clades of interest. As such, the most comprehensive sequence-based analysis to date, which was conducted by the 1KP initiative and encompassed the transcriptomes of roughly 1100 plant species, has identified 244 whole-genome duplication (WGD) events occurring within Viridiplantae (One Thousand Plant Transcriptomes Initiative, <span>2019</span>; Li & Barker, <span>2020</span>).</p><p>Ploidy estimation at the species level is still largely based on information derived from chromosome numbers. A simple utility of chromosome number information for determining ploidy employs threshold techniques to classify polyploid species relative to the lowest chromosome number (or some other measure) found in the genus (Stebbins Jr, <span>1938</span>; Grant, <span>1963</span>; Goldblatt, <span>1980</span>; Wood <i>et al</i>., <span>2009</span>). Other studies have incorporated phylogenetic information and inferred ploidy values based on the maximum parsimony principle (Guggisberg <i>et al</i>., <span>2006</span>; Ohi-Toma <i>et al</i>., <span>2006</span>; Timme <i>et al</i>., <span>2007</span>). A more advanced approach utilizes likelihood models of chromosome-number evolution that accounts for the branch lengths of the phylogeny and allows for different types of transitions in chromosome numbers, whose rates are estimated from the data. Such models have been implemented in the <i>chromEvol</i> probabilistic framework and its extensions (Mayrose <i>et al</i>., <span>2010</span>; Glick & Mayrose, <span>2014</span>; Freyman & Höhna, <span>2018</span>; Zenil-Ferguson <i>et al</i>., <span>2018</span>; Blackmon <i>et al</i>., <span>2019</span>). <i>ChromEvol</i> treats the evolution of chromosome numbers along a phylogeny as a continuous time Markov process and includes parameters that represent the transition rate for different types of events. Three types of transitions correspond to polyploidization events: (1) WGD, an exact duplication of the number of chromosomes; (2) demi-polyploidization, a 1.5-fold multiplication of the chromosome number, representing, for example, a triplication event; and (3) base-number transition, the addition to the genome of any multiplication of an inferred base number, which represents the monoploid chromosome number of the focal group (Glick & Mayrose, <span>2014</span>). In addition to ploidy transitions, <i>chromEvol</i> also considers dysploidization events, which may result in either the addition or the subtraction of a single chromosome number. Such events are the result of rearrangements within chromosomal DNA, and are triggered by double-strand breaks and subsequent misrepair at the breakpoints. A decrease in chromosome number (descending dysploidy) is caused by chromosome fusion, resulting from recombination between at least two non-homologous chromosomes. By contrast, chromosome fission leads to an increase in the chromosome number (ascending dysploidy; Mayrose & Lysak, <span>2021</span>). Given a phylogeny and the respective chromosome numbers at the tips, a standard application of <i>chromEvol</i> allows the selection of the most appropriate model and provides estimates of the expected number of polyploidy and dysploidy transitions along each branch of the phylogeny, thereby allowing categorization of tip taxa as either diploid or polyploid relative to other taxa in the clade (Glick & Mayrose, <span>2014</span>).</p><p>Under the <i>chromEvol</i> phylogenetic framework, inferences are made with respect to a reference timepoint, represented as the root of the phylogeny examined. Consequently, a lineage that experienced a polyploidization event following divergence from the most recent common ancestor (MRCA), but has since diploidized, is still classified as a polyploid. Thus, the choice of a reference timepoint has a large impact on the inferred ploidy values. To date, the <i>chromEvol</i> framework has been applied to phylogenies of different taxonomic scales, ranging from genus (e.g. Mayrose <i>et al</i>., <span>2011</span>; Rice <i>et al</i>., <span>2019</span>), to family (e.g. Chacón & Renner, <span>2014</span>; Mota <i>et al</i>., <span>2016</span>; Román-Palacios <i>et al</i>., <span>2020</span>; Romero-da-Cruz <i>et al</i>., <span>2022</span>), or even higher (e.g. Clark <i>et al</i>., <span>2016</span>; Escudero <i>et al</i>., <span>2018</span>; Carta <i>et al</i>., <span>2020</span>), but the consequences of the chosen time scale remain unclear.</p><p>Information on chromosome numbers across a wide array of plant clades is growing continuously (Rice <i>et al</i>., <span>2015</span>). In parallel, in the past few years, phylogenetic relationships across the seed plants have been more accurately and extensively investigated (Liu <i>et al</i>., <span>2022</span>). We present the plant ploidy database (<i>PloiDB</i>) as an online source that is available online at http://ploidb.tau.ac.il/. <i>PloiDB</i> contains inferences across multiple phylogenetic scales, according to either divergence time or taxonomic resolution. This further allows us to provide a more continuous scale of ploidy inferences, rather than a strictly dichotomous outcome.</p><p>We present <i>PloiDB</i>, as a community resource of ploidy inferences for tens of thousands of plant taxa – including mostly species and occasionally lower taxonomic ranks (e.g. subspecies or varieties). All inferences were generated using the <i>chromEvol</i> framework, with inputs derived from extensive chromosome count information (Rice <i>et al</i>., <span>2015</span>) and a broad seed-plant phylogeny (ALLMB; Smith & Brown, <span>2018</span>). The ALLMB phylogeny consists of 356 305 tips, corresponding to plant taxa at the taxonomic rank of species or below. In our baseline inference scheme, we partitioned the broad ALLMB phylogeny to focal clades according to their generic circumscription. This resulted in 2063 genus-level phylogenies, each with at least five taxa having chromosome-number information, ranging across 210 families belonging to the angiosperms and 8 to the gymnosperms, that together encompass 57 493 taxa. Considering the number of accepted plant taxa in World Flora Online (Borsch <i>et al</i>., <span>2020</span>), the <i>PloiDB</i> inferences cover 14.4% of angiosperm taxa (56 126 out of 389 376) and 26.9% of gymnosperms (442 out of 1643).</p><p>Our results reveal that 31.9% of angiosperm and 5.7% of gymnosperm taxa have experienced one or more polyploidization events since their divergence from the MRCA of their respective genus. Polyploid frequency is similar in monocots and eudicots (32.9% and 31.7%, respectively), but changes dramatically within each group. Polyploidy is common in higher monocots (commelinids, represented by Arecales, Commelinales, Poales, and Zingiberales; 49.3%) but relatively rare in basal monocots (non-commelinid monocots; 21.5%). In eudicots, on the other hand, polyploids are similarly frequent in rosids and asterids (30.5% and 30.8%, respectively) but are more common in basal dicots (37.4%). The distribution of polyploid frequency across genera is left-skewed (Fig. 1a), with 708 genera containing only diploids, while polyploid-rich genera (containing over 50% polyploids) are relatively rare and are only found in 359 angiosperm and zero gymnosperm genera. In addition, our inferences indicate that polyploid frequency is < 20% in roughly half of seed-plant families, while eight families are extremely polyploid rich (i.e. > 80% of the taxa are polyploids). Within the 20 largest angiosperm families, Poaceae, Rosaceae, and Malvaceae are the most polyploid rich (55.1%, 51.3%, and 44%, respectively), while Myrtaceae and Apocynaceae are polyploid poor, containing < 10% polyploids (see Supporting Information Table S1 for polyploid frequency across all families, sorted from the largest to the smallest families).</p><p>None declared.</p><p>IM and KH conceived the study. KH developed the ploidy inference scheme, assembled the <i>PloiDB</i>, and conducted the data analyses. AS provided the <i>chromEvol</i> implementation and consultation during the development of the ploidy inference scheme. IM supervised the study.</p>\",\"PeriodicalId\":48887,\"journal\":{\"name\":\"New Phytologist\",\"volume\":\"240 3\",\"pages\":\"918-927\"},\"PeriodicalIF\":9.4000,\"publicationDate\":\"2023-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/nph.19057\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"New Phytologist\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/nph.19057\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Agricultural and Biological Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Phytologist","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/nph.19057","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 2
摘要
多倍体,即在基因组中获得额外的完整染色体组,被广泛认为是现存生物体多样性的一个关键特征,尤其是在植物中。人们普遍认为,所有被子植物物种在进化史上都经历过至少一次多倍体化事件(Jiao et al.,2011)。因此,大多数(如果不是全部的话)植物物种都应该被认为是在一定程度上已经多倍体化的古多倍体。因此,应该根据参考时间点来区分二倍体和多倍体。近几十年来,多倍体研究在植物进化生物学家中死灰复燃。这在很大程度上是由于使用了基因组分析,揭示了多个植物谱系中基因组重复的丰富历史。事实上,许多研究已经调查了多倍体对形态和生活史特征、生态学、多样化模式和基因组进化的影响(综述于Soltis和Soltis,2000;Otto和Whitton,2003;Ramsey和Schemske,2003;Otto,2007;Ramsey&;Ramsey,2014;Wendel,2015;Soltis等人,2016;Van de Peer等人,2017;Fox等人,2020)。值得注意的是,这些研究中的许多都集中在特定分类群中多倍性的影响上,这限制了我们得出关于多倍性普遍后果的结论的能力,也限制了我们区分广泛趋同趋势和物种特异性特质的能力。因此,越来越需要扩大对种子植物分支的倍性估计的检查,以获得关于多倍体效应的可靠和广泛的信息。在过去的几年里,已经开发了基于测序基因组数据进行倍性推断的多种方法(例如,Jiao等人,2011;Rabier等人,2014;Vanneste等人,2014年;Tiley等人,2018;Zwaenepoel和Van de Peer,2020)。然而,由于计算的复杂性和涉及的大量基因组数据,这种方法的应用仍然有些有限,通常在物种水平以上的系统发育尺度上应用,例如,通过从几个感兴趣的分支中采样代表。因此,迄今为止最全面的基于序列的分析,由1KP倡议进行,涵盖了大约1100个植物物种的转录组,已经确定了244个发生在病毒科内的全基因组重复(WGD)事件(一千植物转录组倡议,2019;李和巴克,2020)。物种水平上的倍性估计仍然主要基于染色体数量的信息。染色体数目信息用于确定倍性的一个简单用途是使用阈值技术,根据属中发现的最低染色体数目(或一些其他测量值)对多倍体物种进行分类(Stebbins Jr,1938;Grant,1963;Goldblatt,1980;Wood等人,2009)简约原理(Guggisberg等人,2006;Ohi-Toma等人,2006年;Timme等人,2007年)。一种更先进的方法利用染色体数量进化的似然模型,该模型解释了系统发育的分支长度,并允许染色体数量的不同类型的转变,其比率是根据数据估计的。这样的模型已经在chromEvol概率框架及其扩展中实现(Mayrose等人,2010;Glick和Mayrose,2014;Freyman和Höhna,2018;Zenil Ferguson等人,2018;Blackmon等人,2019)针对不同类型的事件。三种类型的转换对应于多倍体化事件:(1)WGD,染色体数量的精确复制;(2) 半多倍体化,染色体数量的1.5倍增殖,例如代表三倍化事件;和(3)基数转换,将推断的基数的任何乘法添加到基因组中,该基数表示焦点组的单倍体染色体数量(Glick&;Mayrose,2014)。除了倍性转换外,chromEvol还考虑了异倍性事件,这可能导致单个染色体数量的增加或减少。这类事件是染色体DNA重排的结果,由双链断裂和随后的断点错误修复触发。染色体数量的减少(染色体异常减少)是由至少两条非同源染色体之间的重组引起的染色体融合引起的。相比之下,染色体分裂导致染色体数量增加(上升型异倍性;Mayrose&;Lysak,2021)。
Polyploidy, namely the acquisition of additional, complete sets of chromosomes to the genome, is widely recognized as a key feature of extant organismal diversity, particularly in plants. It is generally accepted that all angiosperm species have experienced at least one polyploidization event in their evolutionary past (Jiao et al., 2011). Therefore, most (if not all) plant species should be considered as paleo-polyploids that have since diploidized to some extent. As such, the distinction between diploids and polyploids should be made with respect to a reference timepoint. In recent decades, polyploid research has experienced a resurgence among plant evolutionary biologists. This is largely due to the use of genomic analyses that have revealed a rich history of genome duplications across multiple plant lineages. Indeed, numerous studies have investigated the impact of polyploidy on morphological and life-history traits, ecology, diversification patterns, and genome evolution (reviewed in Soltis & Soltis, 2000; Otto & Whitton, 2003; Ramsey & Schemske, 2003; Otto, 2007; Ramsey & Ramsey, 2014; Wendel, 2015; Soltis et al., 2016; Van de Peer et al., 2017; Fox et al., 2020). Notably, many of these studies focus on the effect of polyploidy in the context of a specific taxonomic group, which limits our ability to draw conclusions regarding the universal consequences of polyploidy, and to distinguish broad convergent trends from species-specific idiosyncrasies. There is thus a growing need to expand the examination of ploidy estimates across the seed plants clade, to obtain robust and broad information about the effect of polyploidy.
In the last few years, multiple methods for ploidy inferences based on sequenced genomic data have been developed (e.g. Jiao et al., 2011; Rabier et al., 2014; Vanneste et al., 2014; Tiley et al., 2018; Zwaenepoel & Van de Peer, 2020). However, due to the computational complexities and substantial amount of genomic data involved, the applications of such methods are still somewhat limited and are usually applied at phylogenetic scales above the species level, for example, by sampling representatives from several clades of interest. As such, the most comprehensive sequence-based analysis to date, which was conducted by the 1KP initiative and encompassed the transcriptomes of roughly 1100 plant species, has identified 244 whole-genome duplication (WGD) events occurring within Viridiplantae (One Thousand Plant Transcriptomes Initiative, 2019; Li & Barker, 2020).
Ploidy estimation at the species level is still largely based on information derived from chromosome numbers. A simple utility of chromosome number information for determining ploidy employs threshold techniques to classify polyploid species relative to the lowest chromosome number (or some other measure) found in the genus (Stebbins Jr, 1938; Grant, 1963; Goldblatt, 1980; Wood et al., 2009). Other studies have incorporated phylogenetic information and inferred ploidy values based on the maximum parsimony principle (Guggisberg et al., 2006; Ohi-Toma et al., 2006; Timme et al., 2007). A more advanced approach utilizes likelihood models of chromosome-number evolution that accounts for the branch lengths of the phylogeny and allows for different types of transitions in chromosome numbers, whose rates are estimated from the data. Such models have been implemented in the chromEvol probabilistic framework and its extensions (Mayrose et al., 2010; Glick & Mayrose, 2014; Freyman & Höhna, 2018; Zenil-Ferguson et al., 2018; Blackmon et al., 2019). ChromEvol treats the evolution of chromosome numbers along a phylogeny as a continuous time Markov process and includes parameters that represent the transition rate for different types of events. Three types of transitions correspond to polyploidization events: (1) WGD, an exact duplication of the number of chromosomes; (2) demi-polyploidization, a 1.5-fold multiplication of the chromosome number, representing, for example, a triplication event; and (3) base-number transition, the addition to the genome of any multiplication of an inferred base number, which represents the monoploid chromosome number of the focal group (Glick & Mayrose, 2014). In addition to ploidy transitions, chromEvol also considers dysploidization events, which may result in either the addition or the subtraction of a single chromosome number. Such events are the result of rearrangements within chromosomal DNA, and are triggered by double-strand breaks and subsequent misrepair at the breakpoints. A decrease in chromosome number (descending dysploidy) is caused by chromosome fusion, resulting from recombination between at least two non-homologous chromosomes. By contrast, chromosome fission leads to an increase in the chromosome number (ascending dysploidy; Mayrose & Lysak, 2021). Given a phylogeny and the respective chromosome numbers at the tips, a standard application of chromEvol allows the selection of the most appropriate model and provides estimates of the expected number of polyploidy and dysploidy transitions along each branch of the phylogeny, thereby allowing categorization of tip taxa as either diploid or polyploid relative to other taxa in the clade (Glick & Mayrose, 2014).
Under the chromEvol phylogenetic framework, inferences are made with respect to a reference timepoint, represented as the root of the phylogeny examined. Consequently, a lineage that experienced a polyploidization event following divergence from the most recent common ancestor (MRCA), but has since diploidized, is still classified as a polyploid. Thus, the choice of a reference timepoint has a large impact on the inferred ploidy values. To date, the chromEvol framework has been applied to phylogenies of different taxonomic scales, ranging from genus (e.g. Mayrose et al., 2011; Rice et al., 2019), to family (e.g. Chacón & Renner, 2014; Mota et al., 2016; Román-Palacios et al., 2020; Romero-da-Cruz et al., 2022), or even higher (e.g. Clark et al., 2016; Escudero et al., 2018; Carta et al., 2020), but the consequences of the chosen time scale remain unclear.
Information on chromosome numbers across a wide array of plant clades is growing continuously (Rice et al., 2015). In parallel, in the past few years, phylogenetic relationships across the seed plants have been more accurately and extensively investigated (Liu et al., 2022). We present the plant ploidy database (PloiDB) as an online source that is available online at http://ploidb.tau.ac.il/. PloiDB contains inferences across multiple phylogenetic scales, according to either divergence time or taxonomic resolution. This further allows us to provide a more continuous scale of ploidy inferences, rather than a strictly dichotomous outcome.
We present PloiDB, as a community resource of ploidy inferences for tens of thousands of plant taxa – including mostly species and occasionally lower taxonomic ranks (e.g. subspecies or varieties). All inferences were generated using the chromEvol framework, with inputs derived from extensive chromosome count information (Rice et al., 2015) and a broad seed-plant phylogeny (ALLMB; Smith & Brown, 2018). The ALLMB phylogeny consists of 356 305 tips, corresponding to plant taxa at the taxonomic rank of species or below. In our baseline inference scheme, we partitioned the broad ALLMB phylogeny to focal clades according to their generic circumscription. This resulted in 2063 genus-level phylogenies, each with at least five taxa having chromosome-number information, ranging across 210 families belonging to the angiosperms and 8 to the gymnosperms, that together encompass 57 493 taxa. Considering the number of accepted plant taxa in World Flora Online (Borsch et al., 2020), the PloiDB inferences cover 14.4% of angiosperm taxa (56 126 out of 389 376) and 26.9% of gymnosperms (442 out of 1643).
Our results reveal that 31.9% of angiosperm and 5.7% of gymnosperm taxa have experienced one or more polyploidization events since their divergence from the MRCA of their respective genus. Polyploid frequency is similar in monocots and eudicots (32.9% and 31.7%, respectively), but changes dramatically within each group. Polyploidy is common in higher monocots (commelinids, represented by Arecales, Commelinales, Poales, and Zingiberales; 49.3%) but relatively rare in basal monocots (non-commelinid monocots; 21.5%). In eudicots, on the other hand, polyploids are similarly frequent in rosids and asterids (30.5% and 30.8%, respectively) but are more common in basal dicots (37.4%). The distribution of polyploid frequency across genera is left-skewed (Fig. 1a), with 708 genera containing only diploids, while polyploid-rich genera (containing over 50% polyploids) are relatively rare and are only found in 359 angiosperm and zero gymnosperm genera. In addition, our inferences indicate that polyploid frequency is < 20% in roughly half of seed-plant families, while eight families are extremely polyploid rich (i.e. > 80% of the taxa are polyploids). Within the 20 largest angiosperm families, Poaceae, Rosaceae, and Malvaceae are the most polyploid rich (55.1%, 51.3%, and 44%, respectively), while Myrtaceae and Apocynaceae are polyploid poor, containing < 10% polyploids (see Supporting Information Table S1 for polyploid frequency across all families, sorted from the largest to the smallest families).
None declared.
IM and KH conceived the study. KH developed the ploidy inference scheme, assembled the PloiDB, and conducted the data analyses. AS provided the chromEvol implementation and consultation during the development of the ploidy inference scheme. IM supervised the study.
期刊介绍:
New Phytologist is a leading publication that showcases exceptional and groundbreaking research in plant science and its practical applications. With a focus on five distinct sections - Physiology & Development, Environment, Interaction, Evolution, and Transformative Plant Biotechnology - the journal covers a wide array of topics ranging from cellular processes to the impact of global environmental changes. We encourage the use of interdisciplinary approaches, and our content is structured to reflect this. Our journal acknowledges the diverse techniques employed in plant science, including molecular and cell biology, functional genomics, modeling, and system-based approaches, across various subfields.