When numbers matter: Rethinking the role of gene duplication on short evolutionary timescales

IF 2.7 2区生物学 Q2 PLANT SCIENCES

American Journal of Botany Pub Date : 2025-07-09 DOI:10.1002/ajb2.70072

Freja Lindstedt, Qiujie Zhou, Pascal Milesi

{"title":"When numbers matter: Rethinking the role of gene duplication on short evolutionary timescales","authors":"Freja Lindstedt, Qiujie Zhou, Pascal Milesi","doi":"10.1002/ajb2.70072","DOIUrl":null,"url":null,"abstract":"The potential roles of genomic structural variations (SVs) in the control of phenotypic traits and in evolution were suggested as early as the 20th century. However, they were then overshadowed by the emphasis put on single nucleotide polymorphisms (SNPs). Recently, SVs have received renewed attention in evolutionary research due to advancements in sequencing technologies and analytical methods.At the macroevolutionary scale, plant genomes tend to evolve faster than those of other eukaryotes, due to the prevalence of whole genome duplication events (Wendel et al., 2016). Unlike other types of structural variants, such as inversions, copy number variations (CNVs) result from unbalanced mutations that affect the dosage, or amount, of a DNA sequence. When genes are involved, the number of copies of a gene varies from one individual to another. In plants, gene copy number variations (gCNVs) are likely to be abundant due to events such as mating system shifts (the efficacy of purifying selection is reduced in selfing species), hybridization and subsequent genome rearrangements, and whole genome duplications followed by biased retention (Panchy et al., 2016; Wendel et al., 2016; Van de Peer et al., 2017). For example, in Arabidopsis thaliana (L.) Heynh. (Brassicaceae) 10 to 18% of all genes display CNVs (Zmienko et al., 2020; Jaegle et al., 2023). In the genus Picea Mill. (Pinaceae), at least 10% of the protein-coding genes display CNVs (P. abies (L.) H. Karst and P. obovata Ledeb., Q. Zhou et al., 2025; and P. glauca (Moench) Voss and P. mariana (Mill.) Britton, Sterns & Poggenburg, Prunier et al., 2017).Gene duplications have primarily been studied for their roles in long-term evolution. However, a change in gene dosage usually results in a change in the amount of gene products, such as RNA or proteins (e.g., Shao et al., 2019). Therefore, gCNVs have a unique, multiallelic, and quantitative nature. Fully apprehending their role in short-term evolutionary processes requires studying them as quantitative genotypes rather than in a presence/absence (or biallelic) manner, as is most often done (Figure 1, top panel). Unlike SNPs, the accuracy and resolution of gCNV genotyping are usually dependent on the platform used. From short-read sequencing data, one can use biased allelic ratios (Figure 2A) and changes in the depth of coverage (DoC) caused by the mis-mapping of reads from duplicated regions to the same locus in the reference genome to identify CNVs (Figure 2). However, short reads often fail to capture the underlying genetic structure of gCNVs, and changes in DoC can only be interpreted as relative copy numbers across homologous chromosomes (Figure 1, middle panel). Long-read sequencing is a promising alternative that allows for the phasing of the various alleles to obtain absolute copy numbers (see Figure 1, bottom panel). However, long-read data can still be biased in assembling repetitive regions (Carvalho et al., 2025 [preprint]), they are more computationally demanding, and likely too costly for extensive population-level genomics studies. Nevertheless, continuous advancements in sequencing technologies and CNV analysis methods open the door to more extensive studies focusing on gCNVs, even in non-model species (Karunarathne et al., 2023). In the following sections, we recognize gCNVs as a largely untapped source of genetic variation and explore their potential for studying short-term evolution in plants. We also discuss the main challenges of incorporating this type of polymorphism into population and quantitative genomic frameworks, and how plants, as a study system, offer an opportunity to address them.Despite all the evidence that gCNVs are a non-negligible source of polymorphism, it is still unclear to what extent they can be used to address broad questions in evolutionary biology, such as better inferring past demographic events or refining predictions of population responses to global change for conservation strategies. This is partly due to difficulties in estimating key evolutionary parameters, which largely prevent their use in population and quantitative genetic models (Mérot et al., 2020). Gene copy number variation is more likely to occur via low-copy repeat mechanisms (e.g., non-allelic homologous recombination). As a result, duplications at the origin of gCNVs may span multiple genes (as illustrated in Figure 2), thus affecting the apparent gene duplication rate, which has been estimated to be higher than substitution rates (Katju and Bergthorsson, 2013). The presence of gCNVs induces mismatches during synapsis and can originate from different molecular mechanisms (Hastings et al., 2009). Thus, variable mutation rates in terms of copy number change are expected across loci and mechanisms. Additionally, there may be asymmetric rates of copy number gain and loss, as well as state-dependent mutation rates, where the rate of change in copy number varies with the actual number of copies. Gene copy number variations can affect patterns of recombination and segregation. Therefore, models developed to study the evolution of microsatellites or gene families, such as the stepwise mutation model (SMM) and the birth–death model, respectively, do not accurately predict gCNV evolution. However, these models could produce relevant diversity summary statistics when applied to a large enough number of gCNVs to avoid gene-specific bias. For example, allele size variance (Valdes et al., 1993) and Goldstein's δμ² (Goldstein et al., 1995), which are based on relative copy number, and CNV entropy (similar to Shannon's diversity index) and RST (Slatkin, 1995), which are based on absolute copy number, could be useful for measuring within-population diversity and between-population divergence of gCNVs, respectively. Comparing these estimates with those obtained from neutral SNPs (e.g., nucleotide diversity and FST, respectively) would also inform us about the global distribution of fitness effect of gCNVs.For structural variation, it is often considered that their phenotypic and fitness effects scales with their size. For gCNVs it means that both could increase with copy number and the apparent copy number may therefore be the result of a trade-off between a high mutation rate and large fitness effects. The highly dynamic nature of plant genome evolution makes them excellent models for addressing the knowledge gaps associated with gCNVs with >1400 reference genomes available (Bernal-Gallardo and de Folter, 2024). For example, the Brassicaceae plant family now contains the highest number of genome and transcriptome sequences for any plant lineage, allowing extensive comparative studies to address some of the challenges mentioned above and separate species-specific effects from general properties of gCNVs. Several genera have closely related species with different mating systems (selfing and outcrossing), which could be used to investigate the main evolutionary forces shaping the evolution of gCNVs while controlling for phylogenetic relationships, as has been done for RNA expression previously (e.g., Zhang et al., 2022). The change in ploidy can also be used to study mutation and recombination rates by comparing patterns of gCNVs between the sub-genomes of auto-polyploids as any differences between the sub-genomes will have been acquired after the polyploidization event. In addition, interspecific hybridization and resynthesized polyploids can be generated experimentally, and many plants are model species for functional genomics (Bernal-Gallardo and de Folter, 2024). The use of natural and synthetic resources, together with state-of-the-art sequencing and gene editing technologies, would allow for direct measurements, for example, of the phenotypic and fitness effects of gCNVs.Despite the limitations exposed above, the multiallelic and quantitative nature of gCNVs makes them excellent markers for quantitative genetics as, in many cases, the phenotypic values of traits and the copy number of causal gCNVs show a quantitative relationship, enabling straightforward genotype-to-phenotype mapping. The prevalence of gCNVs in plant genomes makes them natural candidates in the control of quantitative traits and adaptation along environmental gradients.Forest trees are particularly relevant models for studying such questions, as they often have extensive ranges with populations connected by long-distance gene flow and show strong patterns of local adaptation (Savolainen et al., 2013). In a recent study, we showed that gCNVs are widespread and involved in local adaptation in both Picea abies (Norway spruce) and Picea obovata (Siberian spruce), two keystone species of the Eurasian boreal forest (Zhou et al., 2025). Importantly, we found no overlap between candidate genes detected purely from SNP variation and genes whose copy number correlates with environment and/or phenotypic variation. This means that, and in contrast with genomic inversions, the explanatory power of gCNVs is not captured by SNPs, and they must be specifically studied to gain a comprehensive view of the genetic architecture of local adaptation patterns and of phenotypic traits.Recent studies have shown that considering structural variants in addition to SNPs better explains the heritability of complex traits. However, these studies tend to consider all structural variants together (including gCNVs) in a biallelic manner (Figure 1, top panel). In doing so, information about any quantitative relationship between the number of copies of a gene and the phenotypic trait of interest is lost when there are more than two copy-number states. Gene copy number variations can thus be considered largely untapped genetic variation that may even explain some of the so-called ‘missing’ heritability. Such information would be particularly relevant for plant breeding, where genomic selection and phenotypic prediction are increasingly being used to shorten breeding cycles and reduce phenotyping costs.The recent burst in high-throughput sequencing has revealed a high prevalence of gCNVs in eukaryotes, and the evolutionary significance of this polymorphism largely remains to be determined. We argue that fully comprehending the role of gCNVs in plant evolution requires solving their structure at the genomic level, but even more importantly to study them as quantitative genotypes. The rapid development of long-read sequencing technologies is full of promise for solving their haplotypic structure. However, we also want to emphasize that much can already be done by using short-read sequencing data and dedicated analytical methods as illustrated above. Extensive population-level genomic data have been generated over the past decade—for example for conservation and breeding purposes—and we hope that this paper will serve as an incentive to reanalyze these data with a focus on gCNVs. Finally, plants display valuable features—such as mating system transitions and polyploidization—for designing comparative frameworks to measure population genetics parameters much needed to fully comprehend the role of gCNVs in evolutionary response at short and intermediate timescales.F.L., P.M., and Q.Z.: Conceptualization, visualization, writing – original draft; writing – review and editing. P.M.: Funding acquisition, supervision.","PeriodicalId":7691,"journal":{"name":"American Journal of Botany","volume":"112 7","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ajb2.70072","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Botany","FirstCategoryId":"99","ListUrlMain":"https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/ajb2.70072","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PLANT SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

The potential roles of genomic structural variations (SVs) in the control of phenotypic traits and in evolution were suggested as early as the 20th century. However, they were then overshadowed by the emphasis put on single nucleotide polymorphisms (SNPs). Recently, SVs have received renewed attention in evolutionary research due to advancements in sequencing technologies and analytical methods.

At the macroevolutionary scale, plant genomes tend to evolve faster than those of other eukaryotes, due to the prevalence of whole genome duplication events (Wendel et al., 2016). Unlike other types of structural variants, such as inversions, copy number variations (CNVs) result from unbalanced mutations that affect the dosage, or amount, of a DNA sequence. When genes are involved, the number of copies of a gene varies from one individual to another. In plants, gene copy number variations (gCNVs) are likely to be abundant due to events such as mating system shifts (the efficacy of purifying selection is reduced in selfing species), hybridization and subsequent genome rearrangements, and whole genome duplications followed by biased retention (Panchy et al., 2016; Wendel et al., 2016; Van de Peer et al., 2017). For example, in Arabidopsis thaliana (L.) Heynh. (Brassicaceae) 10 to 18% of all genes display CNVs (Zmienko et al., 2020; Jaegle et al., 2023). In the genus Picea Mill. (Pinaceae), at least 10% of the protein-coding genes display CNVs (P. abies (L.) H. Karst and P. obovata Ledeb., Q. Zhou et al., 2025; and P. glauca (Moench) Voss and P. mariana (Mill.) Britton, Sterns & Poggenburg, Prunier et al., 2017).

Gene duplications have primarily been studied for their roles in long-term evolution. However, a change in gene dosage usually results in a change in the amount of gene products, such as RNA or proteins (e.g., Shao et al., 2019). Therefore, gCNVs have a unique, multiallelic, and quantitative nature. Fully apprehending their role in short-term evolutionary processes requires studying them as quantitative genotypes rather than in a presence/absence (or biallelic) manner, as is most often done (Figure 1, top panel). Unlike SNPs, the accuracy and resolution of gCNV genotyping are usually dependent on the platform used. From short-read sequencing data, one can use biased allelic ratios (Figure 2A) and changes in the depth of coverage (DoC) caused by the mis-mapping of reads from duplicated regions to the same locus in the reference genome to identify CNVs (Figure 2). However, short reads often fail to capture the underlying genetic structure of gCNVs, and changes in DoC can only be interpreted as relative copy numbers across homologous chromosomes (Figure 1, middle panel). Long-read sequencing is a promising alternative that allows for the phasing of the various alleles to obtain absolute copy numbers (see Figure 1, bottom panel). However, long-read data can still be biased in assembling repetitive regions (Carvalho et al., 2025 [preprint]), they are more computationally demanding, and likely too costly for extensive population-level genomics studies. Nevertheless, continuous advancements in sequencing technologies and CNV analysis methods open the door to more extensive studies focusing on gCNVs, even in non-model species (Karunarathne et al., 2023). In the following sections, we recognize gCNVs as a largely untapped source of genetic variation and explore their potential for studying short-term evolution in plants. We also discuss the main challenges of incorporating this type of polymorphism into population and quantitative genomic frameworks, and how plants, as a study system, offer an opportunity to address them.

Despite all the evidence that gCNVs are a non-negligible source of polymorphism, it is still unclear to what extent they can be used to address broad questions in evolutionary biology, such as better inferring past demographic events or refining predictions of population responses to global change for conservation strategies. This is partly due to difficulties in estimating key evolutionary parameters, which largely prevent their use in population and quantitative genetic models (Mérot et al., 2020). Gene copy number variation is more likely to occur via low-copy repeat mechanisms (e.g., non-allelic homologous recombination). As a result, duplications at the origin of gCNVs may span multiple genes (as illustrated in Figure 2), thus affecting the apparent gene duplication rate, which has been estimated to be higher than substitution rates (Katju and Bergthorsson, 2013). The presence of gCNVs induces mismatches during synapsis and can originate from different molecular mechanisms (Hastings et al., 2009). Thus, variable mutation rates in terms of copy number change are expected across loci and mechanisms. Additionally, there may be asymmetric rates of copy number gain and loss, as well as state-dependent mutation rates, where the rate of change in copy number varies with the actual number of copies. Gene copy number variations can affect patterns of recombination and segregation. Therefore, models developed to study the evolution of microsatellites or gene families, such as the stepwise mutation model (SMM) and the birth–death model, respectively, do not accurately predict gCNV evolution. However, these models could produce relevant diversity summary statistics when applied to a large enough number of gCNVs to avoid gene-specific bias. For example, allele size variance (Valdes et al., 1993) and Goldstein's δμ² (Goldstein et al., 1995), which are based on relative copy number, and CNV entropy (similar to Shannon's diversity index) and R_ST (Slatkin, 1995), which are based on absolute copy number, could be useful for measuring within-population diversity and between-population divergence of gCNVs, respectively. Comparing these estimates with those obtained from neutral SNPs (e.g., nucleotide diversity and F_ST, respectively) would also inform us about the global distribution of fitness effect of gCNVs.

For structural variation, it is often considered that their phenotypic and fitness effects scales with their size. For gCNVs it means that both could increase with copy number and the apparent copy number may therefore be the result of a trade-off between a high mutation rate and large fitness effects. The highly dynamic nature of plant genome evolution makes them excellent models for addressing the knowledge gaps associated with gCNVs with >1400 reference genomes available (Bernal-Gallardo and de Folter, 2024). For example, the Brassicaceae plant family now contains the highest number of genome and transcriptome sequences for any plant lineage, allowing extensive comparative studies to address some of the challenges mentioned above and separate species-specific effects from general properties of gCNVs. Several genera have closely related species with different mating systems (selfing and outcrossing), which could be used to investigate the main evolutionary forces shaping the evolution of gCNVs while controlling for phylogenetic relationships, as has been done for RNA expression previously (e.g., Zhang et al., 2022). The change in ploidy can also be used to study mutation and recombination rates by comparing patterns of gCNVs between the sub-genomes of auto-polyploids as any differences between the sub-genomes will have been acquired after the polyploidization event. In addition, interspecific hybridization and resynthesized polyploids can be generated experimentally, and many plants are model species for functional genomics (Bernal-Gallardo and de Folter, 2024). The use of natural and synthetic resources, together with state-of-the-art sequencing and gene editing technologies, would allow for direct measurements, for example, of the phenotypic and fitness effects of gCNVs.

Despite the limitations exposed above, the multiallelic and quantitative nature of gCNVs makes them excellent markers for quantitative genetics as, in many cases, the phenotypic values of traits and the copy number of causal gCNVs show a quantitative relationship, enabling straightforward genotype-to-phenotype mapping. The prevalence of gCNVs in plant genomes makes them natural candidates in the control of quantitative traits and adaptation along environmental gradients.

Forest trees are particularly relevant models for studying such questions, as they often have extensive ranges with populations connected by long-distance gene flow and show strong patterns of local adaptation (Savolainen et al., 2013). In a recent study, we showed that gCNVs are widespread and involved in local adaptation in both Picea abies (Norway spruce) and Picea obovata (Siberian spruce), two keystone species of the Eurasian boreal forest (Zhou et al., 2025). Importantly, we found no overlap between candidate genes detected purely from SNP variation and genes whose copy number correlates with environment and/or phenotypic variation. This means that, and in contrast with genomic inversions, the explanatory power of gCNVs is not captured by SNPs, and they must be specifically studied to gain a comprehensive view of the genetic architecture of local adaptation patterns and of phenotypic traits.

Recent studies have shown that considering structural variants in addition to SNPs better explains the heritability of complex traits. However, these studies tend to consider all structural variants together (including gCNVs) in a biallelic manner (Figure 1, top panel). In doing so, information about any quantitative relationship between the number of copies of a gene and the phenotypic trait of interest is lost when there are more than two copy-number states. Gene copy number variations can thus be considered largely untapped genetic variation that may even explain some of the so-called ‘missing’ heritability. Such information would be particularly relevant for plant breeding, where genomic selection and phenotypic prediction are increasingly being used to shorten breeding cycles and reduce phenotyping costs.

The recent burst in high-throughput sequencing has revealed a high prevalence of gCNVs in eukaryotes, and the evolutionary significance of this polymorphism largely remains to be determined. We argue that fully comprehending the role of gCNVs in plant evolution requires solving their structure at the genomic level, but even more importantly to study them as quantitative genotypes. The rapid development of long-read sequencing technologies is full of promise for solving their haplotypic structure. However, we also want to emphasize that much can already be done by using short-read sequencing data and dedicated analytical methods as illustrated above. Extensive population-level genomic data have been generated over the past decade—for example for conservation and breeding purposes—and we hope that this paper will serve as an incentive to reanalyze these data with a focus on gCNVs. Finally, plants display valuable features—such as mating system transitions and polyploidization—for designing comparative frameworks to measure population genetics parameters much needed to fully comprehend the role of gCNVs in evolutionary response at short and intermediate timescales.

F.L., P.M., and Q.Z.: Conceptualization, visualization, writing – original draft; writing – review and editing. P.M.: Funding acquisition, supervision.

Abstract Image

查看原文本刊更多论文

当数量起作用时：重新思考基因复制在短进化时间尺度上的作用。

早在20世纪就提出了基因组结构变异（SVs）在表型性状控制和进化中的潜在作用。然而，它们后来被对单核苷酸多态性（snp）的重视所掩盖。近年来，由于测序技术和分析方法的进步，sv在进化研究中得到了新的关注。在宏观进化尺度上，由于全基因组复制事件的普遍存在，植物基因组往往比其他真核生物进化得更快（Wendel et al., 2016）。与其他类型的结构变异（如倒位）不同，拷贝数变异（CNVs）是由影响DNA序列剂量或数量的不平衡突变引起的。当涉及到基因时，一个基因的拷贝数会因个体而异。在植物中，基因拷贝数变异（gCNVs）可能由于交配系统转移（自交物种的净化选择效率降低）、杂交和随后的基因组重排以及全基因组复制后的偏保留等事件而丰富(Panchy等人，2016；Wendel et al., 2016；Van de Peer et al., 2017)。例如，拟南芥（L.）Heynh。（十字花科）10 - 18%的基因显示CNVs (Zmienko et al., 2020；Jaegle et al., 2023)。属于云杉属。（松科），至少10%的蛋白质编码基因显示CNVs （P. abies (L.)）。H. Karst和P. obovata Ledeb。，周琦等，2025；P. glauca (Moench) Voss和P. mariana （Mill）。布里顿，斯特恩斯&；Poggenburg, Prunier et al., 2017)。基因复制主要是研究它们在长期进化中的作用。然而，基因剂量的变化通常会导致基因产物（如RNA或蛋白质）数量的变化（例如Shao et al., 2019）。因此，gCNVs具有独特的、多等位基因的和定量的性质。要充分理解它们在短期进化过程中的作用，就需要将它们作为定量基因型进行研究，而不是像大多数情况那样以存在/不存在（或双等位基因）的方式进行研究（图1，顶部）。与snp不同，gCNV基因分型的准确性和分辨率通常取决于所使用的平台。从短读测序数据中，可以使用偏置等位基因比率（图2A）和由重复区域的reads错误定位到参考基因组中相同位点引起的覆盖深度（DoC）变化来鉴定CNVs（图2）。然而，短读取通常无法捕获gcnv的潜在遗传结构，并且DoC的变化只能解释为同源染色体之间的相对拷贝数（图1，中间面板）。长读测序是一种很有前途的替代方法，它允许对各种等位基因进行分相以获得绝对拷贝数（见图1，底部面板）。然而，长读数据在组装重复区域时仍然可能存在偏差（Carvalho et al.， 2025[预印本]），它们对计算的要求更高，并且对于广泛的种群水平基因组学研究来说可能过于昂贵。尽管如此，测序技术和CNV分析方法的不断进步为更广泛地关注gcnv的研究打开了大门，即使在非模式物种中也是如此（Karunarathne et al., 2023）。在接下来的章节中，我们认识到gCNVs是一个很大程度上尚未开发的遗传变异来源，并探索它们在研究植物短期进化方面的潜力。我们还讨论了将这种类型的多态性纳入群体和定量基因组框架的主要挑战，以及植物作为一个研究系统如何提供解决这些问题的机会。尽管所有证据都表明gCNVs是多态性的一个不可忽视的来源，但仍不清楚它们在多大程度上可以用于解决进化生物学中的广泛问题，例如更好地推断过去的人口统计事件或改进种群对保护策略的全球变化反应的预测。这在一定程度上是由于难以估计关键的进化参数，这在很大程度上阻碍了它们在群体和定量遗传模型中的使用（m<s:1> et al., 2020）。基因拷贝数变异更可能通过低拷贝重复机制发生（例如，非等位基因同源重组）。因此，gcnv起源处的重复可能跨越多个基因（如图2所示），从而影响表观基因重复率，据估计，表观基因重复率高于替代率（Katju和Bergthorsson， 2013）。gCNVs的存在在突触过程中诱导错配，可能源于不同的分子机制（Hastings et al., 2009）。因此，就拷贝数变化而言，不同位点和机制的突变率是可变的。此外，可能存在拷贝数增益和丢失的不对称率，以及状态依赖的突变率，其中拷贝数的变化率随实际拷贝数而变化。基因拷贝数的变化可以影响重组和分离的模式。因此，用于研究微卫星或基因家族进化的模型，如逐步突变模型（SMM）和出生-死亡模型，并不能准确预测gCNV的进化。然而，当应用于足够数量的gcnv时，这些模型可以产生相关的多样性汇总统计，以避免基因特异性偏差。例如，等位基因大小方差（Valdes et al., 1993）和基于相对拷贝数的Goldstein’s δμ²（Goldstein et al., 1995）以及基于绝对拷贝数的CNV熵（类似于Shannon’s多样性指数）和RST （Slatkin, 1995）分别可用于测量gcnv的种群内多样性和种群间差异。将这些估计值与中性snp（例如核苷酸多样性和FST）的估计值进行比较，也将告诉我们gcnv适应度效应的全球分布。对于结构变异，通常认为它们的表型和适合度效应与它们的大小有关。对于gCNVs来说，这意味着两者都可能随着拷贝数的增加而增加，因此表观拷贝数可能是高突变率和大适应度效应之间权衡的结果。植物基因组进化的高度动态性使它们成为解决与gCNVs相关的知识空白的优秀模型，有1400个可用的参考基因组（Bernal-Gallardo和de Folter， 2024）。例如，在所有植物谱系中，十字花科植物家族现在包含的基因组和转录组序列数量最多，这允许进行广泛的比较研究，以解决上述一些挑战，并将gCNVs的一般特性与物种特异性效应分开。有几个属与不同交配系统（自交和异交）密切相关的物种，这可以用来研究影响gcnv进化的主要进化力量，同时控制系统发育关系，就像之前对RNA表达所做的那样（例如，Zhang等人，2022）。倍性的变化也可以用来研究突变和重组率，通过比较自多倍体亚基因组之间的gcnv模式，因为亚基因组之间的任何差异都是在多倍体事件后获得的。此外，可以通过实验产生种间杂交和重新合成的多倍体，许多植物是功能基因组学的模式物种（Bernal-Gallardo and de Folter, 2024）。利用自然和合成资源，加上最先进的测序和基因编辑技术，将允许直接测量，例如，gcnv的表型和适应度效应。尽管存在上述局限性，但gcnv的多等位基因和数量特性使其成为数量遗传学的优秀标记，因为在许多情况下，性状的表型值和因果gcnv的拷贝数显示出数量关系，从而可以直接进行基因型到表型的定位。gCNVs在植物基因组中的普遍存在，使其成为控制数量性状和适应环境梯度的天然候选者。森林树木是研究这些问题的特别相关的模型，因为它们通常具有广泛的分布范围，种群通过远距离基因流联系在一起，并表现出强烈的本地适应模式（Savolainen et al., 2013）。在最近的一项研究中，我们发现gCNVs广泛存在于云杉（Picea abies）（挪威云杉）和云杉（Picea obovata）（西伯利亚云杉）这两个欧亚北方森林的关键物种中，并参与了当地适应（Zhou et al., 2025）。重要的是，我们发现单纯从SNP变异检测到的候选基因与拷贝数与环境和/或表型变异相关的基因之间没有重叠。这意味着，与基因组倒置相反，gCNVs的解释力不被snp捕获，必须对它们进行专门研究，以全面了解局部适应模式和表型性状的遗传结构。最近的研究表明，考虑结构变异和snp可以更好地解释复杂性状的遗传力。然而，这些研究倾向于以双等位基因的方式将所有结构变异（包括gcnv）一起考虑（图1，顶部面板）。在这样做的过程中，当存在两个以上的拷贝数状态时，有关基因拷贝数与感兴趣的表型性状之间的任何定量关系的信息都会丢失。因此，基因拷贝数变异可以被认为是很大程度上未开发的遗传变异，甚至可以解释一些所谓的“缺失”遗传能力。这些信息对植物育种尤其重要，因为基因组选择和表型预测正越来越多地用于缩短育种周期和降低表型成本。最近高通量测序的爆发揭示了gCNVs在真核生物中的高患病率，这种多态性的进化意义在很大程度上仍有待确定。我们认为，全面理解gcnv在植物进化中的作用需要在基因组水平上解决它们的结构问题，但更重要的是将它们作为定量基因型进行研究。长读测序技术的快速发展为解决它们的单倍型结构提供了广阔的前景。然而，我们还想强调的是，使用短读测序数据和专用分析方法已经可以做很多事情，如上所述。在过去的十年中，已经产生了大量的种群水平的基因组数据，例如用于保护和育种目的，我们希望这篇论文能够激励人们重新分析这些数据，并将重点放在gcnv上。最后，植物展示了有价值的特征，如交配系统转换和多倍体化，为设计比较框架来测量群体遗传学参数提供了依据，这些参数对于充分理解gcnv在中短期进化反应中的作用是非常必要的。，下午，和Q.Z：概念化，可视化，写作-原稿；写作——审阅和编辑。下午：资金获取，监督。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

American Journal of Botany 生物-植物科学

CiteScore

4.90

自引率

6.70%

发文量

171

审稿时长

3 months

期刊介绍： The American Journal of Botany (AJB), the flagship journal of the Botanical Society of America (BSA), publishes peer-reviewed, innovative, significant research of interest to a wide audience of plant scientists in all areas of plant biology (structure, function, development, diversity, genetics, evolution, systematics), all levels of organization (molecular to ecosystem), and all plant groups and allied organisms (cyanobacteria, algae, fungi, and lichens). AJB requires authors to frame their research questions and discuss their results in terms of major questions of plant biology. In general, papers that are too narrowly focused, purely descriptive, natural history, broad surveys, or that contain only preliminary data will not be considered.