高山牧草羊草的参考基因组组装

IF 10.5 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Plant Biotechnology Journal Pub Date : 2025-06-18 DOI:10.1111/pbi.70117

Dan Chang, Shangang Jia, Ming Sun, Tao Huang, Huanhuan Lu, Jiajun Yan, Changbing Zhang, Minghong You, Jianbo Zhang, Lijun Yan, Wenlong Gou, Xiong Lei, Xiaofei Ji, Yingzhu Li, Decai Mao, Qi Wu, Ping Li, Hongkun Zheng, Xiao Ma, Xuebin Yan, Quanlan Liu, Xiaofan He, Wengang Xie, Daxu Li, Shiqie Bai

{"title":"高山牧草羊草的参考基因组组装","authors":"Dan Chang, Shangang Jia, Ming Sun, Tao Huang, Huanhuan Lu, Jiajun Yan, Changbing Zhang, Minghong You, Jianbo Zhang, Lijun Yan, Wenlong Gou, Xiong Lei, Xiaofei Ji, Yingzhu Li, Decai Mao, Qi Wu, Ping Li, Hongkun Zheng, Xiao Ma, Xuebin Yan, Quanlan Liu, Xiaofan He, Wengang Xie, Daxu Li, Shiqie Bai","doi":"10.1111/pbi.70117","DOIUrl":null,"url":null,"abstract":"Elymus nutans Griseb. (Poaceae: Triticeae, 2n = 6x = 42) is a dominant perennial plant species (Figure 1a) in the Qinghai-Tibetan Plateau in China (Liu et al., 2022), where it serves as an important forage grass with high yields, high nutritional value and good palatability for herbivorous ruminant animals.The genome size of E. nutans is estimated based on flow cytometry and k-mer analysis, respectively (Figure S1). Using advanced sequencing technology, we generated an allohexaploid reference genome for E. nutans, representing the three sets of chromosomes (subgenomes St, Y and H). Initial contigs were assembled from long reads obtained using Oxford Nanopore Technology (ONT, 133.86×, N50 > 29 kb; Table S1), which were polished based on Illumina short reads (Table S2). We assembled the contigs into 21 pseudo-chromosomes using Hi-C data (119.2×, Table S2). After data cleaning and error correction, we obtained a final genome assembly of 9.46 Gb with a contig N50 of 3.01 Mb consisting of 21 chromosomes. The total length of scaffolds is 3.27 Gb, 3.27 Gb and 2.83 Gb for H, St and Y subgenomes, respectively (Table S3). The chromosomes were further grouped into three subgenomes (StStYYHH) based on similarity to the genomes of barley (Hordeum vulgare; HH) and Elymus sibiricus (StStHH) (Figure 1b).The benchmarking universal single-copy orthologs (BUSCO) score of the E. nutans assembly is 96.6% and the long terminal repeats (LTR) assembly index (LAI) is 16.54, 14.87 and 17.20 for the St, Y and H subgenomes, respectively, confirming a high quality. We successfully mapped 99.64% ONT and 97.1% Illumina reads to the genome assembly and the uniform coverage of mapped reads showed the reliability of the assembly, which was supported by the Hi-C heatmap. Synteny analysis revealed conservation among the three subgenomes, with one large reciprocal translocation detected between chromosomes H04 (175.1 Mb) and Y03 (153.8 Mb) (Figure 1b; Figure S2). This reciprocal translocation, which was further confirmed by fluorescence in situ hybridization (FISH) imaging using unique probes for subgenome H, is localized at one end of chromosome Y03 (Figure 1c). Collinearity between H04 and H03/St03 and between Y03 and Y04/St04 indicated the results from reciprocal translocation (Figure 1b). The syntenic blocks among the three subgenomes of E. nutans (St, Y and H), Xa, H, V, Y, St, R, E, B, A, D and J subgenomes in other Triticeae species also suggest a reliable assembly of the E. nutans genome and potential structural variations (Figure S3).Among the E. nutans genome, 83.89% are annotated as repetitive sequences (Table S4) and up to 61.67% are grouped as LTRs and dominated by the most abundant LTRs of Copia and Gypsy (Table S4). Gene annotation based on de novo, homology and transcript-based predictions resulted in 114 214 gene models, including 39 341, 40 837 and 33 541 gene models for subgenomes H, St and Y, with average gene lengths of 3392.60 bp, 3462.82 bp and 3409.88 bp, respectively (Table S5).We determined the potential locations of centromeric regions in the assembly based on enrichment of the known centromeric sequences in wheat and maize (Figure 1b). The LTR retrotransposons Cereba/Quinta (GenBank accession no. FN564437.1) and the whole centromeric sequences were retrieved from the centromeres of wheat (NCBI accession no. GCA_022117705.1) and maize (Chen et al., 2023), respectively, and their alignments to the assembly pointed to the same locations with substantial overlap across all 21 chromosomes of the three subgenomes (Figure 1b; Table S6). The potential centromeric regions are in accordance with the enrichment of transposable elements (TEs) and the gene-poor centromeric and pericentromeric regions (Figure 1b). We further observed the highest proportion of tandem repeats among the potential centromeric regions of the H, St and Y subgenomes, accounting for 26.55%, 19.41% and 21.11%, respectively (Table S7). However, these repeat units and their contents would like to be further confirmed in the future.We explored the divergence of the three E. nutans subgenomes via sequence similarities with phylogenetically closely related species. The sequence identity in these species reached approximately 97.5% for subgenomes H and St (Figure S4). Similar to other Gramineae species, the distribution of Ks values formed peaks at 0.7–0.82 (Figure 1d), indicating that an ancient WGD event affecting the three subgenomes occurred approximately 62.61–73.34 million years ago (MYA). From a phylogenetic tree reconstructed using 18 subgenomes of 12 species (Figure 1e), we estimated the divergence time of the three subgenomes to be approximately 10.04 MYA, with Y and St further splitting ~7.59 MYA. Using divergence times and evolutionary relationships, we reconstructed a model for the evolutionary history of E. nutans, and it showed that hexaploid E. nutans (StStYYHH) occurred <3.16 MYA after the split of St subgenomes between E. nutans and E. sibiricus (Figure 1e,f) (Chen et al., 2024). The hybridization of an ancient diploid species (HH, e.g., Hordeum) and a tetraploid species (StStYY, e.g., Roegneria) (Figure 1f), rather than the one between StStHH and YY, is strongly supported by the facts that no diploid species (YY) are currently found in the world, and multiple hexaploidy species (StStYYWW, StStYYPP and StStYYHH) occurred as frequent events (Chen et al., 2024; Fan et al., 2013). The history of the Y subgenome could be traced to 6.25 MYA, when genome V in Thinopyrum intermedium and Dasypyrum villosum diverged from the ancestor of Y and V genomes (Figure 1e).Gene family analysis in E. nutans and nine other Triticeae subgenomes identified 102, 105 and 82 gene families unique to E. nutans subgenomes H, St and Y, respectively, and 6147 gene families shared among subgenomes (Figure 1g). Expanded gene families were identified in the three subgenomes (Figure 1e), and enriched in pathways related to environmental adaptation (Figure S5), for example, strong UV-B and drought stress in Tibetan Plateau. We collected and planted five lines of wild resources from different altitudes and locations (Table S8), conducted the transcriptomic studies under the treatments of UV-B and drought stress, and performed the data validation by qRT-PCR (Tables S9–S11). Weighted gene co-expression network analysis (WGCNA) revealed that the DEGs under both UV-B (black module) and drought stress (purple module) are highly enriched in glutathione transferase activity (Figure S6). We found the allohexaploid E. nutans genome harbours 342 GST genes (nine subfamilies), surpassing other species. Tau and phi subfamilies dominate, with E. nutans' St and H subgenomes showing exceptionally high tau member counts compared to wheat's subgenomes (Figure S7a; Table S12). Furthermore, we discovered the transcriptional responses of five phi and tau subfamily members (EVM0015335, EVM0002076, EVM0134842, EVM0087283 and EVM0141011) to the treatments of both drought and UV-B, and their expressions exhibited significant differences between the lines (NM037 vs QH009, SC020 vs NM035) (Figure S7b,c). The WRKY transcription factor EVM0129376_WRKY played a role as a hub gene in both the networks for the two WGCNA modules (Figure S7d,e). These findings suggest that the GST members might interact with transcription factors of WRKY (such as EVM0129376) and others, and participate in responses to drought and UV-B stresses (Dixon et al., 2002; Jiang et al., 2017).In summary, our high-quality assembly of the three subgenomes of the Triticeae forage grass E. nutans provides critical insights into the evolutionary history of this species, and will serve as a valuable resource for future studies on its adaptation to the extreme environmental conditions of the Qinghai-Tibetan Plateau.This work was supported by the Science & Technology Department of Sichuan Province (Grant No. 2021YFYZ0013-2, 2019YFN0170 and 2023YFSY0012), the Sichuan Provincial Department of Agriculture and Rural Affairs (Grant No. SCCXTD-2025-16), the National Center of Pratacultural Technology Innovation (under preparation) (Grant No. CCPTZX2023W01) and the Sichuan Provincial Forestry and Grassland Administration (Grant No. CXTD2025005).S.B. conceived the project. W.X. and D.L. provided the financial support and participated in the supervision of the project. D.C., H.L., J.Y., C.Z., M.Y., J.Z., L.Y., W.G., X.L., X.J., Y.L., D.M., Q.W., X.C., J.T., H.Z. and P.L. contributed to plant sample collection, DNA/RNA preparation, library construction and sequencing. X.M., X.Y. and Q.L. assisted with data analysis. S.J. and T.H. performed genome assembly and annotation and comparative genomic analyses. X.H. performed the screening of centromeric repeats. T.H. and M.S. performed transcriptome analysis and analysis of the GST gene family. S.J., D.C. and M.S. wrote and revised the manuscript.The genome assembly (accession no. GWHFAJN00000000.1) and raw sequencing data generated in this study, comprising ONT data, Illumina data, Iso-seq data, and ChIP-seq data, can be found in the Genome Sequence Archive at the National Genomics Data Center (https://ngdc.cncb.ac.cn/) under BioProject accession number PRJCA028418.","PeriodicalId":221,"journal":{"name":"Plant Biotechnology Journal","volume":"23 9","pages":"3900-3902"},"PeriodicalIF":10.5000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/pbi.70117","citationCount":"0","resultStr":"{\"title\":\"A reference genome assembly of the alpine forage grass Elymus nutans\",\"authors\":\"Dan Chang, Shangang Jia, Ming Sun, Tao Huang, Huanhuan Lu, Jiajun Yan, Changbing Zhang, Minghong You, Jianbo Zhang, Lijun Yan, Wenlong Gou, Xiong Lei, Xiaofei Ji, Yingzhu Li, Decai Mao, Qi Wu, Ping Li, Hongkun Zheng, Xiao Ma, Xuebin Yan, Quanlan Liu, Xiaofan He, Wengang Xie, Daxu Li, Shiqie Bai\",\"doi\":\"10.1111/pbi.70117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Elymus nutans Griseb. (Poaceae: Triticeae, 2n = 6x = 42) is a dominant perennial plant species (Figure 1a) in the Qinghai-Tibetan Plateau in China (Liu et al., 2022), where it serves as an important forage grass with high yields, high nutritional value and good palatability for herbivorous ruminant animals.The genome size of E. nutans is estimated based on flow cytometry and k-mer analysis, respectively (Figure S1). Using advanced sequencing technology, we generated an allohexaploid reference genome for E. nutans, representing the three sets of chromosomes (subgenomes St, Y and H). Initial contigs were assembled from long reads obtained using Oxford Nanopore Technology (ONT, 133.86×, N50 > 29 kb; Table S1), which were polished based on Illumina short reads (Table S2). We assembled the contigs into 21 pseudo-chromosomes using Hi-C data (119.2×, Table S2). After data cleaning and error correction, we obtained a final genome assembly of 9.46 Gb with a contig N50 of 3.01 Mb consisting of 21 chromosomes. The total length of scaffolds is 3.27 Gb, 3.27 Gb and 2.83 Gb for H, St and Y subgenomes, respectively (Table S3). The chromosomes were further grouped into three subgenomes (StStYYHH) based on similarity to the genomes of barley (Hordeum vulgare; HH) and Elymus sibiricus (StStHH) (Figure 1b).The benchmarking universal single-copy orthologs (BUSCO) score of the E. nutans assembly is 96.6% and the long terminal repeats (LTR) assembly index (LAI) is 16.54, 14.87 and 17.20 for the St, Y and H subgenomes, respectively, confirming a high quality. We successfully mapped 99.64% ONT and 97.1% Illumina reads to the genome assembly and the uniform coverage of mapped reads showed the reliability of the assembly, which was supported by the Hi-C heatmap. Synteny analysis revealed conservation among the three subgenomes, with one large reciprocal translocation detected between chromosomes H04 (175.1 Mb) and Y03 (153.8 Mb) (Figure 1b; Figure S2). This reciprocal translocation, which was further confirmed by fluorescence in situ hybridization (FISH) imaging using unique probes for subgenome H, is localized at one end of chromosome Y03 (Figure 1c). Collinearity between H04 and H03/St03 and between Y03 and Y04/St04 indicated the results from reciprocal translocation (Figure 1b). The syntenic blocks among the three subgenomes of E. nutans (St, Y and H), Xa, H, V, Y, St, R, E, B, A, D and J subgenomes in other Triticeae species also suggest a reliable assembly of the E. nutans genome and potential structural variations (Figure S3).Among the E. nutans genome, 83.89% are annotated as repetitive sequences (Table S4) and up to 61.67% are grouped as LTRs and dominated by the most abundant LTRs of Copia and Gypsy (Table S4). Gene annotation based on de novo, homology and transcript-based predictions resulted in 114 214 gene models, including 39 341, 40 837 and 33 541 gene models for subgenomes H, St and Y, with average gene lengths of 3392.60 bp, 3462.82 bp and 3409.88 bp, respectively (Table S5).We determined the potential locations of centromeric regions in the assembly based on enrichment of the known centromeric sequences in wheat and maize (Figure 1b). The LTR retrotransposons Cereba/Quinta (GenBank accession no. FN564437.1) and the whole centromeric sequences were retrieved from the centromeres of wheat (NCBI accession no. GCA_022117705.1) and maize (Chen et al., 2023), respectively, and their alignments to the assembly pointed to the same locations with substantial overlap across all 21 chromosomes of the three subgenomes (Figure 1b; Table S6). The potential centromeric regions are in accordance with the enrichment of transposable elements (TEs) and the gene-poor centromeric and pericentromeric regions (Figure 1b). We further observed the highest proportion of tandem repeats among the potential centromeric regions of the H, St and Y subgenomes, accounting for 26.55%, 19.41% and 21.11%, respectively (Table S7). However, these repeat units and their contents would like to be further confirmed in the future.We explored the divergence of the three E. nutans subgenomes via sequence similarities with phylogenetically closely related species. The sequence identity in these species reached approximately 97.5% for subgenomes H and St (Figure S4). Similar to other Gramineae species, the distribution of Ks values formed peaks at 0.7–0.82 (Figure 1d), indicating that an ancient WGD event affecting the three subgenomes occurred approximately 62.61–73.34 million years ago (MYA). From a phylogenetic tree reconstructed using 18 subgenomes of 12 species (Figure 1e), we estimated the divergence time of the three subgenomes to be approximately 10.04 MYA, with Y and St further splitting ~7.59 MYA. Using divergence times and evolutionary relationships, we reconstructed a model for the evolutionary history of E. nutans, and it showed that hexaploid E. nutans (StStYYHH) occurred <3.16 MYA after the split of St subgenomes between E. nutans and E. sibiricus (Figure 1e,f) (Chen et al., 2024). The hybridization of an ancient diploid species (HH, e.g., Hordeum) and a tetraploid species (StStYY, e.g., Roegneria) (Figure 1f), rather than the one between StStHH and YY, is strongly supported by the facts that no diploid species (YY) are currently found in the world, and multiple hexaploidy species (StStYYWW, StStYYPP and StStYYHH) occurred as frequent events (Chen et al., 2024; Fan et al., 2013). The history of the Y subgenome could be traced to 6.25 MYA, when genome V in Thinopyrum intermedium and Dasypyrum villosum diverged from the ancestor of Y and V genomes (Figure 1e).Gene family analysis in E. nutans and nine other Triticeae subgenomes identified 102, 105 and 82 gene families unique to E. nutans subgenomes H, St and Y, respectively, and 6147 gene families shared among subgenomes (Figure 1g). Expanded gene families were identified in the three subgenomes (Figure 1e), and enriched in pathways related to environmental adaptation (Figure S5), for example, strong UV-B and drought stress in Tibetan Plateau. We collected and planted five lines of wild resources from different altitudes and locations (Table S8), conducted the transcriptomic studies under the treatments of UV-B and drought stress, and performed the data validation by qRT-PCR (Tables S9–S11). Weighted gene co-expression network analysis (WGCNA) revealed that the DEGs under both UV-B (black module) and drought stress (purple module) are highly enriched in glutathione transferase activity (Figure S6). We found the allohexaploid E. nutans genome harbours 342 GST genes (nine subfamilies), surpassing other species. Tau and phi subfamilies dominate, with E. nutans' St and H subgenomes showing exceptionally high tau member counts compared to wheat's subgenomes (Figure S7a; Table S12). Furthermore, we discovered the transcriptional responses of five phi and tau subfamily members (EVM0015335, EVM0002076, EVM0134842, EVM0087283 and EVM0141011) to the treatments of both drought and UV-B, and their expressions exhibited significant differences between the lines (NM037 vs QH009, SC020 vs NM035) (Figure S7b,c). The WRKY transcription factor EVM0129376_WRKY played a role as a hub gene in both the networks for the two WGCNA modules (Figure S7d,e). These findings suggest that the GST members might interact with transcription factors of WRKY (such as EVM0129376) and others, and participate in responses to drought and UV-B stresses (Dixon et al., 2002; Jiang et al., 2017).In summary, our high-quality assembly of the three subgenomes of the Triticeae forage grass E. nutans provides critical insights into the evolutionary history of this species, and will serve as a valuable resource for future studies on its adaptation to the extreme environmental conditions of the Qinghai-Tibetan Plateau.This work was supported by the Science & Technology Department of Sichuan Province (Grant No. 2021YFYZ0013-2, 2019YFN0170 and 2023YFSY0012), the Sichuan Provincial Department of Agriculture and Rural Affairs (Grant No. SCCXTD-2025-16), the National Center of Pratacultural Technology Innovation (under preparation) (Grant No. CCPTZX2023W01) and the Sichuan Provincial Forestry and Grassland Administration (Grant No. CXTD2025005).S.B. conceived the project. W.X. and D.L. provided the financial support and participated in the supervision of the project. D.C., H.L., J.Y., C.Z., M.Y., J.Z., L.Y., W.G., X.L., X.J., Y.L., D.M., Q.W., X.C., J.T., H.Z. and P.L. contributed to plant sample collection, DNA/RNA preparation, library construction and sequencing. X.M., X.Y. and Q.L. assisted with data analysis. S.J. and T.H. performed genome assembly and annotation and comparative genomic analyses. X.H. performed the screening of centromeric repeats. T.H. and M.S. performed transcriptome analysis and analysis of the GST gene family. S.J., D.C. and M.S. wrote and revised the manuscript.The genome assembly (accession no. GWHFAJN00000000.1) and raw sequencing data generated in this study, comprising ONT data, Illumina data, Iso-seq data, and ChIP-seq data, can be found in the Genome Sequence Archive at the National Genomics Data Center (https://ngdc.cncb.ac.cn/) under BioProject accession number PRJCA028418.\",\"PeriodicalId\":221,\"journal\":{\"name\":\"Plant Biotechnology Journal\",\"volume\":\"23 9\",\"pages\":\"3900-3902\"},\"PeriodicalIF\":10.5000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/pbi.70117\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Plant Biotechnology Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/pbi.70117\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Biotechnology Journal","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/pbi.70117","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

格里塞布。（2022），是一种产量高、营养价值高、适口性好的草食性反刍动物重要牧草。分别基于流式细胞术和k-mer分析估计了E. nutans的基因组大小（图S1）。利用先进的测序技术，我们生成了一个异源六倍体参考基因组，代表了三组染色体（亚基因组St， Y和H）。使用Oxford Nanopore Technology （ONT, 133.86×， N50 > 29 kb；表S1）获得的长reads组装初始contigs，并根据Illumina短reads进行抛光（表S2）。我们使用Hi-C数据将这些contigs组装成21条伪染色体（119.2 x，表S2）。经过数据清理和纠错后，我们最终得到了一个9.46 Gb的基因组组装体，N50为3.01 Mb，由21条染色体组成。H、St和Y亚基因组的支架总长度分别为3.27 Gb、3.27 Gb和2.83 Gb（表S3）。根据与大麦（Hordeum vulgare； HH）和西伯利亚榆树（Elymus sibiricus）（StStHH）基因组的相似性，将这些染色体进一步分为三个亚基因组（StStYYHH）（图1b）。其St、Y和H亚基因组的基准通用单拷贝同源物（BUSCO）得分为96.6%，长末端重复序列（LTR）组装指数（LAI）分别为16.54、14.87和17.20，证实了其高质量。我们成功地将99.64%的ONT和97.1%的Illumina reads映射到基因组组装上，并且绘制的reads的均匀覆盖表明了该组装的可靠性，这得到了Hi-C热图的支持。Synteny分析显示三个亚基因组之间存在保守性，在H04 （175.1 Mb）和Y03 （153.8 Mb）染色体之间检测到一个大的反向易位（图1b；图S2）。这种反向易位，通过荧光原位杂交（FISH）成像进一步证实，使用独特的亚基因组H探针，定位在染色体Y03的一端（图1c）。H04与H03/St03、Y03与Y04/St04共线性表明了反向易位的结果（图1b）。在其他小麦科物种中，花生3个亚基因组（St、Y和H）、Xa、H、V、Y、St、R、E、B、A、D和J亚基因组之间的同源块也表明花生基因组的可靠组装和潜在的结构变异（图S3）。在E. nutans基因组中，83.89%被注释为重复序列（表S4），高达61.67%被归为ltr，并以Copia和Gypsy最丰富的ltr为主（表S4）。基于从头开始、同源性和转录预测的基因注释共得到114 214个基因模型，其中H、St和Y亚基因组的基因模型分别为39 341、40 837和33 541个，平均基因长度分别为3392.60 bp、3462.82 bp和3409.88 bp（表S5）。我们根据小麦和玉米中已知着丝粒序列的富集，确定了组装体中着丝粒区域的潜在位置（图1b）。LTR逆转录转座子Cereba/Quinta (GenBank加入号：FN564437.1)和整个着丝粒序列均从小麦的着丝粒中检索到（NCBI accession no. 5）。GCA_022117705.1)和玉米（Chen et al., 2023），它们的序列指向相同的位置，在三个亚基因组的所有21条染色体上都有大量重叠（图1b；表S6）。潜在的着丝粒区域与转座元件（te）的富集以及基因贫乏的着丝粒和着丝粒周围区域一致（图1b）。我们进一步观察到，在H、St和Y亚基因组的潜在着丝粒区中，串联重复序列的比例最高，分别占26.55%、19.41%和21.11%（表S7）。但是，这些重复单元及其内容需要在未来进一步确认。我们通过与系统发育密切相关的物种的序列相似性来探索三个E. nutans亚基因组的差异。在这些物种中，H和St亚基因组的序列一致性达到约97.5%（图S4）。与其他禾科物种相似，Ks值分布在0.7 ~ 0.82处形成峰值（图1d），表明影响三个亚基因组的古代WGD事件发生在大约6261 ~ 7334万年前（MYA）。从12个物种的18个亚基因组重建的系统发育树（图1e）中，我们估计三个亚基因组的分化时间约为10.04 MYA， Y和St进一步分裂至7.59 MYA。利用分化时间和进化关系，我们重建了nutans的进化历史模型，结果表明，六倍体nutans （StStYYHH）发生在nutans和E. nutans之间St亚基因组分裂后的3.16 MYA。 sibiricus（图1e，f）（Chen等，2024）。古代二倍体物种（HH，如Hordeum）与四倍体物种（StStYY，如Roegneria）的杂交（图1f），而不是StStHH与YY之间的杂交（图1f），强有力地支持了这样一个事实，即目前世界上没有发现二倍体物种（YY），而多个六倍体物种（StStYYWW、StStYYPP和StStYYHH）频繁发生（Chen et al., 2024; Fan et al., 2013）。Y亚基因组的历史可追溯至6.25 MYA，当时Thinopyrum intermedium和Dasypyrum villosum的基因组V从Y和V基因组的祖先分化出来（图1e）。基因家族分析表明，在黑麦和其他9个小麦亚基因组中，分别鉴定出102、105和82个黑麦亚基因组H、St和Y特有的基因家族，以及6147个亚基因组共有的基因家族（图1g）。我们采集和种植了5个不同海拔和地点的野生资源品系（表S8），进行了UV-B和干旱胁迫下的转录组学研究，并通过qRT-PCR对数据进行了验证（表S9-S11）。加权基因共表达网络分析（WGCNA）显示，在UV-B（黑色模块）和干旱（紫色模块）胁迫下的deg都高度富集谷胱甘肽转移酶活性（图S6）。结果表明，异六倍体鼠鼠基因组包含342个GST基因（9个亚科），超过其他物种。Tau和phi亚家族占主导地位，与小麦亚基因组相比，花生的St和H亚基因组显示出异常高的Tau成员计数（图S7a；表S12）。此外，我们还发现了5个phi和tau亚家族成员（EVM0015335、EVM0002076、EVM0134842、EVM0087283和EVM0141011）对干旱和UV-B处理的转录反应，并且它们的表达在不同品系（NM037 vs QH009、SC020 vs NM035）之间存在显著差异（图S7b、c）。WRKY转录因子EVM0129376_WRKY在两个WGCNA模块的网络中都扮演枢纽基因的角色（图S7d，e）。这些发现表明，GST成员可能与WRKY转录因子（如EVM0129376）等相互作用，参与对干旱和UV-B胁迫的响应（Dixon et al., 2002; Jiang et al., 2017）。综上所述，本研究得到四川省科学技术厅（批准号：2021yfyz0013 - 2,2019yfn0170和2023YFSY0012）和四川省农业农村厅（批准号：2023YFSY0012）的支持。SCCXTD-2025-16)，国家草业技术创新中心（筹备中）(批准号：CCPTZX2023W01)和四川省林业和草原管理局(批准号：.S.B CXTD2025005)。构思项目。W.X.和D.L.提供了资金支持并参与了项目的监督。d.d.c, h.l., j.y., c.z., m.y., j.z., l.y., w.g., x.l., x.j., y.l., d.m.， Q.W, x.c.e., j.t.， H.Z.和P.L.参与了植物样本收集，DNA/RNA制备，文库建设和测序。X.M, x.y和q.l协助数据分析。S.J.和T.H.进行基因组组装和注释以及比较基因组分析。X.H.进行了着丝粒重复序列的筛选。T.H.和M.S.对GST基因家族进行转录组分析和分析。sj， D.C.和ms撰写并修改了手稿。基因组组装(登记号：gwhfajn000000001)和本研究生成的原始测序数据，包括ONT数据、Illumina数据、Iso-seq数据和ChIP-seq数据，可在国家基因组学数据中心（https://ngdc.cncb.ac.cn/）的基因组序列档案中找到，BioProject登录号为PRJCA028418。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A reference genome assembly of the alpine forage grass Elymus nutans

查看原文本刊更多论文

A reference genome assembly of the alpine forage grass Elymus nutans

Elymus nutans Griseb. (Poaceae: Triticeae, 2n = 6x = 42) is a dominant perennial plant species (Figure 1a) in the Qinghai-Tibetan Plateau in China (Liu et al., 2022), where it serves as an important forage grass with high yields, high nutritional value and good palatability for herbivorous ruminant animals.

The genome size of E. nutans is estimated based on flow cytometry and k-mer analysis, respectively (Figure S1). Using advanced sequencing technology, we generated an allohexaploid reference genome for E. nutans, representing the three sets of chromosomes (subgenomes St, Y and H). Initial contigs were assembled from long reads obtained using Oxford Nanopore Technology (ONT, 133.86×, N50 > 29 kb; Table S1), which were polished based on Illumina short reads (Table S2). We assembled the contigs into 21 pseudo-chromosomes using Hi-C data (119.2×, Table S2). After data cleaning and error correction, we obtained a final genome assembly of 9.46 Gb with a contig N50 of 3.01 Mb consisting of 21 chromosomes. The total length of scaffolds is 3.27 Gb, 3.27 Gb and 2.83 Gb for H, St and Y subgenomes, respectively (Table S3). The chromosomes were further grouped into three subgenomes (StStYYHH) based on similarity to the genomes of barley (Hordeum vulgare; HH) and Elymus sibiricus (StStHH) (Figure 1b).

The benchmarking universal single-copy orthologs (BUSCO) score of the E. nutans assembly is 96.6% and the long terminal repeats (LTR) assembly index (LAI) is 16.54, 14.87 and 17.20 for the St, Y and H subgenomes, respectively, confirming a high quality. We successfully mapped 99.64% ONT and 97.1% Illumina reads to the genome assembly and the uniform coverage of mapped reads showed the reliability of the assembly, which was supported by the Hi-C heatmap. Synteny analysis revealed conservation among the three subgenomes, with one large reciprocal translocation detected between chromosomes H04 (175.1 Mb) and Y03 (153.8 Mb) (Figure 1b; Figure S2). This reciprocal translocation, which was further confirmed by fluorescence in situ hybridization (FISH) imaging using unique probes for subgenome H, is localized at one end of chromosome Y03 (Figure 1c). Collinearity between H04 and H03/St03 and between Y03 and Y04/St04 indicated the results from reciprocal translocation (Figure 1b). The syntenic blocks among the three subgenomes of E. nutans (St, Y and H), Xa, H, V, Y, St, R, E, B, A, D and J subgenomes in other Triticeae species also suggest a reliable assembly of the E. nutans genome and potential structural variations (Figure S3).

Among the E. nutans genome, 83.89% are annotated as repetitive sequences (Table S4) and up to 61.67% are grouped as LTRs and dominated by the most abundant LTRs of Copia and Gypsy (Table S4). Gene annotation based on de novo, homology and transcript-based predictions resulted in 114 214 gene models, including 39 341, 40 837 and 33 541 gene models for subgenomes H, St and Y, with average gene lengths of 3392.60 bp, 3462.82 bp and 3409.88 bp, respectively (Table S5).

We determined the potential locations of centromeric regions in the assembly based on enrichment of the known centromeric sequences in wheat and maize (Figure 1b). The LTR retrotransposons Cereba/Quinta (GenBank accession no. FN564437.1) and the whole centromeric sequences were retrieved from the centromeres of wheat (NCBI accession no. GCA_022117705.1) and maize (Chen et al., 2023), respectively, and their alignments to the assembly pointed to the same locations with substantial overlap across all 21 chromosomes of the three subgenomes (Figure 1b; Table S6). The potential centromeric regions are in accordance with the enrichment of transposable elements (TEs) and the gene-poor centromeric and pericentromeric regions (Figure 1b). We further observed the highest proportion of tandem repeats among the potential centromeric regions of the H, St and Y subgenomes, accounting for 26.55%, 19.41% and 21.11%, respectively (Table S7). However, these repeat units and their contents would like to be further confirmed in the future.

We explored the divergence of the three E. nutans subgenomes via sequence similarities with phylogenetically closely related species. The sequence identity in these species reached approximately 97.5% for subgenomes H and St (Figure S4). Similar to other Gramineae species, the distribution of Ks values formed peaks at 0.7–0.82 (Figure 1d), indicating that an ancient WGD event affecting the three subgenomes occurred approximately 62.61–73.34 million years ago (MYA). From a phylogenetic tree reconstructed using 18 subgenomes of 12 species (Figure 1e), we estimated the divergence time of the three subgenomes to be approximately 10.04 MYA, with Y and St further splitting ~7.59 MYA. Using divergence times and evolutionary relationships, we reconstructed a model for the evolutionary history of E. nutans, and it showed that hexaploid E. nutans (StStYYHH) occurred <3.16 MYA after the split of St subgenomes between E. nutans and E. sibiricus (Figure 1e,f) (Chen et al., 2024). The hybridization of an ancient diploid species (HH, e.g., Hordeum) and a tetraploid species (StStYY, e.g., Roegneria) (Figure 1f), rather than the one between StStHH and YY, is strongly supported by the facts that no diploid species (YY) are currently found in the world, and multiple hexaploidy species (StStYYWW, StStYYPP and StStYYHH) occurred as frequent events (Chen et al., 2024; Fan et al., 2013). The history of the Y subgenome could be traced to 6.25 MYA, when genome V in Thinopyrum intermedium and Dasypyrum villosum diverged from the ancestor of Y and V genomes (Figure 1e).

Gene family analysis in E. nutans and nine other Triticeae subgenomes identified 102, 105 and 82 gene families unique to E. nutans subgenomes H, St and Y, respectively, and 6147 gene families shared among subgenomes (Figure 1g). Expanded gene families were identified in the three subgenomes (Figure 1e), and enriched in pathways related to environmental adaptation (Figure S5), for example, strong UV-B and drought stress in Tibetan Plateau. We collected and planted five lines of wild resources from different altitudes and locations (Table S8), conducted the transcriptomic studies under the treatments of UV-B and drought stress, and performed the data validation by qRT-PCR (Tables S9–S11). Weighted gene co-expression network analysis (WGCNA) revealed that the DEGs under both UV-B (black module) and drought stress (purple module) are highly enriched in glutathione transferase activity (Figure S6). We found the allohexaploid E. nutans genome harbours 342 GST genes (nine subfamilies), surpassing other species. Tau and phi subfamilies dominate, with E. nutans' St and H subgenomes showing exceptionally high tau member counts compared to wheat's subgenomes (Figure S7a; Table S12). Furthermore, we discovered the transcriptional responses of five phi and tau subfamily members (EVM0015335, EVM0002076, EVM0134842, EVM0087283 and EVM0141011) to the treatments of both drought and UV-B, and their expressions exhibited significant differences between the lines (NM037 vs QH009, SC020 vs NM035) (Figure S7b,c). The WRKY transcription factor EVM0129376_WRKY played a role as a hub gene in both the networks for the two WGCNA modules (Figure S7d,e). These findings suggest that the GST members might interact with transcription factors of WRKY (such as EVM0129376) and others, and participate in responses to drought and UV-B stresses (Dixon et al., 2002; Jiang et al., 2017).

In summary, our high-quality assembly of the three subgenomes of the Triticeae forage grass E. nutans provides critical insights into the evolutionary history of this species, and will serve as a valuable resource for future studies on its adaptation to the extreme environmental conditions of the Qinghai-Tibetan Plateau.

This work was supported by the Science & Technology Department of Sichuan Province (Grant No. 2021YFYZ0013-2, 2019YFN0170 and 2023YFSY0012), the Sichuan Provincial Department of Agriculture and Rural Affairs (Grant No. SCCXTD-2025-16), the National Center of Pratacultural Technology Innovation (under preparation) (Grant No. CCPTZX2023W01) and the Sichuan Provincial Forestry and Grassland Administration (Grant No. CXTD2025005).

S.B. conceived the project. W.X. and D.L. provided the financial support and participated in the supervision of the project. D.C., H.L., J.Y., C.Z., M.Y., J.Z., L.Y., W.G., X.L., X.J., Y.L., D.M., Q.W., X.C., J.T., H.Z. and P.L. contributed to plant sample collection, DNA/RNA preparation, library construction and sequencing. X.M., X.Y. and Q.L. assisted with data analysis. S.J. and T.H. performed genome assembly and annotation and comparative genomic analyses. X.H. performed the screening of centromeric repeats. T.H. and M.S. performed transcriptome analysis and analysis of the GST gene family. S.J., D.C. and M.S. wrote and revised the manuscript.

The genome assembly (accession no. GWHFAJN00000000.1) and raw sequencing data generated in this study, comprising ONT data, Illumina data, Iso-seq data, and ChIP-seq data, can be found in the Genome Sequence Archive at the National Genomics Data Center (https://ngdc.cncb.ac.cn/) under BioProject accession number PRJCA028418.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Plant Biotechnology Journal 生物-生物工程与应用微生物

CiteScore

20.50

自引率

2.90%

发文量

201

审稿时长

1 months

期刊介绍： Plant Biotechnology Journal aspires to publish original research and insightful reviews of high impact, authored by prominent researchers in applied plant science. The journal places a special emphasis on molecular plant sciences and their practical applications through plant biotechnology. Our goal is to establish a platform for showcasing significant advances in the field, encompassing curiosity-driven studies with potential applications, strategic research in plant biotechnology, scientific analysis of crucial issues for the beneficial utilization of plant sciences, and assessments of the performance of plant biotechnology products in practical applications.