Telomere-to-telomere genome of the desiccation-tolerant desert moss Syntrichia caninervis illuminates Copia-dominant centromeric architecture

IF 10.1 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Bei Gao, Jichen Zhao, Xiaoshuang Li, Jianhua Zhang, Melvin J. Oliver, Daoyuan Zhang
{"title":"Telomere-to-telomere genome of the desiccation-tolerant desert moss Syntrichia caninervis illuminates Copia-dominant centromeric architecture","authors":"Bei Gao, Jichen Zhao, Xiaoshuang Li, Jianhua Zhang, Melvin J. Oliver, Daoyuan Zhang","doi":"10.1111/pbi.14549","DOIUrl":null,"url":null,"abstract":"<p>The extremophile desert moss <i>Syntrichia caninervis</i>, from the Gurbantunggut Desert in China, was capable of surviving simulated Mars conditions (Li <i>et al</i>., <span>2024</span>). <i>Syntrichia caninervis</i> has become a research model for plant desiccation tolerance (Oliver <i>et al</i>., <span>2020</span>). The chromosome-level genome of <i>S. caninervis</i>, from gametophytes originating from the Mojave Desert, was sequenced and assembled (Silva <i>et al</i>., <span>2021</span>), facilitating research on gene function (Li <i>et al</i>., <span>2023</span>) and comparative and evolutionary genomics (Zhang <i>et al</i>., <span>2024</span>). This <i>S. caninervis</i> genome was considered an initial version (ScMoj v1). Because the ScMoj v1 genome relies on assembly of short reads, it has issues with continuity, gaps and assembly errors related to repetitive sequences. Here we generated a high-quality genome for the <i>S. caninervis</i> isolated from the Gurbantunggut Desert (designated ScGur).</p>\n<p>Cultured gametophytes propagated from a single female gametophyte (Figure S1) were used for DNA isolation. The genome was assembled from PacBio High Fidelity (HiFi) and Oxford Nanopore Technologies (ONT) ultra-long reads (Table S1) using hifiasm and NextDenovo softwares. The complete circular genomes of the <i>S. caninervis</i> chloroplast (123 124 bp, Figure S2) and mitochondria (108 309 bp, Figure S3) were obtained using GetOrganelle. A single circular bacterial genome of 6 933 718 bp (Figure 1a) was discovered during assembly with high genomic synteny (Figure 1b) to three genomic contigs of <i>Paenibacillus cellulosilyticus</i> (NCBI accession: GCF_013347265.1), indicating an internally symbiotic bacteria <i>Paenibacillus</i> sp. within <i>S. caninervis</i> gametophytes.</p>\n<figure><picture>\n<source media=\"(min-width: 1650px)\" srcset=\"/cms/asset/96930a0f-6765-4442-bb29-f03b2f725bae/pbi14549-fig-0001-m.jpg\"/><img alt=\"Details are in the caption following the image\" data-lg-src=\"/cms/asset/96930a0f-6765-4442-bb29-f03b2f725bae/pbi14549-fig-0001-m.jpg\" loading=\"lazy\" src=\"/cms/asset/eed8b8bf-93e4-4eb0-8fb4-76fbe6812297/pbi14549-fig-0001-m.png\" title=\"Details are in the caption following the image\"/></picture><figcaption>\n<div><strong>Figure 1<span style=\"font-weight:normal\"></span></strong><div>Open in figure viewer<i aria-hidden=\"true\"></i><span>PowerPoint</span></div>\n</div>\n<div>Complete genome assembly of the desert moss <i>Syntrichia caninervis</i> and its symbiotic bacteria. (a) Complete circular genome of the symbiotic bacteria <i>Paenibacillus</i> sp., illustrated tracks included the GC skew (purple and green), GC content (grey) and the 6-frame protein coding sequences (blue). (b) Collinearity analyses of the symbiotic bacterial genomic sequence with the three contigs of <i>Paenibacillus cellulosilyticus</i> (strain KACC 14175) genome. (c) Overview and comparison of the ScMoj v1 and ScGur T2T genomes. (d) Overview of the genomic syntenies between the ScGur T2T, ScMoj v1 and <i>P. patens</i> T2T genomes. (e) Chromosomal sequence synteny comparison and structural variations between the ScMoj v1 and ScGur T2T genome assemblies. (f) Photographs of desiccated and rehydrated <i>S. caninervis</i> gametophytes and a circos plot illustrated various genomic structures of the T2T genome. Plotted circos tracks illustrated (I) the 13 gapless chromosomes, (II) GC content, (III) gene density, (IV) coverage of repetitive sequences, (V) coverage of helitron elements, (VI) coverage of Gypsy LTR elements, (VII) coverage of Copia LTR elements, (VIII) centromeric regions identified from CENH3 protein-binding peaks using cut&amp;tag and (IX) intra-genomic syntenies. Densities show the proportion of a 500-kb window (400-kb sliding step) of each genomic feature. (g, h) Sequence signature illustration of two exemplar centromeres of chromosome 1 (h) and 13 (i) indicated the Copia-dominant centromeres in <i>S. caniversis</i>. The upper panel showed the StainedGlass sequence-identity heatmaps, followed by coverage peaks from two independent CUT&amp;Tag sequencing experiments and various types of repetitive elements. Copia and Gypsy elements are shown as pink and green rectangle, respectively; and intact Copia and Gypsy elements are shaded deep red and dark green, respectively.</div>\n</figcaption>\n</figure>\n<p>Ten sets of initial contigs were assembled with hifiasm and NextDenovo using original and cleaned reads (removing organellar and bacterial reads), followed by polishing (Table S1; Figure S4). The 10 sets were evaluated for continuity (N50 length), completeness (BUSCO) and overall base accuracy (Qv). The assembly #10, assembled using hifiasm coupled with further polishing were of the highest quality (Figure S4) and selected as the backbone contigs for further HiC scaffolding (Figure S5) and generated 13 scaffolds. All the eight gaps in the scaffolded chromosomes were filled using ONT ultra-long reads or NextDenovo contigs, read coverage was checked to confirm correct gap filling. The gapless chromosomes were polished again using NextPolish with Illumina reads to improve single nucleotide accuracy.</p>\n<p>The final ScGur genome contained 13 gapless chromosomes with a total length of 323.44 Mbp, 31.25 Mb longer than the ScMoj v1 assembly (292.19 Mbp). The contig N50 length were improved from 28.46 Kbp to 24.41 Mbp. All 13 gapless chromosomes contained the 7-base telomeric signature repeats at both ends (Table S2), indicating a telomere-to-telomere (T2T) assembly. The finalized genome exhibited a Qv value of 51.142 (accuracy &gt; 99.999%) (Table S3), significantly higher than the ONT-based T2T genome for <i>Physcomitrium patens</i> (Qv = 32.94) (Bi <i>et al</i>., <span>2024</span>). The LTR assembly index (LAI) for the T2T genome is 18.16 (Figure 1c; Figure S6). The BUSCO completeness value of the T2T genome is 98.1%.</p>\n<p>Comparing the two <i>S. caninervis</i> genomes revealed that all chromosomes were longer in the T2T genome (Table S3), and overall chromosomal collinearity (Figure 1d,e) between the two assemblies exhibited inversions and translocations (Figure 1d,e; Table S4). Genomic synteny comparison with the <i>P. patens</i> T2T genome recapitulated the seven ancestral elements (Figure 1d; Figure S7). A comparison of chromosome 13 from the two genomes revealed substantial variations between the two (Figure 1e). The HiC interaction heatmap of this chromosome for the T2T genome exhibited continuous interaction signals without detectable conformational errors (Figure S8). The observed structural variations could result from difficulties in assembling shorter reads in highly repetitive regions for ScMoj v1 or represent differences between the two ecotypes that evolved in geographically isolation, evidenced by a noticeable synonymous distance (<i>K</i><sub>S</sub>) peak around 0.005 (Figure S7). The number of protein coding genes increased from 16 545 to 18 093 in the T2T genome, which exhibited alternate density distributions with interspersed repetitive elements (Table S5, Tracks III and IV of Figure 1f). A total of 677 transcription factor (TF) genes were annotated (542 TF genes in ScMoj v1) (Figure 1c; Table S6). Notably, the RAV, TCP, BBR-BPC and VOZ transcription factors, absent in the ScMoj v1 genome but seen in other mosses, were all annotated in the T2T genome (Table S6; Figure S9). The tRNA genes were increased from 291 to 314, and the number of identified rRNAs were tripled from 59 to 180 in the T2T assembly (Figure 1c). A tandem repetitive region containing 65 rRNAs was observed at the 3′ end of chromosome 6 in the T2T genome (Figure S10), which was not fully assembled in ScMoj v1.</p>\n<p>Antibodies to the centromere-specific histone 3 (CENH3) protein were synthesized (Figure S11) and employed to conduct CUT&amp;Tag sequencing to locate the centromeres (Figure 1f; Figure S12). A centromere for each chromosome was detected with lengths ranging from 81.5 Kbp to 203.5 Kbp (Figure 1f,g; Table S7), with three acrocentric or near telocentric centromeres on chromosomes 5, 6 and 8 (Figure S12). The 65-bp tandem duplication monomers (<i>M65</i>, Figure S13) were identified using TRASH and identity heatmaps were plotted using StainedGlass. Tandemly duplicated <i>M65</i> monomers were scattered throughout the genome (Figures S14) and all 13 centromeres were composed mostly of <i>Copia</i> elements (Figure 1g,h; Figures S15, S16). Similar <i>Copia</i>-dominant centromeres were observed for the genomes of <i>P. patens</i> and <i>Ceratodon purpureus</i>, indicating they might be prevalent in mosses. The retrotransposon-dominant centromeres seen in mosses (c.a. 80–200 kbp) are smaller in size than the angiosperm satellite-rich centromeres, e.g., soybean (0.9–4.1 Mbp) (Zhang <i>et al</i>., <span>2023</span>) and <i>Forsythia suspensa</i> (0.4–1.4 Mbp) (Cui <i>et al</i>., <span>2024</span>). Whether the two distinct types (i.e. retrotransposon-dominant and satellite-rich) of centromeres are associated with the efficiency of mitosis needs further investigation.</p>\n<p>Transcriptomic data for hydrated, dehydrated and rehydrated samples were mapped to the T2T genome with improved alignment rates, on average about 7.21% more RNA-Seq reads were mapped to the T2T genome (Figure S17). A total of 55 chlorophyll A-B binding proteins (CABs, PF00504) in the T2T genome were identified, in contrast to 42 CABs identified in the ScMoj v1 genome. Thirty-five (63.6%) CAB members were in tandem duplication clusters, and transcripts for most CAB members increased in abundance following rehydration (Figure S18). The transcript abundance of late embryogenesis abundant (LEA) protein genes exhibited higher accumulation in desiccated tissues (Figure S19). The analyses of these DT-related genes will provide valuable resource for elucidating DT and target genes for molecular breeding studies.</p>\n<p>Our results provide a valuable high-quality T2T genome resource as well as important insights into the genomic architecture of the desiccation-tolerant model moss <i>Syntrichia caninervis</i>.</p>","PeriodicalId":221,"journal":{"name":"Plant Biotechnology Journal","volume":"41 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Biotechnology Journal","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1111/pbi.14549","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The extremophile desert moss Syntrichia caninervis, from the Gurbantunggut Desert in China, was capable of surviving simulated Mars conditions (Li et al., 2024). Syntrichia caninervis has become a research model for plant desiccation tolerance (Oliver et al., 2020). The chromosome-level genome of S. caninervis, from gametophytes originating from the Mojave Desert, was sequenced and assembled (Silva et al., 2021), facilitating research on gene function (Li et al., 2023) and comparative and evolutionary genomics (Zhang et al., 2024). This S. caninervis genome was considered an initial version (ScMoj v1). Because the ScMoj v1 genome relies on assembly of short reads, it has issues with continuity, gaps and assembly errors related to repetitive sequences. Here we generated a high-quality genome for the S. caninervis isolated from the Gurbantunggut Desert (designated ScGur).

Cultured gametophytes propagated from a single female gametophyte (Figure S1) were used for DNA isolation. The genome was assembled from PacBio High Fidelity (HiFi) and Oxford Nanopore Technologies (ONT) ultra-long reads (Table S1) using hifiasm and NextDenovo softwares. The complete circular genomes of the S. caninervis chloroplast (123 124 bp, Figure S2) and mitochondria (108 309 bp, Figure S3) were obtained using GetOrganelle. A single circular bacterial genome of 6 933 718 bp (Figure 1a) was discovered during assembly with high genomic synteny (Figure 1b) to three genomic contigs of Paenibacillus cellulosilyticus (NCBI accession: GCF_013347265.1), indicating an internally symbiotic bacteria Paenibacillus sp. within S. caninervis gametophytes.

Abstract Image
Figure 1
Open in figure viewerPowerPoint
Complete genome assembly of the desert moss Syntrichia caninervis and its symbiotic bacteria. (a) Complete circular genome of the symbiotic bacteria Paenibacillus sp., illustrated tracks included the GC skew (purple and green), GC content (grey) and the 6-frame protein coding sequences (blue). (b) Collinearity analyses of the symbiotic bacterial genomic sequence with the three contigs of Paenibacillus cellulosilyticus (strain KACC 14175) genome. (c) Overview and comparison of the ScMoj v1 and ScGur T2T genomes. (d) Overview of the genomic syntenies between the ScGur T2T, ScMoj v1 and P. patens T2T genomes. (e) Chromosomal sequence synteny comparison and structural variations between the ScMoj v1 and ScGur T2T genome assemblies. (f) Photographs of desiccated and rehydrated S. caninervis gametophytes and a circos plot illustrated various genomic structures of the T2T genome. Plotted circos tracks illustrated (I) the 13 gapless chromosomes, (II) GC content, (III) gene density, (IV) coverage of repetitive sequences, (V) coverage of helitron elements, (VI) coverage of Gypsy LTR elements, (VII) coverage of Copia LTR elements, (VIII) centromeric regions identified from CENH3 protein-binding peaks using cut&tag and (IX) intra-genomic syntenies. Densities show the proportion of a 500-kb window (400-kb sliding step) of each genomic feature. (g, h) Sequence signature illustration of two exemplar centromeres of chromosome 1 (h) and 13 (i) indicated the Copia-dominant centromeres in S. caniversis. The upper panel showed the StainedGlass sequence-identity heatmaps, followed by coverage peaks from two independent CUT&Tag sequencing experiments and various types of repetitive elements. Copia and Gypsy elements are shown as pink and green rectangle, respectively; and intact Copia and Gypsy elements are shaded deep red and dark green, respectively.

Ten sets of initial contigs were assembled with hifiasm and NextDenovo using original and cleaned reads (removing organellar and bacterial reads), followed by polishing (Table S1; Figure S4). The 10 sets were evaluated for continuity (N50 length), completeness (BUSCO) and overall base accuracy (Qv). The assembly #10, assembled using hifiasm coupled with further polishing were of the highest quality (Figure S4) and selected as the backbone contigs for further HiC scaffolding (Figure S5) and generated 13 scaffolds. All the eight gaps in the scaffolded chromosomes were filled using ONT ultra-long reads or NextDenovo contigs, read coverage was checked to confirm correct gap filling. The gapless chromosomes were polished again using NextPolish with Illumina reads to improve single nucleotide accuracy.

The final ScGur genome contained 13 gapless chromosomes with a total length of 323.44 Mbp, 31.25 Mb longer than the ScMoj v1 assembly (292.19 Mbp). The contig N50 length were improved from 28.46 Kbp to 24.41 Mbp. All 13 gapless chromosomes contained the 7-base telomeric signature repeats at both ends (Table S2), indicating a telomere-to-telomere (T2T) assembly. The finalized genome exhibited a Qv value of 51.142 (accuracy > 99.999%) (Table S3), significantly higher than the ONT-based T2T genome for Physcomitrium patens (Qv = 32.94) (Bi et al., 2024). The LTR assembly index (LAI) for the T2T genome is 18.16 (Figure 1c; Figure S6). The BUSCO completeness value of the T2T genome is 98.1%.

Comparing the two S. caninervis genomes revealed that all chromosomes were longer in the T2T genome (Table S3), and overall chromosomal collinearity (Figure 1d,e) between the two assemblies exhibited inversions and translocations (Figure 1d,e; Table S4). Genomic synteny comparison with the P. patens T2T genome recapitulated the seven ancestral elements (Figure 1d; Figure S7). A comparison of chromosome 13 from the two genomes revealed substantial variations between the two (Figure 1e). The HiC interaction heatmap of this chromosome for the T2T genome exhibited continuous interaction signals without detectable conformational errors (Figure S8). The observed structural variations could result from difficulties in assembling shorter reads in highly repetitive regions for ScMoj v1 or represent differences between the two ecotypes that evolved in geographically isolation, evidenced by a noticeable synonymous distance (KS) peak around 0.005 (Figure S7). The number of protein coding genes increased from 16 545 to 18 093 in the T2T genome, which exhibited alternate density distributions with interspersed repetitive elements (Table S5, Tracks III and IV of Figure 1f). A total of 677 transcription factor (TF) genes were annotated (542 TF genes in ScMoj v1) (Figure 1c; Table S6). Notably, the RAV, TCP, BBR-BPC and VOZ transcription factors, absent in the ScMoj v1 genome but seen in other mosses, were all annotated in the T2T genome (Table S6; Figure S9). The tRNA genes were increased from 291 to 314, and the number of identified rRNAs were tripled from 59 to 180 in the T2T assembly (Figure 1c). A tandem repetitive region containing 65 rRNAs was observed at the 3′ end of chromosome 6 in the T2T genome (Figure S10), which was not fully assembled in ScMoj v1.

Antibodies to the centromere-specific histone 3 (CENH3) protein were synthesized (Figure S11) and employed to conduct CUT&Tag sequencing to locate the centromeres (Figure 1f; Figure S12). A centromere for each chromosome was detected with lengths ranging from 81.5 Kbp to 203.5 Kbp (Figure 1f,g; Table S7), with three acrocentric or near telocentric centromeres on chromosomes 5, 6 and 8 (Figure S12). The 65-bp tandem duplication monomers (M65, Figure S13) were identified using TRASH and identity heatmaps were plotted using StainedGlass. Tandemly duplicated M65 monomers were scattered throughout the genome (Figures S14) and all 13 centromeres were composed mostly of Copia elements (Figure 1g,h; Figures S15, S16). Similar Copia-dominant centromeres were observed for the genomes of P. patens and Ceratodon purpureus, indicating they might be prevalent in mosses. The retrotransposon-dominant centromeres seen in mosses (c.a. 80–200 kbp) are smaller in size than the angiosperm satellite-rich centromeres, e.g., soybean (0.9–4.1 Mbp) (Zhang et al., 2023) and Forsythia suspensa (0.4–1.4 Mbp) (Cui et al., 2024). Whether the two distinct types (i.e. retrotransposon-dominant and satellite-rich) of centromeres are associated with the efficiency of mitosis needs further investigation.

Transcriptomic data for hydrated, dehydrated and rehydrated samples were mapped to the T2T genome with improved alignment rates, on average about 7.21% more RNA-Seq reads were mapped to the T2T genome (Figure S17). A total of 55 chlorophyll A-B binding proteins (CABs, PF00504) in the T2T genome were identified, in contrast to 42 CABs identified in the ScMoj v1 genome. Thirty-five (63.6%) CAB members were in tandem duplication clusters, and transcripts for most CAB members increased in abundance following rehydration (Figure S18). The transcript abundance of late embryogenesis abundant (LEA) protein genes exhibited higher accumulation in desiccated tissues (Figure S19). The analyses of these DT-related genes will provide valuable resource for elucidating DT and target genes for molecular breeding studies.

Our results provide a valuable high-quality T2T genome resource as well as important insights into the genomic architecture of the desiccation-tolerant model moss Syntrichia caninervis.

耐干燥沙漠苔藓犬毛藓的端粒到端粒基因组阐明了复制显性着丝粒结构
来自中国古尔班通古特沙漠的极端微生物沙漠苔藓Syntrichia caninervis能够在模拟火星条件下生存(Li et al., 2024)。犬心毛虫(Syntrichia caninervis)已成为植物耐干燥性的研究典范(Oliver et al., 2020)。对源自莫哈韦沙漠配子体的S. caninervis染色体水平基因组进行了测序和组装(Silva et al., 2021),促进了基因功能(Li et al., 2023)以及比较和进化基因组学(Zhang et al., 2024)的研究。该犬种基因组被认为是初始版本(scmojv1)。由于ScMoj v1基因组依赖于短读段的组装,因此存在与重复序列相关的连续性、间隙和组装错误等问题。在此,我们为古尔班通古特沙漠(ScGur)分离的犬齿蛇(S. caninervis)生成了一个高质量的基因组。用单个雌性配子体繁殖的培养配子体(图S1)进行DNA分离。使用hifiasm和NextDenovo软件,从PacBio High Fidelity (HiFi)和Oxford Nanopore Technologies (ONT)超长reads(表S1)组装基因组。利用GetOrganelle软件获得了caninervis叶绿体(123 124 bp,图S2)和线粒体(108 309 bp,图S3)的完整圆形基因组。在与纤维素芽孢杆菌(Paenibacillus cellulosilyticus, NCBI accession: GCF_013347265.1)的三个基因组序列的组装过程中,发现了一个长度为6 933 718 bp的环状细菌基因组(图1a),表明在caninervis配子体中存在一种内部共生细菌Paenibacillus sp.。图1打开图查看器powerpoint1沙漠苔藓犬毛藓(Syntrichia caninervis)及其共生细菌的完整基因组组装。(a)共生细菌Paenibacillus sp.完整的环状基因组,图示轨迹包括GC偏态(紫色和绿色)、GC含量(灰色)和6帧蛋白编码序列(蓝色)。(b)纤维素芽孢杆菌(菌株KACC 14175)基因组与共生细菌基因组序列共线性分析。(c) scmojv1和ScGur T2T基因组的概述和比较。(d) ScGur T2T、scmojv1和P. patens T2T基因组的基因组一致性综述。(e) scmojv1和ScGur T2T基因组序列的染色体序列同质性比较和结构差异。(f)干燥和再水化的犬牙蛇配子体的照片和circos图说明了T2T基因组的各种基因组结构。绘制的circos轨迹说明了(I) 13条无间隙染色体,(II) GC含量,(III)基因密度,(IV)重复序列的覆盖范围,(V) helitron元件的覆盖范围,(VI) Gypsy LTR元件的覆盖范围,(VII) Copia LTR元件的覆盖范围,(VIII)使用cut标签从CENH3蛋白结合峰中识别的着丝粒区域,以及(IX)基因组内合成。密度表示每个基因组特征的500 kb窗口(400 kb滑动步长)的比例。(g, h) 1号染色体(h)和13号染色体(i)的两个样例着丝粒的序列特征图表明,这两个着丝粒是复制显性着丝粒。上面的面板显示了StainedGlass序列识别热图,其次是两个独立的CUT&amp;Tag测序实验的覆盖峰和各种类型的重复元件。Copia和Gypsy元素分别用粉色和绿色矩形表示;完整的Copia和Gypsy元素分别被染成深红色和深绿色。使用原始和清洁的reads(去除细胞器和细菌的reads),用hifiasm和NextDenovo组装10组初始contigs,然后进行抛光(表S1;图S4)。评估10组数据的连续性(N50长度)、完整性(BUSCO)和总体基础精度(Qv)。使用hifiasm和进一步抛光组装的组件#10具有最高质量(图S4),并被选为进一步的HiC支架的骨干组件(图S5),并生成13个支架。使用ONT超长reads或NextDenovo contigs填充支架染色体上的所有8个间隙,检查reads覆盖率以确认正确的间隙填充。使用带有Illumina reads的NextPolish再次抛光无间隙染色体,以提高单核苷酸的准确性。最终的ScGur基因组包含13条无间隙染色体,总长度为323.44 Mbp,比ScMoj v1片段(292.19 Mbp)长31.25 Mb。N50长度由28.46 Kbp增加到24.41 Mbp。所有13条无间隙染色体在两端都含有7碱基的端粒特征重复序列(表S2),表明端粒到端粒(T2T)组装。最终基因组的Qv值为51.142(精度&gt;99.999%)(表S3),显著高于基于ont的T2T基因组(Qv = 32.94) (Bi et al., 2024)。T2T基因组的LTR组装指数(LAI)为18.16(图1c;图S6)。 T2T基因组的BUSCO完整性值为98.1%。比较两个犬鼠基因组发现,T2T基因组中的所有染色体都更长(表S3),并且两个组合之间的总体染色体共线性(图1d,e)表现出倒置和易位(图1d,e;表S4)。与P. patens T2T基因组的基因组同源性比较再现了7个祖先元件(图1d;图S7)。对来自两个基因组的13号染色体的比较揭示了两者之间的实质性差异(图1e)。T2T基因组这条染色体的HiC相互作用热图显示出连续的相互作用信号,没有可检测到的构象错误(图S8)。所观察到的结构差异可能是由于scmojv1在高度重复区域组装较短reads的困难造成的,或者代表了两个生态型在地理隔离中进化的差异,显著的同音距离(KS)峰值约为0.005(图S7)。T2T基因组中蛋白质编码基因的数量从16 545个增加到18 093个,呈现出重复元件穿插的交替密度分布(表S5,图1f的轨道III和IV)。共有677个转录因子(TF)基因被注释(scmojv1中有542个TF基因)(图1c;表S6)。值得注意的是,在scmojv1基因组中不存在的RAV、TCP、BBR-BPC和VOZ转录因子均在T2T基因组中有注释(表S6;图S9)。tRNA基因从291个增加到314个,T2T组装中鉴定的rnas数量从59个增加到180个,增加了两倍(图1c)。在T2T基因组的6号染色体3 '端观察到一个包含65个rRNAs的串联重复区域(图S10),该区域在scmojv1中未完全组装。合成着丝粒特异性组蛋白3 (CENH3)的抗体(图S11),并进行CUT&amp;Tag测序以定位着丝粒(图1f;图S12)。每条染色体检测到一个着丝粒,长度从81.5 Kbp到203.5 Kbp不等(图1f,g;表S7),在第5、6和8号染色体上有三个远中心或近远中心着丝粒(图S12)。65-bp串联重复单体(M65,图S13)使用TRASH进行鉴定,并使用StainedGlass绘制识别热图。串联复制的M65单体分散在整个基因组中(图S14),所有13个着丝粒主要由Copia元件组成(图1g,h;图S15, S16)。在patens和ceratdon purpureus的基因组中观察到相似的拷贝显性着丝粒,表明它们可能普遍存在于苔藓中。在苔藓中发现的以反转录转座子为主导的着丝粒(约80-200 kbp)比被子植物中富含卫星的着丝粒小,例如大豆(0.9-4.1 Mbp) (Zhang et al., 2023)和连翘(0.4-1.4 Mbp) (Cui et al., 2024)。着丝粒的两种不同类型(即反转录转座子显性和富含卫星)是否与有丝分裂的效率有关,还需要进一步研究。水合、脱水和复水合样品的转录组学数据被映射到T2T基因组,比对率提高,平均约7.21%的RNA-Seq reads被映射到T2T基因组(图S17)。在T2T基因组中共鉴定出55个叶绿素A- b结合蛋白(cab, PF00504),而在scmojv1基因组中鉴定出42个cab。35个(63.6%)CAB成员在串联重复簇中,大多数CAB成员的转录本在补液后丰度增加(图S18)。胚胎发生晚期丰度(LEA)蛋白基因的转录丰度在干燥组织中积累较高(图S19)。这些DT相关基因的分析将为分子育种研究DT及其靶基因提供有价值的资源。我们的研究结果提供了一个有价值的高质量的T2T基因组资源,以及对耐干燥模式苔藓Syntrichia caninervis基因组结构的重要见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Plant Biotechnology Journal
Plant Biotechnology Journal 生物-生物工程与应用微生物
CiteScore
20.50
自引率
2.90%
发文量
201
审稿时长
1 months
期刊介绍: Plant Biotechnology Journal aspires to publish original research and insightful reviews of high impact, authored by prominent researchers in applied plant science. The journal places a special emphasis on molecular plant sciences and their practical applications through plant biotechnology. Our goal is to establish a platform for showcasing significant advances in the field, encompassing curiosity-driven studies with potential applications, strategic research in plant biotechnology, scientific analysis of crucial issues for the beneficial utilization of plant sciences, and assessments of the performance of plant biotechnology products in practical applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信