Bei Gao, Jichen Zhao, Xiaoshuang Li, Jianhua Zhang, Melvin J. Oliver, Daoyuan Zhang
{"title":"Telomere-to-telomere genome of the desiccation-tolerant desert moss Syntrichia caninervis illuminates Copia-dominant centromeric architecture","authors":"Bei Gao, Jichen Zhao, Xiaoshuang Li, Jianhua Zhang, Melvin J. Oliver, Daoyuan Zhang","doi":"10.1111/pbi.14549","DOIUrl":null,"url":null,"abstract":"<p>The extremophile desert moss <i>Syntrichia caninervis</i>, from the Gurbantunggut Desert in China, was capable of surviving simulated Mars conditions (Li <i>et al</i>., <span>2024</span>). <i>Syntrichia caninervis</i> has become a research model for plant desiccation tolerance (Oliver <i>et al</i>., <span>2020</span>). The chromosome-level genome of <i>S. caninervis</i>, from gametophytes originating from the Mojave Desert, was sequenced and assembled (Silva <i>et al</i>., <span>2021</span>), facilitating research on gene function (Li <i>et al</i>., <span>2023</span>) and comparative and evolutionary genomics (Zhang <i>et al</i>., <span>2024</span>). This <i>S. caninervis</i> genome was considered an initial version (ScMoj v1). Because the ScMoj v1 genome relies on assembly of short reads, it has issues with continuity, gaps and assembly errors related to repetitive sequences. Here we generated a high-quality genome for the <i>S. caninervis</i> isolated from the Gurbantunggut Desert (designated ScGur).</p>\n<p>Cultured gametophytes propagated from a single female gametophyte (Figure S1) were used for DNA isolation. The genome was assembled from PacBio High Fidelity (HiFi) and Oxford Nanopore Technologies (ONT) ultra-long reads (Table S1) using hifiasm and NextDenovo softwares. The complete circular genomes of the <i>S. caninervis</i> chloroplast (123 124 bp, Figure S2) and mitochondria (108 309 bp, Figure S3) were obtained using GetOrganelle. A single circular bacterial genome of 6 933 718 bp (Figure 1a) was discovered during assembly with high genomic synteny (Figure 1b) to three genomic contigs of <i>Paenibacillus cellulosilyticus</i> (NCBI accession: GCF_013347265.1), indicating an internally symbiotic bacteria <i>Paenibacillus</i> sp. within <i>S. caninervis</i> gametophytes.</p>\n<figure><picture>\n<source media=\"(min-width: 1650px)\" srcset=\"/cms/asset/96930a0f-6765-4442-bb29-f03b2f725bae/pbi14549-fig-0001-m.jpg\"/><img alt=\"Details are in the caption following the image\" data-lg-src=\"/cms/asset/96930a0f-6765-4442-bb29-f03b2f725bae/pbi14549-fig-0001-m.jpg\" loading=\"lazy\" src=\"/cms/asset/eed8b8bf-93e4-4eb0-8fb4-76fbe6812297/pbi14549-fig-0001-m.png\" title=\"Details are in the caption following the image\"/></picture><figcaption>\n<div><strong>Figure 1<span style=\"font-weight:normal\"></span></strong><div>Open in figure viewer<i aria-hidden=\"true\"></i><span>PowerPoint</span></div>\n</div>\n<div>Complete genome assembly of the desert moss <i>Syntrichia caninervis</i> and its symbiotic bacteria. (a) Complete circular genome of the symbiotic bacteria <i>Paenibacillus</i> sp., illustrated tracks included the GC skew (purple and green), GC content (grey) and the 6-frame protein coding sequences (blue). (b) Collinearity analyses of the symbiotic bacterial genomic sequence with the three contigs of <i>Paenibacillus cellulosilyticus</i> (strain KACC 14175) genome. (c) Overview and comparison of the ScMoj v1 and ScGur T2T genomes. (d) Overview of the genomic syntenies between the ScGur T2T, ScMoj v1 and <i>P. patens</i> T2T genomes. (e) Chromosomal sequence synteny comparison and structural variations between the ScMoj v1 and ScGur T2T genome assemblies. (f) Photographs of desiccated and rehydrated <i>S. caninervis</i> gametophytes and a circos plot illustrated various genomic structures of the T2T genome. Plotted circos tracks illustrated (I) the 13 gapless chromosomes, (II) GC content, (III) gene density, (IV) coverage of repetitive sequences, (V) coverage of helitron elements, (VI) coverage of Gypsy LTR elements, (VII) coverage of Copia LTR elements, (VIII) centromeric regions identified from CENH3 protein-binding peaks using cut&tag and (IX) intra-genomic syntenies. Densities show the proportion of a 500-kb window (400-kb sliding step) of each genomic feature. (g, h) Sequence signature illustration of two exemplar centromeres of chromosome 1 (h) and 13 (i) indicated the Copia-dominant centromeres in <i>S. caniversis</i>. The upper panel showed the StainedGlass sequence-identity heatmaps, followed by coverage peaks from two independent CUT&Tag sequencing experiments and various types of repetitive elements. Copia and Gypsy elements are shown as pink and green rectangle, respectively; and intact Copia and Gypsy elements are shaded deep red and dark green, respectively.</div>\n</figcaption>\n</figure>\n<p>Ten sets of initial contigs were assembled with hifiasm and NextDenovo using original and cleaned reads (removing organellar and bacterial reads), followed by polishing (Table S1; Figure S4). The 10 sets were evaluated for continuity (N50 length), completeness (BUSCO) and overall base accuracy (Qv). The assembly #10, assembled using hifiasm coupled with further polishing were of the highest quality (Figure S4) and selected as the backbone contigs for further HiC scaffolding (Figure S5) and generated 13 scaffolds. All the eight gaps in the scaffolded chromosomes were filled using ONT ultra-long reads or NextDenovo contigs, read coverage was checked to confirm correct gap filling. The gapless chromosomes were polished again using NextPolish with Illumina reads to improve single nucleotide accuracy.</p>\n<p>The final ScGur genome contained 13 gapless chromosomes with a total length of 323.44 Mbp, 31.25 Mb longer than the ScMoj v1 assembly (292.19 Mbp). The contig N50 length were improved from 28.46 Kbp to 24.41 Mbp. All 13 gapless chromosomes contained the 7-base telomeric signature repeats at both ends (Table S2), indicating a telomere-to-telomere (T2T) assembly. The finalized genome exhibited a Qv value of 51.142 (accuracy > 99.999%) (Table S3), significantly higher than the ONT-based T2T genome for <i>Physcomitrium patens</i> (Qv = 32.94) (Bi <i>et al</i>., <span>2024</span>). The LTR assembly index (LAI) for the T2T genome is 18.16 (Figure 1c; Figure S6). The BUSCO completeness value of the T2T genome is 98.1%.</p>\n<p>Comparing the two <i>S. caninervis</i> genomes revealed that all chromosomes were longer in the T2T genome (Table S3), and overall chromosomal collinearity (Figure 1d,e) between the two assemblies exhibited inversions and translocations (Figure 1d,e; Table S4). Genomic synteny comparison with the <i>P. patens</i> T2T genome recapitulated the seven ancestral elements (Figure 1d; Figure S7). A comparison of chromosome 13 from the two genomes revealed substantial variations between the two (Figure 1e). The HiC interaction heatmap of this chromosome for the T2T genome exhibited continuous interaction signals without detectable conformational errors (Figure S8). The observed structural variations could result from difficulties in assembling shorter reads in highly repetitive regions for ScMoj v1 or represent differences between the two ecotypes that evolved in geographically isolation, evidenced by a noticeable synonymous distance (<i>K</i><sub>S</sub>) peak around 0.005 (Figure S7). The number of protein coding genes increased from 16 545 to 18 093 in the T2T genome, which exhibited alternate density distributions with interspersed repetitive elements (Table S5, Tracks III and IV of Figure 1f). A total of 677 transcription factor (TF) genes were annotated (542 TF genes in ScMoj v1) (Figure 1c; Table S6). Notably, the RAV, TCP, BBR-BPC and VOZ transcription factors, absent in the ScMoj v1 genome but seen in other mosses, were all annotated in the T2T genome (Table S6; Figure S9). The tRNA genes were increased from 291 to 314, and the number of identified rRNAs were tripled from 59 to 180 in the T2T assembly (Figure 1c). A tandem repetitive region containing 65 rRNAs was observed at the 3′ end of chromosome 6 in the T2T genome (Figure S10), which was not fully assembled in ScMoj v1.</p>\n<p>Antibodies to the centromere-specific histone 3 (CENH3) protein were synthesized (Figure S11) and employed to conduct CUT&Tag sequencing to locate the centromeres (Figure 1f; Figure S12). A centromere for each chromosome was detected with lengths ranging from 81.5 Kbp to 203.5 Kbp (Figure 1f,g; Table S7), with three acrocentric or near telocentric centromeres on chromosomes 5, 6 and 8 (Figure S12). The 65-bp tandem duplication monomers (<i>M65</i>, Figure S13) were identified using TRASH and identity heatmaps were plotted using StainedGlass. Tandemly duplicated <i>M65</i> monomers were scattered throughout the genome (Figures S14) and all 13 centromeres were composed mostly of <i>Copia</i> elements (Figure 1g,h; Figures S15, S16). Similar <i>Copia</i>-dominant centromeres were observed for the genomes of <i>P. patens</i> and <i>Ceratodon purpureus</i>, indicating they might be prevalent in mosses. The retrotransposon-dominant centromeres seen in mosses (c.a. 80–200 kbp) are smaller in size than the angiosperm satellite-rich centromeres, e.g., soybean (0.9–4.1 Mbp) (Zhang <i>et al</i>., <span>2023</span>) and <i>Forsythia suspensa</i> (0.4–1.4 Mbp) (Cui <i>et al</i>., <span>2024</span>). Whether the two distinct types (i.e. retrotransposon-dominant and satellite-rich) of centromeres are associated with the efficiency of mitosis needs further investigation.</p>\n<p>Transcriptomic data for hydrated, dehydrated and rehydrated samples were mapped to the T2T genome with improved alignment rates, on average about 7.21% more RNA-Seq reads were mapped to the T2T genome (Figure S17). A total of 55 chlorophyll A-B binding proteins (CABs, PF00504) in the T2T genome were identified, in contrast to 42 CABs identified in the ScMoj v1 genome. Thirty-five (63.6%) CAB members were in tandem duplication clusters, and transcripts for most CAB members increased in abundance following rehydration (Figure S18). The transcript abundance of late embryogenesis abundant (LEA) protein genes exhibited higher accumulation in desiccated tissues (Figure S19). The analyses of these DT-related genes will provide valuable resource for elucidating DT and target genes for molecular breeding studies.</p>\n<p>Our results provide a valuable high-quality T2T genome resource as well as important insights into the genomic architecture of the desiccation-tolerant model moss <i>Syntrichia caninervis</i>.</p>","PeriodicalId":221,"journal":{"name":"Plant Biotechnology Journal","volume":"41 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Biotechnology Journal","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1111/pbi.14549","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The extremophile desert moss Syntrichia caninervis, from the Gurbantunggut Desert in China, was capable of surviving simulated Mars conditions (Li et al., 2024). Syntrichia caninervis has become a research model for plant desiccation tolerance (Oliver et al., 2020). The chromosome-level genome of S. caninervis, from gametophytes originating from the Mojave Desert, was sequenced and assembled (Silva et al., 2021), facilitating research on gene function (Li et al., 2023) and comparative and evolutionary genomics (Zhang et al., 2024). This S. caninervis genome was considered an initial version (ScMoj v1). Because the ScMoj v1 genome relies on assembly of short reads, it has issues with continuity, gaps and assembly errors related to repetitive sequences. Here we generated a high-quality genome for the S. caninervis isolated from the Gurbantunggut Desert (designated ScGur).
Cultured gametophytes propagated from a single female gametophyte (Figure S1) were used for DNA isolation. The genome was assembled from PacBio High Fidelity (HiFi) and Oxford Nanopore Technologies (ONT) ultra-long reads (Table S1) using hifiasm and NextDenovo softwares. The complete circular genomes of the S. caninervis chloroplast (123 124 bp, Figure S2) and mitochondria (108 309 bp, Figure S3) were obtained using GetOrganelle. A single circular bacterial genome of 6 933 718 bp (Figure 1a) was discovered during assembly with high genomic synteny (Figure 1b) to three genomic contigs of Paenibacillus cellulosilyticus (NCBI accession: GCF_013347265.1), indicating an internally symbiotic bacteria Paenibacillus sp. within S. caninervis gametophytes.
Figure 1
Open in figure viewerPowerPoint
Complete genome assembly of the desert moss Syntrichia caninervis and its symbiotic bacteria. (a) Complete circular genome of the symbiotic bacteria Paenibacillus sp., illustrated tracks included the GC skew (purple and green), GC content (grey) and the 6-frame protein coding sequences (blue). (b) Collinearity analyses of the symbiotic bacterial genomic sequence with the three contigs of Paenibacillus cellulosilyticus (strain KACC 14175) genome. (c) Overview and comparison of the ScMoj v1 and ScGur T2T genomes. (d) Overview of the genomic syntenies between the ScGur T2T, ScMoj v1 and P. patens T2T genomes. (e) Chromosomal sequence synteny comparison and structural variations between the ScMoj v1 and ScGur T2T genome assemblies. (f) Photographs of desiccated and rehydrated S. caninervis gametophytes and a circos plot illustrated various genomic structures of the T2T genome. Plotted circos tracks illustrated (I) the 13 gapless chromosomes, (II) GC content, (III) gene density, (IV) coverage of repetitive sequences, (V) coverage of helitron elements, (VI) coverage of Gypsy LTR elements, (VII) coverage of Copia LTR elements, (VIII) centromeric regions identified from CENH3 protein-binding peaks using cut&tag and (IX) intra-genomic syntenies. Densities show the proportion of a 500-kb window (400-kb sliding step) of each genomic feature. (g, h) Sequence signature illustration of two exemplar centromeres of chromosome 1 (h) and 13 (i) indicated the Copia-dominant centromeres in S. caniversis. The upper panel showed the StainedGlass sequence-identity heatmaps, followed by coverage peaks from two independent CUT&Tag sequencing experiments and various types of repetitive elements. Copia and Gypsy elements are shown as pink and green rectangle, respectively; and intact Copia and Gypsy elements are shaded deep red and dark green, respectively.
Ten sets of initial contigs were assembled with hifiasm and NextDenovo using original and cleaned reads (removing organellar and bacterial reads), followed by polishing (Table S1; Figure S4). The 10 sets were evaluated for continuity (N50 length), completeness (BUSCO) and overall base accuracy (Qv). The assembly #10, assembled using hifiasm coupled with further polishing were of the highest quality (Figure S4) and selected as the backbone contigs for further HiC scaffolding (Figure S5) and generated 13 scaffolds. All the eight gaps in the scaffolded chromosomes were filled using ONT ultra-long reads or NextDenovo contigs, read coverage was checked to confirm correct gap filling. The gapless chromosomes were polished again using NextPolish with Illumina reads to improve single nucleotide accuracy.
The final ScGur genome contained 13 gapless chromosomes with a total length of 323.44 Mbp, 31.25 Mb longer than the ScMoj v1 assembly (292.19 Mbp). The contig N50 length were improved from 28.46 Kbp to 24.41 Mbp. All 13 gapless chromosomes contained the 7-base telomeric signature repeats at both ends (Table S2), indicating a telomere-to-telomere (T2T) assembly. The finalized genome exhibited a Qv value of 51.142 (accuracy > 99.999%) (Table S3), significantly higher than the ONT-based T2T genome for Physcomitrium patens (Qv = 32.94) (Bi et al., 2024). The LTR assembly index (LAI) for the T2T genome is 18.16 (Figure 1c; Figure S6). The BUSCO completeness value of the T2T genome is 98.1%.
Comparing the two S. caninervis genomes revealed that all chromosomes were longer in the T2T genome (Table S3), and overall chromosomal collinearity (Figure 1d,e) between the two assemblies exhibited inversions and translocations (Figure 1d,e; Table S4). Genomic synteny comparison with the P. patens T2T genome recapitulated the seven ancestral elements (Figure 1d; Figure S7). A comparison of chromosome 13 from the two genomes revealed substantial variations between the two (Figure 1e). The HiC interaction heatmap of this chromosome for the T2T genome exhibited continuous interaction signals without detectable conformational errors (Figure S8). The observed structural variations could result from difficulties in assembling shorter reads in highly repetitive regions for ScMoj v1 or represent differences between the two ecotypes that evolved in geographically isolation, evidenced by a noticeable synonymous distance (KS) peak around 0.005 (Figure S7). The number of protein coding genes increased from 16 545 to 18 093 in the T2T genome, which exhibited alternate density distributions with interspersed repetitive elements (Table S5, Tracks III and IV of Figure 1f). A total of 677 transcription factor (TF) genes were annotated (542 TF genes in ScMoj v1) (Figure 1c; Table S6). Notably, the RAV, TCP, BBR-BPC and VOZ transcription factors, absent in the ScMoj v1 genome but seen in other mosses, were all annotated in the T2T genome (Table S6; Figure S9). The tRNA genes were increased from 291 to 314, and the number of identified rRNAs were tripled from 59 to 180 in the T2T assembly (Figure 1c). A tandem repetitive region containing 65 rRNAs was observed at the 3′ end of chromosome 6 in the T2T genome (Figure S10), which was not fully assembled in ScMoj v1.
Antibodies to the centromere-specific histone 3 (CENH3) protein were synthesized (Figure S11) and employed to conduct CUT&Tag sequencing to locate the centromeres (Figure 1f; Figure S12). A centromere for each chromosome was detected with lengths ranging from 81.5 Kbp to 203.5 Kbp (Figure 1f,g; Table S7), with three acrocentric or near telocentric centromeres on chromosomes 5, 6 and 8 (Figure S12). The 65-bp tandem duplication monomers (M65, Figure S13) were identified using TRASH and identity heatmaps were plotted using StainedGlass. Tandemly duplicated M65 monomers were scattered throughout the genome (Figures S14) and all 13 centromeres were composed mostly of Copia elements (Figure 1g,h; Figures S15, S16). Similar Copia-dominant centromeres were observed for the genomes of P. patens and Ceratodon purpureus, indicating they might be prevalent in mosses. The retrotransposon-dominant centromeres seen in mosses (c.a. 80–200 kbp) are smaller in size than the angiosperm satellite-rich centromeres, e.g., soybean (0.9–4.1 Mbp) (Zhang et al., 2023) and Forsythia suspensa (0.4–1.4 Mbp) (Cui et al., 2024). Whether the two distinct types (i.e. retrotransposon-dominant and satellite-rich) of centromeres are associated with the efficiency of mitosis needs further investigation.
Transcriptomic data for hydrated, dehydrated and rehydrated samples were mapped to the T2T genome with improved alignment rates, on average about 7.21% more RNA-Seq reads were mapped to the T2T genome (Figure S17). A total of 55 chlorophyll A-B binding proteins (CABs, PF00504) in the T2T genome were identified, in contrast to 42 CABs identified in the ScMoj v1 genome. Thirty-five (63.6%) CAB members were in tandem duplication clusters, and transcripts for most CAB members increased in abundance following rehydration (Figure S18). The transcript abundance of late embryogenesis abundant (LEA) protein genes exhibited higher accumulation in desiccated tissues (Figure S19). The analyses of these DT-related genes will provide valuable resource for elucidating DT and target genes for molecular breeding studies.
Our results provide a valuable high-quality T2T genome resource as well as important insights into the genomic architecture of the desiccation-tolerant model moss Syntrichia caninervis.
期刊介绍:
Plant Biotechnology Journal aspires to publish original research and insightful reviews of high impact, authored by prominent researchers in applied plant science. The journal places a special emphasis on molecular plant sciences and their practical applications through plant biotechnology. Our goal is to establish a platform for showcasing significant advances in the field, encompassing curiosity-driven studies with potential applications, strategic research in plant biotechnology, scientific analysis of crucial issues for the beneficial utilization of plant sciences, and assessments of the performance of plant biotechnology products in practical applications.