{"title":"DinoSource: A comprehensive database of dinoflagellate genomic resources","authors":"Fuming Lai, Chongping Li, Yidong Zhang, Ying Li, Yuci Wang, Qiangwei Zhou, Yaping Fang, Hao Chen, Guoliang Li","doi":"10.1111/pbi.70054","DOIUrl":null,"url":null,"abstract":"<p>Dinoflagellates are a taxonomically diverse and ecologically significant group of phytoplankton. They are also infamous for their involvement in harmful algal blooms, which have significant ecological and economic impacts. In recent years, substantial advances have been made in the analysis of dinoflagellate genomes, including sequencing, assembly and gene annotation, alongside the accumulation of extensive multi-omics data (González-Pech <i>et al</i>., <span>2021</span>). Despite these developments, the large size and complexity of dinoflagellate genomes present ongoing challenges. Current resources, such as SAGER, primarily focus on genomic and transcriptomic data sets for <i>Symbiodiniaceae</i> (Yu <i>et al</i>., <span>2020</span>).</p>\n<p>In this study, we have developed the first high-precision and comprehensive genome resource database for dinoflagellates, DinoSource (http://glab.hzau.edu.cn/dinosource), which provides 21 genome assemblies for all 20 currently sequenced dinoflagellate species (including two strains of <i>Polarella glacialis</i>) (Table S1). Our database integrates 703 omics samples, which have been generated from our experiments as well as collected from public repositories such as GEO (Gene Expression Omnibus) and SRA (Sequence Read Archive) up to the present date (Figure 1a). The sources and species distribution of the data sets are detailed in the ‘Data’ page of DinoSource (Figures 1b and S1a).</p>\n<figure><picture>\n<source media=\"(min-width: 1650px)\" srcset=\"/cms/asset/eaf169d6-1eef-4e11-8aa4-58413c9db10c/pbi70054-fig-0001-m.jpg\"/><img alt=\"Details are in the caption following the image\" data-lg-src=\"/cms/asset/eaf169d6-1eef-4e11-8aa4-58413c9db10c/pbi70054-fig-0001-m.jpg\" loading=\"lazy\" src=\"/cms/asset/abbf798d-6830-47f6-b49c-ebeb17ddbea9/pbi70054-fig-0001-m.png\" title=\"Details are in the caption following the image\"/></picture><figcaption>\n<div><strong>Figure 1<span style=\"font-weight:normal\"></span></strong><div>Open in figure viewer<i aria-hidden=\"true\"></i><span>PowerPoint</span></div>\n</div>\n<div>Architecture and screenshots of the DinoSource database. (a) Data collection and sources. (b) Species distribution of omics data across different species. (c) DinoSource's web implementation includes three core modules: The boxplot displays expression profiles of a subset of genes associated with ko: K02634 across different treatments in <i>Breviolum minutum</i>. (e) Gene differential expression analysis and functional enrichment analysis tools. (f) The stacked bar plot illustrates the proportion of three 5mC contexts at varying methylation levels across <i>B. minutum</i>. (g) HiGlass visualizes the Hi-C interaction matrices for <i>Symbiodinium microadriaticum</i> (GSM5023543) in the region chr19:800 K–10 MB. The blue triangular box highlights the identified TAD. (h) An example of using comparative genomics tools in DinoSource. The left panel shows a syntenic block located between <i>Fugacium kawagutii</i> and <i>S. microadriaticum</i>. The middle panel presents a phylogenetic tree illustrating the relationship between the Fkaw0003 gene in Fugacium kawagutii and the Smic26481 gene in <i>S. microadriaticum</i>, both located within the syntenic block. The right panel displays BLAST results, indicating a high level of sequence similarity between the proteins encoded by Fkaw0003 and Smic26481.</div>\n</figcaption>\n</figure>\n<p>To ensure data comprehensiveness and accuracy, we subjected all collected data to rigorous processing and standardization. We obtained the raw data for all data sets, including DNA 5hmU immunoprecipitation sequencing (5hmU DIP-seq), N1-methyladenosine RNA Immunoprecipitation Sequencing (m<sup>1</sup>A RIP-seq), bisulfite sequencing (BS-seq), high-throughput chromosome conformation capture sequencing (Hi-C), assay for transposase accessible chromatin with high-throughput sequencing (ATAC-seq), RNA sequencing (RNA-seq) and ribosome profiling (Ribo-seq). Subsequently, we processed these data sets using standardized pipelines tailored to each data type (Figure S1b) and visualized them using the WashU epigenome browser and HiGlass browser. Additionally, the ‘Quality Control’ page offers various quality control metrics for different types of data to ensure data integrity (Figure S2).</p>\n<p>The core modules of DinoSource are categorized into three main sections: ‘Search Modules’, ‘Genome Browser’ and ‘Analysis Modules’ (Figure 1c). For the convenience of users, the homepage offers a quick search engine designed to help users swiftly retrieve omics results related to genes of interest (Figure S3).</p>\n<p>DinoSource offers comprehensive gene prediction and annotation functionalities for users to explore the gene functions and genomic characteristics in the ‘GeneCard’ page (Table S2). Additionally, <i>Amphidinium carterae</i> genes can also be retrieved using commonly known gene symbols based on our annotation. For example, inputting the gene LHCP into DinoSource will return all associated genes in <i>A. carterae</i>, including their gene basic information, annotations and sequences for DNA, mRNA and protein, as well as any gene region repetitive elements (Figure S4a).</p>\n<p>DinoSource collects and processes high-throughput transcriptomic data across various treatment conditions, standardizing expression levels using transcripts per million (TPM) for comparative purposes. It features a user-friendly ‘Transcriptome’ page that allows users to retrieve data by gene ID or specific GO terms and KEGG categories to explore gene expression profiles associated with particular pathways. DinoSource displays comparative expression levels across different samples and treatment groups (Figures S4b and 1d). Furthermore, we provide bioinformatics tools in analysis modules for differential gene expression analysis and enrichment analysis (Figure 1e) and use WGCNA to construct co-expression networks (Figure S4c).</p>\n<p>To investigate whether nitrogen availability affects translation efficiency in <i>A. carterae</i>, we generated Ribo-seq data under both nitrogen starvation and normal conditions. These results are available on the ‘Translation’ page of DinoSource, where we found that the translation efficiency of photosynthesis-related genes significantly decreased under nitrogen starvation (Figure S3d).</p>\n<p>The ‘Chromatin Accessibility’ page in DinoSource is designed to map genomewide open chromatin regions in dinoflagellates, emphasizing their role in gene regulation and interactions with trans-acting factors. Users can investigate the distribution of chromatin accessibility peaks by entering specific genes or genomic regions. Consistent with recent findings, we also observe that signals in open chromatin regions tend to preferentially appear in non-repetitive regions (Marinov <i>et al</i>., <span>2024</span>) (Figure S4f).</p>\n<p>Unlike higher eukaryotic plants, dinoflagellates possess highly distinctive patterns of nucleotide modifications. DinoSource showcases the genomewide distribution of nucleotide modifications in dinoflagellates. In the ‘Nucleotide Modification’ page, users can browse 5hmU and m1A distributions under various conditions and observe that 5hmU co-localizes with repetitive sequences (Figure S4d), consistent with previous reports (Marinov <i>et al</i>., <span>2024</span>). The ‘DNA Methylation’ page offers single-base resolution methylation levels for all samples, focusing on gene body regions and transcription start sites (Figure S4e). The data cover all three methylation contexts (CG, CHG and CHH), with most genomewide methylation occurring at CG dinucleotides when levels exceed 0.3 (Figure 1f), consistent with previous reports (de Mendoza <i>et al</i>., <span>2018</span>). The WashU Browser provides an intuitive platform for users to visualize differentially methylated regions across multiple data sets (Figure S5a).</p>\n<p>Dinoflagellates exhibit distinct three-dimensional (3D) genomic features due to their unique chromosomal organization, which sets them apart from typical eukaryotes (Nand <i>et al</i>., <span>2021</span>). To facilitate the characterization of the dinoflagellate 3D genome, DinoSource has curated and processed Hi-C data sets. Interaction matrices are used to visualize heatmaps in HiGlass and to reconstruct 3D structures (Figure S5b). Consistent with previous findings (Nand <i>et al</i>., <span>2021</span>), no evidence of chromatin compartmentalization or locus-specific point-to-point loop interactions was detected in DinoSource. However, TADs were observed despite the rigid chromosomal structure in dinoflagellates (Figure 1g), with strong 5hmU signals at TAD boundaries (Figure S5c), aligning with previous reports (Marinov <i>et al</i>., <span>2024</span>).</p>\n<p>Despite belonging to the same phylum, dinoflagellates collected in DinoSource exhibit a remarkable diversity in genome size. To facilitate comparative genomic studies and reveal evolutionary patterns, DinoSource allows users to explore collinear genes between any regions of selected genomes on the “Genome Synteny” page (Figure 1h, Genome Synteny part). Furthermore, In the “Homologue” page of analysis modules, users can select a gene from any dinoflagellate species to retrieve homologues in other dinoflagellates (Figure 1h, Homologue part). Additionally, this page presents the phylogenetic tree of the collected dinoflagellate species (Figure S5d). DinoSource also offers a BLAST tool, enabling users to infer the function, structure and evolutionary history of sequences (Figure 1h, Blast Tool part).</p>\n<p>DinoSource provides a comprehensive genomic, multi-omics and functional resource for dinoflagellate research. In the future, we plan to expand DinoSource by incorporating more dinoflagellate species, integrating diverse omics data types and developing innovative analytical tools to further support advancements in dinoflagellate biology research.</p>","PeriodicalId":221,"journal":{"name":"Plant Biotechnology Journal","volume":"39 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Biotechnology Journal","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1111/pbi.70054","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Dinoflagellates are a taxonomically diverse and ecologically significant group of phytoplankton. They are also infamous for their involvement in harmful algal blooms, which have significant ecological and economic impacts. In recent years, substantial advances have been made in the analysis of dinoflagellate genomes, including sequencing, assembly and gene annotation, alongside the accumulation of extensive multi-omics data (González-Pech et al., 2021). Despite these developments, the large size and complexity of dinoflagellate genomes present ongoing challenges. Current resources, such as SAGER, primarily focus on genomic and transcriptomic data sets for Symbiodiniaceae (Yu et al., 2020).
In this study, we have developed the first high-precision and comprehensive genome resource database for dinoflagellates, DinoSource (http://glab.hzau.edu.cn/dinosource), which provides 21 genome assemblies for all 20 currently sequenced dinoflagellate species (including two strains of Polarella glacialis) (Table S1). Our database integrates 703 omics samples, which have been generated from our experiments as well as collected from public repositories such as GEO (Gene Expression Omnibus) and SRA (Sequence Read Archive) up to the present date (Figure 1a). The sources and species distribution of the data sets are detailed in the ‘Data’ page of DinoSource (Figures 1b and S1a).
Figure 1
Open in figure viewerPowerPoint
Architecture and screenshots of the DinoSource database. (a) Data collection and sources. (b) Species distribution of omics data across different species. (c) DinoSource's web implementation includes three core modules: The boxplot displays expression profiles of a subset of genes associated with ko: K02634 across different treatments in Breviolum minutum. (e) Gene differential expression analysis and functional enrichment analysis tools. (f) The stacked bar plot illustrates the proportion of three 5mC contexts at varying methylation levels across B. minutum. (g) HiGlass visualizes the Hi-C interaction matrices for Symbiodinium microadriaticum (GSM5023543) in the region chr19:800 K–10 MB. The blue triangular box highlights the identified TAD. (h) An example of using comparative genomics tools in DinoSource. The left panel shows a syntenic block located between Fugacium kawagutii and S. microadriaticum. The middle panel presents a phylogenetic tree illustrating the relationship between the Fkaw0003 gene in Fugacium kawagutii and the Smic26481 gene in S. microadriaticum, both located within the syntenic block. The right panel displays BLAST results, indicating a high level of sequence similarity between the proteins encoded by Fkaw0003 and Smic26481.
To ensure data comprehensiveness and accuracy, we subjected all collected data to rigorous processing and standardization. We obtained the raw data for all data sets, including DNA 5hmU immunoprecipitation sequencing (5hmU DIP-seq), N1-methyladenosine RNA Immunoprecipitation Sequencing (m1A RIP-seq), bisulfite sequencing (BS-seq), high-throughput chromosome conformation capture sequencing (Hi-C), assay for transposase accessible chromatin with high-throughput sequencing (ATAC-seq), RNA sequencing (RNA-seq) and ribosome profiling (Ribo-seq). Subsequently, we processed these data sets using standardized pipelines tailored to each data type (Figure S1b) and visualized them using the WashU epigenome browser and HiGlass browser. Additionally, the ‘Quality Control’ page offers various quality control metrics for different types of data to ensure data integrity (Figure S2).
The core modules of DinoSource are categorized into three main sections: ‘Search Modules’, ‘Genome Browser’ and ‘Analysis Modules’ (Figure 1c). For the convenience of users, the homepage offers a quick search engine designed to help users swiftly retrieve omics results related to genes of interest (Figure S3).
DinoSource offers comprehensive gene prediction and annotation functionalities for users to explore the gene functions and genomic characteristics in the ‘GeneCard’ page (Table S2). Additionally, Amphidinium carterae genes can also be retrieved using commonly known gene symbols based on our annotation. For example, inputting the gene LHCP into DinoSource will return all associated genes in A. carterae, including their gene basic information, annotations and sequences for DNA, mRNA and protein, as well as any gene region repetitive elements (Figure S4a).
DinoSource collects and processes high-throughput transcriptomic data across various treatment conditions, standardizing expression levels using transcripts per million (TPM) for comparative purposes. It features a user-friendly ‘Transcriptome’ page that allows users to retrieve data by gene ID or specific GO terms and KEGG categories to explore gene expression profiles associated with particular pathways. DinoSource displays comparative expression levels across different samples and treatment groups (Figures S4b and 1d). Furthermore, we provide bioinformatics tools in analysis modules for differential gene expression analysis and enrichment analysis (Figure 1e) and use WGCNA to construct co-expression networks (Figure S4c).
To investigate whether nitrogen availability affects translation efficiency in A. carterae, we generated Ribo-seq data under both nitrogen starvation and normal conditions. These results are available on the ‘Translation’ page of DinoSource, where we found that the translation efficiency of photosynthesis-related genes significantly decreased under nitrogen starvation (Figure S3d).
The ‘Chromatin Accessibility’ page in DinoSource is designed to map genomewide open chromatin regions in dinoflagellates, emphasizing their role in gene regulation and interactions with trans-acting factors. Users can investigate the distribution of chromatin accessibility peaks by entering specific genes or genomic regions. Consistent with recent findings, we also observe that signals in open chromatin regions tend to preferentially appear in non-repetitive regions (Marinov et al., 2024) (Figure S4f).
Unlike higher eukaryotic plants, dinoflagellates possess highly distinctive patterns of nucleotide modifications. DinoSource showcases the genomewide distribution of nucleotide modifications in dinoflagellates. In the ‘Nucleotide Modification’ page, users can browse 5hmU and m1A distributions under various conditions and observe that 5hmU co-localizes with repetitive sequences (Figure S4d), consistent with previous reports (Marinov et al., 2024). The ‘DNA Methylation’ page offers single-base resolution methylation levels for all samples, focusing on gene body regions and transcription start sites (Figure S4e). The data cover all three methylation contexts (CG, CHG and CHH), with most genomewide methylation occurring at CG dinucleotides when levels exceed 0.3 (Figure 1f), consistent with previous reports (de Mendoza et al., 2018). The WashU Browser provides an intuitive platform for users to visualize differentially methylated regions across multiple data sets (Figure S5a).
Dinoflagellates exhibit distinct three-dimensional (3D) genomic features due to their unique chromosomal organization, which sets them apart from typical eukaryotes (Nand et al., 2021). To facilitate the characterization of the dinoflagellate 3D genome, DinoSource has curated and processed Hi-C data sets. Interaction matrices are used to visualize heatmaps in HiGlass and to reconstruct 3D structures (Figure S5b). Consistent with previous findings (Nand et al., 2021), no evidence of chromatin compartmentalization or locus-specific point-to-point loop interactions was detected in DinoSource. However, TADs were observed despite the rigid chromosomal structure in dinoflagellates (Figure 1g), with strong 5hmU signals at TAD boundaries (Figure S5c), aligning with previous reports (Marinov et al., 2024).
Despite belonging to the same phylum, dinoflagellates collected in DinoSource exhibit a remarkable diversity in genome size. To facilitate comparative genomic studies and reveal evolutionary patterns, DinoSource allows users to explore collinear genes between any regions of selected genomes on the “Genome Synteny” page (Figure 1h, Genome Synteny part). Furthermore, In the “Homologue” page of analysis modules, users can select a gene from any dinoflagellate species to retrieve homologues in other dinoflagellates (Figure 1h, Homologue part). Additionally, this page presents the phylogenetic tree of the collected dinoflagellate species (Figure S5d). DinoSource also offers a BLAST tool, enabling users to infer the function, structure and evolutionary history of sequences (Figure 1h, Blast Tool part).
DinoSource provides a comprehensive genomic, multi-omics and functional resource for dinoflagellate research. In the future, we plan to expand DinoSource by incorporating more dinoflagellate species, integrating diverse omics data types and developing innovative analytical tools to further support advancements in dinoflagellate biology research.
期刊介绍:
Plant Biotechnology Journal aspires to publish original research and insightful reviews of high impact, authored by prominent researchers in applied plant science. The journal places a special emphasis on molecular plant sciences and their practical applications through plant biotechnology. Our goal is to establish a platform for showcasing significant advances in the field, encompassing curiosity-driven studies with potential applications, strategic research in plant biotechnology, scientific analysis of crucial issues for the beneficial utilization of plant sciences, and assessments of the performance of plant biotechnology products in practical applications.