DinoSource: A comprehensive database of dinoflagellate genomic resources

IF 10.1 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Fuming Lai, Chongping Li, Yidong Zhang, Ying Li, Yuci Wang, Qiangwei Zhou, Yaping Fang, Hao Chen, Guoliang Li
{"title":"DinoSource: A comprehensive database of dinoflagellate genomic resources","authors":"Fuming Lai, Chongping Li, Yidong Zhang, Ying Li, Yuci Wang, Qiangwei Zhou, Yaping Fang, Hao Chen, Guoliang Li","doi":"10.1111/pbi.70054","DOIUrl":null,"url":null,"abstract":"<p>Dinoflagellates are a taxonomically diverse and ecologically significant group of phytoplankton. They are also infamous for their involvement in harmful algal blooms, which have significant ecological and economic impacts. In recent years, substantial advances have been made in the analysis of dinoflagellate genomes, including sequencing, assembly and gene annotation, alongside the accumulation of extensive multi-omics data (González-Pech <i>et al</i>., <span>2021</span>). Despite these developments, the large size and complexity of dinoflagellate genomes present ongoing challenges. Current resources, such as SAGER, primarily focus on genomic and transcriptomic data sets for <i>Symbiodiniaceae</i> (Yu <i>et al</i>., <span>2020</span>).</p>\n<p>In this study, we have developed the first high-precision and comprehensive genome resource database for dinoflagellates, DinoSource (http://glab.hzau.edu.cn/dinosource), which provides 21 genome assemblies for all 20 currently sequenced dinoflagellate species (including two strains of <i>Polarella glacialis</i>) (Table S1). Our database integrates 703 omics samples, which have been generated from our experiments as well as collected from public repositories such as GEO (Gene Expression Omnibus) and SRA (Sequence Read Archive) up to the present date (Figure 1a). The sources and species distribution of the data sets are detailed in the ‘Data’ page of DinoSource (Figures 1b and S1a).</p>\n<figure><picture>\n<source media=\"(min-width: 1650px)\" srcset=\"/cms/asset/eaf169d6-1eef-4e11-8aa4-58413c9db10c/pbi70054-fig-0001-m.jpg\"/><img alt=\"Details are in the caption following the image\" data-lg-src=\"/cms/asset/eaf169d6-1eef-4e11-8aa4-58413c9db10c/pbi70054-fig-0001-m.jpg\" loading=\"lazy\" src=\"/cms/asset/abbf798d-6830-47f6-b49c-ebeb17ddbea9/pbi70054-fig-0001-m.png\" title=\"Details are in the caption following the image\"/></picture><figcaption>\n<div><strong>Figure 1<span style=\"font-weight:normal\"></span></strong><div>Open in figure viewer<i aria-hidden=\"true\"></i><span>PowerPoint</span></div>\n</div>\n<div>Architecture and screenshots of the DinoSource database. (a) Data collection and sources. (b) Species distribution of omics data across different species. (c) DinoSource's web implementation includes three core modules: The boxplot displays expression profiles of a subset of genes associated with ko: K02634 across different treatments in <i>Breviolum minutum</i>. (e) Gene differential expression analysis and functional enrichment analysis tools. (f) The stacked bar plot illustrates the proportion of three 5mC contexts at varying methylation levels across <i>B. minutum</i>. (g) HiGlass visualizes the Hi-C interaction matrices for <i>Symbiodinium microadriaticum</i> (GSM5023543) in the region chr19:800 K–10 MB. The blue triangular box highlights the identified TAD. (h) An example of using comparative genomics tools in DinoSource. The left panel shows a syntenic block located between <i>Fugacium kawagutii</i> and <i>S. microadriaticum</i>. The middle panel presents a phylogenetic tree illustrating the relationship between the Fkaw0003 gene in Fugacium kawagutii and the Smic26481 gene in <i>S. microadriaticum</i>, both located within the syntenic block. The right panel displays BLAST results, indicating a high level of sequence similarity between the proteins encoded by Fkaw0003 and Smic26481.</div>\n</figcaption>\n</figure>\n<p>To ensure data comprehensiveness and accuracy, we subjected all collected data to rigorous processing and standardization. We obtained the raw data for all data sets, including DNA 5hmU immunoprecipitation sequencing (5hmU DIP-seq), N1-methyladenosine RNA Immunoprecipitation Sequencing (m<sup>1</sup>A RIP-seq), bisulfite sequencing (BS-seq), high-throughput chromosome conformation capture sequencing (Hi-C), assay for transposase accessible chromatin with high-throughput sequencing (ATAC-seq), RNA sequencing (RNA-seq) and ribosome profiling (Ribo-seq). Subsequently, we processed these data sets using standardized pipelines tailored to each data type (Figure S1b) and visualized them using the WashU epigenome browser and HiGlass browser. Additionally, the ‘Quality Control’ page offers various quality control metrics for different types of data to ensure data integrity (Figure S2).</p>\n<p>The core modules of DinoSource are categorized into three main sections: ‘Search Modules’, ‘Genome Browser’ and ‘Analysis Modules’ (Figure 1c). For the convenience of users, the homepage offers a quick search engine designed to help users swiftly retrieve omics results related to genes of interest (Figure S3).</p>\n<p>DinoSource offers comprehensive gene prediction and annotation functionalities for users to explore the gene functions and genomic characteristics in the ‘GeneCard’ page (Table S2). Additionally, <i>Amphidinium carterae</i> genes can also be retrieved using commonly known gene symbols based on our annotation. For example, inputting the gene LHCP into DinoSource will return all associated genes in <i>A. carterae</i>, including their gene basic information, annotations and sequences for DNA, mRNA and protein, as well as any gene region repetitive elements (Figure S4a).</p>\n<p>DinoSource collects and processes high-throughput transcriptomic data across various treatment conditions, standardizing expression levels using transcripts per million (TPM) for comparative purposes. It features a user-friendly ‘Transcriptome’ page that allows users to retrieve data by gene ID or specific GO terms and KEGG categories to explore gene expression profiles associated with particular pathways. DinoSource displays comparative expression levels across different samples and treatment groups (Figures S4b and 1d). Furthermore, we provide bioinformatics tools in analysis modules for differential gene expression analysis and enrichment analysis (Figure 1e) and use WGCNA to construct co-expression networks (Figure S4c).</p>\n<p>To investigate whether nitrogen availability affects translation efficiency in <i>A. carterae</i>, we generated Ribo-seq data under both nitrogen starvation and normal conditions. These results are available on the ‘Translation’ page of DinoSource, where we found that the translation efficiency of photosynthesis-related genes significantly decreased under nitrogen starvation (Figure S3d).</p>\n<p>The ‘Chromatin Accessibility’ page in DinoSource is designed to map genomewide open chromatin regions in dinoflagellates, emphasizing their role in gene regulation and interactions with trans-acting factors. Users can investigate the distribution of chromatin accessibility peaks by entering specific genes or genomic regions. Consistent with recent findings, we also observe that signals in open chromatin regions tend to preferentially appear in non-repetitive regions (Marinov <i>et al</i>., <span>2024</span>) (Figure S4f).</p>\n<p>Unlike higher eukaryotic plants, dinoflagellates possess highly distinctive patterns of nucleotide modifications. DinoSource showcases the genomewide distribution of nucleotide modifications in dinoflagellates. In the ‘Nucleotide Modification’ page, users can browse 5hmU and m1A distributions under various conditions and observe that 5hmU co-localizes with repetitive sequences (Figure S4d), consistent with previous reports (Marinov <i>et al</i>., <span>2024</span>). The ‘DNA Methylation’ page offers single-base resolution methylation levels for all samples, focusing on gene body regions and transcription start sites (Figure S4e). The data cover all three methylation contexts (CG, CHG and CHH), with most genomewide methylation occurring at CG dinucleotides when levels exceed 0.3 (Figure 1f), consistent with previous reports (de Mendoza <i>et al</i>., <span>2018</span>). The WashU Browser provides an intuitive platform for users to visualize differentially methylated regions across multiple data sets (Figure S5a).</p>\n<p>Dinoflagellates exhibit distinct three-dimensional (3D) genomic features due to their unique chromosomal organization, which sets them apart from typical eukaryotes (Nand <i>et al</i>., <span>2021</span>). To facilitate the characterization of the dinoflagellate 3D genome, DinoSource has curated and processed Hi-C data sets. Interaction matrices are used to visualize heatmaps in HiGlass and to reconstruct 3D structures (Figure S5b). Consistent with previous findings (Nand <i>et al</i>., <span>2021</span>), no evidence of chromatin compartmentalization or locus-specific point-to-point loop interactions was detected in DinoSource. However, TADs were observed despite the rigid chromosomal structure in dinoflagellates (Figure 1g), with strong 5hmU signals at TAD boundaries (Figure S5c), aligning with previous reports (Marinov <i>et al</i>., <span>2024</span>).</p>\n<p>Despite belonging to the same phylum, dinoflagellates collected in DinoSource exhibit a remarkable diversity in genome size. To facilitate comparative genomic studies and reveal evolutionary patterns, DinoSource allows users to explore collinear genes between any regions of selected genomes on the “Genome Synteny” page (Figure 1h, Genome Synteny part). Furthermore, In the “Homologue” page of analysis modules, users can select a gene from any dinoflagellate species to retrieve homologues in other dinoflagellates (Figure 1h, Homologue part). Additionally, this page presents the phylogenetic tree of the collected dinoflagellate species (Figure S5d). DinoSource also offers a BLAST tool, enabling users to infer the function, structure and evolutionary history of sequences (Figure 1h, Blast Tool part).</p>\n<p>DinoSource provides a comprehensive genomic, multi-omics and functional resource for dinoflagellate research. In the future, we plan to expand DinoSource by incorporating more dinoflagellate species, integrating diverse omics data types and developing innovative analytical tools to further support advancements in dinoflagellate biology research.</p>","PeriodicalId":221,"journal":{"name":"Plant Biotechnology Journal","volume":"39 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Biotechnology Journal","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1111/pbi.70054","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Dinoflagellates are a taxonomically diverse and ecologically significant group of phytoplankton. They are also infamous for their involvement in harmful algal blooms, which have significant ecological and economic impacts. In recent years, substantial advances have been made in the analysis of dinoflagellate genomes, including sequencing, assembly and gene annotation, alongside the accumulation of extensive multi-omics data (González-Pech et al., 2021). Despite these developments, the large size and complexity of dinoflagellate genomes present ongoing challenges. Current resources, such as SAGER, primarily focus on genomic and transcriptomic data sets for Symbiodiniaceae (Yu et al., 2020).

In this study, we have developed the first high-precision and comprehensive genome resource database for dinoflagellates, DinoSource (http://glab.hzau.edu.cn/dinosource), which provides 21 genome assemblies for all 20 currently sequenced dinoflagellate species (including two strains of Polarella glacialis) (Table S1). Our database integrates 703 omics samples, which have been generated from our experiments as well as collected from public repositories such as GEO (Gene Expression Omnibus) and SRA (Sequence Read Archive) up to the present date (Figure 1a). The sources and species distribution of the data sets are detailed in the ‘Data’ page of DinoSource (Figures 1b and S1a).

Abstract Image
Figure 1
Open in figure viewerPowerPoint
Architecture and screenshots of the DinoSource database. (a) Data collection and sources. (b) Species distribution of omics data across different species. (c) DinoSource's web implementation includes three core modules: The boxplot displays expression profiles of a subset of genes associated with ko: K02634 across different treatments in Breviolum minutum. (e) Gene differential expression analysis and functional enrichment analysis tools. (f) The stacked bar plot illustrates the proportion of three 5mC contexts at varying methylation levels across B. minutum. (g) HiGlass visualizes the Hi-C interaction matrices for Symbiodinium microadriaticum (GSM5023543) in the region chr19:800 K–10 MB. The blue triangular box highlights the identified TAD. (h) An example of using comparative genomics tools in DinoSource. The left panel shows a syntenic block located between Fugacium kawagutii and S. microadriaticum. The middle panel presents a phylogenetic tree illustrating the relationship between the Fkaw0003 gene in Fugacium kawagutii and the Smic26481 gene in S. microadriaticum, both located within the syntenic block. The right panel displays BLAST results, indicating a high level of sequence similarity between the proteins encoded by Fkaw0003 and Smic26481.

To ensure data comprehensiveness and accuracy, we subjected all collected data to rigorous processing and standardization. We obtained the raw data for all data sets, including DNA 5hmU immunoprecipitation sequencing (5hmU DIP-seq), N1-methyladenosine RNA Immunoprecipitation Sequencing (m1A RIP-seq), bisulfite sequencing (BS-seq), high-throughput chromosome conformation capture sequencing (Hi-C), assay for transposase accessible chromatin with high-throughput sequencing (ATAC-seq), RNA sequencing (RNA-seq) and ribosome profiling (Ribo-seq). Subsequently, we processed these data sets using standardized pipelines tailored to each data type (Figure S1b) and visualized them using the WashU epigenome browser and HiGlass browser. Additionally, the ‘Quality Control’ page offers various quality control metrics for different types of data to ensure data integrity (Figure S2).

The core modules of DinoSource are categorized into three main sections: ‘Search Modules’, ‘Genome Browser’ and ‘Analysis Modules’ (Figure 1c). For the convenience of users, the homepage offers a quick search engine designed to help users swiftly retrieve omics results related to genes of interest (Figure S3).

DinoSource offers comprehensive gene prediction and annotation functionalities for users to explore the gene functions and genomic characteristics in the ‘GeneCard’ page (Table S2). Additionally, Amphidinium carterae genes can also be retrieved using commonly known gene symbols based on our annotation. For example, inputting the gene LHCP into DinoSource will return all associated genes in A. carterae, including their gene basic information, annotations and sequences for DNA, mRNA and protein, as well as any gene region repetitive elements (Figure S4a).

DinoSource collects and processes high-throughput transcriptomic data across various treatment conditions, standardizing expression levels using transcripts per million (TPM) for comparative purposes. It features a user-friendly ‘Transcriptome’ page that allows users to retrieve data by gene ID or specific GO terms and KEGG categories to explore gene expression profiles associated with particular pathways. DinoSource displays comparative expression levels across different samples and treatment groups (Figures S4b and 1d). Furthermore, we provide bioinformatics tools in analysis modules for differential gene expression analysis and enrichment analysis (Figure 1e) and use WGCNA to construct co-expression networks (Figure S4c).

To investigate whether nitrogen availability affects translation efficiency in A. carterae, we generated Ribo-seq data under both nitrogen starvation and normal conditions. These results are available on the ‘Translation’ page of DinoSource, where we found that the translation efficiency of photosynthesis-related genes significantly decreased under nitrogen starvation (Figure S3d).

The ‘Chromatin Accessibility’ page in DinoSource is designed to map genomewide open chromatin regions in dinoflagellates, emphasizing their role in gene regulation and interactions with trans-acting factors. Users can investigate the distribution of chromatin accessibility peaks by entering specific genes or genomic regions. Consistent with recent findings, we also observe that signals in open chromatin regions tend to preferentially appear in non-repetitive regions (Marinov et al., 2024) (Figure S4f).

Unlike higher eukaryotic plants, dinoflagellates possess highly distinctive patterns of nucleotide modifications. DinoSource showcases the genomewide distribution of nucleotide modifications in dinoflagellates. In the ‘Nucleotide Modification’ page, users can browse 5hmU and m1A distributions under various conditions and observe that 5hmU co-localizes with repetitive sequences (Figure S4d), consistent with previous reports (Marinov et al., 2024). The ‘DNA Methylation’ page offers single-base resolution methylation levels for all samples, focusing on gene body regions and transcription start sites (Figure S4e). The data cover all three methylation contexts (CG, CHG and CHH), with most genomewide methylation occurring at CG dinucleotides when levels exceed 0.3 (Figure 1f), consistent with previous reports (de Mendoza et al., 2018). The WashU Browser provides an intuitive platform for users to visualize differentially methylated regions across multiple data sets (Figure S5a).

Dinoflagellates exhibit distinct three-dimensional (3D) genomic features due to their unique chromosomal organization, which sets them apart from typical eukaryotes (Nand et al., 2021). To facilitate the characterization of the dinoflagellate 3D genome, DinoSource has curated and processed Hi-C data sets. Interaction matrices are used to visualize heatmaps in HiGlass and to reconstruct 3D structures (Figure S5b). Consistent with previous findings (Nand et al., 2021), no evidence of chromatin compartmentalization or locus-specific point-to-point loop interactions was detected in DinoSource. However, TADs were observed despite the rigid chromosomal structure in dinoflagellates (Figure 1g), with strong 5hmU signals at TAD boundaries (Figure S5c), aligning with previous reports (Marinov et al., 2024).

Despite belonging to the same phylum, dinoflagellates collected in DinoSource exhibit a remarkable diversity in genome size. To facilitate comparative genomic studies and reveal evolutionary patterns, DinoSource allows users to explore collinear genes between any regions of selected genomes on the “Genome Synteny” page (Figure 1h, Genome Synteny part). Furthermore, In the “Homologue” page of analysis modules, users can select a gene from any dinoflagellate species to retrieve homologues in other dinoflagellates (Figure 1h, Homologue part). Additionally, this page presents the phylogenetic tree of the collected dinoflagellate species (Figure S5d). DinoSource also offers a BLAST tool, enabling users to infer the function, structure and evolutionary history of sequences (Figure 1h, Blast Tool part).

DinoSource provides a comprehensive genomic, multi-omics and functional resource for dinoflagellate research. In the future, we plan to expand DinoSource by incorporating more dinoflagellate species, integrating diverse omics data types and developing innovative analytical tools to further support advancements in dinoflagellate biology research.

DinoSource:鞭毛藻基因组资源的综合数据库
甲藻是一种在分类学上多种多样、在生态学上具有重要意义的浮游植物。它们也因参与有害藻华而臭名昭著,对生态和经济造成了重大影响。近年来,在甲藻基因组分析方面取得了重大进展,包括测序、组装和基因注释,同时还积累了大量的多组学数据(González-Pech 等人,2021 年)。尽管取得了这些进展,但甲藻基因组的庞大性和复杂性带来了持续的挑战。目前的资源,如 SAGER,主要集中在共生藻科(Symbiodinceae)的基因组和转录组数据集(Yu 等人,2020 年)。在本研究中,我们开发了首个高精度、全面的甲藻基因组资源数据库 DinoSource(http://glab.hzau.edu.cn/dinosource),该数据库为目前所有 20 个已测序的甲藻物种(包括 Polarella glacialis 的两个菌株)提供了 21 个基因组组装(表 S1)。我们的数据库整合了 703 个 omics 样本,这些样本来自我们的实验,以及从 GEO(Gene Expression Omnibus)和 SRA(Sequence Read Archive)等公共资源库收集的数据(图 1a)。数据集的来源和物种分布详见 DinoSource 的 "数据 "页面(图 1b 和 S1a)。(a) 数据收集和来源。(b) 不同物种的 omics 数据分布。(c) DinoSource 的网络实现包括三个核心模块:(e) 基因差异表达分析和功能富集分析工具。(f) 叠加条形图说明了不同甲基化水平的三种 5mC 上下文在 B. minutum 中的比例。 (g) HiGlass 可视化显示了 chr19:800 K-10 MB 区域中微囊共生藻(Symbiodinium microadriaticum,GSM5023543)的 Hi-C 相互作用矩阵。蓝色三角框突出显示了已识别的 TAD。 (h) 在 DinoSource 中使用比较基因组学工具的示例。左图显示了位于 Fugacium kawagutii 和 S. microadriaticum 之间的同源区块。中图是一棵系统发生树,说明了 Fugacium kawagutii 中的 Fkaw0003 基因与 S. microadriaticum 中的 Smic26481 基因之间的关系,这两个基因都位于同源区块内。为了确保数据的全面性和准确性,我们对所有收集到的数据进行了严格的处理和标准化。我们获得了所有数据集的原始数据,包括 DNA 5hmU 免疫沉淀测序(5hmU DIP-seq)、N1-甲基腺苷 RNA 免疫沉淀测序(m1A RIP-seq)、亚硫酸氢盐测序(BS-seq)、高通量染色体构象捕获测序(Hi-C)、转座酶可接触染色质高通量测序(ATAC-seq)、RNA 测序(RNA-seq)和核糖体分析(Ribo-seq)。随后,我们使用为每种数据类型量身定制的标准化管道处理这些数据集(图 S1b),并使用 WashU 表观基因组浏览器和 HiGlass 浏览器将其可视化。此外,"质量控制 "页面还为不同类型的数据提供了各种质量控制指标,以确保数据的完整性(图 S2):DinoSource 的核心模块分为三个主要部分:"搜索模块"、"基因组浏览器 "和 "分析模块"(图 1c)。为了方便用户,主页提供了一个快速搜索引擎,旨在帮助用户快速检索与感兴趣基因相关的omics结果(图S3)。DinoSource提供了全面的基因预测和注释功能,用户可以在 "GeneCard "页面探索基因功能和基因组特征(表S2)。DinoSource 提供了全面的基因预测和注释功能,用户可以在 "GeneCard "页面(表 S2)中探索基因的功能和基因组特征。例如,在 DinoSource 中输入基因 LHCP,将返回 A. carterae 的所有相关基因,包括基因基本信息、注释和 DNA、mRNA 和蛋白质序列,以及任何基因区域重复元件(图 S4a)。DinoSource 收集并处理各种处理条件下的高通量转录组数据,使用百万转录本(TPM)对表达水平进行标准化,以便进行比较。它有一个用户友好的 "转录组 "页面,允许用户通过基因 ID 或特定 GO 术语和 KEGG 类别检索数据,以探索与特定通路相关的基因表达谱。DinoSource 可显示不同样本和处理组的比较表达水平(图 S4b 和 1d)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Plant Biotechnology Journal
Plant Biotechnology Journal 生物-生物工程与应用微生物
CiteScore
20.50
自引率
2.90%
发文量
201
审稿时长
1 months
期刊介绍: Plant Biotechnology Journal aspires to publish original research and insightful reviews of high impact, authored by prominent researchers in applied plant science. The journal places a special emphasis on molecular plant sciences and their practical applications through plant biotechnology. Our goal is to establish a platform for showcasing significant advances in the field, encompassing curiosity-driven studies with potential applications, strategic research in plant biotechnology, scientific analysis of crucial issues for the beneficial utilization of plant sciences, and assessments of the performance of plant biotechnology products in practical applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信