ScDB: A comprehensive database dedicated to Saccharum, facilitating functional genomics and molecular biology studies in sugarcane

IF 10.1 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Siyuan Chen, Xiaoxi Feng, Zhe Zhang, Xiuting Hua, Qing Zhang, Chengjie Chen, Jiawei Li, Xiaojing Liu, Chenyu Weng, Baoshan Chen, Muqing Zhang, Wei Yao, Haibao Tang, Ray Ming, Jisen Zhang
{"title":"ScDB: A comprehensive database dedicated to Saccharum, facilitating functional genomics and molecular biology studies in sugarcane","authors":"Siyuan Chen, Xiaoxi Feng, Zhe Zhang, Xiuting Hua, Qing Zhang, Chengjie Chen, Jiawei Li, Xiaojing Liu, Chenyu Weng, Baoshan Chen, Muqing Zhang, Wei Yao, Haibao Tang, Ray Ming, Jisen Zhang","doi":"10.1111/pbi.14457","DOIUrl":null,"url":null,"abstract":"<p>Sugarcane is the world's important sugar crop, serving as the primary feedstock for the production of sugar and biofuels. Modern sugarcane cultivar resulting from deliberate interspecific hybridization between <i>Saccharum officinarum</i> and <i>Saccharum spontaneum</i>. The utilization of wild resources is essential for the development of high-quality sugarcane varieties, and the genomic and omics analyses of these materials provide valuable insights into their molecular mechanisms. However, the complexity of the sugarcane genome has historically presented challenges for researchers. In our previous studies, we led the efforts to assemble the genome of a haploid <i>S. spontaneum</i> AP85-441 (Zhang <i>et al</i>., <span>2018</span>) and pioneered the approach to tackle a complex autopolyploid at allele-level resolution. We then traced the origins of <i>Saccharum</i> and mapped the chromosomal evolution in <i>S. spontaneum</i> Np-X (Zhang <i>et al</i>., <span>2022</span>). Additionally, we successfully assembled a complete, gap-free diploid <i>Erianthus rufipilus</i> YN2009-3 genome, shedding light on the genomic footprints of evolution in the highly polyploid <i>Saccharum</i> (Wang <i>et al</i>., <span>2023</span>). Meanwhile, we are proud to present the genome of <i>Saccharum</i> hybrid XTT22, considered the most significant achievement in sugarcane research. Our work is currently accepted and will soon be online (Zhang <i>et al</i>., <i>Nature Genetics</i>). In addition, other teams have similarly worked on genome research in the Sugarcane. This year, the genomes of modern sugarcane R570 and ZZ1 were published by A. D'Hont's team and Muqing Zhang's team, respectively (Bao <i>et al</i>., <span>2024</span>; Healey <i>et al</i>., <span>2024</span>).</p>\n<p>Building upon this foundation, we are pleased to introduce ScDB (<i>Saccharum</i> genomic database, https://sugarcane.gxu.edu.cn/scdb), the first user-friendly multi-omics database for six <i>Saccharum</i> species (AP85-441, Np-X, LA-Purple, XTT22, R570, ZZ1) and a <i>Erianthus rufipilus</i> (YN2009-3). ScDB currently comprises a total of 38.91 Gb of genomic assembly sequences, encompassing 1 366 608 genes. Additionally, ScDB includes 24 transcriptome projects involving over 300 sugarcane samples and approximately 2.5 TB of data. Furthermore, 12 online functions that are frequently used by users have been developed to facilitate the use of ScDB, include ‘Gene Search’, ‘Orthologous Gene Search’, ‘Synteny Block’, ‘Genome Browser’, ‘Gene Expression’, ‘Co-expression Network’, ‘Blast’, ‘Primer’, ‘Sequence Fetch’, ‘Transcription Factors’, ‘Protein Interaction Network’, ‘Profile Inference’ (Figure 1a).</p>\n<figure><picture>\n<source media=\"(min-width: 1650px)\" srcset=\"/cms/asset/af07cd8d-7988-4899-b1c8-cd096eee0bb1/pbi14457-fig-0001-m.jpg\"/><img alt=\"Details are in the caption following the image\" data-lg-src=\"/cms/asset/af07cd8d-7988-4899-b1c8-cd096eee0bb1/pbi14457-fig-0001-m.jpg\" loading=\"lazy\" src=\"/cms/asset/97696127-eac5-4fc7-b9cd-db70c147dfe6/pbi14457-fig-0001-m.png\" title=\"Details are in the caption following the image\"/></picture><figcaption>\n<div><strong>Figure 1<span style=\"font-weight:normal\"></span></strong><div>Open in figure viewer<i aria-hidden=\"true\"></i><span>PowerPoint</span></div>\n</div>\n<div>Overview of ScDB and its functions in multi-omics analysis. (a) The phylogenetic relationships and data sources of all existing species in ScDB, the construction process and the modules and tools included. (b) Advanced search of the home page. (c) Part of the gene details page: gene function annotations and expression data for different studies. (d) The gene synteny blocks are obtained using the synteny blocks search function. The results are presented as a synteny diagram and table. (e) Gene expression heatmaps allow users to select different studies and Expression Units (TPM or FPKM) and customize colour schemes. (f) The Profile Inference tool enables users to match known motifs by gene ID, gene name and amino acid sequence, and meme files are provided for download.</div>\n</figcaption>\n</figure>\n<p>ScDB consists of a frontend web interface, a backend application server, a main database and a suite of tools for analysis and visualization. The database is an organized database into six main modules: ‘Home’, ‘Genomics’, ‘Transcriptomics’, ‘Tools’, ‘Download’ and ‘Publication’. The homepage features an introduction to ScDB, an advanced search engine, descriptions of <i>Saccharum</i> species and <i>Erianthus rufipilus</i>, and links to various tools. The advanced search function enables users to search by gene ID, gene name, GO number and KEGG number (Figure 1b).</p>\n<p>The ‘Genomics module’ includes functions for ‘Genome’, ‘Gene Search’, ‘Synteny Blocks’ and ‘Genome Browser’. The ‘Genome’ reveals <i>Saccharum</i> species and <i>Erianthus rufipilus</i> that have been sequenced, along with insights into their geographic distribution and evolutionary ties. Users can view detailed genomic information and images for each variety, as well as structural annotations for each chromosome. In the ‘Gene Search’ feature, users can look up several genes using either gene IDs or specific chromosome regions. The ‘Search By Range’ option includes a chromosome selection tool, making it easier for those who are less acquainted with the genome to navigate. The gene details page provides information on the location of genes, functional annotations, expression of various studies, Orthogroups genes, as well as CDS, proteins and upstream and downstream sequences (Figure 1c). The ‘Orthologous Gene Search’ module searches for homologous genes, allowing the entry of genes from species included in the ScDB, and Arabidopsis, rice and sorghum. The ‘Synteny Block’ can be used for a swift examination of the evolution and variety within large homologous gene segments and chromosome (Figure 1d). The ‘Genome Browser’ tool provides a fast and interactive genome browser for navigating large-scale high-throughput sequencing data under a genomic framework.</p>\n<p>The ‘Transcriptomics module’ offers search and visualization functionalities for gene expression (Figure 1e) and co-expression gene networks. In the ‘Gene Expression’, Users are facilitated to access expression data for a range of genes. Users have the freedom to select their preferred studies, select the expression units (either Transcripts Per Million or Fragments Per Kilobase Million), and customize the color scheme of the heatmap according to their preferences.</p>\n<p>The ‘Tools’ module includes functions for ‘Blast’, ‘Primer’, ‘Sequence Fetch’, ‘Transcription Factors’, ‘Protein Interaction Network’ and ‘Profile Inference’. The ‘Blast’ tool performs homology searches with different data sets. ‘Primer’ is the primer design tool. ‘Sequence Fetch’ can be used to extract chromosome sequences from a specified region. In the ‘Transcription Factors’, we used iTAK (Zheng <i>et al</i>., <span>2016</span>) software to identify transcription factor families and kinase families of <i>Saccharum</i> species and <i>Erianthus rufipilus</i>, users can click on the name of any transcription factor family or kinase family to view a list of all genes contained in that family and can also search for the gene family in which the gene belongs. In ‘Protein Interaction Network’, users can search protein interaction networks for specific genes by gene IDs. The results are presented in a table that can be saved in CSV files and also visualized as an interactive network diagram, which can also be saved as an SVG image. Users can search for motifs in the Jaspar database by matching gene ID, gene name and protein sequence in ‘Profile Inference’, and download meme format files that can be used for binding prediction with upstream sequences obtained from the gene details page (Figure 1f). ‘Download’ module provides chromosome data and annotations for download.</p>\n<p>In summary, we present ScDB, which encompasses genome assemblies, annotations and transcriptome data of six <i>Saccharum</i> species and <i>Erianthus rufipilus</i>. To enhance the usability and efficiency of data acquisition and analysis, ScDB also provides a suite of convenient modules for search, analysis and visualization. In the future, ScDB will continue to be updated, adding more sugarcane genome data and other levels of omics data (proteomics, epigenetics, ncRNA, etc.), as well as further data analysis tools to ensure that it is a powerful and sustainable sugarcane data collection and analysis platform.</p>","PeriodicalId":221,"journal":{"name":"Plant Biotechnology Journal","volume":"7 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Biotechnology Journal","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1111/pbi.14457","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Sugarcane is the world's important sugar crop, serving as the primary feedstock for the production of sugar and biofuels. Modern sugarcane cultivar resulting from deliberate interspecific hybridization between Saccharum officinarum and Saccharum spontaneum. The utilization of wild resources is essential for the development of high-quality sugarcane varieties, and the genomic and omics analyses of these materials provide valuable insights into their molecular mechanisms. However, the complexity of the sugarcane genome has historically presented challenges for researchers. In our previous studies, we led the efforts to assemble the genome of a haploid S. spontaneum AP85-441 (Zhang et al., 2018) and pioneered the approach to tackle a complex autopolyploid at allele-level resolution. We then traced the origins of Saccharum and mapped the chromosomal evolution in S. spontaneum Np-X (Zhang et al., 2022). Additionally, we successfully assembled a complete, gap-free diploid Erianthus rufipilus YN2009-3 genome, shedding light on the genomic footprints of evolution in the highly polyploid Saccharum (Wang et al., 2023). Meanwhile, we are proud to present the genome of Saccharum hybrid XTT22, considered the most significant achievement in sugarcane research. Our work is currently accepted and will soon be online (Zhang et al., Nature Genetics). In addition, other teams have similarly worked on genome research in the Sugarcane. This year, the genomes of modern sugarcane R570 and ZZ1 were published by A. D'Hont's team and Muqing Zhang's team, respectively (Bao et al., 2024; Healey et al., 2024).

Building upon this foundation, we are pleased to introduce ScDB (Saccharum genomic database, https://sugarcane.gxu.edu.cn/scdb), the first user-friendly multi-omics database for six Saccharum species (AP85-441, Np-X, LA-Purple, XTT22, R570, ZZ1) and a Erianthus rufipilus (YN2009-3). ScDB currently comprises a total of 38.91 Gb of genomic assembly sequences, encompassing 1 366 608 genes. Additionally, ScDB includes 24 transcriptome projects involving over 300 sugarcane samples and approximately 2.5 TB of data. Furthermore, 12 online functions that are frequently used by users have been developed to facilitate the use of ScDB, include ‘Gene Search’, ‘Orthologous Gene Search’, ‘Synteny Block’, ‘Genome Browser’, ‘Gene Expression’, ‘Co-expression Network’, ‘Blast’, ‘Primer’, ‘Sequence Fetch’, ‘Transcription Factors’, ‘Protein Interaction Network’, ‘Profile Inference’ (Figure 1a).

Abstract Image
Figure 1
Open in figure viewerPowerPoint
Overview of ScDB and its functions in multi-omics analysis. (a) The phylogenetic relationships and data sources of all existing species in ScDB, the construction process and the modules and tools included. (b) Advanced search of the home page. (c) Part of the gene details page: gene function annotations and expression data for different studies. (d) The gene synteny blocks are obtained using the synteny blocks search function. The results are presented as a synteny diagram and table. (e) Gene expression heatmaps allow users to select different studies and Expression Units (TPM or FPKM) and customize colour schemes. (f) The Profile Inference tool enables users to match known motifs by gene ID, gene name and amino acid sequence, and meme files are provided for download.

ScDB consists of a frontend web interface, a backend application server, a main database and a suite of tools for analysis and visualization. The database is an organized database into six main modules: ‘Home’, ‘Genomics’, ‘Transcriptomics’, ‘Tools’, ‘Download’ and ‘Publication’. The homepage features an introduction to ScDB, an advanced search engine, descriptions of Saccharum species and Erianthus rufipilus, and links to various tools. The advanced search function enables users to search by gene ID, gene name, GO number and KEGG number (Figure 1b).

The ‘Genomics module’ includes functions for ‘Genome’, ‘Gene Search’, ‘Synteny Blocks’ and ‘Genome Browser’. The ‘Genome’ reveals Saccharum species and Erianthus rufipilus that have been sequenced, along with insights into their geographic distribution and evolutionary ties. Users can view detailed genomic information and images for each variety, as well as structural annotations for each chromosome. In the ‘Gene Search’ feature, users can look up several genes using either gene IDs or specific chromosome regions. The ‘Search By Range’ option includes a chromosome selection tool, making it easier for those who are less acquainted with the genome to navigate. The gene details page provides information on the location of genes, functional annotations, expression of various studies, Orthogroups genes, as well as CDS, proteins and upstream and downstream sequences (Figure 1c). The ‘Orthologous Gene Search’ module searches for homologous genes, allowing the entry of genes from species included in the ScDB, and Arabidopsis, rice and sorghum. The ‘Synteny Block’ can be used for a swift examination of the evolution and variety within large homologous gene segments and chromosome (Figure 1d). The ‘Genome Browser’ tool provides a fast and interactive genome browser for navigating large-scale high-throughput sequencing data under a genomic framework.

The ‘Transcriptomics module’ offers search and visualization functionalities for gene expression (Figure 1e) and co-expression gene networks. In the ‘Gene Expression’, Users are facilitated to access expression data for a range of genes. Users have the freedom to select their preferred studies, select the expression units (either Transcripts Per Million or Fragments Per Kilobase Million), and customize the color scheme of the heatmap according to their preferences.

The ‘Tools’ module includes functions for ‘Blast’, ‘Primer’, ‘Sequence Fetch’, ‘Transcription Factors’, ‘Protein Interaction Network’ and ‘Profile Inference’. The ‘Blast’ tool performs homology searches with different data sets. ‘Primer’ is the primer design tool. ‘Sequence Fetch’ can be used to extract chromosome sequences from a specified region. In the ‘Transcription Factors’, we used iTAK (Zheng et al., 2016) software to identify transcription factor families and kinase families of Saccharum species and Erianthus rufipilus, users can click on the name of any transcription factor family or kinase family to view a list of all genes contained in that family and can also search for the gene family in which the gene belongs. In ‘Protein Interaction Network’, users can search protein interaction networks for specific genes by gene IDs. The results are presented in a table that can be saved in CSV files and also visualized as an interactive network diagram, which can also be saved as an SVG image. Users can search for motifs in the Jaspar database by matching gene ID, gene name and protein sequence in ‘Profile Inference’, and download meme format files that can be used for binding prediction with upstream sequences obtained from the gene details page (Figure 1f). ‘Download’ module provides chromosome data and annotations for download.

In summary, we present ScDB, which encompasses genome assemblies, annotations and transcriptome data of six Saccharum species and Erianthus rufipilus. To enhance the usability and efficiency of data acquisition and analysis, ScDB also provides a suite of convenient modules for search, analysis and visualization. In the future, ScDB will continue to be updated, adding more sugarcane genome data and other levels of omics data (proteomics, epigenetics, ncRNA, etc.), as well as further data analysis tools to ensure that it is a powerful and sustainable sugarcane data collection and analysis platform.

ScDB:蔗糖专用综合数据库,促进甘蔗功能基因组学和分子生物学研究
合成块 "可用于快速检查大型同源基因片段和染色体内的进化和多样性(图 1d)。基因组浏览器 "工具提供了一个快速、交互式的基因组浏览器,用于在基因组框架下浏览大规模高通量测序数据。"转录组学模块 "提供了基因表达(图 1e)和共表达基因网络的搜索和可视化功能。在 "基因表达 "中,用户可以方便地访问一系列基因的表达数据。用户可以自由选择自己喜欢的研究,选择表达单位(每百万转录本或每百万片段),并根据自己的喜好定制热图的配色方案。"工具 "模块包括 "Blast"、"Primer"、"Sequence Fetch"、"转录因子"、"蛋白质相互作用网络 "和 "Profile Inference "等功能。Blast "工具利用不同的数据集进行同源性搜索。引物 "是引物设计工具。序列提取 "可用于从指定区域提取染色体序列。在 "转录因子 "中,我们使用 iTAK(Zheng 等人,2016 年)软件识别了蔗糖树种和 Erianthus rufipilus 的转录因子家族和激酶家族,用户可以点击任何转录因子家族或激酶家族的名称,查看该家族包含的所有基因列表,还可以搜索该基因所属的基因家族。在 "蛋白质相互作用网络 "中,用户可以通过基因 ID 搜索特定基因的蛋白质相互作用网络。搜索结果以表格形式呈现,可以 CSV 文件格式保存,也可以可视化为交互式网络图,还可以 SVG 图像格式保存。用户可以在 "Profile Inference "中通过匹配基因 ID、基因名称和蛋白质序列来搜索 Jaspar 数据库中的主题,并下载 meme 格式文件,用于与从基因详细信息页面获得的上游序列进行结合预测(图 1f)。下载 "模块提供染色体数据和注释的下载。总之,我们介绍的 ScDB 包含六个蔗糖物种和 Erianthus rufipilus 的基因组组装、注释和转录组数据。为了提高数据获取和分析的可用性和效率,ScDB 还提供了一套方便的搜索、分析和可视化模块。未来,ScDB 还将继续更新,增加更多的甘蔗基因组数据和其他层面的 omics 数据(蛋白质组学、表观遗传学、ncRNA 等),以及更多的数据分析工具,以确保它成为一个功能强大、可持续发展的甘蔗数据收集和分析平台。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Plant Biotechnology Journal
Plant Biotechnology Journal 生物-生物工程与应用微生物
CiteScore
20.50
自引率
2.90%
发文量
201
审稿时长
1 months
期刊介绍: Plant Biotechnology Journal aspires to publish original research and insightful reviews of high impact, authored by prominent researchers in applied plant science. The journal places a special emphasis on molecular plant sciences and their practical applications through plant biotechnology. Our goal is to establish a platform for showcasing significant advances in the field, encompassing curiosity-driven studies with potential applications, strategic research in plant biotechnology, scientific analysis of crucial issues for the beneficial utilization of plant sciences, and assessments of the performance of plant biotechnology products in practical applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信