GpemDB: A Scalable Database Architecture with the Multi-omics Entity-relationship Model to Integrate Heterogeneous Big-data for Precise Crop Breeding.

IF 3.1 4区生物学 Q2 Immunology and Microbiology

Frontiers in Bioscience-Landmark Pub Date : 2022-05-17 DOI:10.31083/j.fbl2705159

Liang Gong, Qiaojun Lou, Chenrui Yu, Yunyun Chen, Jun Hong, Wei Wu, Shengzhe Fan, Liangyu Chen, Chengliang Liu

{"title":"GpemDB: A Scalable Database Architecture with the Multi-omics Entity-relationship Model to Integrate Heterogeneous Big-data for Precise Crop Breeding.","authors":"Liang Gong, Qiaojun Lou, Chenrui Yu, Yunyun Chen, Jun Hong, Wei Wu, Shengzhe Fan, Liangyu Chen, Chengliang Liu","doi":"10.31083/j.fbl2705159","DOIUrl":null,"url":null,"abstract":"BACKGROUND\nWith the development of high-throughput genome sequencing and phenotype screening techniques, there is a possibility of leveraging multi-omics to speed up the breeding process. However, the heterogeneity of big data handicaps the progress and the lack of a comprehensive database supporting end-to-end association analysis impedes the efficient use of these data.\n\n\nMETHODS\nIn response to this problem, a scalable entity-relationship model and a database architecture are firstly proposed in this paper to manage the cross-platform data sets and explore the relationship among multi-omics, and finally accelerate our breeding efficiency. First, the targeted omics data of crops should be normalized before being stored in the database. A typical breeding data content and structure is demonstrated with the case study of rice (Oryza sativa L). Second, the structure, patterns and hierarchy of multi-omics data are described with the entity-relationship modeling technique. Third, some statistical tools used frequently in the agricultural analysis have been embedded into the database to help breeding.\n\n\nRESULTS\nAs a result, a general-purpose scalable database, called GpemDB integrating genomics, phenomics, enviromics and management, is developed. It is the first database designed to manage all these four omics data together. The GpemDB involving Gpem metadata-level layer and informative-level layer provides a visualized scheme to display the content of the database and facilitates users to manage, analyze and share breeding data.\n\n\nCONCLUSIONS\nGpemDB has been successfully applied to a rice population, which demonstrates this database architecture and model are promising to serve as a powerful tool to utilize the big data for high precise and efficient research and breeding of crops.","PeriodicalId":50430,"journal":{"name":"Frontiers in Bioscience-Landmark","volume":"1 3 1","pages":"159"},"PeriodicalIF":3.1000,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Bioscience-Landmark","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.31083/j.fbl2705159","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Immunology and Microbiology","Score":null,"Total":0}

引用次数: 3

Abstract

BACKGROUND With the development of high-throughput genome sequencing and phenotype screening techniques, there is a possibility of leveraging multi-omics to speed up the breeding process. However, the heterogeneity of big data handicaps the progress and the lack of a comprehensive database supporting end-to-end association analysis impedes the efficient use of these data. METHODS In response to this problem, a scalable entity-relationship model and a database architecture are firstly proposed in this paper to manage the cross-platform data sets and explore the relationship among multi-omics, and finally accelerate our breeding efficiency. First, the targeted omics data of crops should be normalized before being stored in the database. A typical breeding data content and structure is demonstrated with the case study of rice (Oryza sativa L). Second, the structure, patterns and hierarchy of multi-omics data are described with the entity-relationship modeling technique. Third, some statistical tools used frequently in the agricultural analysis have been embedded into the database to help breeding. RESULTS As a result, a general-purpose scalable database, called GpemDB integrating genomics, phenomics, enviromics and management, is developed. It is the first database designed to manage all these four omics data together. The GpemDB involving Gpem metadata-level layer and informative-level layer provides a visualized scheme to display the content of the database and facilitates users to manage, analyze and share breeding data. CONCLUSIONS GpemDB has been successfully applied to a rice population, which demonstrates this database architecture and model are promising to serve as a powerful tool to utilize the big data for high precise and efficient research and breeding of crops.

查看原文本刊更多论文

GpemDB:一个可扩展的数据库架构，具有多组学实体关系模型，用于集成异构大数据以实现精确作物育种。

随着高通量基因组测序和表型筛选技术的发展，利用多组学加速育种过程成为可能。然而，大数据的异质性阻碍了这一进展，而且缺乏支持端到端关联分析的综合数据库也阻碍了这些数据的有效利用。方法针对这一问题，本文首先提出了一种可扩展的实体-关系模型和数据库体系结构，对跨平台数据集进行管理，探索多组学之间的关系，从而提高育种效率。首先，对农作物的目标组学数据进行归一化处理，然后存入数据库。以水稻(Oryza sativa L)为例，展示了典型的育种数据内容和结构。其次，利用实体-关系建模技术描述了多组学数据的结构、模式和层次。第三，一些在农业分析中经常使用的统计工具被嵌入到数据库中，以帮助育种。结果开发了集基因组学、表型组学、环境学和管理学于一体的通用可扩展数据库GpemDB。它是第一个同时管理这四个组学数据的数据库。GpemDB包含Gpem元数据层和信息层，提供了一种可视化的方案来显示数据库的内容，方便用户管理、分析和共享育种数据。结论sgpemdb已成功应用于水稻群体，表明该数据库架构和模型有望成为利用大数据进行作物精准高效研究和育种的有力工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Bioscience-Landmark 生物-生化与分子生物学

CiteScore

3.40

自引率

3.20%

发文量

301

审稿时长

3 months

期刊介绍： FBL is an international peer-reviewed open access journal of biological and medical science. FBL publishes state of the art advances in any discipline in the area of biology and medicine, including biochemistry and molecular biology, parasitology, virology, immunology, epidemiology, microbiology, entomology, botany, agronomy, as well as basic medicine, preventive medicine, bioinformatics and other related topics.