Liang Gong, Qiaojun Lou, Chenrui Yu, Yunyun Chen, Jun Hong, Wei Wu, Shengzhe Fan, Liangyu Chen, Chengliang Liu
{"title":"GpemDB: A Scalable Database Architecture with the Multi-omics Entity-relationship Model to Integrate Heterogeneous Big-data for Precise Crop Breeding.","authors":"Liang Gong, Qiaojun Lou, Chenrui Yu, Yunyun Chen, Jun Hong, Wei Wu, Shengzhe Fan, Liangyu Chen, Chengliang Liu","doi":"10.31083/j.fbl2705159","DOIUrl":null,"url":null,"abstract":"BACKGROUND\nWith the development of high-throughput genome sequencing and phenotype screening techniques, there is a possibility of leveraging multi-omics to speed up the breeding process. However, the heterogeneity of big data handicaps the progress and the lack of a comprehensive database supporting end-to-end association analysis impedes the efficient use of these data.\n\n\nMETHODS\nIn response to this problem, a scalable entity-relationship model and a database architecture are firstly proposed in this paper to manage the cross-platform data sets and explore the relationship among multi-omics, and finally accelerate our breeding efficiency. First, the targeted omics data of crops should be normalized before being stored in the database. A typical breeding data content and structure is demonstrated with the case study of rice (Oryza sativa L). Second, the structure, patterns and hierarchy of multi-omics data are described with the entity-relationship modeling technique. Third, some statistical tools used frequently in the agricultural analysis have been embedded into the database to help breeding.\n\n\nRESULTS\nAs a result, a general-purpose scalable database, called GpemDB integrating genomics, phenomics, enviromics and management, is developed. It is the first database designed to manage all these four omics data together. The GpemDB involving Gpem metadata-level layer and informative-level layer provides a visualized scheme to display the content of the database and facilitates users to manage, analyze and share breeding data.\n\n\nCONCLUSIONS\nGpemDB has been successfully applied to a rice population, which demonstrates this database architecture and model are promising to serve as a powerful tool to utilize the big data for high precise and efficient research and breeding of crops.","PeriodicalId":50430,"journal":{"name":"Frontiers in Bioscience-Landmark","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Bioscience-Landmark","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.31083/j.fbl2705159","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Immunology and Microbiology","Score":null,"Total":0}
引用次数: 3
Abstract
BACKGROUND
With the development of high-throughput genome sequencing and phenotype screening techniques, there is a possibility of leveraging multi-omics to speed up the breeding process. However, the heterogeneity of big data handicaps the progress and the lack of a comprehensive database supporting end-to-end association analysis impedes the efficient use of these data.
METHODS
In response to this problem, a scalable entity-relationship model and a database architecture are firstly proposed in this paper to manage the cross-platform data sets and explore the relationship among multi-omics, and finally accelerate our breeding efficiency. First, the targeted omics data of crops should be normalized before being stored in the database. A typical breeding data content and structure is demonstrated with the case study of rice (Oryza sativa L). Second, the structure, patterns and hierarchy of multi-omics data are described with the entity-relationship modeling technique. Third, some statistical tools used frequently in the agricultural analysis have been embedded into the database to help breeding.
RESULTS
As a result, a general-purpose scalable database, called GpemDB integrating genomics, phenomics, enviromics and management, is developed. It is the first database designed to manage all these four omics data together. The GpemDB involving Gpem metadata-level layer and informative-level layer provides a visualized scheme to display the content of the database and facilitates users to manage, analyze and share breeding data.
CONCLUSIONS
GpemDB has been successfully applied to a rice population, which demonstrates this database architecture and model are promising to serve as a powerful tool to utilize the big data for high precise and efficient research and breeding of crops.
期刊介绍:
FBL is an international peer-reviewed open access journal of biological and medical science. FBL publishes state of the art advances in any discipline in the area of biology and medicine, including biochemistry and molecular biology, parasitology, virology, immunology, epidemiology, microbiology, entomology, botany, agronomy, as well as basic medicine, preventive medicine, bioinformatics and other related topics.