RDBMS and NOSQL Based Hybrid Technology for Transcriptome Data Structuring and Processing

Q3 Mathematics

Mathematical Biology and Bioinformatics Pub Date : 2020-12-28 DOI:10.17537/2020.15.455

A. M. Mukhin, M. Genaev, D. Rasskazov, S. Lashin, D. Afonnikov

{"title":"RDBMS and NOSQL Based Hybrid Technology for Transcriptome Data Structuring and Processing","authors":"A. M. Mukhin, M. Genaev, D. Rasskazov, S. Lashin, D. Afonnikov","doi":"10.17537/2020.15.455","DOIUrl":null,"url":null,"abstract":"\nThe transcriptome sequencing experiment (RNA-seq) has become almost a routine procedure for studying both model organisms and crops. As a result of bioinformatics processing of such experimental output, huge heterogeneous data are obtained, representing nucleotide sequences of transcripts, amino acid sequences, and their structural and functional annotation. It is important to present the data obtained to a wide range of researchers in the form of databases. This article proposes a hybrid approach to creating molecular genetic databases that contain information about transcript sequences and their structural and functional annotation. The essence of the approach consists in the simultaneous storing both structured and weakly structured data in the database. The technology was used to implement a database of transcriptomes of agricultural plants. This paper discusses the features of implementing this approach and examples of generating both simple and complex queries to such a database in the SQL language. The OORT database is freely available at https://oort.cytogen.ru/.\n","PeriodicalId":53525,"journal":{"name":"Mathematical Biology and Bioinformatics","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17537/2020.15.455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 1

Abstract

The transcriptome sequencing experiment (RNA-seq) has become almost a routine procedure for studying both model organisms and crops. As a result of bioinformatics processing of such experimental output, huge heterogeneous data are obtained, representing nucleotide sequences of transcripts, amino acid sequences, and their structural and functional annotation. It is important to present the data obtained to a wide range of researchers in the form of databases. This article proposes a hybrid approach to creating molecular genetic databases that contain information about transcript sequences and their structural and functional annotation. The essence of the approach consists in the simultaneous storing both structured and weakly structured data in the database. The technology was used to implement a database of transcriptomes of agricultural plants. This paper discusses the features of implementing this approach and examples of generating both simple and complex queries to such a database in the SQL language. The OORT database is freely available at https://oort.cytogen.ru/.

查看原文本刊更多论文

基于RDBMS和NOSQL的转录组数据结构和处理混合技术

转录组测序实验(RNA-seq)几乎已成为研究模式生物和作物的常规方法。对这些实验输出进行生物信息学处理，获得了大量异构数据，包括转录本的核苷酸序列、氨基酸序列及其结构和功能注释。将获得的数据以数据库的形式呈现给广泛的研究人员是很重要的。本文提出了一种混合方法来创建包含转录序列及其结构和功能注释信息的分子遗传数据库。该方法的本质在于同时在数据库中存储结构化和弱结构化数据。该技术被用于建立农业植物转录组数据库。本文讨论了实现这种方法的特点，以及用SQL语言为这种数据库生成简单和复杂查询的示例。OORT数据库可在https://oort.cytogen.ru/免费获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Mathematical Biology and Bioinformatics Mathematics-Applied Mathematics

CiteScore

1.10

自引率

0.00%

发文量