Using codes in place of DNA Sample in Databases to reduce Storage

Shan e Zahra, Sabir Abbas, Tayyab Altaf
{"title":"Using codes in place of DNA Sample in Databases to reduce Storage","authors":"Shan e Zahra, Sabir Abbas, Tayyab Altaf","doi":"10.54692/lgurjcsit.2019.030386","DOIUrl":null,"url":null,"abstract":"Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These arethe biomolecules that are present in all cells of human beings. Due to the self-replicating property ofDNA, it is a key constituent of genetic material that exists in all breathing creatures. This biomolecule(DNA) comprehends the genetic material obligatory for the operational and expansion of all personifiedlives. To save DNA data of a single person we require 10CD-Rom's. In this paper, A lossless three-phasecompression algorithm is presented for DNA sequences. In the first phase the dataset is segmentedhaving tetra groups and then the resultant genetic sequences are compressed in the form of uniquenumbers (e.g Array Index) and in the second phase binary code is generated on the bases of array indexnumbers and in the last phase the modified version of Run Length Encoding (RLE) is applied on thedataset.The newly proposed technique has been implemented and its performance is also measured on samples.It has achieved the best average compression ratio. After Storing different DNA Samples.","PeriodicalId":197260,"journal":{"name":"Lahore Garrison University Research Journal of Computer Science and Information Technology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lahore Garrison University Research Journal of Computer Science and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54692/lgurjcsit.2019.030386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These arethe biomolecules that are present in all cells of human beings. Due to the self-replicating property ofDNA, it is a key constituent of genetic material that exists in all breathing creatures. This biomolecule(DNA) comprehends the genetic material obligatory for the operational and expansion of all personifiedlives. To save DNA data of a single person we require 10CD-Rom's. In this paper, A lossless three-phasecompression algorithm is presented for DNA sequences. In the first phase the dataset is segmentedhaving tetra groups and then the resultant genetic sequences are compressed in the form of uniquenumbers (e.g Array Index) and in the second phase binary code is generated on the bases of array indexnumbers and in the last phase the modified version of Run Length Encoding (RLE) is applied on thedataset.The newly proposed technique has been implemented and its performance is also measured on samples.It has achieved the best average compression ratio. After Storing different DNA Samples.
使用代码代替数据库中的DNA样本,减少存储空间
生物学数据主要包括脱氧核糖核酸(DNA)和蛋白质序列。这些是存在于人类所有细胞中的生物分子。由于dna具有自我复制的特性,它是存在于所有呼吸生物体内的遗传物质的关键组成部分。这种生物分子(DNA)理解所有人格化生命的运作和扩展所必需的遗传物质。要保存一个人的DNA数据,我们需要10张cd - rom。本文提出了一种DNA序列的无损三相压缩算法。在第一阶段,数据集被分割成四组,然后产生的基因序列以唯一数字的形式压缩(例如数组索引),在第二阶段,基于数组索引数生成二进制代码,在最后阶段,修改版本的运行长度编码(RLE)应用于数据集。该方法已经实现,并在样本上进行了性能测试。实现了最佳的平均压缩比。储存不同的DNA样本后。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信