{"title":"使用代码代替数据库中的DNA样本,减少存储空间","authors":"Shan e Zahra, Sabir Abbas, Tayyab Altaf","doi":"10.54692/lgurjcsit.2019.030386","DOIUrl":null,"url":null,"abstract":"Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These arethe biomolecules that are present in all cells of human beings. Due to the self-replicating property ofDNA, it is a key constituent of genetic material that exists in all breathing creatures. This biomolecule(DNA) comprehends the genetic material obligatory for the operational and expansion of all personifiedlives. To save DNA data of a single person we require 10CD-Rom's. In this paper, A lossless three-phasecompression algorithm is presented for DNA sequences. In the first phase the dataset is segmentedhaving tetra groups and then the resultant genetic sequences are compressed in the form of uniquenumbers (e.g Array Index) and in the second phase binary code is generated on the bases of array indexnumbers and in the last phase the modified version of Run Length Encoding (RLE) is applied on thedataset.The newly proposed technique has been implemented and its performance is also measured on samples.It has achieved the best average compression ratio. After Storing different DNA Samples.","PeriodicalId":197260,"journal":{"name":"Lahore Garrison University Research Journal of Computer Science and Information Technology","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using codes in place of DNA Sample in Databases to reduce Storage\",\"authors\":\"Shan e Zahra, Sabir Abbas, Tayyab Altaf\",\"doi\":\"10.54692/lgurjcsit.2019.030386\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These arethe biomolecules that are present in all cells of human beings. Due to the self-replicating property ofDNA, it is a key constituent of genetic material that exists in all breathing creatures. This biomolecule(DNA) comprehends the genetic material obligatory for the operational and expansion of all personifiedlives. To save DNA data of a single person we require 10CD-Rom's. In this paper, A lossless three-phasecompression algorithm is presented for DNA sequences. In the first phase the dataset is segmentedhaving tetra groups and then the resultant genetic sequences are compressed in the form of uniquenumbers (e.g Array Index) and in the second phase binary code is generated on the bases of array indexnumbers and in the last phase the modified version of Run Length Encoding (RLE) is applied on thedataset.The newly proposed technique has been implemented and its performance is also measured on samples.It has achieved the best average compression ratio. After Storing different DNA Samples.\",\"PeriodicalId\":197260,\"journal\":{\"name\":\"Lahore Garrison University Research Journal of Computer Science and Information Technology\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lahore Garrison University Research Journal of Computer Science and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54692/lgurjcsit.2019.030386\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lahore Garrison University Research Journal of Computer Science and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54692/lgurjcsit.2019.030386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using codes in place of DNA Sample in Databases to reduce Storage
Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These arethe biomolecules that are present in all cells of human beings. Due to the self-replicating property ofDNA, it is a key constituent of genetic material that exists in all breathing creatures. This biomolecule(DNA) comprehends the genetic material obligatory for the operational and expansion of all personifiedlives. To save DNA data of a single person we require 10CD-Rom's. In this paper, A lossless three-phasecompression algorithm is presented for DNA sequences. In the first phase the dataset is segmentedhaving tetra groups and then the resultant genetic sequences are compressed in the form of uniquenumbers (e.g Array Index) and in the second phase binary code is generated on the bases of array indexnumbers and in the last phase the modified version of Run Length Encoding (RLE) is applied on thedataset.The newly proposed technique has been implemented and its performance is also measured on samples.It has achieved the best average compression ratio. After Storing different DNA Samples.