{"title":"An Optimal Seed Based Compression Algorithm for DNA Sequences.","authors":"Pamela Vinitha Eric, Gopakumar Gopalakrishnan, Muralikrishnan Karunakaran","doi":"10.1155/2016/3528406","DOIUrl":null,"url":null,"abstract":"<p><p>This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms. </p>","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2016 ","pages":"3528406"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983397/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2016/3528406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2016/7/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.
本文提出了一种基于种子的无损压缩算法来压缩 DNA 序列,该算法使用的替换方法与 LempelZiv 压缩方案类似。所提出的方法利用了 DNA 序列中固有的重复结构,创建了一个离线字典,其中包含所有此类重复以及错配的详细信息。通过确保只允许有希望的错配,该方法实现了与现有无损 DNA 序列压缩算法相当或更好的压缩率。