基于种子的 DNA 序列最佳压缩算法

Q1 Biochemistry, Genetics and Molecular Biology

Advances in Bioinformatics Pub Date : 2016-01-01 Epub Date: 2016-07-31 DOI:10.1155/2016/3528406

Pamela Vinitha Eric, Gopakumar Gopalakrishnan, Muralikrishnan Karunakaran

{"title":"基于种子的 DNA 序列最佳压缩算法","authors":"Pamela Vinitha Eric, Gopakumar Gopalakrishnan, Muralikrishnan Karunakaran","doi":"10.1155/2016/3528406","DOIUrl":null,"url":null,"abstract":"This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms. ","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2016 ","pages":"3528406"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983397/pdf/","citationCount":"0","resultStr":"{\"title\":\"An Optimal Seed Based Compression Algorithm for DNA Sequences.\",\"authors\":\"Pamela Vinitha Eric, Gopakumar Gopalakrishnan, Muralikrishnan Karunakaran\",\"doi\":\"10.1155/2016/3528406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms. \",\"PeriodicalId\":39059,\"journal\":{\"name\":\"Advances in Bioinformatics\",\"volume\":\"2016 \",\"pages\":\"3528406\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983397/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1155/2016/3528406\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2016/7/31 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"Biochemistry, Genetics and Molecular Biology\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2016/3528406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2016/7/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种基于种子的无损压缩算法来压缩 DNA 序列，该算法使用的替换方法与 LempelZiv 压缩方案类似。所提出的方法利用了 DNA 序列中固有的重复结构，创建了一个离线字典，其中包含所有此类重复以及错配的详细信息。通过确保只允许有希望的错配，该方法实现了与现有无损 DNA 序列压缩算法相当或更好的压缩率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

An Optimal Seed Based Compression Algorithm for DNA Sequences.

查看原文本刊更多论文

An Optimal Seed Based Compression Algorithm for DNA Sequences.

This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Advances in Bioinformatics Biochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (miscellaneous)

自引率

0.00%

发文量