使用动态规划的非重复DNA序列的有效压缩

2006 International Conference on Advanced Computing and Communications Pub Date : 2006-12-01 DOI:10.1109/ADCOM.2006.4289956

K. Srinivasa, M. Jagadish, K. Venugopal, L. Patnaik

{"title":"使用动态规划的非重复DNA序列的有效压缩","authors":"K. Srinivasa, M. Jagadish, K. Venugopal, L. Patnaik","doi":"10.1109/ADCOM.2006.4289956","DOIUrl":null,"url":null,"abstract":"DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.","PeriodicalId":296627,"journal":{"name":"2006 International Conference on Advanced Computing and Communications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Efficient Compression of non-repetitive DNA sequences using Dynamic Programming\",\"authors\":\"K. Srinivasa, M. Jagadish, K. Venugopal, L. Patnaik\",\"doi\":\"10.1109/ADCOM.2006.4289956\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.\",\"PeriodicalId\":296627,\"journal\":{\"name\":\"2006 International Conference on Advanced Computing and Communications\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 International Conference on Advanced Computing and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ADCOM.2006.4289956\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Conference on Advanced Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ADCOM.2006.4289956","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

自从基因组数据库出现以来，DNA压缩一直是人们非常感兴趣的课题。虽然只有两个比特足以编码DNA的四个碱基(即A、G、T和C)，但巨大的DNA序列迫使人们需要有效的压缩。一般的文本压缩方法不利用DNA序列特有的特征。DNA特异性压缩算法通常利用重复序列。高重复率的DNA序列可以通过基于字典的压缩算法进行最佳压缩。然而，在序列中不重复出现的DNA片段使用不同的文本压缩方案进行压缩。本文提出了一种基于动态规划方法的DNA序列非重复区域压缩编码方案。为了测试该方法的效率，我们将编码方案合并到dna特异性算法DNAPack中。将该算法的性能与各种DNA压缩算法进行了比较。结果表明，该方法在许多情况下都取得了较好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Compression of non-repetitive DNA sequences using Dynamic Programming

DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2006 International Conference on Advanced Computing and Communications

自引率

0.00%

发文量