使用动态规划的非重复DNA序列的有效压缩

K. Srinivasa, M. Jagadish, K. Venugopal, L. Patnaik
{"title":"使用动态规划的非重复DNA序列的有效压缩","authors":"K. Srinivasa, M. Jagadish, K. Venugopal, L. Patnaik","doi":"10.1109/ADCOM.2006.4289956","DOIUrl":null,"url":null,"abstract":"DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.","PeriodicalId":296627,"journal":{"name":"2006 International Conference on Advanced Computing and Communications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Efficient Compression of non-repetitive DNA sequences using Dynamic Programming\",\"authors\":\"K. Srinivasa, M. Jagadish, K. Venugopal, L. Patnaik\",\"doi\":\"10.1109/ADCOM.2006.4289956\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.\",\"PeriodicalId\":296627,\"journal\":{\"name\":\"2006 International Conference on Advanced Computing and Communications\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 International Conference on Advanced Computing and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ADCOM.2006.4289956\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Conference on Advanced Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ADCOM.2006.4289956","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

自从基因组数据库出现以来,DNA压缩一直是人们非常感兴趣的课题。虽然只有两个比特足以编码DNA的四个碱基(即A、G、T和C),但巨大的DNA序列迫使人们需要有效的压缩。一般的文本压缩方法不利用DNA序列特有的特征。DNA特异性压缩算法通常利用重复序列。高重复率的DNA序列可以通过基于字典的压缩算法进行最佳压缩。然而,在序列中不重复出现的DNA片段使用不同的文本压缩方案进行压缩。本文提出了一种基于动态规划方法的DNA序列非重复区域压缩编码方案。为了测试该方法的效率,我们将编码方案合并到dna特异性算法DNAPack中。将该算法的性能与各种DNA压缩算法进行了比较。结果表明,该方法在许多情况下都取得了较好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Efficient Compression of non-repetitive DNA sequences using Dynamic Programming
DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信