K. Srinivasa, M. Jagadish, K. Venugopal, L. Patnaik
{"title":"使用动态规划的非重复DNA序列的有效压缩","authors":"K. Srinivasa, M. Jagadish, K. Venugopal, L. Patnaik","doi":"10.1109/ADCOM.2006.4289956","DOIUrl":null,"url":null,"abstract":"DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.","PeriodicalId":296627,"journal":{"name":"2006 International Conference on Advanced Computing and Communications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Efficient Compression of non-repetitive DNA sequences using Dynamic Programming\",\"authors\":\"K. Srinivasa, M. Jagadish, K. Venugopal, L. Patnaik\",\"doi\":\"10.1109/ADCOM.2006.4289956\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.\",\"PeriodicalId\":296627,\"journal\":{\"name\":\"2006 International Conference on Advanced Computing and Communications\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 International Conference on Advanced Computing and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ADCOM.2006.4289956\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Conference on Advanced Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ADCOM.2006.4289956","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient Compression of non-repetitive DNA sequences using Dynamic Programming
DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.