D. Satyanvesh, Kaliuday Balleda, P. K. Baruah, S. Sai
{"title":"Genalign -一个用于对齐压缩DNA序列的高性能实现","authors":"D. Satyanvesh, Kaliuday Balleda, P. K. Baruah, S. Sai","doi":"10.1109/ICACT.2013.6710490","DOIUrl":null,"url":null,"abstract":"In molecular biology, sequence alignment is a way of arranging DNA, RNA or protein sequences to identify regions of similarity between the sequences. However, this is a challenging problem since the DNA sequences are huge in size and the databases are growing at an exponential rate. It requires tremendous amount of memory and large computational power. For example, the human genome in raw format ranges from 2 to 30 Tera-bytes. The inherent property of DNA is that it contains many repeats which makes it highly compressible. This paper presents a new approach of aligning the sequences after compressing them. The alignment consists of both ungapped and gapped alignment. Multi-cores and GPUs can be used to align these huge sequences quickly on the compressed sequences. The focus mainly is on aligning the huge sequences accurately. The ungapped alignment achieves a speedup of upto 56 on K20 Kepler GPUs and the gapped alignment achieves a speedup of upto 15 on multi-cores.","PeriodicalId":302640,"journal":{"name":"2013 15th International Conference on Advanced Computing Technologies (ICACT)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Genalign — A high performance implementation for aligning the compressed DNA sequences\",\"authors\":\"D. Satyanvesh, Kaliuday Balleda, P. K. Baruah, S. Sai\",\"doi\":\"10.1109/ICACT.2013.6710490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In molecular biology, sequence alignment is a way of arranging DNA, RNA or protein sequences to identify regions of similarity between the sequences. However, this is a challenging problem since the DNA sequences are huge in size and the databases are growing at an exponential rate. It requires tremendous amount of memory and large computational power. For example, the human genome in raw format ranges from 2 to 30 Tera-bytes. The inherent property of DNA is that it contains many repeats which makes it highly compressible. This paper presents a new approach of aligning the sequences after compressing them. The alignment consists of both ungapped and gapped alignment. Multi-cores and GPUs can be used to align these huge sequences quickly on the compressed sequences. The focus mainly is on aligning the huge sequences accurately. The ungapped alignment achieves a speedup of upto 56 on K20 Kepler GPUs and the gapped alignment achieves a speedup of upto 15 on multi-cores.\",\"PeriodicalId\":302640,\"journal\":{\"name\":\"2013 15th International Conference on Advanced Computing Technologies (ICACT)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 15th International Conference on Advanced Computing Technologies (ICACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACT.2013.6710490\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 15th International Conference on Advanced Computing Technologies (ICACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACT.2013.6710490","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Genalign — A high performance implementation for aligning the compressed DNA sequences
In molecular biology, sequence alignment is a way of arranging DNA, RNA or protein sequences to identify regions of similarity between the sequences. However, this is a challenging problem since the DNA sequences are huge in size and the databases are growing at an exponential rate. It requires tremendous amount of memory and large computational power. For example, the human genome in raw format ranges from 2 to 30 Tera-bytes. The inherent property of DNA is that it contains many repeats which makes it highly compressible. This paper presents a new approach of aligning the sequences after compressing them. The alignment consists of both ungapped and gapped alignment. Multi-cores and GPUs can be used to align these huge sequences quickly on the compressed sequences. The focus mainly is on aligning the huge sequences accurately. The ungapped alignment achieves a speedup of upto 56 on K20 Kepler GPUs and the gapped alignment achieves a speedup of upto 15 on multi-cores.