Libgapmis: An ultrafast library for short-read single-gap alignment

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops Pub Date : 2012-10-04 DOI:10.1109/BIBMW.2012.6470221

Nikolaos S. Alachiotis, S. Berger, T. Flouri, S. Pissis, A. Stamatakis

{"title":"Libgapmis: An ultrafast library for short-read single-gap alignment","authors":"Nikolaos S. Alachiotis, S. Berger, T. Flouri, S. Pissis, A. Stamatakis","doi":"10.1109/BIBMW.2012.6470221","DOIUrl":null,"url":null,"abstract":"A broad variety of short-read alignment programmes has been released recently to address the task of mapping tens of millions of short reads to a reference genome, placing emphasis on various aspects of the problem. Although all programmes allow for a small number of alignment mismatches, some of them either perform poorly when allowing gap insertions or they do not allow for gap insertions at all. The seed-and-extend strategy is applied in most of these programmes: after a fast alignment between a fragment of the reference sequence and a high-quality fragment of a short read-the seed-an important problem is to extend the alignment between a relatively short succeeding fragment of the reference sequence and the remaining low-quality fragment of the read allowing a number of mismatches and the insertion of gaps in the alignment. However, the length of the short reads in combination with the gap occurrence frequency observed in various applications suggest that the single-gap alignment of (parts of) those reads is desirable. In this article, we present libgapmis, an ultrafast library for pairwise short-read single-gap alignment including accelerated SSE-based and GPU-based versions. It implements an algorithm, which computes a modified version of the traditional dynamic programming matrix for sequence alignment to solve the above alignment problem. We show that the library functions of the CPU-based version are up to 20x faster compared to competing programmes, while the respective SSE-based and GPU-based versions are up to 6x and llx faster than our CPU-based implementation, respectively. The functions made available via our library can be seamlessly integrated into any short-read alignment pipeline.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"10 1","pages":"688-695"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBMW.2012.6470221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

A broad variety of short-read alignment programmes has been released recently to address the task of mapping tens of millions of short reads to a reference genome, placing emphasis on various aspects of the problem. Although all programmes allow for a small number of alignment mismatches, some of them either perform poorly when allowing gap insertions or they do not allow for gap insertions at all. The seed-and-extend strategy is applied in most of these programmes: after a fast alignment between a fragment of the reference sequence and a high-quality fragment of a short read-the seed-an important problem is to extend the alignment between a relatively short succeeding fragment of the reference sequence and the remaining low-quality fragment of the read allowing a number of mismatches and the insertion of gaps in the alignment. However, the length of the short reads in combination with the gap occurrence frequency observed in various applications suggest that the single-gap alignment of (parts of) those reads is desirable. In this article, we present libgapmis, an ultrafast library for pairwise short-read single-gap alignment including accelerated SSE-based and GPU-based versions. It implements an algorithm, which computes a modified version of the traditional dynamic programming matrix for sequence alignment to solve the above alignment problem. We show that the library functions of the CPU-based version are up to 20x faster compared to competing programmes, while the respective SSE-based and GPU-based versions are up to 6x and llx faster than our CPU-based implementation, respectively. The functions made available via our library can be seamlessly integrated into any short-read alignment pipeline.

查看原文本刊更多论文

Libgapmis:一个用于短读单间隙对齐的超快库

最近发布了各种各样的短读序列比对程序，以解决将数千万个短读序列映射到参考基因组的任务，并强调了该问题的各个方面。尽管所有程序都允许少量的对齐不匹配，但其中一些程序在允许间隙插入时表现不佳，或者根本不允许间隙插入。在大多数程序中都应用了种子-扩展策略:在参考序列的片段与短序列的高质量片段(种子)之间快速比对之后，一个重要的问题是延长参考序列的相对较短的后续片段与剩余的低质量片段之间的比对，从而导致许多不匹配和在比对中插入间隙。然而，在各种应用中观察到的短读段长度与间隙发生频率的结合表明，这些读段的(部分)单间隙对齐是可取的。在本文中，我们介绍libgapmis，这是一个超快的库，用于两两短读单间隙对齐，包括基于sse和基于gpu的加速版本。实现了一种算法，该算法通过计算传统动态规划矩阵的改进版本来求解序列对齐问题。我们表明，与竞争程序相比，基于cpu的版本的库功能快了20倍，而基于sse和基于gpu的版本分别比基于cpu的实现快了6倍和16倍。通过我们的库提供的功能可以无缝地集成到任何短读对齐管道中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

自引率

0.00%

发文量