Prefiltering Model for Homology Detection Algorithms on GPU

Evolutionary Bioinformatics Online Pub Date : 2016-01-01 DOI:10.4137/EBO.S40877

Germán Retamosa, L. de Pedro, Iván González, J. Tamames

{"title":"Prefiltering Model for Homology Detection Algorithms on GPU","authors":"Germán Retamosa, L. de Pedro, Iván González, J. Tamames","doi":"10.4137/EBO.S40877","DOIUrl":null,"url":null,"abstract":"Homology detection has evolved over the time from heavy algorithms based on dynamic programming approaches to lightweight alternatives based on different heuristic models. However, the main problem with these algorithms is that they use complex statistical models, which makes it difficult to achieve a relevant speedup and find exact matches with the original results. Thus, their acceleration is essential. The aim of this article was to prefilter a sequence database. To make this work, we have implemented a groundbreaking heuristic model based on NVIDIA's graphics processing units (GPUs) and multicore processors. Depending on the sensitivity settings, this makes it possible to quickly reduce the sequence database by factors between 50% and 95%, while rejecting no significant sequences. Furthermore, this prefiltering application can be used together with multiple homology detection algorithms as a part of a next-generation sequencing system. Extensive performance and accuracy tests have been carried out in the Spanish National Centre for Biotechnology (NCB). The results show that GPU hardware can accelerate the execution times of former homology detection applications, such as National Centre for Biotechnology Information (NCBI), Basic Local Alignment Search Tool for Proteins (BLASTP), up to a factor of 4. KEY POINTS • Owing to the increasing size of the current sequence datasets, filtering approach and high-performance computing (HPC) techniques are the best solution to process all these information in acceptable processing times. • Graphics processing unit cards and their corresponding programming models are good options to carry out these processing methods. • Combination of filtration models with HPC techniques is able to offer new levels of performance and accuracy in homology detection algorithms such as National Centre for Biotechnology Information Basic Local Alignment Search Tool.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Bioinformatics Online","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4137/EBO.S40877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Homology detection has evolved over the time from heavy algorithms based on dynamic programming approaches to lightweight alternatives based on different heuristic models. However, the main problem with these algorithms is that they use complex statistical models, which makes it difficult to achieve a relevant speedup and find exact matches with the original results. Thus, their acceleration is essential. The aim of this article was to prefilter a sequence database. To make this work, we have implemented a groundbreaking heuristic model based on NVIDIA's graphics processing units (GPUs) and multicore processors. Depending on the sensitivity settings, this makes it possible to quickly reduce the sequence database by factors between 50% and 95%, while rejecting no significant sequences. Furthermore, this prefiltering application can be used together with multiple homology detection algorithms as a part of a next-generation sequencing system. Extensive performance and accuracy tests have been carried out in the Spanish National Centre for Biotechnology (NCB). The results show that GPU hardware can accelerate the execution times of former homology detection applications, such as National Centre for Biotechnology Information (NCBI), Basic Local Alignment Search Tool for Proteins (BLASTP), up to a factor of 4. KEY POINTS • Owing to the increasing size of the current sequence datasets, filtering approach and high-performance computing (HPC) techniques are the best solution to process all these information in acceptable processing times. • Graphics processing unit cards and their corresponding programming models are good options to carry out these processing methods. • Combination of filtration models with HPC techniques is able to offer new levels of performance and accuracy in homology detection algorithms such as National Centre for Biotechnology Information Basic Local Alignment Search Tool.

查看原文本刊更多论文

GPU上同调检测算法的预滤波模型

随着时间的推移，同源性检测已经从基于动态规划方法的重型算法发展到基于不同启发式模型的轻量级替代算法。然而，这些算法的主要问题是它们使用了复杂的统计模型，这使得很难实现相关的加速并找到与原始结果的精确匹配。因此，它们的加速度是必不可少的。本文的目的是对序列数据库进行预过滤。为了做到这一点，我们已经实现了一个突破性的启发式模型基于NVIDIA的图形处理单元(gpu)和多核处理器。根据灵敏度设置，这使得可以快速减少50%至95%的因子序列数据库，同时拒绝无显著序列。此外，该预滤波应用程序可以与多种同源检测算法一起使用，作为下一代测序系统的一部分。在西班牙国家生物技术中心(NCB)进行了广泛的性能和准确性测试。结果表明，GPU硬件可以将国家生物技术信息中心(NCBI)、蛋白质基本局部比对搜索工具(BLASTP)等原有同源性检测应用程序的执行时间提高4倍。•由于当前序列数据集的规模越来越大，过滤方法和高性能计算(HPC)技术是在可接受的处理时间内处理所有这些信息的最佳解决方案。•图形处理单元卡及其相应的编程模型是执行这些处理方法的良好选择。•过滤模型与HPC技术的结合能够在同源检测算法(如国家生物技术信息中心基本局部比对搜索工具)中提供新的性能和准确性水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Evolutionary Bioinformatics Online

自引率

0.00%

发文量