Hardware Acceleration of k-Mer Clustering using Locality-Sensitive Hashing

2019 22nd Euromicro Conference on Digital System Design (DSD) Pub Date : 2019-08-01 DOI:10.1109/DSD.2019.00105

Javier E. Soto, Thomas Krohmer, Cecilia Hernández, M. Figueroa

引用次数: 3

Abstract

Clustering is an essential operation in many data analysis applications. In particular, bioinformatics and genome analysis use clustering to group similar components in sequence data, in order to find important patterns such as DNA motifs. In this paper, we present an algorithm that clusters DNA data using locality-sensitive hashing with MinHash to group similar subsequences in large Chip-seq datasets. Tested on a standard mESC dataset, the algorithm builds clusters that contain subsequences with high-score matches to known DNA motifs. We also describe the architecture and implementation of a hardware accelerator on a Xilinx Kintex-7 XC7K325T FPGA, that exploits the parallelism of the algorithm to cluster data with a throughput of one k-mer per clock cycle at 350MHz. The accelerator achieves a speedup of 91 compared to a parallel software implementation of the algorithm on a 24-core server.

查看原文本刊更多论文

基于位置敏感哈希的k-Mer聚类硬件加速

聚类是许多数据分析应用程序中必不可少的操作。特别是，生物信息学和基因组分析使用聚类对序列数据中的相似成分进行分组，以便找到重要的模式，如DNA基序。在本文中，我们提出了一种算法，该算法使用位置敏感散列和MinHash对大型Chip-seq数据集中的相似子序列进行分组。在标准的mESC数据集上进行测试，该算法构建的聚类包含与已知DNA基序具有高分匹配的子序列。我们还描述了在Xilinx Kintex-7 XC7K325T FPGA上硬件加速器的架构和实现，该加速器利用算法的并行性在350MHz下以每个时钟周期1 k-mer的吞吐量对数据进行集群。与24核服务器上并行软件实现的算法相比，该加速器的加速提高了91倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 22nd Euromicro Conference on Digital System Design (DSD)

自引率

0.00%

发文量