Hardware Acceleration of k-Mer Clustering using Locality-Sensitive Hashing

Javier E. Soto, Thomas Krohmer, Cecilia Hernández, M. Figueroa
{"title":"Hardware Acceleration of k-Mer Clustering using Locality-Sensitive Hashing","authors":"Javier E. Soto, Thomas Krohmer, Cecilia Hernández, M. Figueroa","doi":"10.1109/DSD.2019.00105","DOIUrl":null,"url":null,"abstract":"Clustering is an essential operation in many data analysis applications. In particular, bioinformatics and genome analysis use clustering to group similar components in sequence data, in order to find important patterns such as DNA motifs. In this paper, we present an algorithm that clusters DNA data using locality-sensitive hashing with MinHash to group similar subsequences in large Chip-seq datasets. Tested on a standard mESC dataset, the algorithm builds clusters that contain subsequences with high-score matches to known DNA motifs. We also describe the architecture and implementation of a hardware accelerator on a Xilinx Kintex-7 XC7K325T FPGA, that exploits the parallelism of the algorithm to cluster data with a throughput of one k-mer per clock cycle at 350MHz. The accelerator achieves a speedup of 91 compared to a parallel software implementation of the algorithm on a 24-core server.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"8 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2019.00105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Clustering is an essential operation in many data analysis applications. In particular, bioinformatics and genome analysis use clustering to group similar components in sequence data, in order to find important patterns such as DNA motifs. In this paper, we present an algorithm that clusters DNA data using locality-sensitive hashing with MinHash to group similar subsequences in large Chip-seq datasets. Tested on a standard mESC dataset, the algorithm builds clusters that contain subsequences with high-score matches to known DNA motifs. We also describe the architecture and implementation of a hardware accelerator on a Xilinx Kintex-7 XC7K325T FPGA, that exploits the parallelism of the algorithm to cluster data with a throughput of one k-mer per clock cycle at 350MHz. The accelerator achieves a speedup of 91 compared to a parallel software implementation of the algorithm on a 24-core server.
基于位置敏感哈希的k-Mer聚类硬件加速
聚类是许多数据分析应用程序中必不可少的操作。特别是,生物信息学和基因组分析使用聚类对序列数据中的相似成分进行分组,以便找到重要的模式,如DNA基序。在本文中,我们提出了一种算法,该算法使用位置敏感散列和MinHash对大型Chip-seq数据集中的相似子序列进行分组。在标准的mESC数据集上进行测试,该算法构建的聚类包含与已知DNA基序具有高分匹配的子序列。我们还描述了在Xilinx Kintex-7 XC7K325T FPGA上硬件加速器的架构和实现,该加速器利用算法的并行性在350MHz下以每个时钟周期1 k-mer的吞吐量对数据进行集群。与24核服务器上并行软件实现的算法相比,该加速器的加速提高了91倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信