Javier E. Soto, Thomas Krohmer, Cecilia Hernández, M. Figueroa
{"title":"基于位置敏感哈希的k-Mer聚类硬件加速","authors":"Javier E. Soto, Thomas Krohmer, Cecilia Hernández, M. Figueroa","doi":"10.1109/DSD.2019.00105","DOIUrl":null,"url":null,"abstract":"Clustering is an essential operation in many data analysis applications. In particular, bioinformatics and genome analysis use clustering to group similar components in sequence data, in order to find important patterns such as DNA motifs. In this paper, we present an algorithm that clusters DNA data using locality-sensitive hashing with MinHash to group similar subsequences in large Chip-seq datasets. Tested on a standard mESC dataset, the algorithm builds clusters that contain subsequences with high-score matches to known DNA motifs. We also describe the architecture and implementation of a hardware accelerator on a Xilinx Kintex-7 XC7K325T FPGA, that exploits the parallelism of the algorithm to cluster data with a throughput of one k-mer per clock cycle at 350MHz. The accelerator achieves a speedup of 91 compared to a parallel software implementation of the algorithm on a 24-core server.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"8 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Hardware Acceleration of k-Mer Clustering using Locality-Sensitive Hashing\",\"authors\":\"Javier E. Soto, Thomas Krohmer, Cecilia Hernández, M. Figueroa\",\"doi\":\"10.1109/DSD.2019.00105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering is an essential operation in many data analysis applications. In particular, bioinformatics and genome analysis use clustering to group similar components in sequence data, in order to find important patterns such as DNA motifs. In this paper, we present an algorithm that clusters DNA data using locality-sensitive hashing with MinHash to group similar subsequences in large Chip-seq datasets. Tested on a standard mESC dataset, the algorithm builds clusters that contain subsequences with high-score matches to known DNA motifs. We also describe the architecture and implementation of a hardware accelerator on a Xilinx Kintex-7 XC7K325T FPGA, that exploits the parallelism of the algorithm to cluster data with a throughput of one k-mer per clock cycle at 350MHz. The accelerator achieves a speedup of 91 compared to a parallel software implementation of the algorithm on a 24-core server.\",\"PeriodicalId\":217233,\"journal\":{\"name\":\"2019 22nd Euromicro Conference on Digital System Design (DSD)\",\"volume\":\"8 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 22nd Euromicro Conference on Digital System Design (DSD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSD.2019.00105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2019.00105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hardware Acceleration of k-Mer Clustering using Locality-Sensitive Hashing
Clustering is an essential operation in many data analysis applications. In particular, bioinformatics and genome analysis use clustering to group similar components in sequence data, in order to find important patterns such as DNA motifs. In this paper, we present an algorithm that clusters DNA data using locality-sensitive hashing with MinHash to group similar subsequences in large Chip-seq datasets. Tested on a standard mESC dataset, the algorithm builds clusters that contain subsequences with high-score matches to known DNA motifs. We also describe the architecture and implementation of a hardware accelerator on a Xilinx Kintex-7 XC7K325T FPGA, that exploits the parallelism of the algorithm to cluster data with a throughput of one k-mer per clock cycle at 350MHz. The accelerator achieves a speedup of 91 compared to a parallel software implementation of the algorithm on a 24-core server.