{"title":"A power-efficient FPGA accelerator: Systolic array with cache-coherent interface for pair-HMM algorithm","authors":"Megumi Ito, Moriyoshi Ohara","doi":"10.1109/CoolChips.2016.7503681","DOIUrl":null,"url":null,"abstract":"A systolic array is known as a parallel hardware architecture applicable to a wide range of applications. Naive implementations, however, can lead to inefficient resource usage and low power performance. In this paper, we discuss two techniques for improving the hardware resource usage: flexible multi-threading and dummy data padding. The design was implemented to accelerate a pair-HMM algorithm on an FPGA with the IBM POWER8 CAPI (Coherent Accelerator Processor Interface) feature. The CAPI feature simplifies the software design for driving the FPGA accelerator. Our experimental result indicates that the implemented FPGA accelerator executing the pair-HMM algorithm achieves 33x higher power performance than a POWER8 processor chip executing the same algorithm.","PeriodicalId":273992,"journal":{"name":"2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoolChips.2016.7503681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29
Abstract
A systolic array is known as a parallel hardware architecture applicable to a wide range of applications. Naive implementations, however, can lead to inefficient resource usage and low power performance. In this paper, we discuss two techniques for improving the hardware resource usage: flexible multi-threading and dummy data padding. The design was implemented to accelerate a pair-HMM algorithm on an FPGA with the IBM POWER8 CAPI (Coherent Accelerator Processor Interface) feature. The CAPI feature simplifies the software design for driving the FPGA accelerator. Our experimental result indicates that the implemented FPGA accelerator executing the pair-HMM algorithm achieves 33x higher power performance than a POWER8 processor chip executing the same algorithm.