{"title":"基于数据通信分析的FPGA加速器自动混合互连设计","authors":"C. Pham-Quoc, Z. Al-Ars, K. Bertels","doi":"10.1109/IPDPSW.2014.21","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce an automated interconnect design strategy to create an efficient custom interconnect for kernels in an FPGA-based accelerator system to accelerate their communication behavior. Our custom interconnect includes an NoC, shared local memory solution or both. Depending on the quantitative communication profiling of the application, the interconnect is built using our proposed custom interconnect design algorithm and adaptive mapping function. Experimental results show that our system achieves an overall application speed-up of 3.72× compared to software and of 2.87× compared to the baseline system - a conventional FPGA bus-based accelerator system. Moreover, our proposed system achieves 66.5% energy reduction due to the reduced execution time.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Automated Hybrid Interconnect Design for FPGA Accelerators Using Data Communication Profiling\",\"authors\":\"C. Pham-Quoc, Z. Al-Ars, K. Bertels\",\"doi\":\"10.1109/IPDPSW.2014.21\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we introduce an automated interconnect design strategy to create an efficient custom interconnect for kernels in an FPGA-based accelerator system to accelerate their communication behavior. Our custom interconnect includes an NoC, shared local memory solution or both. Depending on the quantitative communication profiling of the application, the interconnect is built using our proposed custom interconnect design algorithm and adaptive mapping function. Experimental results show that our system achieves an overall application speed-up of 3.72× compared to software and of 2.87× compared to the baseline system - a conventional FPGA bus-based accelerator system. Moreover, our proposed system achieves 66.5% energy reduction due to the reduced execution time.\",\"PeriodicalId\":153864,\"journal\":{\"name\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2014.21\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automated Hybrid Interconnect Design for FPGA Accelerators Using Data Communication Profiling
In this paper, we introduce an automated interconnect design strategy to create an efficient custom interconnect for kernels in an FPGA-based accelerator system to accelerate their communication behavior. Our custom interconnect includes an NoC, shared local memory solution or both. Depending on the quantitative communication profiling of the application, the interconnect is built using our proposed custom interconnect design algorithm and adaptive mapping function. Experimental results show that our system achieves an overall application speed-up of 3.72× compared to software and of 2.87× compared to the baseline system - a conventional FPGA bus-based accelerator system. Moreover, our proposed system achieves 66.5% energy reduction due to the reduced execution time.