具有解耦内存访问的计算管道的体系结构综合

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI:10.1109/FPT.2014.7082758

Shaoyi Cheng, J. Wawrzynek

{"title":"具有解耦内存访问的计算管道的体系结构综合","authors":"Shaoyi Cheng, J. Wawrzynek","doi":"10.1109/FPT.2014.7082758","DOIUrl":null,"url":null,"abstract":"As high level synthesis (HLS) moves towards mainstream adoption among FPGA designers, it has proven to be an effective method for rapid hardware generation. However, in the context of offloading compute intensive software kernels to FPGA accelerators, current HLS tools do not always take full advantage of the hardware platforms. In this paper, we present an automatic flow to refactor and restructure processor-centric software implementations, making them better suited for FPGA platforms. The methodology generates pipelines that decouple memory operations and data access from computation. The resulting pipelines have much better throughput due to their efficient use of the memory bandwidth and improved tolerance to data access latency. The methodology complements existing work in high-level synthesis, easing the creation of heterogeneous systems with high performance accelerators and general purpose processors. With this approach, for a set of non-regular algorithm kernels written in C, a performance improvement of 3.3 to 9.1x is observed over direct C-to-Hardware mapping using a state-of-the-art HLS tool.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"18 1","pages":"83-90"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Architectural synthesis of computational pipelines with decoupled memory access\",\"authors\":\"Shaoyi Cheng, J. Wawrzynek\",\"doi\":\"10.1109/FPT.2014.7082758\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As high level synthesis (HLS) moves towards mainstream adoption among FPGA designers, it has proven to be an effective method for rapid hardware generation. However, in the context of offloading compute intensive software kernels to FPGA accelerators, current HLS tools do not always take full advantage of the hardware platforms. In this paper, we present an automatic flow to refactor and restructure processor-centric software implementations, making them better suited for FPGA platforms. The methodology generates pipelines that decouple memory operations and data access from computation. The resulting pipelines have much better throughput due to their efficient use of the memory bandwidth and improved tolerance to data access latency. The methodology complements existing work in high-level synthesis, easing the creation of heterogeneous systems with high performance accelerators and general purpose processors. With this approach, for a set of non-regular algorithm kernels written in C, a performance improvement of 3.3 to 9.1x is observed over direct C-to-Hardware mapping using a state-of-the-art HLS tool.\",\"PeriodicalId\":6877,\"journal\":{\"name\":\"2014 International Conference on Field-Programmable Technology (FPT)\",\"volume\":\"18 1\",\"pages\":\"83-90\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Field-Programmable Technology (FPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPT.2014.7082758\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2014.7082758","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

随着高层次综合(HLS)在FPGA设计人员中逐渐成为主流，它已被证明是快速硬件生成的有效方法。然而，在将计算密集型软件内核卸载到FPGA加速器的背景下，当前的HLS工具并不总是充分利用硬件平台。在本文中，我们提出了一个自动流程来重构和重构以处理器为中心的软件实现，使它们更适合FPGA平台。该方法生成了将内存操作和数据访问与计算解耦的管道。由于有效地利用了内存带宽并提高了对数据访问延迟的容忍度，因此生成的管道具有更好的吞吐量。该方法补充了现有的高级综合工作，简化了具有高性能加速器和通用处理器的异构系统的创建。使用这种方法，对于用C编写的一组非规则算法内核，使用最先进的HLS工具进行直接C到硬件映射，可以观察到3.3到9.1倍的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Architectural synthesis of computational pipelines with decoupled memory access

As high level synthesis (HLS) moves towards mainstream adoption among FPGA designers, it has proven to be an effective method for rapid hardware generation. However, in the context of offloading compute intensive software kernels to FPGA accelerators, current HLS tools do not always take full advantage of the hardware platforms. In this paper, we present an automatic flow to refactor and restructure processor-centric software implementations, making them better suited for FPGA platforms. The methodology generates pipelines that decouple memory operations and data access from computation. The resulting pipelines have much better throughput due to their efficient use of the memory bandwidth and improved tolerance to data access latency. The methodology complements existing work in high-level synthesis, easing the creation of heterogeneous systems with high performance accelerators and general purpose processors. With this approach, for a set of non-regular algorithm kernels written in C, a performance improvement of 3.3 to 9.1x is observed over direct C-to-Hardware mapping using a state-of-the-art HLS tool.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference on Field-Programmable Technology (FPT)

自引率

0.00%

发文量