可重构加速器的解耦访问-执行体系结构

Proceedings of the 15th ACM International Conference on Computing Frontiers Pub Date : 2018-05-08 DOI:10.1145/3203217.3203267

George Charitopoulos, Charalampos Vatsolakis, Grigorios Chrysos, D. Pnevmatikatos

{"title":"可重构加速器的解耦访问-执行体系结构","authors":"George Charitopoulos, Charalampos Vatsolakis, Grigorios Chrysos, D. Pnevmatikatos","doi":"10.1145/3203217.3203267","DOIUrl":null,"url":null,"abstract":"Mapping computational intensive applications on reconfigurable technology for acceleration requires two main implementation parts: (a) the data plane, i.e., efficient interconnected units that accelerate processing, and (b) the access-plane, i.e., efficient ways to access data and transfer them to/from the accelerator. Data plane construction is well understood and mature tools -such as High Level Synthesis (HLS)- that produce efficient reconfigurable architectures exist. The access plane, however, is more challenging: data fetching for big-data and high-performance computing applications is even more complex and time consuming than processing. Towards this end, we present DAER, a Decoupled Access-Execute architecture and framework for Reconfigurable accelerators. Our approach maps the code to be accelerated in two separate parts: (a) the fetch unit, responsible for fetching data to the accelerator and storing results back in memory, and (b) the processing unit, which processes the fetched data in a streaming way. This approach offers the user a structured and well-defined way of mapping applications on an FPGA. Additionally, it bodes well with other hardware-based optimization techniques, e.g. pipelining, custom processing and data prefetching, which hide the memory data access latency. We use the DAER framework and HLS mapping tools on five applications and show the proposed DAER framework achieves an order of magnitude performance speed-up compared to unmodified applications, and as much as 2x performance improvement compared to their optimized HLS versions. We, also, map the DAER-based architectures on HPC platforms showing the performance advantages of our approach on real world platforms.","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"06 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"A decoupled access-execute architecture for reconfigurable accelerators\",\"authors\":\"George Charitopoulos, Charalampos Vatsolakis, Grigorios Chrysos, D. Pnevmatikatos\",\"doi\":\"10.1145/3203217.3203267\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mapping computational intensive applications on reconfigurable technology for acceleration requires two main implementation parts: (a) the data plane, i.e., efficient interconnected units that accelerate processing, and (b) the access-plane, i.e., efficient ways to access data and transfer them to/from the accelerator. Data plane construction is well understood and mature tools -such as High Level Synthesis (HLS)- that produce efficient reconfigurable architectures exist. The access plane, however, is more challenging: data fetching for big-data and high-performance computing applications is even more complex and time consuming than processing. Towards this end, we present DAER, a Decoupled Access-Execute architecture and framework for Reconfigurable accelerators. Our approach maps the code to be accelerated in two separate parts: (a) the fetch unit, responsible for fetching data to the accelerator and storing results back in memory, and (b) the processing unit, which processes the fetched data in a streaming way. This approach offers the user a structured and well-defined way of mapping applications on an FPGA. Additionally, it bodes well with other hardware-based optimization techniques, e.g. pipelining, custom processing and data prefetching, which hide the memory data access latency. We use the DAER framework and HLS mapping tools on five applications and show the proposed DAER framework achieves an order of magnitude performance speed-up compared to unmodified applications, and as much as 2x performance improvement compared to their optimized HLS versions. We, also, map the DAER-based architectures on HPC platforms showing the performance advantages of our approach on real world platforms.\",\"PeriodicalId\":127096,\"journal\":{\"name\":\"Proceedings of the 15th ACM International Conference on Computing Frontiers\",\"volume\":\"06 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th ACM International Conference on Computing Frontiers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3203217.3203267\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3203217.3203267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

将计算密集型应用映射到可重构加速技术上需要两个主要的实现部分:(a)数据平面，即加速处理的高效互连单元;(b)访问平面，即访问数据并将其传输到加速器的有效方法。数据平面构造很好理解，并且存在成熟的工具——例如高级综合(High Level Synthesis, HLS)——可以产生高效的可重构架构。然而，访问平面更具挑战性:大数据和高性能计算应用的数据获取比处理更复杂，更耗时。为此，我们提出了DAER，一种用于可重构加速器的解耦访问-执行体系结构和框架。我们的方法将要加速的代码映射为两个独立的部分:(a)获取单元，负责将数据获取到加速器并将结果存储回内存，以及(b)处理单元，以流方式处理获取的数据。这种方法为用户提供了一种在FPGA上映射应用程序的结构化和定义良好的方法。此外，它预示着其他基于硬件的优化技术，如流水线，自定义处理和数据预取，隐藏内存数据访问延迟。我们在五个应用程序上使用DAER框架和HLS映射工具，并表明与未修改的应用程序相比，提议的DAER框架实现了数量级的性能加速，与优化后的HLS版本相比，性能提高了2倍。我们还将基于daer的架构映射到HPC平台上，展示了我们的方法在现实世界平台上的性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A decoupled access-execute architecture for reconfigurable accelerators

Mapping computational intensive applications on reconfigurable technology for acceleration requires two main implementation parts: (a) the data plane, i.e., efficient interconnected units that accelerate processing, and (b) the access-plane, i.e., efficient ways to access data and transfer them to/from the accelerator. Data plane construction is well understood and mature tools -such as High Level Synthesis (HLS)- that produce efficient reconfigurable architectures exist. The access plane, however, is more challenging: data fetching for big-data and high-performance computing applications is even more complex and time consuming than processing. Towards this end, we present DAER, a Decoupled Access-Execute architecture and framework for Reconfigurable accelerators. Our approach maps the code to be accelerated in two separate parts: (a) the fetch unit, responsible for fetching data to the accelerator and storing results back in memory, and (b) the processing unit, which processes the fetched data in a streaming way. This approach offers the user a structured and well-defined way of mapping applications on an FPGA. Additionally, it bodes well with other hardware-based optimization techniques, e.g. pipelining, custom processing and data prefetching, which hide the memory data access latency. We use the DAER framework and HLS mapping tools on five applications and show the proposed DAER framework achieves an order of magnitude performance speed-up compared to unmodified applications, and as much as 2x performance improvement compared to their optimized HLS versions. We, also, map the DAER-based architectures on HPC platforms showing the performance advantages of our approach on real world platforms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 15th ACM International Conference on Computing Frontiers

自引率

0.00%

发文量