fpga上异构处理元件的统一管理方法

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI:10.1109/FPL57034.2022.00048

S. Denholm, W. Luk

{"title":"fpga上异构处理元件的统一管理方法","authors":"S. Denholm, W. Luk","doi":"10.1109/FPL57034.2022.00048","DOIUrl":null,"url":null,"abstract":"FPGA designs do not typically include all available processing elements, e.g., LUTs, DSPs and embedded cores. Additional work is required to manage their different implementations and behaviour, which can unbalance parallel pipelines and complicate development. In this paper we introduce a novel management architecture to unify heterogeneous processing elements into compute pools. A pool formed of E processing elements, each implementing the same function, serves D parallel function calls. A call-and-response approach to computation allows for different processing element implementations, connections, latencies and non-deterministic behaviour. Our rotating scheduler automatically arbitrates access to processing elements, uses greatly simplified routing, and scales linearly with D parallel accesses to the compute pool. Processing elements can easily be added to improve performance, or removed to reduce resource use and routing, facilitating higher operating frequencies. Migrating to larger or smaller FPGAs thus comes at a known performance cost. We assess our framework with a range of neural network activation functions (ReLU, LReLU, ELU, GELU, sigmoid, swish, softplus and tanh) on the Xilinx Alveo U280.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Unified Approach for Managing Heterogeneous Processing Elements on FPGAs\",\"authors\":\"S. Denholm, W. Luk\",\"doi\":\"10.1109/FPL57034.2022.00048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"FPGA designs do not typically include all available processing elements, e.g., LUTs, DSPs and embedded cores. Additional work is required to manage their different implementations and behaviour, which can unbalance parallel pipelines and complicate development. In this paper we introduce a novel management architecture to unify heterogeneous processing elements into compute pools. A pool formed of E processing elements, each implementing the same function, serves D parallel function calls. A call-and-response approach to computation allows for different processing element implementations, connections, latencies and non-deterministic behaviour. Our rotating scheduler automatically arbitrates access to processing elements, uses greatly simplified routing, and scales linearly with D parallel accesses to the compute pool. Processing elements can easily be added to improve performance, or removed to reduce resource use and routing, facilitating higher operating frequencies. Migrating to larger or smaller FPGAs thus comes at a known performance cost. We assess our framework with a range of neural network activation functions (ReLU, LReLU, ELU, GELU, sigmoid, swish, softplus and tanh) on the Xilinx Alveo U280.\",\"PeriodicalId\":380116,\"journal\":{\"name\":\"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)\",\"volume\":\"135 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPL57034.2022.00048\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL57034.2022.00048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

FPGA设计通常不包括所有可用的处理元件，例如lut、dsp和嵌入式内核。需要额外的工作来管理它们不同的实现和行为，这可能会使并行管道不平衡并使开发复杂化。本文引入了一种新的管理体系结构，将异构处理元素统一到计算池中。由E个处理元素组成的池，每个处理元素实现相同的函数，提供D个并行函数调用。调用-响应计算方法允许不同的处理元素实现、连接、延迟和不确定性行为。我们的旋转调度器自动仲裁对处理元素的访问，使用大大简化的路由，并随着对计算池的D并行访问而线性扩展。可以很容易地添加处理元素以提高性能，或者删除处理元素以减少资源使用和路由，从而促进更高的操作频率。因此，迁移到更大或更小的fpga需要付出已知的性能代价。我们在Xilinx Alveo U280上使用一系列神经网络激活函数(ReLU, LReLU, ELU, GELU, sigmoid, swish, softplus和tanh)来评估我们的框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Unified Approach for Managing Heterogeneous Processing Elements on FPGAs

FPGA designs do not typically include all available processing elements, e.g., LUTs, DSPs and embedded cores. Additional work is required to manage their different implementations and behaviour, which can unbalance parallel pipelines and complicate development. In this paper we introduce a novel management architecture to unify heterogeneous processing elements into compute pools. A pool formed of E processing elements, each implementing the same function, serves D parallel function calls. A call-and-response approach to computation allows for different processing element implementations, connections, latencies and non-deterministic behaviour. Our rotating scheduler automatically arbitrates access to processing elements, uses greatly simplified routing, and scales linearly with D parallel accesses to the compute pool. Processing elements can easily be added to improve performance, or removed to reduce resource use and routing, facilitating higher operating frequencies. Migrating to larger or smaller FPGAs thus comes at a known performance cost. We assess our framework with a range of neural network activation functions (ReLU, LReLU, ELU, GELU, sigmoid, swish, softplus and tanh) on the Xilinx Alveo U280.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)

自引率

0.00%

发文量