Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献_第4页

FPGA Acceleration of Irregular Iterative Computations using Criticality-Aware Dataflow Optimizations (Abstract Only) 基于临界感知数据流优化的FPGA非规则迭代计算加速(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689110

Siddhartha, Nachiket Kapre

{"title":"FPGA Acceleration of Irregular Iterative Computations using Criticality-Aware Dataflow Optimizations (Abstract Only)","authors":"Siddhartha, Nachiket Kapre","doi":"10.1145/2684746.2689110","DOIUrl":"https://doi.org/10.1145/2684746.2689110","url":null,"abstract":"FPGA acceleration of large irregular dataflow graphs is often limited by the long tail distribution of parallelism on fine-grained overlay dataflow architectures. In this paper, we show how to overcome these limitations by exploiting criticality information along compute paths; both statically during graph pre-processing and dynamically at runtime. We statically reassociate the high-fanin dataflow chains by providing faster routes for late arriving inputs. We also perform a fanout decomposition and selective node replication in order to distribute serialization costs across multiple PEs. Additionally, we modify the dataflow firing rule in hardware to prefer critical nodes when multiple nodes are ready for dataflow evaluation. Effectively these transformations reduce the length of the tail in the parallelism profile for these large-scale graphs. Across a range of dataflow benchmarks extracted from Sparse LU factorization, we demonstrate up to 2.5× (mean 1.21×) improvement when using the static pre-processing alone, a 2.4× (mean 1.17×) improvement when using only dynamic optimizations and an overall 2.9× (mean 1.39×) improvement when both static and dynamic optimizations are enabled. These improvements are on top of 3--10× speedups over CPU implementations without our transformation enabled.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121606465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Generation of Energy and Performance Pareto Front for FPGA Designs (Abstract Only) 用于FPGA设计的高效能量生成和性能Pareto Front(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689133

S. Kuppannagari, V. Prasanna

{"title":"Efficient Generation of Energy and Performance Pareto Front for FPGA Designs (Abstract Only)","authors":"S. Kuppannagari, V. Prasanna","doi":"10.1145/2684746.2689133","DOIUrl":"https://doi.org/10.1145/2684746.2689133","url":null,"abstract":"Analysis of trade-offs between energy efficiency and latency is essential to generate designs complying with a given set of constraints. Improvements in FPGA technologies offer a myriad choices for power and performance optimizations. Various algorithm intrinsic parameters also affect these objectives. The design space is compounded by the available choices. This requires efficient techniques to quickly explore the design space. Current techniques perform Gate/RTL level or functional level power modeling which are slow and hence not scalable. In this work we perform efficient design space exploration using a high level performance model. We develop a semi-automatic design framework to generate energy efficiency and latency trade-offs. The framework develops a performance model given a high level specification of a design with minimal user assistance. It then explores the entire design space to generate the dominating designs with respect to energy efficiency and latency metrics. We illustrate the framework using convolutional neural network which gained significance due to its application in deep learning. We simulate a few designs from the dominating set and show that the performance estimation for the dominating designs are close to the simulated results. We also show that our framework explores 6000 design points per minute on a commodity platform such as Dell workstation as opposed to state-of-the-art techniques which explore at 50 to 60 design points per minute.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129186391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Session details: Technical Session 5: Processors and Accelerators 技术会议5:处理器和加速器

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/3251655

Zhiru Zhang

引用次数: 0

Enhancing Hardware Design Flows with MyHDL 用MyHDL增强硬件设计流程

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689092

Keerthan Jaic, M. C. Smith

{"title":"Enhancing Hardware Design Flows with MyHDL","authors":"Keerthan Jaic, M. C. Smith","doi":"10.1145/2684746.2689092","DOIUrl":"https://doi.org/10.1145/2684746.2689092","url":null,"abstract":"MyHDL is a Python based HDL that harnesses the power and versatility of Python for hardware development. MyHDL has excellent simulation capabilities and also allows for conversion to Verilog and VHDL, so developers can enter a conventional design flow as desired. Verilog and VHDL are used extensively, particularly because most synthesis tools only support these two languages. However, they are simply outdated; poor parameterization limits high level design and modern abstraction features such as classes are missing. On the other hand, MyHDL has great support for parameterization. However, MyHDL did not have support for converting code that used attributes, so abstraction was limited. We extended MyHDL support to include attribute conversion. We explored methods for abstracting interfaces between components and hardware-software interfaces. The result is increased code reuse, simplified module declaration, and reduced boilerplate. These extensions result in streamlining between design, simulation, and a final synthesizable hardware, thus reducing limitations on high level development and making MyHDL an even more powerful design environment for rapid hardware prototyping.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123956473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Towards More Efficient Logic Blocks By Exploiting Biconditional Expansion (Abstract Only) 利用双条件展开实现更高效的逻辑块(仅抽象)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689100

P. Gaillardon, Gain Kim, Xifan Tang, L. Amarù, G. Micheli

{"title":"Towards More Efficient Logic Blocks By Exploiting Biconditional Expansion (Abstract Only)","authors":"P. Gaillardon, Gain Kim, Xifan Tang, L. Amarù, G. Micheli","doi":"10.1145/2684746.2689100","DOIUrl":"https://doi.org/10.1145/2684746.2689100","url":null,"abstract":"Nowadays, Field Programmable Gate Arrays (FPGA) exploit Look-Up Tables (LUTs) to generate logic functions. A K-input LUT can implement any Boolean functions with K inputs. Thanks to this flexibility, LUTs remained conceptually unchanged in FPGAs, only the number of inputs increased in time. Unfortunately, the flexibility does not come for free and LUTs have non-negligible costs in both circuit-level performances (large number of memories, area or delay penalties) and logic-level capabilities (limited fan-out). Here, we propose an FPGA fabric based on two novel logic blocks. First, we introduce a new LUT design showing reduced power consumption with no sacrifice in the logic flexibility. Then, we present a block suited to arithmetic functions but preserving enough versatility to implement general logic functions. The two blocks are supported by a recently introduced logic representation called Biconditional Binary Decision Diagrams (BBDDs). Using architectural-level benchmarking, we showed that an FPGA architecture exploiting the novel blocks performs significantly better than current state-of-the-art FPGA architectures at 40nm technological node over a large set of test circuits. While reducing the power consumption of MCNC big20 benchmarks by 29%, the proposed architecture is able to efficiently implement arithmetic circuits as compared to its traditional LUT-based FPGA counterpart. For instance, a 256-bit adder can be realized with a 43% gain in area×delay product. While considering large general and arithmetic logic benchmarks, we observe, on average, 4%, 3% and 10% improvements in area, delay and power respectively.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127561516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Automatic Design Flow for Hybrid Parallel Computing on MPSoCs (Abstract Only) mpsoc混合并行计算的自动设计流程(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689141

Hongyuan Ding, Miaoqing Huang

引用次数: 2

Automatic Time-Redundancy Transformation for Fault-Tolerant Circuits 容错电路的自动时间冗余变换

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689058

D. Burlyaev, Pascal Fradet, A. Girault

{"title":"Automatic Time-Redundancy Transformation for Fault-Tolerant Circuits","authors":"D. Burlyaev, Pascal Fradet, A. Girault","doi":"10.1145/2684746.2689058","DOIUrl":"https://doi.org/10.1145/2684746.2689058","url":null,"abstract":"We present a novel logic-level circuit transformation technique for automatic insertion of fault-tolerance properties. Our transformation uses double-time redundancy coupled with micro-checkpointing, rollback and a speedup mode. To the best of our knowledge, our solution is the only technologically independent scheme capable to correct the multiple bit-flips caused by a Single-Event Transient (SET) with double-time redundancy. The approach allows soft-error masking (within the considered fault-model) and keeps the same input/output behavior regardless error occurrences. Our technique trades-off the circuit throughput for a small hardware overhead. Experimental results on the ITC'99 benchmark suite indicate that the benefits of our methods grow with the combinational size of the circuit. The hardware overhead is 2.7 to 6.1 times smaller than full Triple Modular Redundancy (TMR) with double loss in throughput. We do not consider configuration memory corruption and our approach is readily applicable to Flash-based FPGAs. Our method does not require any specific hardware support and is an interesting alternative to TMR for logic-intensive designs.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129863895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Take the Highway: Design for Embedded NoCs on FPGAs 以高速公路为例:fpga上嵌入式noc的设计

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689074

M. Abdelfattah, Andrew Bitar, Vaughn Betz

引用次数: 29

Energy-Efficient Discrete Signal Processing with Field Programmable Analog Arrays (FPAAs) 基于现场可编程模拟阵列(FPAAs)的节能离散信号处理

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689078

Yu Bai, Mingjie Lin

{"title":"Energy-Efficient Discrete Signal Processing with Field Programmable Analog Arrays (FPAAs)","authors":"Yu Bai, Mingjie Lin","doi":"10.1145/2684746.2689078","DOIUrl":"https://doi.org/10.1145/2684746.2689078","url":null,"abstract":"Large-scale field programmable analog array (FPAA) devices have made analog and analog-digital signal processing techniques accessible to a much wider community. However, largely due to its severe resource constraints, high noise sensitivity, and enormous design space, reconfigurable analog computing remains a niche in the DSP application space. In this paper, we develop a probabilistic-based methodology for designing and implementing the analog computing engines that specifically target at energy-efficient signal processing systems. We will first demonstrate how to decompose a given DSP application into various functional modules within the framework of probabilistic-based processing. Furthermore, we will show how these individual functional modules can be easily mapped to the limited selection of analog blocks found in an commercially available FPAA device: the PSoC chip platform from Cypress. To keep our study concrete, our implementation example focuses on the 1-D convolution module, a fundamental algorithmic building block in many applications of computer vision and artificial intelligence. In the end, we construct a complete image processing system based on the PSoC chip platform, and use the application of image key point extraction to demonstrate that our proposed approach to reconfigurable analog computing has considerable advantages in hardware usage, energy efficiency, and computing robustness over the traditional DSP approaches.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114884649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Session details: Technical Session 6: High-level and System-level Synthesis 会议详情:技术会议6:高级别和系统级综合

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/3251656

B. Hutchings

引用次数: 0