Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献

筛选
英文 中文
FPGA Acceleration of Irregular Iterative Computations using Criticality-Aware Dataflow Optimizations (Abstract Only) 基于临界感知数据流优化的FPGA非规则迭代计算加速(仅摘要)
Siddhartha, Nachiket Kapre
{"title":"FPGA Acceleration of Irregular Iterative Computations using Criticality-Aware Dataflow Optimizations (Abstract Only)","authors":"Siddhartha, Nachiket Kapre","doi":"10.1145/2684746.2689110","DOIUrl":"https://doi.org/10.1145/2684746.2689110","url":null,"abstract":"FPGA acceleration of large irregular dataflow graphs is often limited by the long tail distribution of parallelism on fine-grained overlay dataflow architectures. In this paper, we show how to overcome these limitations by exploiting criticality information along compute paths; both statically during graph pre-processing and dynamically at runtime. We statically reassociate the high-fanin dataflow chains by providing faster routes for late arriving inputs. We also perform a fanout decomposition and selective node replication in order to distribute serialization costs across multiple PEs. Additionally, we modify the dataflow firing rule in hardware to prefer critical nodes when multiple nodes are ready for dataflow evaluation. Effectively these transformations reduce the length of the tail in the parallelism profile for these large-scale graphs. Across a range of dataflow benchmarks extracted from Sparse LU factorization, we demonstrate up to 2.5× (mean 1.21×) improvement when using the static pre-processing alone, a 2.4× (mean 1.17×) improvement when using only dynamic optimizations and an overall 2.9× (mean 1.39×) improvement when both static and dynamic optimizations are enabled. These improvements are on top of 3--10× speedups over CPU implementations without our transformation enabled.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121606465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Generation of Energy and Performance Pareto Front for FPGA Designs (Abstract Only) 用于FPGA设计的高效能量生成和性能Pareto Front(仅摘要)
S. Kuppannagari, V. Prasanna
{"title":"Efficient Generation of Energy and Performance Pareto Front for FPGA Designs (Abstract Only)","authors":"S. Kuppannagari, V. Prasanna","doi":"10.1145/2684746.2689133","DOIUrl":"https://doi.org/10.1145/2684746.2689133","url":null,"abstract":"Analysis of trade-offs between energy efficiency and latency is essential to generate designs complying with a given set of constraints. Improvements in FPGA technologies offer a myriad choices for power and performance optimizations. Various algorithm intrinsic parameters also affect these objectives. The design space is compounded by the available choices. This requires efficient techniques to quickly explore the design space. Current techniques perform Gate/RTL level or functional level power modeling which are slow and hence not scalable. In this work we perform efficient design space exploration using a high level performance model. We develop a semi-automatic design framework to generate energy efficiency and latency trade-offs. The framework develops a performance model given a high level specification of a design with minimal user assistance. It then explores the entire design space to generate the dominating designs with respect to energy efficiency and latency metrics. We illustrate the framework using convolutional neural network which gained significance due to its application in deep learning. We simulate a few designs from the dominating set and show that the performance estimation for the dominating designs are close to the simulated results. We also show that our framework explores 6000 design points per minute on a commodity platform such as Dell workstation as opposed to state-of-the-art techniques which explore at 50 to 60 design points per minute.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129186391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Technical Session 5: Processors and Accelerators 技术会议5:处理器和加速器
Zhiru Zhang
{"title":"Session details: Technical Session 5: Processors and Accelerators","authors":"Zhiru Zhang","doi":"10.1145/3251655","DOIUrl":"https://doi.org/10.1145/3251655","url":null,"abstract":"","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123858967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Hardware Design Flows with MyHDL 用MyHDL增强硬件设计流程
Keerthan Jaic, M. C. Smith
{"title":"Enhancing Hardware Design Flows with MyHDL","authors":"Keerthan Jaic, M. C. Smith","doi":"10.1145/2684746.2689092","DOIUrl":"https://doi.org/10.1145/2684746.2689092","url":null,"abstract":"MyHDL is a Python based HDL that harnesses the power and versatility of Python for hardware development. MyHDL has excellent simulation capabilities and also allows for conversion to Verilog and VHDL, so developers can enter a conventional design flow as desired. Verilog and VHDL are used extensively, particularly because most synthesis tools only support these two languages. However, they are simply outdated; poor parameterization limits high level design and modern abstraction features such as classes are missing. On the other hand, MyHDL has great support for parameterization. However, MyHDL did not have support for converting code that used attributes, so abstraction was limited. We extended MyHDL support to include attribute conversion. We explored methods for abstracting interfaces between components and hardware-software interfaces. The result is increased code reuse, simplified module declaration, and reduced boilerplate. These extensions result in streamlining between design, simulation, and a final synthesizable hardware, thus reducing limitations on high level development and making MyHDL an even more powerful design environment for rapid hardware prototyping.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123956473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Towards More Efficient Logic Blocks By Exploiting Biconditional Expansion (Abstract Only) 利用双条件展开实现更高效的逻辑块(仅抽象)
P. Gaillardon, Gain Kim, Xifan Tang, L. Amarù, G. Micheli
{"title":"Towards More Efficient Logic Blocks By Exploiting Biconditional Expansion (Abstract Only)","authors":"P. Gaillardon, Gain Kim, Xifan Tang, L. Amarù, G. Micheli","doi":"10.1145/2684746.2689100","DOIUrl":"https://doi.org/10.1145/2684746.2689100","url":null,"abstract":"Nowadays, Field Programmable Gate Arrays (FPGA) exploit Look-Up Tables (LUTs) to generate logic functions. A K-input LUT can implement any Boolean functions with K inputs. Thanks to this flexibility, LUTs remained conceptually unchanged in FPGAs, only the number of inputs increased in time. Unfortunately, the flexibility does not come for free and LUTs have non-negligible costs in both circuit-level performances (large number of memories, area or delay penalties) and logic-level capabilities (limited fan-out). Here, we propose an FPGA fabric based on two novel logic blocks. First, we introduce a new LUT design showing reduced power consumption with no sacrifice in the logic flexibility. Then, we present a block suited to arithmetic functions but preserving enough versatility to implement general logic functions. The two blocks are supported by a recently introduced logic representation called Biconditional Binary Decision Diagrams (BBDDs). Using architectural-level benchmarking, we showed that an FPGA architecture exploiting the novel blocks performs significantly better than current state-of-the-art FPGA architectures at 40nm technological node over a large set of test circuits. While reducing the power consumption of MCNC big20 benchmarks by 29%, the proposed architecture is able to efficiently implement arithmetic circuits as compared to its traditional LUT-based FPGA counterpart. For instance, a 256-bit adder can be realized with a 43% gain in area×delay product. While considering large general and arithmetic logic benchmarks, we observe, on average, 4%, 3% and 10% improvements in area, delay and power respectively.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127561516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Automatic Design Flow for Hybrid Parallel Computing on MPSoCs (Abstract Only) mpsoc混合并行计算的自动设计流程(仅摘要)
Hongyuan Ding, Miaoqing Huang
{"title":"An Automatic Design Flow for Hybrid Parallel Computing on MPSoCs (Abstract Only)","authors":"Hongyuan Ding, Miaoqing Huang","doi":"10.1145/2684746.2689141","DOIUrl":"https://doi.org/10.1145/2684746.2689141","url":null,"abstract":"State-of-the-art high-level synthesis (HLS) tools are able to lower the threshold for designers to exploit performance benefits of hardware accelerators. However, it is still a challenge to achieve parallelism on a hybrid multiprocessor system-on-chip (MPSoC). In this work, we present an automatic hybrid design flow. The hybrid hardware platform as well as both the hardware and software kernels can be generated through this flow. In addition, a hybrid OpenCL-like programming model is proposed to combine software and hardware kernels running on the unified hardware platform. Our results show that our automatic design flow can not only significantly minimize the development time, but also gain about 11 times speedup compared with pure software parallel implementation for a matrix multiplication benchmark.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125907516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automatic Time-Redundancy Transformation for Fault-Tolerant Circuits 容错电路的自动时间冗余变换
D. Burlyaev, Pascal Fradet, A. Girault
{"title":"Automatic Time-Redundancy Transformation for Fault-Tolerant Circuits","authors":"D. Burlyaev, Pascal Fradet, A. Girault","doi":"10.1145/2684746.2689058","DOIUrl":"https://doi.org/10.1145/2684746.2689058","url":null,"abstract":"We present a novel logic-level circuit transformation technique for automatic insertion of fault-tolerance properties. Our transformation uses double-time redundancy coupled with micro-checkpointing, rollback and a speedup mode. To the best of our knowledge, our solution is the only technologically independent scheme capable to correct the multiple bit-flips caused by a Single-Event Transient (SET) with double-time redundancy. The approach allows soft-error masking (within the considered fault-model) and keeps the same input/output behavior regardless error occurrences. Our technique trades-off the circuit throughput for a small hardware overhead. Experimental results on the ITC'99 benchmark suite indicate that the benefits of our methods grow with the combinational size of the circuit. The hardware overhead is 2.7 to 6.1 times smaller than full Triple Modular Redundancy (TMR) with double loss in throughput. We do not consider configuration memory corruption and our approach is readily applicable to Flash-based FPGAs. Our method does not require any specific hardware support and is an interesting alternative to TMR for logic-intensive designs.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129863895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Take the Highway: Design for Embedded NoCs on FPGAs 以高速公路为例:fpga上嵌入式noc的设计
M. Abdelfattah, Andrew Bitar, Vaughn Betz
{"title":"Take the Highway: Design for Embedded NoCs on FPGAs","authors":"M. Abdelfattah, Andrew Bitar, Vaughn Betz","doi":"10.1145/2684746.2689074","DOIUrl":"https://doi.org/10.1145/2684746.2689074","url":null,"abstract":"We explore the addition of a fast embedded network-on-chip (NoC) to augment the FPGA's existing wires and switches, and help interconnect large applications. A flexible interface between the FPGA fabric and the embedded NoC allows modules of varying widths and frequencies to transport data over the NoC. We study both latency-insensitive and latency-sensitive design styles and present the constraints for implementing each type of communication on the embedded NoC. Our application case study with image compression shows that an embedded NoC improves frequency by 10-80%, reduces utilization of scarce long wires by 40% and makes design easier and more predictable. Additionally, we leverage the embedded NoC in creating a programmable Ethernet switch that can support up to 819 Gb/s on FPGAs.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129447888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Energy-Efficient Discrete Signal Processing with Field Programmable Analog Arrays (FPAAs) 基于现场可编程模拟阵列(FPAAs)的节能离散信号处理
Yu Bai, Mingjie Lin
{"title":"Energy-Efficient Discrete Signal Processing with Field Programmable Analog Arrays (FPAAs)","authors":"Yu Bai, Mingjie Lin","doi":"10.1145/2684746.2689078","DOIUrl":"https://doi.org/10.1145/2684746.2689078","url":null,"abstract":"Large-scale field programmable analog array (FPAA) devices have made analog and analog-digital signal processing techniques accessible to a much wider community. However, largely due to its severe resource constraints, high noise sensitivity, and enormous design space, reconfigurable analog computing remains a niche in the DSP application space. In this paper, we develop a probabilistic-based methodology for designing and implementing the analog computing engines that specifically target at energy-efficient signal processing systems. We will first demonstrate how to decompose a given DSP application into various functional modules within the framework of probabilistic-based processing. Furthermore, we will show how these individual functional modules can be easily mapped to the limited selection of analog blocks found in an commercially available FPAA device: the PSoC chip platform from Cypress. To keep our study concrete, our implementation example focuses on the 1-D convolution module, a fundamental algorithmic building block in many applications of computer vision and artificial intelligence. In the end, we construct a complete image processing system based on the PSoC chip platform, and use the application of image key point extraction to demonstrate that our proposed approach to reconfigurable analog computing has considerable advantages in hardware usage, energy efficiency, and computing robustness over the traditional DSP approaches.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114884649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Session details: Technical Session 6: High-level and System-level Synthesis 会议详情:技术会议6:高级别和系统级综合
B. Hutchings
{"title":"Session details: Technical Session 6: High-level and System-level Synthesis","authors":"B. Hutchings","doi":"10.1145/3251656","DOIUrl":"https://doi.org/10.1145/3251656","url":null,"abstract":"","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133796873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信