2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines最新文献

筛选
英文 中文
Reducing Overheads for Fault-Tolerant Datapaths with Dynamic Partial Reconfiguration 通过动态部分重构减少容错数据路径的开销
James J. Davis, P. Cheung
{"title":"Reducing Overheads for Fault-Tolerant Datapaths with Dynamic Partial Reconfiguration","authors":"James J. Davis, P. Cheung","doi":"10.1109/FCCM.2014.36","DOIUrl":"https://doi.org/10.1109/FCCM.2014.36","url":null,"abstract":"As process scaling and transistor count inflation continue, silicon chips are becoming increasingly susceptible to faults. Although FPGAs are particularly vulnerable to these effects, their runtime reconfigurability offers unique opportunities for fault tolerance. This work presents an application combining algorithmic-level error detection with dynamic partial reconfiguration (DPR) to allow faults manifested within its datapath at runtime to be circumvented at low cost.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121168666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
System-Level Retiming and Pipelining 系统级重定时和流水线
Girish Venkataramani, Y. Gu
{"title":"System-Level Retiming and Pipelining","authors":"Girish Venkataramani, Y. Gu","doi":"10.1109/FCCM.2014.30","DOIUrl":"https://doi.org/10.1109/FCCM.2014.30","url":null,"abstract":"In this paper, we examine retiming and pipelining in the context of system-level optimization techniques. Our main contributions are: (a) functionally equivalent retiming and delay balancing as necessary techniques for pipelining and retiming system-level graphs while maintaining numerical fidelity, and (b) clock-rate pipelining, as a new technique that leverages the knowledge of multi-rate design spec to pipeline multi-cycle paths. All these techniques have been implemented within HDL Coder™, a tool that generates synthesizable HDL code from Simulink ® and MATLAB®.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"348 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115894093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Rapid Post-Map Insertion of Embedded Logic Analyzers for Xilinx FPGAs Xilinx fpga嵌入式逻辑分析仪的快速映射后插入
B. Hutchings, J. Keeley
{"title":"Rapid Post-Map Insertion of Embedded Logic Analyzers for Xilinx FPGAs","authors":"B. Hutchings, J. Keeley","doi":"10.1109/FCCM.2014.29","DOIUrl":"https://doi.org/10.1109/FCCM.2014.29","url":null,"abstract":"A rapid post-map insertion of an embedded logic analyzer is discussed. The proposed technique makes use of otherwise unused resources in an already-mapped circuit and does not disturb the original placement and routing of the circuit. Using this technique, designers can add debugging circuitry to existing circuits and quickly modify the set of of observed signals in just a few minutes instead of waiting for a recompile of their circuit. All tests were performed on a Xilinx Virtex-5 FPGA.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123699603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
FPGA Architecture Enhancements to Support Heterogeneous Partially Reconfigurable Regions FPGA架构增强以支持异构部分可重构区域
Christophe Huriaux, O. Sentieys, R. Tessier
{"title":"FPGA Architecture Enhancements to Support Heterogeneous Partially Reconfigurable Regions","authors":"Christophe Huriaux, O. Sentieys, R. Tessier","doi":"10.1109/FCCM.2014.17","DOIUrl":"https://doi.org/10.1109/FCCM.2014.17","url":null,"abstract":"In this work the author develop an FPGA architecture which allows for the placement of a partial FPGA design on the logic fabric even if the relative placement of heterogeneous blocks within the target region is not identical to the placement used to generate the bitstream for the partial design. This work has been conducted in the context of the European FP7 FlexTiles project in which a dynamically reconfigurable logic fabric is embedded in a 3-D stacked chip along with a manycore architecture. The reconfigurable logic fabric is used to load hardware-accelerated functions whose use is scheduled at run time. All communication between the fabric and manycore is made via dedicated I/O interface blocks in the fabric. This communication configuration increases the need for a flexible architecture which can handle the placement of a single application bitstream in multiple locations on the logic fabric.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132233754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Performance Comparison between Multi-FPGA Prototyping Platforms: Hardwired Off-the-Shelf, Cabling, and Custom 多fpga原型平台之间的性能比较:硬连线现成,布线和自定义
Qingshan Tang, M. Tuna, H. Mehrez
{"title":"Performance Comparison between Multi-FPGA Prototyping Platforms: Hardwired Off-the-Shelf, Cabling, and Custom","authors":"Qingshan Tang, M. Tuna, H. Mehrez","doi":"10.1109/FCCM.2014.44","DOIUrl":"https://doi.org/10.1109/FCCM.2014.44","url":null,"abstract":"We can classify multi-FPGA prototyping platforms in three categories: hardwired off-the-shelf, cabling and custom. Three points are developed in this paper. Firstly, an automatic design flow is proposed to generate a cabling platform and a custom platform for a given design. Then, the optimal width of cables for a cabling multi-FPGA platform is explored. Finally, the performances of these three multi-FPGA platforms are compared. The results show that the cabling platform achieves up to 82% gain in performance, and the custom platform achieves up to 100%, compared to the hardwired off-the-shelf platform. The custom platform achieves up to 20% gain in performance over the cabling platform. Therefore the results show that, apart from some stringent constraints (such as deployment cost or specific frequency needed), the relatively new cabling paradigm with the proposed automatic, inter-FPGA tracks distribution tool, offers an attractive alternative compared to the two other platforms.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130598055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Breaking Sequential Dependencies in FPGA-Based Sparse LU Factorization 基于fpga的稀疏LU分解中的顺序依赖分解
Siddhartha, Nachiket Kapre
{"title":"Breaking Sequential Dependencies in FPGA-Based Sparse LU Factorization","authors":"Siddhartha, Nachiket Kapre","doi":"10.1109/FCCM.2014.26","DOIUrl":"https://doi.org/10.1109/FCCM.2014.26","url":null,"abstract":"Substitution, and reassociation of irregular sparse LU factorization can deliver up to 31% additional speedup over an existing state-of-the-art parallel FPGA implementation where further parallelization was deemed virtually impossible. The state-of-the-art implementation is already capable of delivering 3× acceleration over CPU-based sparse LU solvers. Sparse LU factorization is a well-known computational bottleneck in many existing scientific and engineering applications and is notoriously hard to parallelize due to inherent sequential dependencies in the computation graph. In this paper, we show how to break these alleged inherent dependencies using depth-limited substitution, and reassociation of the resulting computation. This is a work-parallelism tradeoff that is well-suited for implementation on FPGA-based token dataflow architectures. Such compute organizations are capable of fast parallel processing of large irregular graphs extracted from the sparse LU computation. We manage and control the growth in additional work due to substitution through careful selection of substitution depth. We exploit associativity in the generated graphs to restructure long compute chains into reduction trees.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122602143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories LEAP共享存储器:FPGA相干存储器的自动化构建
Hsin-Jung Yang, Kermin Fleming, Michael Adler, J. Emer
{"title":"LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories","authors":"Hsin-Jung Yang, Kermin Fleming, Michael Adler, J. Emer","doi":"10.1109/FCCM.2014.43","DOIUrl":"https://doi.org/10.1109/FCCM.2014.43","url":null,"abstract":"Parallel programming has been widely used in many scientific and technical areas to solve large problems. While general-purpose processors have rich infrastructure to support parallel programming on shared memory, such as coherent caches and synchronization libraries, parallel programming infrastructure for FPGAs is limited. Thus, development of FPGA-based parallel algorithms remains difficult. In this work, we seek to simplify parallel programming on FPGAs. We provide a set of easy-to-use declarative primitives to maintain coherency and consistency of accesses to shared memory resources. We propose a shared-memory service that automatically manages coherent caches on multiple FPGAs. Experimental results of a 2-dimensional heat transfer equation show that the shared memory service with our distributed coherent caches outperforms a centralized cache by 2.6x. To handle synchronization, we provide new lock and barrier primitives that leverage native FPGA communication capabilities and outperform traditional through-memory primitives by 1.8x.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125130850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Accurate and Efficient Three Level Design Space Exploration Based on Constraints Satisfaction Optimization Problem Solver 基于约束满足优化问题求解器的精确高效三级设计空间探索
Shuo Li, A. Hemani
{"title":"Accurate and Efficient Three Level Design Space Exploration Based on Constraints Satisfaction Optimization Problem Solver","authors":"Shuo Li, A. Hemani","doi":"10.1109/FCCM.2014.56","DOIUrl":"https://doi.org/10.1109/FCCM.2014.56","url":null,"abstract":"In this paper, we propose an efficient and effective there level Design Space Exploration (DSE) method for mapping a system consisting of a number of DSP functions onto an RTL or lower level model using constraint programming methodology. The design space has three dimensions: a) function execution schedule (when the functions should execute), b) function implementation assignment (how the execution of functions are assigned to physical kernels) and c) implementation architecture (how many arithmetic units are deployed in each kernel). The DSE has been formulated as a Constraints Satisfaction Optimization Problem (CSOP) and solved by the constraint programming solver in Google's OR-Tools.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129835008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing Processing Latency with a Heterogeneous FPGA-Processor Framework 利用异构fpga处理器框架减少处理延迟
Jonathon Pendlum, M. Leeser, K. Chowdhury
{"title":"Reducing Processing Latency with a Heterogeneous FPGA-Processor Framework","authors":"Jonathon Pendlum, M. Leeser, K. Chowdhury","doi":"10.1109/FCCM.2014.13","DOIUrl":"https://doi.org/10.1109/FCCM.2014.13","url":null,"abstract":"Both Xilinx and Altera have released SoCs that tightly couple programmable logic with a dual core Cortex A9 ARM processor. These SoCs show promise in accelerating applications that exploit both the FPGA's parallel processing architecture and the CPU's sequential processing. For example, before accessing a wireless channel, a cognitive radio does spectrum sensing to detect channel occupancy and then makes a decision based on spectrum policies. Spectrum sensing maps well to FPGA fabric, while spectrum decision can be implemented with a CPU. Both algorithms are highly sensitive to latency as a faster decision improves spectrum utilization. This paper introduces CRASH: Cognitive Radio Accelerated with Software and Hardware - a new software and programmable logic framework for Xilinx's Zynq SoC targeting cognitive radio. We implement spectrum sensing and the spectrum decision in three configurations: both algorithms in the FPGA, both in software only, and spectrum sensing on the FPGA and spectrum decision on the CPU. We measure the end-to-end latency to detect and acquire unoccupied spectrum for these configurations. Results show that CRASH can successfully partition algorithms between FPGA and CPU and reduce processing latency.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129328107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
SMCGen: Generating Reconfigurable Design for Sequential Monte Carlo Applications 生成时序蒙特卡罗应用的可重构设计
T. Chau, Maciej Kurek, James Stanley Targett, J. Humphrey, Georgios Skouroupathis, A. Eele, J. Maciejowski, Benjamin Cope, Kathryn Cobden, P. Leong, P. Cheung, W. Luk
{"title":"SMCGen: Generating Reconfigurable Design for Sequential Monte Carlo Applications","authors":"T. Chau, Maciej Kurek, James Stanley Targett, J. Humphrey, Georgios Skouroupathis, A. Eele, J. Maciejowski, Benjamin Cope, Kathryn Cobden, P. Leong, P. Cheung, W. Luk","doi":"10.1109/FCCM.2014.46","DOIUrl":"https://doi.org/10.1109/FCCM.2014.46","url":null,"abstract":"The Sequential Monte Carlo (SMC) method is a simulation-based approach to compute posterior distributions. SMC methods often work well on applications considered intractable by other methods due to high dimensionality, but they are computationally demanding. While SMC has been implemented efficiently on FPGAs, design productivity remains a challenge. This paper introduces a design flow for generating efficient implementation of reconfigurable SMC designs. Through templating the SMC structure, the design flow enables efficient mapping of SMC applications to multiple FPGAs. The proposed design flow consists of a parametrisable SMC computation engine, and an open-source software template which enables efficient mapping of a variety of SMC designs to reconfigurable hardware. Design parameters that are critical to the performance and to the solution quality are tuned using a machine learning algorithm based on surrogate modelling. Experimental results for three case studies show that design performance is substantially improved after parameter optimisation. The proposed design flow demonstrates its capability of producing reconfigurable implementations for a range of SMC applications that have significant improvement in speed and in energy efficiency over optimised CPU and GPU implementations.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129857030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信