2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines最新文献_第3页

Reducing Overheads for Fault-Tolerant Datapaths with Dynamic Partial Reconfiguration 通过动态部分重构减少容错数据路径的开销

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.36

James J. Davis, P. Cheung

引用次数: 1

System-Level Retiming and Pipelining 系统级重定时和流水线

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.30

Girish Venkataramani, Y. Gu

引用次数: 11

Rapid Post-Map Insertion of Embedded Logic Analyzers for Xilinx FPGAs Xilinx fpga嵌入式逻辑分析仪的快速映射后插入

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.29

B. Hutchings, J. Keeley

引用次数: 29

FPGA Architecture Enhancements to Support Heterogeneous Partially Reconfigurable Regions FPGA架构增强以支持异构部分可重构区域

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.17

Christophe Huriaux, O. Sentieys, R. Tessier

引用次数: 2

Performance Comparison between Multi-FPGA Prototyping Platforms: Hardwired Off-the-Shelf, Cabling, and Custom 多fpga原型平台之间的性能比较:硬连线现成，布线和自定义

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.44

Qingshan Tang, M. Tuna, H. Mehrez

引用次数: 8

Breaking Sequential Dependencies in FPGA-Based Sparse LU Factorization 基于fpga的稀疏LU分解中的顺序依赖分解

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.26

Siddhartha, Nachiket Kapre

{"title":"Breaking Sequential Dependencies in FPGA-Based Sparse LU Factorization","authors":"Siddhartha, Nachiket Kapre","doi":"10.1109/FCCM.2014.26","DOIUrl":"https://doi.org/10.1109/FCCM.2014.26","url":null,"abstract":"Substitution, and reassociation of irregular sparse LU factorization can deliver up to 31% additional speedup over an existing state-of-the-art parallel FPGA implementation where further parallelization was deemed virtually impossible. The state-of-the-art implementation is already capable of delivering 3× acceleration over CPU-based sparse LU solvers. Sparse LU factorization is a well-known computational bottleneck in many existing scientific and engineering applications and is notoriously hard to parallelize due to inherent sequential dependencies in the computation graph. In this paper, we show how to break these alleged inherent dependencies using depth-limited substitution, and reassociation of the resulting computation. This is a work-parallelism tradeoff that is well-suited for implementation on FPGA-based token dataflow architectures. Such compute organizations are capable of fast parallel processing of large irregular graphs extracted from the sparse LU computation. We manage and control the growth in additional work due to substitution through careful selection of substitution depth. We exploit associativity in the generated graphs to restructure long compute chains into reduction trees.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122602143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories LEAP共享存储器:FPGA相干存储器的自动化构建

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.43

Hsin-Jung Yang, Kermin Fleming, Michael Adler, J. Emer

引用次数: 24

Accurate and Efficient Three Level Design Space Exploration Based on Constraints Satisfaction Optimization Problem Solver 基于约束满足优化问题求解器的精确高效三级设计空间探索

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.56

Shuo Li, A. Hemani

引用次数: 0

Reducing Processing Latency with a Heterogeneous FPGA-Processor Framework 利用异构fpga处理器框架减少处理延迟

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.13

Jonathon Pendlum, M. Leeser, K. Chowdhury

{"title":"Reducing Processing Latency with a Heterogeneous FPGA-Processor Framework","authors":"Jonathon Pendlum, M. Leeser, K. Chowdhury","doi":"10.1109/FCCM.2014.13","DOIUrl":"https://doi.org/10.1109/FCCM.2014.13","url":null,"abstract":"Both Xilinx and Altera have released SoCs that tightly couple programmable logic with a dual core Cortex A9 ARM processor. These SoCs show promise in accelerating applications that exploit both the FPGA's parallel processing architecture and the CPU's sequential processing. For example, before accessing a wireless channel, a cognitive radio does spectrum sensing to detect channel occupancy and then makes a decision based on spectrum policies. Spectrum sensing maps well to FPGA fabric, while spectrum decision can be implemented with a CPU. Both algorithms are highly sensitive to latency as a faster decision improves spectrum utilization. This paper introduces CRASH: Cognitive Radio Accelerated with Software and Hardware - a new software and programmable logic framework for Xilinx's Zynq SoC targeting cognitive radio. We implement spectrum sensing and the spectrum decision in three configurations: both algorithms in the FPGA, both in software only, and spectrum sensing on the FPGA and spectrum decision on the CPU. We measure the end-to-end latency to detect and acquire unoccupied spectrum for these configurations. Results show that CRASH can successfully partition algorithms between FPGA and CPU and reduce processing latency.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129328107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

SMCGen: Generating Reconfigurable Design for Sequential Monte Carlo Applications 生成时序蒙特卡罗应用的可重构设计

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.46

T. Chau, Maciej Kurek, James Stanley Targett, J. Humphrey, Georgios Skouroupathis, A. Eele, J. Maciejowski, Benjamin Cope, Kathryn Cobden, P. Leong, P. Cheung, W. Luk

{"title":"SMCGen: Generating Reconfigurable Design for Sequential Monte Carlo Applications","authors":"T. Chau, Maciej Kurek, James Stanley Targett, J. Humphrey, Georgios Skouroupathis, A. Eele, J. Maciejowski, Benjamin Cope, Kathryn Cobden, P. Leong, P. Cheung, W. Luk","doi":"10.1109/FCCM.2014.46","DOIUrl":"https://doi.org/10.1109/FCCM.2014.46","url":null,"abstract":"The Sequential Monte Carlo (SMC) method is a simulation-based approach to compute posterior distributions. SMC methods often work well on applications considered intractable by other methods due to high dimensionality, but they are computationally demanding. While SMC has been implemented efficiently on FPGAs, design productivity remains a challenge. This paper introduces a design flow for generating efficient implementation of reconfigurable SMC designs. Through templating the SMC structure, the design flow enables efficient mapping of SMC applications to multiple FPGAs. The proposed design flow consists of a parametrisable SMC computation engine, and an open-source software template which enables efficient mapping of a variety of SMC designs to reconfigurable hardware. Design parameters that are critical to the performance and to the solution quality are tuned using a machine learning algorithm based on surrogate modelling. Experimental results for three case studies show that design performance is substantially improved after parameter optimisation. The proposed design flow demonstrates its capability of producing reconfigurable implementations for a range of SMC applications that have significant improvement in speed and in energy efficiency over optimised CPU and GPU implementations.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129857030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7