2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines最新文献_第6页

Automating Optimization of Reconfigurable Designs 可重构设计的自动化优化

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 1900-01-01 DOI: 10.1109/FCCM.2014.65

Maciej Kurek, Tobias Becker, T. Chau, W. Luk

引用次数: 17

Customizable Compression Architecture for Efficient Configuration in CGRAs 可定制的压缩体系结构在CGRAs中的高效配置

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 1900-01-01 DOI: 10.1109/FCCM.2014.18

Syed M. A. H. Jafri, Muhammad Adeel Tajammul, M. Daneshtalab, A. Hemani, K. Paul, P. Ellervee, J. Plosila, H. Tenhunen

引用次数: 2

Automated Partial Reconfiguration Design for Adaptive Systems with CoPR for Zynq 基于Zynq的CoPR自适应系统的自动部分重构设计

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 1900-01-01 DOI: 10.1109/FCCM.2014.63

Kizheppatt Vipin, Suhaib A. Fahmy

引用次数: 11

Fast and Power Efficient Heapsort IP for Image Compression Application 快速和高效的堆排序IP图像压缩应用程序

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 1900-01-01 DOI: 10.1109/FCCM.2014.72

Yuhui Bai, S. Z. Ahmed, B. Granado

{"title":"Fast and Power Efficient Heapsort IP for Image Compression Application","authors":"Yuhui Bai, S. Z. Ahmed, B. Granado","doi":"10.1109/FCCM.2014.72","DOIUrl":"https://doi.org/10.1109/FCCM.2014.72","url":null,"abstract":"We present a hardware architecture of a heapsort algorithm, the sorting is employed in a subband coding block of a wavelet-based image coder termed Öktem image coder [1]. Although this coder provides good image quality, the sorting is time consuming, and is application specific, as the sorting is repetitively used for different volume of data in the subband coding, thus a simple hardware implementation with fixed sorting capacity will be difficult to scale during runtime. To tackle this problem, the time/power efficiency and the sorting size flexibility have to be taken in to account. We proposed an improved FPGA heapsort architecture based on Zabołotny's work [2] as an IP accelerator of the image coder. We present a configurable architecture by using adaptive layer enable elements so the sorting capacity could be adjusted during runtime to efficiently sort different amount of data. With the adaptive memory shutdown, our improved architecture provides up to 20.9% power reduction on the memories compared to the baseline implementation. Moreover, our architecture provides 13x speedup compared to ARM CortexA9.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120928705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Scalable Multi-engine Xpress9 Compressor with Asynchronous Data Transfer 具有异步数据传输的可扩展多引擎Xpress9压缩器

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 1900-01-01 DOI: 10.1109/FCCM.2014.49

Joo-Young Kim, S. Hauck, D. Burger

引用次数: 12

Integrated CUDA-to-FPGA Synthesis with Network-on-Chip 集成CUDA-to-FPGA合成与片上网络

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 1900-01-01 DOI: 10.1109/FCCM.2014.14

S. Gurumani, Jacob Tolar, Yao Chen, Yun Liang, K. Rupnow, Deming Chen

引用次数: 6

A Self-Adaptive SEU Mitigation System for FPGAs with an Internal Block RAM Radiation Particle Sensor 内置块RAM辐射粒子传感器的fpga自适应SEU缓解系统

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 1900-01-01 DOI: 10.1109/FCCM.2014.79

R. Glein, Bernhard Schmidt, F. Rittner, J. Teich, Daniel Ziener

{"title":"A Self-Adaptive SEU Mitigation System for FPGAs with an Internal Block RAM Radiation Particle Sensor","authors":"R. Glein, Bernhard Schmidt, F. Rittner, J. Teich, Daniel Ziener","doi":"10.1109/FCCM.2014.79","DOIUrl":"https://doi.org/10.1109/FCCM.2014.79","url":null,"abstract":"In this paper, we propose a self-adaptive FPGA-based, partially reconfigurable system for space missions in order to mitigate Single Event Upsets in the FPGA configuration and fabric. Dynamic reconfiguration is used here for an on-demand replication of modules in dependence of current and changing radiation levels. More precisely, the idea is to trigger a redundancy scheme such as Dual Modular Redundancy or Triple Modular Redundancy in response to a continuously monitored Single Event Upset rate measured inside the on-chip memories itself, e.g., any subset (even used) internal Block RAMs. Depending on the current radiation level, the minimal number of replicas is determined at runtime under the constraint that a required Safety Integrity Level for a module is ensured and configured accordingly. For signal processing applications it is shown that this autonomous adaption to the different solar conditions realizes a resource efficient mitigation. In our case study, we show that it is possible to triplicate the data throughput at the Solar Maximum condition (no flares) compared to a Triple Modular Redundancy implementation of a single module. We also show the decreasing Probability of Failures Per Hour by 2 × 104 at flare-enhanced conditions compared with a non-redundant system.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124882374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Speeding Up FPGA Placement: Parallel Algorithms and Methods 加速FPGA布局:并行算法和方法

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 1900-01-01 DOI: 10.1109/FCCM.2014.60

Ma An, J. Gregory Steffan, Vaughn Betz

{"title":"Speeding Up FPGA Placement: Parallel Algorithms and Methods","authors":"Ma An, J. Gregory Steffan, Vaughn Betz","doi":"10.1109/FCCM.2014.60","DOIUrl":"https://doi.org/10.1109/FCCM.2014.60","url":null,"abstract":"Placement of a large FPGA design now commonly requires several hours, significantly hindering designer productivity. Furthermore, FPGA capacity is growing faster than CPU speed, which will further increase placement time unless new approaches are found. Multi-core processors are now ubiquitous, however, and some recent processors also have hardware support for transactional memory (TM), making parallelism an increasingly attractive approach for speeding up placement. We investigate methods to parallelize the simulated annealing placement algorithm in VPR, which is widely used in FPGA research. We explore both algorithmic changes and the use of different parallel programming paradigms and hardware, including TM, thread-level speculation (TLS) and lock-free techniques. We find that hardware TM enables large speedups (8.1x on average), but compromises “move fairness” and leads to an unacceptable quality loss. TLS scales poorly, with a maximum 2.2x speedup, but preserves quality. A new dependency checking parallel strategy achieves the best balance: the deterministic version achieves 5.9x speedup and no quality loss, while the non-deterministic, lock-free version can scale to a 34x speedup.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121991788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

FPGAs in the Cloud: Booting Virtualized Hardware Accelerators with OpenStack 云中的fpga:使用OpenStack启动虚拟化硬件加速器

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 1900-01-01 DOI: 10.1109/FCCM.2014.42

Stuart Byma, J. Steffan, H. Bannazadeh, Alberto Leon Garcia, P. Chow

{"title":"FPGAs in the Cloud: Booting Virtualized Hardware Accelerators with OpenStack","authors":"Stuart Byma, J. Steffan, H. Bannazadeh, Alberto Leon Garcia, P. Chow","doi":"10.1109/FCCM.2014.42","DOIUrl":"https://doi.org/10.1109/FCCM.2014.42","url":null,"abstract":"We present a new approach for integrating virtualized FPGA-based hardware accelerators into commercial-scale cloud computing systems, with minimal virtualization overhead. Partially reconfigurable regions across multiple FPGAs are offered as generic cloud resources through OpenStack (opensource cloud software), thereby allowing users to “boot” custom designed or predefined network-connected hardware accelerators with the same commands they would use to boot a regular Virtual Machine. We propose a hardware and software framework to enable this virtualization. This is a first attempt at closely fitting FPGAs into existing cloud computing models, where resources are virtualized, flexible, and have the illusion of infinite scalability. Our system can set up and tear down virtual accelerators in approximately 2.6 seconds on average, much faster than regular virtual machines. The static virtualization hardware on the physical FPGAs causes only a three cycle latency increase and a one cycle pipeline stall per packet in accelerators when compared to a non-virtualized system. We present a case study analyzing the design and performance of an application-level load balancer using a fully implemented prototype of our system. Our study shows that FPGA cloud compute resources can easily outperform virtual machines, while the system's virtualization and abstraction significantly reduces design iteration time and design complexity.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116311518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 229

Look-up Table Design for Deep Sub-threshold through Full-Supply Operation 全供深次阈值查表设计

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 1900-01-01 DOI: 10.1109/FCCM.2014.80

M. Abusultan, S. Khatri

{"title":"Look-up Table Design for Deep Sub-threshold through Full-Supply Operation","authors":"M. Abusultan, S. Khatri","doi":"10.1109/FCCM.2014.80","DOIUrl":"https://doi.org/10.1109/FCCM.2014.80","url":null,"abstract":"Field programmable gate arrays (FPGAs) are the implementation platform of choice when it comes to design flexibility. However, the high power consumption of FPGAs (which arises due to their flexible structure), make them less appealing for extreme low power applications. In this paper, we present a design of an FPGA lookup table (LUT), with the goal of seamless operation over a wide band of supply voltages. The same LUT design has the ability to operate at sub-threshold voltage when low power is required, and at higher voltages whenever faster performance is required. The results show that operating the LUT in sub-threshold mode yields a (~80×) lower power and a (~4×) lower energy than full supply voltage operation, for a 6-input LUT implemented in a 22nm predictive technology. The key drawback of sub-threshold operation is its susceptibility to process, temperature, and supply voltage (PVT) variations. This paper also presents the design and experimental results for a closed-loop adaptive body biasing mechanism to dynamically cancel global (spacial) as well as local (random) PVT variations. For the same 22nm technology, we demonstrate that the closed-loop adaptive body biasing circuits can allow the FPGA LUT to operate over an operating frequency range that spans an order of magnitude (40 MHz to 1300 MHz). We also show that the closed-loop adaptive body biasing circuits can cancel delay variations due to supply voltage changes, and reduce the effect of process variations on setup and hold times by 1.8× and 2.9× respectively. The dynamic body biasing circuits incur a 3.49% area overhead when designed to each drive a cluster of 25 LUTs.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116788610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6