FPGA. ACM International Symposium on Field-Programmable Gate Arrays最新文献_第9页

Accelerator compiler for the VENICE vector processor 加速编译器的威尼斯矢量处理器

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145732

Zhiduo Liu, Aaron Severance, Satnam Singh, G. Lemieux

引用次数: 12

Compiling high throughput network processors 编译高吞吐量网络处理器

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145709

M. Lavasani, Larry R. Dennison, Derek Chiou

{"title":"Compiling high throughput network processors","authors":"M. Lavasani, Larry R. Dennison, Derek Chiou","doi":"10.1145/2145694.2145709","DOIUrl":"https://doi.org/10.1145/2145694.2145709","url":null,"abstract":"Gorilla is a methodology for generating FPGA-based solutions especially well suited for data parallel applications with fine grain irregularity. Irregularity simultaneously destroys performance and increases power consumption on many data parallel processors such as General Purpose Graphical Processor Units (GPGPUs). Gorilla achieves high performance and low power through the use of FPGA-tailored parallelization techniques and application-specific hardwired accelerators, processing engines, and communication mechanisms. Automatic compilation from a stylized C language and templates that define the hardware structure coupled with the intrinsic flexibility of FPGAs provide high performance, low power, and programmability.\u0000 Gorilla's capabilities are demonstrated through the generation of a family of core-router network processors processing up to 100Gbps (200MPPS for 64B packets) supporting any mix of IPv4, IPv6, and Multi-Protocol Label Switching (MPLS) packets on a single FPGA with off-chip IP lookup tables. A 40Gbps version of that network processor was run with an embedded test rig on a Xilinx Virtex-6 FPGA, verifying for performance and correctness. Its measured power consumption is comparable to full custom, commercial network processors. In addition, it is demonstrated how Gorilla can be used to generate merged virtual routers, saving FPGA resources.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"13 1","pages":"87-96"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81407894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Reconfigurable architecture and automated design flow for rapid FPGA-based LDPC code emulation 基于fpga的LDPC代码快速仿真的可重构架构和自动化设计流程

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145722

Haoran Li, Youn Sung Park, Zhengya Zhang

引用次数: 9

FPGA-RR: an enhanced FPGA architecture with RRAM-based reconfigurable interconnects (abstract only) FPGA- rr:基于rram的可重构互连的增强FPGA架构(仅摘要)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145751

J. Cong, Bingjun Xiao

{"title":"FPGA-RR: an enhanced FPGA architecture with RRAM-based reconfigurable interconnects (abstract only)","authors":"J. Cong, Bingjun Xiao","doi":"10.1145/2145694.2145751","DOIUrl":"https://doi.org/10.1145/2145694.2145751","url":null,"abstract":"In this study, we explore the use of Resistive RAMs (RRAMs) as candidates for programmable interconnects in FPGAs. An RRAM cell can be programmed between high resistance state and low resistance state, with an on/off ratio close to MOSFET. It provides an opportunity to use an RRAM as a routing switch at a much smaller area cost than its CMOS counterpart. RRAMs can be fabricated over CMOS circuits using CMOS-compatible processes to have a more compact gate array. Our recent work (presented in NanoArch'2011) demonstrated significant potential of area, delay, and power reduction from using RRAMs in FPGAs. But some design problems remain open. The programming of RRAM switches integrated in interconnects is one important problem. We show that the high-level architecture of programming circuits for RRAM switches should be modified to avoid potential logic hazard. Also the programming cells used in previous works have an area overhead even larger than RRAM itself. We manage to reduce this overhead significantly with utilization of the non-arbitrary pattern of RRAM integration in FPGA interconnects. In addition we suggest a novel buffering solution for FPGA interconnects in light of the low area cost of RRAM-based routing switch. We propose on-demand buffer insertion, where buffers can be connected to interconnects via RRAMs to dynamically reflect the demand of the netlist to map onto FPGA. Compared to conventional buffering solution which are pre-determined during fabrication and can only be optimized for general case, our solution shows further area savings and performance improvement. The resulting FPGA architecture using RRAM for programmable interconnects is named FPGA-RR. We provide a complete CAD flow for FPGA-RR.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"57 1","pages":"268"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76523478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A cycle-accurate, cycle-reproducible multi-FPGA system for accelerating multi-core processor simulation 一个周期精确，周期可重复的多fpga系统，用于加速多核处理器仿真

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145720

S. Asaad, Ralph Bellofatto, B. Brezzo, C. Haymes, M. Kapur, B. Parker, Thomas Roewer, P. Saha, T. Takken, J. Tierno

{"title":"A cycle-accurate, cycle-reproducible multi-FPGA system for accelerating multi-core processor simulation","authors":"S. Asaad, Ralph Bellofatto, B. Brezzo, C. Haymes, M. Kapur, B. Parker, Thomas Roewer, P. Saha, T. Takken, J. Tierno","doi":"10.1145/2145694.2145720","DOIUrl":"https://doi.org/10.1145/2145694.2145720","url":null,"abstract":"Software based tools for simulation are not keeping up with the demands for increased chip and system design complexity. In this paper, we describe a cycle-accurate and cycle-reproducible large-scale FPGA platform that is designed from the ground up to accelerate logic verification of the Bluegene/Q compute node ASIC, a multi-processor SOC implemented in IBM's 45 nm SOI CMOS technology. This paper discusses the challenges for constructing such large-scale FPGA platforms, including design partitioning, clocking & synchronization, and debugging support, as well as our approach for addressing these challenges without sacrificing cycle accuracy and cycle reproducibility. The resulting fullchip simulation of the Bluegene/Q compute node ASIC runs at a simulated processor clock speed of 4 MHz, over 100,000 times faster than the logic level software simulation of the same design. The vast increase in simulation speed provides a new capability in the design cycle that proved to be instrumental in logic verification as well as early software development and performance validation for Bluegene/Q.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"153-162"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90388928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 68

Limit study of energy & delay benefits of component-specific routing 限制对特定组件路由的能量和延迟效益的研究

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145710

Nikil Mehta, Raphael Rubin, A. DeHon

引用次数: 27

Thermal-aware logic block placement for 3D FPGAs considering lateral heat dissipation (abstract only) 考虑横向散热的3D fpga热感知逻辑块放置(仅摘要)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145749

Juinn-Dar Huang, Ya-Shih Huang, Mi-Yu Hsu, Han-Yuan Chang

{"title":"Thermal-aware logic block placement for 3D FPGAs considering lateral heat dissipation (abstract only)","authors":"Juinn-Dar Huang, Ya-Shih Huang, Mi-Yu Hsu, Han-Yuan Chang","doi":"10.1145/2145694.2145749","DOIUrl":"https://doi.org/10.1145/2145694.2145749","url":null,"abstract":"Three-dimensional (3D) integration is an attractive and promising technology to keep Moore's Law alive, whereas the thermal issue also presents a critical challenge for 3D integrated circuits. Meanwhile, accurate thermal analysis is very time-consuming and thus can hardly be incorporated into most of placement algorithms generally performing numerous iterative refinement steps. As a consequence, in this paper, we first present a fine-grained grid-based thermal model for the 3D regular FPGA architecture and also highlight that lateral heat dissipation paths can no longer be assumed negligible. Then we propose two fast thermal-aware placement algorithms for 3D FPGAs, Standard Deviation (SD) and MineSweeper (MS), in which rapid thermal evaluation instead of slow detailed analysis is utilized. Moreover, both take the lateral heat dissipation into consideration and focus on distributing heat sources more evenly within a layer in a 3D FPGA to avoid creating hotspots. Experimental results show that SD and MS achieve 12.1%/7.6% reduction in maximum temperature and 82%/56% improvement in temperature deviation compared with a classical thermal-unaware placement method only at the cost of minor increase in wirelength and delay. Moreover, MS merely consumes 4% more runtime for producing thermal-aware placement solutions.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"40 1","pages":"268"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73325653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A real-time stereo vision system using a tree-structured dynamic programming on FPGA 基于FPGA的树形结构动态规划实时立体视觉系统

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145698

Minxi Jin, T. Maruyama

引用次数: 21

OpenCL memory infrastructure for FPGAs (abstract only) fpga的OpenCL存储器基础结构(仅抽象)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145756

S. Chin, P. Chow

{"title":"OpenCL memory infrastructure for FPGAs (abstract only)","authors":"S. Chin, P. Chow","doi":"10.1145/2145694.2145756","DOIUrl":"https://doi.org/10.1145/2145694.2145756","url":null,"abstract":"Programming models assist developers in creating high performance computing systems by forming a higher level abstraction of the target platform. OpenCL has emerged as a standard programming model for heterogeneous systems and there has been recent activity combining OpenCL and FPGAs. This work introduces memory infrastructure for FPGAs and is designed for OpenCL style computation, complementing previous work. An Aggregating Memory Controller is implemented in hardware and aims to maximize bandwidth to external, large, high-latency, high-bandwidth memories by finding the minimal number of external memory burst requests from a vector of requests. A template processing array with soft-processor and hand-coded hardware elements was also designed to drive the memory controller. The Aggregating Memory Controller is described in terms of operation and future scalability and the created processing array is described as a flexible structure that can support many types of processing solutions. A hardware prototype of the memory controller and processing array was implemented on a Virtex-5 LX110T FPGA. Two micro-benchmarks were run on both the soft-processor elements and the hand-coded hardware cores to exercise the memory controller. Results for effective memory bandwidth within the system show that the high-latency can be hidden using the Aggregating Memory Controller by increasing the number of threads within the processing array.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"14 1","pages":"269-270"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77722925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Incremental clustering applied to radar deinterleaving: a parameterized FPGA implementation 增量聚类应用于雷达去交错:一个参数化的FPGA实现

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145699

Scott Bailie, M. Leeser

引用次数: 4