FPGA. ACM International Symposium on Field-Programmable Gate Arrays最新文献

筛选
英文 中文
Accelerator compiler for the VENICE vector processor 加速编译器的威尼斯矢量处理器
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145732
Zhiduo Liu, Aaron Severance, Satnam Singh, G. Lemieux
{"title":"Accelerator compiler for the VENICE vector processor","authors":"Zhiduo Liu, Aaron Severance, Satnam Singh, G. Lemieux","doi":"10.1145/2145694.2145732","DOIUrl":"https://doi.org/10.1145/2145694.2145732","url":null,"abstract":"This paper describes the compiler design for VENICE, a new soft vector processor (SVP). The compiler is a new back-end target for Microsoft Accelerator, a high-level data parallel library for C++ and C#. This allows us to automatically compile high-level programs into VENICE assembly code, thus avoiding the process of writing assembly code used by previous SVPs. Experimental results show the compiler can generate scalable parallel code with execution times that are comparable to hand-written VENICE assembly code. On data-parallel applications, VENICE at 100MHz on an Altera DE3 platform runs at speeds comparable to one core of a 3.5GHz Intel Xeon W3690 processor, beating it in performance on four of six benchmarks by up to 3.2x.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"79 1","pages":"229-232"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84071093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Compiling high throughput network processors 编译高吞吐量网络处理器
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145709
M. Lavasani, Larry R. Dennison, Derek Chiou
{"title":"Compiling high throughput network processors","authors":"M. Lavasani, Larry R. Dennison, Derek Chiou","doi":"10.1145/2145694.2145709","DOIUrl":"https://doi.org/10.1145/2145694.2145709","url":null,"abstract":"Gorilla is a methodology for generating FPGA-based solutions especially well suited for data parallel applications with fine grain irregularity. Irregularity simultaneously destroys performance and increases power consumption on many data parallel processors such as General Purpose Graphical Processor Units (GPGPUs). Gorilla achieves high performance and low power through the use of FPGA-tailored parallelization techniques and application-specific hardwired accelerators, processing engines, and communication mechanisms. Automatic compilation from a stylized C language and templates that define the hardware structure coupled with the intrinsic flexibility of FPGAs provide high performance, low power, and programmability.\u0000 Gorilla's capabilities are demonstrated through the generation of a family of core-router network processors processing up to 100Gbps (200MPPS for 64B packets) supporting any mix of IPv4, IPv6, and Multi-Protocol Label Switching (MPLS) packets on a single FPGA with off-chip IP lookup tables. A 40Gbps version of that network processor was run with an embedded test rig on a Xilinx Virtex-6 FPGA, verifying for performance and correctness. Its measured power consumption is comparable to full custom, commercial network processors. In addition, it is demonstrated how Gorilla can be used to generate merged virtual routers, saving FPGA resources.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"13 1","pages":"87-96"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81407894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Reconfigurable architecture and automated design flow for rapid FPGA-based LDPC code emulation 基于fpga的LDPC代码快速仿真的可重构架构和自动化设计流程
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145722
Haoran Li, Youn Sung Park, Zhengya Zhang
{"title":"Reconfigurable architecture and automated design flow for rapid FPGA-based LDPC code emulation","authors":"Haoran Li, Youn Sung Park, Zhengya Zhang","doi":"10.1145/2145694.2145722","DOIUrl":"https://doi.org/10.1145/2145694.2145722","url":null,"abstract":"Multitude of design freedoms of LDPC codes and practical decoders require fast simulations. FPGA emulation is attractive but inaccessible due to its design complexity. We propose a library and script based approach to automate the construction of FPGA emulations. Code parameters and design parameters are programmed either during run time or by script in design time. We demonstrate the architecture and design flow using the LDPC codes for the latest wireless communication standards: each emulation model was auto-constructed within one minute and the peak emulation throughput reached 3.8 Gb/s on a BEE3 platform.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"8 1","pages":"167-170"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81020244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
FPGA-RR: an enhanced FPGA architecture with RRAM-based reconfigurable interconnects (abstract only) FPGA- rr:基于rram的可重构互连的增强FPGA架构(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145751
J. Cong, Bingjun Xiao
{"title":"FPGA-RR: an enhanced FPGA architecture with RRAM-based reconfigurable interconnects (abstract only)","authors":"J. Cong, Bingjun Xiao","doi":"10.1145/2145694.2145751","DOIUrl":"https://doi.org/10.1145/2145694.2145751","url":null,"abstract":"In this study, we explore the use of Resistive RAMs (RRAMs) as candidates for programmable interconnects in FPGAs. An RRAM cell can be programmed between high resistance state and low resistance state, with an on/off ratio close to MOSFET. It provides an opportunity to use an RRAM as a routing switch at a much smaller area cost than its CMOS counterpart. RRAMs can be fabricated over CMOS circuits using CMOS-compatible processes to have a more compact gate array. Our recent work (presented in NanoArch'2011) demonstrated significant potential of area, delay, and power reduction from using RRAMs in FPGAs. But some design problems remain open. The programming of RRAM switches integrated in interconnects is one important problem. We show that the high-level architecture of programming circuits for RRAM switches should be modified to avoid potential logic hazard. Also the programming cells used in previous works have an area overhead even larger than RRAM itself. We manage to reduce this overhead significantly with utilization of the non-arbitrary pattern of RRAM integration in FPGA interconnects. In addition we suggest a novel buffering solution for FPGA interconnects in light of the low area cost of RRAM-based routing switch. We propose on-demand buffer insertion, where buffers can be connected to interconnects via RRAMs to dynamically reflect the demand of the netlist to map onto FPGA. Compared to conventional buffering solution which are pre-determined during fabrication and can only be optimized for general case, our solution shows further area savings and performance improvement. The resulting FPGA architecture using RRAM for programmable interconnects is named FPGA-RR. We provide a complete CAD flow for FPGA-RR.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"57 1","pages":"268"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76523478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A cycle-accurate, cycle-reproducible multi-FPGA system for accelerating multi-core processor simulation 一个周期精确,周期可重复的多fpga系统,用于加速多核处理器仿真
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145720
S. Asaad, Ralph Bellofatto, B. Brezzo, C. Haymes, M. Kapur, B. Parker, Thomas Roewer, P. Saha, T. Takken, J. Tierno
{"title":"A cycle-accurate, cycle-reproducible multi-FPGA system for accelerating multi-core processor simulation","authors":"S. Asaad, Ralph Bellofatto, B. Brezzo, C. Haymes, M. Kapur, B. Parker, Thomas Roewer, P. Saha, T. Takken, J. Tierno","doi":"10.1145/2145694.2145720","DOIUrl":"https://doi.org/10.1145/2145694.2145720","url":null,"abstract":"Software based tools for simulation are not keeping up with the demands for increased chip and system design complexity. In this paper, we describe a cycle-accurate and cycle-reproducible large-scale FPGA platform that is designed from the ground up to accelerate logic verification of the Bluegene/Q compute node ASIC, a multi-processor SOC implemented in IBM's 45 nm SOI CMOS technology. This paper discusses the challenges for constructing such large-scale FPGA platforms, including design partitioning, clocking & synchronization, and debugging support, as well as our approach for addressing these challenges without sacrificing cycle accuracy and cycle reproducibility. The resulting fullchip simulation of the Bluegene/Q compute node ASIC runs at a simulated processor clock speed of 4 MHz, over 100,000 times faster than the logic level software simulation of the same design. The vast increase in simulation speed provides a new capability in the design cycle that proved to be instrumental in logic verification as well as early software development and performance validation for Bluegene/Q.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"153-162"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90388928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
Limit study of energy & delay benefits of component-specific routing 限制对特定组件路由的能量和延迟效益的研究
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145710
Nikil Mehta, Raphael Rubin, A. DeHon
{"title":"Limit study of energy & delay benefits of component-specific routing","authors":"Nikil Mehta, Raphael Rubin, A. DeHon","doi":"10.1145/2145694.2145710","DOIUrl":"https://doi.org/10.1145/2145694.2145710","url":null,"abstract":"As feature sizes scale toward atomic limits, parameter variation continues to increase, leading to increased margins in both delay and energy. The possibility of very slow devices on critical paths forces designers to increase transistor sizes, reduce clock speed and operate at higher voltages than desired in order to meet timing. With post-fabrication configurability, FPGAs have the opportunity to use slow devices on non-critical paths while selecting fast devices for critical paths. To understand the potential benefit we might gain from component-specific mapping, we quantify the margins associated with parameter variation in FPGAs over a wide range of predictive technologies (45nm-12nm) and gate sizes and show how these margins can be significantly reduced by delay-aware, component-specific routing. For the Toronto 20 benchmark set, we show that component-specific routing can eliminate delay margins induced by variation and reduce energy for energy minimal designs by 1.42-1.98×. We further show that these benefits increase as technology scales.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"34 1","pages":"97-106"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73327494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Thermal-aware logic block placement for 3D FPGAs considering lateral heat dissipation (abstract only) 考虑横向散热的3D fpga热感知逻辑块放置(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145749
Juinn-Dar Huang, Ya-Shih Huang, Mi-Yu Hsu, Han-Yuan Chang
{"title":"Thermal-aware logic block placement for 3D FPGAs considering lateral heat dissipation (abstract only)","authors":"Juinn-Dar Huang, Ya-Shih Huang, Mi-Yu Hsu, Han-Yuan Chang","doi":"10.1145/2145694.2145749","DOIUrl":"https://doi.org/10.1145/2145694.2145749","url":null,"abstract":"Three-dimensional (3D) integration is an attractive and promising technology to keep Moore's Law alive, whereas the thermal issue also presents a critical challenge for 3D integrated circuits. Meanwhile, accurate thermal analysis is very time-consuming and thus can hardly be incorporated into most of placement algorithms generally performing numerous iterative refinement steps. As a consequence, in this paper, we first present a fine-grained grid-based thermal model for the 3D regular FPGA architecture and also highlight that lateral heat dissipation paths can no longer be assumed negligible. Then we propose two fast thermal-aware placement algorithms for 3D FPGAs, Standard Deviation (SD) and MineSweeper (MS), in which rapid thermal evaluation instead of slow detailed analysis is utilized. Moreover, both take the lateral heat dissipation into consideration and focus on distributing heat sources more evenly within a layer in a 3D FPGA to avoid creating hotspots. Experimental results show that SD and MS achieve 12.1%/7.6% reduction in maximum temperature and 82%/56% improvement in temperature deviation compared with a classical thermal-unaware placement method only at the cost of minor increase in wirelength and delay. Moreover, MS merely consumes 4% more runtime for producing thermal-aware placement solutions.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"40 1","pages":"268"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73325653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A real-time stereo vision system using a tree-structured dynamic programming on FPGA 基于FPGA的树形结构动态规划实时立体视觉系统
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145698
Minxi Jin, T. Maruyama
{"title":"A real-time stereo vision system using a tree-structured dynamic programming on FPGA","authors":"Minxi Jin, T. Maruyama","doi":"10.1145/2145694.2145698","DOIUrl":"https://doi.org/10.1145/2145694.2145698","url":null,"abstract":"Many hardware systems for stereo vision have been proposed. Their processing speed is very fast, but the algorithms used in them are limited in order to achieve the high processing speed by simplifying the sequences of the memory accesses and operations. The error rates by them can not compete with those by software programs. In this paper, we describe an FPGA implementation of a tree-structured dynamic programming algorithm. The computational complexity of this algorithm is higher than those by previous hardware systems, but the processing speed of our system is still fast enough for real-time applications, and its error rate is competitive with software algorithms.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"4 1","pages":"21-24"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73468018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
OpenCL memory infrastructure for FPGAs (abstract only) fpga的OpenCL存储器基础结构(仅抽象)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145756
S. Chin, P. Chow
{"title":"OpenCL memory infrastructure for FPGAs (abstract only)","authors":"S. Chin, P. Chow","doi":"10.1145/2145694.2145756","DOIUrl":"https://doi.org/10.1145/2145694.2145756","url":null,"abstract":"Programming models assist developers in creating high performance computing systems by forming a higher level abstraction of the target platform. OpenCL has emerged as a standard programming model for heterogeneous systems and there has been recent activity combining OpenCL and FPGAs. This work introduces memory infrastructure for FPGAs and is designed for OpenCL style computation, complementing previous work. An Aggregating Memory Controller is implemented in hardware and aims to maximize bandwidth to external, large, high-latency, high-bandwidth memories by finding the minimal number of external memory burst requests from a vector of requests. A template processing array with soft-processor and hand-coded hardware elements was also designed to drive the memory controller. The Aggregating Memory Controller is described in terms of operation and future scalability and the created processing array is described as a flexible structure that can support many types of processing solutions. A hardware prototype of the memory controller and processing array was implemented on a Virtex-5 LX110T FPGA. Two micro-benchmarks were run on both the soft-processor elements and the hand-coded hardware cores to exercise the memory controller. Results for effective memory bandwidth within the system show that the high-latency can be hidden using the Aggregating Memory Controller by increasing the number of threads within the processing array.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"14 1","pages":"269-270"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77722925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Incremental clustering applied to radar deinterleaving: a parameterized FPGA implementation 增量聚类应用于雷达去交错:一个参数化的FPGA实现
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145699
Scott Bailie, M. Leeser
{"title":"Incremental clustering applied to radar deinterleaving: a parameterized FPGA implementation","authors":"Scott Bailie, M. Leeser","doi":"10.1145/2145694.2145699","DOIUrl":"https://doi.org/10.1145/2145694.2145699","url":null,"abstract":"ICED (Incremental Clustering of Evolving Data) is a novel incremental clustering algorithm designed for data whose characteristics change over time. ICED is an unsupervised clustering technique that assumes no prior knowledge of the incoming data, and supports removing clusters that contain stale data. The user controls the FPGA implementation through a combination of compile time parameters (number of clusters) and run time parameters (distance threshold, fade cycle length). ICED has been applied to a radar application: pulse deinterleaving. ICED is the first implementation of incremental clustering on an FPGA of which we are aware. The implementation runs 39 times faster than an equivalent C implementation on a 3GHz Intel Xeon processor, and is capable of processing radar data in real time.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"346 1","pages":"25-28"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79667431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信