FPGA. ACM International Symposium on Field-Programmable Gate Arrays最新文献

筛选
英文 中文
FPGA-accelerated 3D reconstruction using compressive sensing 基于压缩感知的fpga加速三维重建
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145721
Jianwen Chen, J. Cong, Ming Yan, Yi Zou
{"title":"FPGA-accelerated 3D reconstruction using compressive sensing","authors":"Jianwen Chen, J. Cong, Ming Yan, Yi Zou","doi":"10.1145/2145694.2145721","DOIUrl":"https://doi.org/10.1145/2145694.2145721","url":null,"abstract":"The radiation dose associated with computerized tomography (CT) is significant. Optimization-based iterative reconstruction approaches, e.g., compressive sensing provide ways to reduce the radiation exposure, without sacrificing image quality. However, the computational requirement such algorithms is much higher than that of the conventional Filtered Back Projection (FBP) reconstruction algorithm. This paper describes an FPGA implementation of one important iterative kernel called EM, which is the major computation kernel of a recent EM+TV reconstruction algorithm. We show that a hybrid approach (CPU+GPU+FPGA) can deliver a better performance and energy efficiency than GPU-only solutions, providing 13X boost of throughput than a dual-core CPU implementation.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"23 1","pages":"163-166"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75513281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
A configurable architecture to limit wakeup current in dynamically-controlled power-gated FPGAs 一种限制动态控制电源门控fpga唤醒电流的可配置架构
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145737
A. Bsoul, S. Wilton
{"title":"A configurable architecture to limit wakeup current in dynamically-controlled power-gated FPGAs","authors":"A. Bsoul, S. Wilton","doi":"10.1145/2145694.2145737","DOIUrl":"https://doi.org/10.1145/2145694.2145737","url":null,"abstract":"A dynamically-controlled power-gated (DCPG) FPGA architecture has recently been proposed to reduce static energy dissipation during idle periods. During a power mode transition from an off state to on state, the wakeup current drawn from power supplies causes a voltage droop on the power distribution network of a device. If not handled appropriately, this current and the associated voltage droop could cause malfunction of the design and/or the device. In DCPG FPGAs, the amount of wakeup current is not known beforehand as the structures of power-gated modules are application dependent; thus, a configurable solution is required to handle wakeup current. In this paper we propose a programmable wakeup architecture for DCPG FPGAs. The proposed solution has two levels: a fixed intra-region level and a configurable inter-region level. The architecture ensures that a power-gated module can be turned on such that the wakeup current constraints are not violated. We study the area and power overheads of the proposed solution. Our results show that the area overhead of the proposed inrush current limiting architecture is less than 2% for a power gating region of size 3x3 or 4x4 tiles, and the leakage power saved is more than 85% in a region of size 4x4 tiles.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"99 1","pages":"245-254"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80961522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs CONNECT:重新审视在fpga背景下设计noc的传统智慧
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145703
Michael Papamichael, J. Hoe
{"title":"CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs","authors":"Michael Papamichael, J. Hoe","doi":"10.1145/2145694.2145703","DOIUrl":"https://doi.org/10.1145/2145694.2145703","url":null,"abstract":"An FPGA is a peculiar hardware realization substrate in terms of the relative speed and cost of logic vs. wires vs. memory. In this paper, we present a Network-on-Chip (NoC) design study from the mindset of NoC as a synthesizable infrastructural element to support emerging System-on-Chip (SoC) applications on FPGAs. To support our study, we developed CONNECT, an NoC generator that can produce synthesizable RTL designs of FPGA-tuned multi-node NoCs of arbitrary topology. The CONNECT NoC architecture embodies a set of FPGA-motivated design principles that uniquely influence key NoC design decisions, such as topology, link width, router pipeline depth, network buffer sizing, and flow control. We evaluate CONNECT against a high-quality publicly available synthesizable RTL-level NoC design intended for ASICs. Our evaluation shows a significant gain in specializing NoC design decisions to FPGAs' unique mapping and operating characteristics. For example, in the case of a 4x4 mesh configuration evaluated using a set of synthetic traffic patterns, we obtain comparable or better performance than the state-of-the-art NoC while reducing logic resource cost by 58%, or alternatively, achieve 3-4x better performance for approximately the same logic resource usage. Finally, to demonstrate CONNECT's flexibility and extensive design space coverage, we also report synthesis and network performance results for several router configurations and for entire CONNECT networks.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"14 1","pages":"37-46"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90178568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 195
Securing netlist-level FPGA design through exploiting process variation and degradation 通过利用过程变化和退化来保护网络列表级FPGA设计
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145716
J. Zheng, M. Potkonjak
{"title":"Securing netlist-level FPGA design through exploiting process variation and degradation","authors":"J. Zheng, M. Potkonjak","doi":"10.1145/2145694.2145716","DOIUrl":"https://doi.org/10.1145/2145694.2145716","url":null,"abstract":"The continuously widening gap between the Non-Recurring Engineering(NRE) and Recurring Engineering (RE) costs of producing Integrated Circuit (IC) products in the past few decades gives high incentives to unauthorized cloning and reverse-engineering of ICs. Existing IC Digital Rights Management (DRM) schemes often demands high overhead in area, power, and performance, or require non-volatile storage. Our goal is to develop a novel Intellectual Property (IP) protection technique that offers universal protection to both Application-Specific Integrated Circuits (ASIC) and Field-Programmable Gate-Arrays (FPGAs) from unauthorized manufacturing and reverse engineering. In this paper we show a proof-of-concept implementation of the basic elements of the technique, as well as a case study of applying the anti-cloning technique to a nontrivial FPGA design.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"10 1","pages":"129-138"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90306345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Early timing estimation for system-level design using FPGAs (abstract only) 利用fpga进行系统级设计的早期时序估计(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145761
H. Andrade, Arkadeb Ghosal, Rhishikesh Limaye, S. Malik, N. Petersen, K. Ravindran, Trung N. Tran, Guoqiang Wang, Guang Yang
{"title":"Early timing estimation for system-level design using FPGAs (abstract only)","authors":"H. Andrade, Arkadeb Ghosal, Rhishikesh Limaye, S. Malik, N. Petersen, K. Ravindran, Trung N. Tran, Guoqiang Wang, Guang Yang","doi":"10.1145/2145694.2145761","DOIUrl":"https://doi.org/10.1145/2145694.2145761","url":null,"abstract":"FPGA devices provide flexible, fast, and low-cost prototyping and production solutions for system design. However, as the design complexity continues to rise, the design and synthesis iterations become a labor intensive and time consuming ordeal. Consequently, it becomes imperative to raise the level of abstraction for FPGA designs, while providing insight into performance metrics early in the design process. In particular, an important design time problem is to determine the maximum clock frequency that a circuit can achieve on a specific FPGA target before full synthesis and implementation. This early quantification can greatly help evaluate key design characteristics without reverting to tedious runs of the full implementation flow. In this work, we focus on the predictability of timing delay of circuits composed of high-level blocks on an FPGA. We are well aware of difficulties in tackling uncertainties in early timing estimation, e.g., an inherent gap between a high-level representation and gates/wires; extremely difficult delay estimation due to the randomness in physical design tools, etc. We show that the estimation uncertainties can be mitigated through a carefully characterized timing database of primitive building blocks and refined timing analysis models. We primarily focus on applications composed of data-intensive word-level arithmetic computations from the DSP domain and specified using static dataflow models. Our experiments indicate that for these applications, timing estimates can be obtained reliably within a good error margin on average and in the worst case. As future work, we plan to fine tune the timing database by modeling resource utilization effects and inter-primitive/actor routing delay via variants of Rent's rule and related efforts. We are also interested in exploring dynamic sub-cycle timing characterization.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"71 1","pages":"271"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78304771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Operation scheduling and architecture co-synthesis for energy-efficient dataflow computations on FPGAs (abstract only) fpga上高能效数据流计算的操作调度与架构协同合成(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145757
C. Y. Lin, N. Wong, Hayden Kwok-Hay So
{"title":"Operation scheduling and architecture co-synthesis for energy-efficient dataflow computations on FPGAs (abstract only)","authors":"C. Y. Lin, N. Wong, Hayden Kwok-Hay So","doi":"10.1145/2145694.2145757","DOIUrl":"https://doi.org/10.1145/2145694.2145757","url":null,"abstract":"Compiling high-level user applications for execution on FPGAs often involves synthesizing dataflow graphs beyond the size of the available on-chip computational resources. One way to address this is by folding the execution of the given dataflow graphs onto an array of directly connected simple configurable processing elements (CPEs). Under this scenario, the performance and energy-efficiency of the resulting system depends not only on the mapping schedule of the compute operations on the CPEs, but also on the topology of the interconnect array that connects the CPEs. This paper presents a framework in which the operation scheduler and the underlying CPE interconnect network topology are co-optimized on a per-application basis for energy-efficient FPGA computation. Given the same application, more than 2.5x difference in energy-efficiency was achievable by the use of different common regular array topologies to connect the CPEs. Moreover, by using irregular application-specific interconnect topologies derived from a genetic algorithm, up to 50% improvement in energy-delay-product was achievable when compared to the use of even the best regular topology. The use of such framework is anticipated to serve as part of a rapid high-level FPGA application compiler since minimum hardware place-and-route is needed to generate the optimal schedule and topology.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"38 8","pages":"270"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91433844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Timing yield improvement of FPGAs utilizing enhanced architectures and multiple configurations under process variation (abstract only) 利用增强架构和工艺变化下的多种配置的fpga时序良率改进(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145742
Fatemeh Sadat Pourhashemi, M. S. Zamani
{"title":"Timing yield improvement of FPGAs utilizing enhanced architectures and multiple configurations under process variation (abstract only)","authors":"Fatemeh Sadat Pourhashemi, M. S. Zamani","doi":"10.1145/2145694.2145742","DOIUrl":"https://doi.org/10.1145/2145694.2145742","url":null,"abstract":"Designing with field-programmable gate arrays (FPGAs) can face with difficulties due to process variations. Some techniques use reconfigurability of FPGAs to reduce the effects of process variations in these chips. Furthermore, FPGA architecture enhancement is an effective way to degrade the impact of variation. In this paper, various FPGA architectures are examined to identify which architecture can achieve larger parametric yield improvement utilizing multiple configurations as opposed to single configuration. Experimental results show that by increasing cluster size from 4 to 10, yield improvement increases from 2.82X to 4.48X. However, changing look-up table (LUT) size from 4 to 7 results in yield improvement degradation from 2.82X to 1.45X, using 10 configurations compared to single configuration over 20 MCNC benchmark circuits. These results indicate that multi-configuration technique causes larger timing yield improvement in FPGAs with larger cluster size and smaller LUT size.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"50 1","pages":"265"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82052725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm and architecture optimization for large size two dimensional discrete fourier transform (abstract only) 大尺寸二维离散傅里叶变换的算法与结构优化(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145760
Berkin Akin, Peter Milder, F. Franchetti, J. Hoe
{"title":"Algorithm and architecture optimization for large size two dimensional discrete fourier transform (abstract only)","authors":"Berkin Akin, Peter Milder, F. Franchetti, J. Hoe","doi":"10.1145/2145694.2145760","DOIUrl":"https://doi.org/10.1145/2145694.2145760","url":null,"abstract":"We present a poster showcasing our FPGA implementations of two-dimensional discrete Fourier transform (2D-DFT) on large datasets that must reside off-chip in DRAM. These memory-bound large 2D-DFT computations are at the heart of important scientific computing and image processing applications. The central challenge in creating high-performance implementations is in the carefully orchestrated use of the available off-chip memory bandwidth and on-chip temporary storage. Our implementations derive their efficiency from a combined attention to both the algorithm design to enable efficient DRAM access patterns and datapath design to extract the maximum compute throughput at a given level of memory bandwidth. The poster reports results including a 1024x1024 double-precision 2D-DFT implementation on an Altera DE4 platform (based on a Stratix IV EP4SGX530 with 12 GB/s DRAM bandwidth) that reached over 16 Gflop/s, achieving a much higher ratio of performance-to-memory-bandwidth than both state-of-the-art CPU and GPU implementations.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"26 1","pages":"271"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78327641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Functionally verifying state saving and restoration in dynamically reconfigurable systems 动态可重构系统状态保存与恢复的功能验证
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145735
Lingkan Gong, O. Diessel
{"title":"Functionally verifying state saving and restoration in dynamically reconfigurable systems","authors":"Lingkan Gong, O. Diessel","doi":"10.1145/2145694.2145735","DOIUrl":"https://doi.org/10.1145/2145694.2145735","url":null,"abstract":"Dynamically reconfigurable systems increase design density and flexibility by allowing hardware modules to be swapped at run time. Systems that employ checkpointing, periodic or phased execution, preemptive multitasking and resource defragmentation, may also need to be able to save and restore the state of a module that is being reconfigured. Existing tools verify the functionality of a system that is undergoing reconfiguration. These tools can also be employed if state is accessed using application logic. However, when state is accessed via the configuration port, functional verification is hindered because the FPGA fabric, which mediates the transfer of state between the application logic and the configuration port, is not being simulated. We describe how to efficiently simulate those aspects of the fabric that are used in accessing module state. To the best of our knowledge, this work is the first to allow cycle-accurate simulation of a system partially reconfiguring both its logic and state and a case study shows that our method is effective in detecting device independent design errors.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"19 1","pages":"241-244"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90480805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A fast discrete placement algorithm for FPGAs fpga的快速离散布局算法
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145713
Qinghong Wu, K. McElvain
{"title":"A fast discrete placement algorithm for FPGAs","authors":"Qinghong Wu, K. McElvain","doi":"10.1145/2145694.2145713","DOIUrl":"https://doi.org/10.1145/2145694.2145713","url":null,"abstract":"Good FPGA placement is crucial to obtain the best Quality of Results (QoR) from FPGA hardware. Although many published global placement techniques place objects in a continuous ASIC-like environment, FPGAs are discrete in nature, and a continuous algorithm cannot always achieve superior QoR by itself. Therefore, discrete FPGA-specific detail placement algorithms are used to improve the global placement results. Unfortunately, most of these detail placement algorithms do not have a global view. This paper presents a discrete \"middle\" placer that fills the gap between the two placement steps. It works like simulated annealing, but leverages various acceleration techniques. It does not pay the runtime penalty typical of simulated annealing solutions. Experiments show that with this placer, final QoR is significantly better than with the global-detail placer approach.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"96 1","pages":"115-118"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86607705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信