FPGA. ACM International Symposium on Field-Programmable Gate Arrays最新文献

筛选
英文 中文
Operation scheduling and architecture co-synthesis for energy-efficient dataflow computations on FPGAs (abstract only) fpga上高能效数据流计算的操作调度与架构协同合成(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145757
C. Y. Lin, N. Wong, Hayden Kwok-Hay So
{"title":"Operation scheduling and architecture co-synthesis for energy-efficient dataflow computations on FPGAs (abstract only)","authors":"C. Y. Lin, N. Wong, Hayden Kwok-Hay So","doi":"10.1145/2145694.2145757","DOIUrl":"https://doi.org/10.1145/2145694.2145757","url":null,"abstract":"Compiling high-level user applications for execution on FPGAs often involves synthesizing dataflow graphs beyond the size of the available on-chip computational resources. One way to address this is by folding the execution of the given dataflow graphs onto an array of directly connected simple configurable processing elements (CPEs). Under this scenario, the performance and energy-efficiency of the resulting system depends not only on the mapping schedule of the compute operations on the CPEs, but also on the topology of the interconnect array that connects the CPEs. This paper presents a framework in which the operation scheduler and the underlying CPE interconnect network topology are co-optimized on a per-application basis for energy-efficient FPGA computation. Given the same application, more than 2.5x difference in energy-efficiency was achievable by the use of different common regular array topologies to connect the CPEs. Moreover, by using irregular application-specific interconnect topologies derived from a genetic algorithm, up to 50% improvement in energy-delay-product was achievable when compared to the use of even the best regular topology. The use of such framework is anticipated to serve as part of a rapid high-level FPGA application compiler since minimum hardware place-and-route is needed to generate the optimal schedule and topology.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"38 8","pages":"270"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91433844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Timing yield improvement of FPGAs utilizing enhanced architectures and multiple configurations under process variation (abstract only) 利用增强架构和工艺变化下的多种配置的fpga时序良率改进(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145742
Fatemeh Sadat Pourhashemi, M. S. Zamani
{"title":"Timing yield improvement of FPGAs utilizing enhanced architectures and multiple configurations under process variation (abstract only)","authors":"Fatemeh Sadat Pourhashemi, M. S. Zamani","doi":"10.1145/2145694.2145742","DOIUrl":"https://doi.org/10.1145/2145694.2145742","url":null,"abstract":"Designing with field-programmable gate arrays (FPGAs) can face with difficulties due to process variations. Some techniques use reconfigurability of FPGAs to reduce the effects of process variations in these chips. Furthermore, FPGA architecture enhancement is an effective way to degrade the impact of variation. In this paper, various FPGA architectures are examined to identify which architecture can achieve larger parametric yield improvement utilizing multiple configurations as opposed to single configuration. Experimental results show that by increasing cluster size from 4 to 10, yield improvement increases from 2.82X to 4.48X. However, changing look-up table (LUT) size from 4 to 7 results in yield improvement degradation from 2.82X to 1.45X, using 10 configurations compared to single configuration over 20 MCNC benchmark circuits. These results indicate that multi-configuration technique causes larger timing yield improvement in FPGAs with larger cluster size and smaller LUT size.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"50 1","pages":"265"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82052725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Early timing estimation for system-level design using FPGAs (abstract only) 利用fpga进行系统级设计的早期时序估计(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145761
H. Andrade, Arkadeb Ghosal, Rhishikesh Limaye, S. Malik, N. Petersen, K. Ravindran, Trung N. Tran, Guoqiang Wang, Guang Yang
{"title":"Early timing estimation for system-level design using FPGAs (abstract only)","authors":"H. Andrade, Arkadeb Ghosal, Rhishikesh Limaye, S. Malik, N. Petersen, K. Ravindran, Trung N. Tran, Guoqiang Wang, Guang Yang","doi":"10.1145/2145694.2145761","DOIUrl":"https://doi.org/10.1145/2145694.2145761","url":null,"abstract":"FPGA devices provide flexible, fast, and low-cost prototyping and production solutions for system design. However, as the design complexity continues to rise, the design and synthesis iterations become a labor intensive and time consuming ordeal. Consequently, it becomes imperative to raise the level of abstraction for FPGA designs, while providing insight into performance metrics early in the design process. In particular, an important design time problem is to determine the maximum clock frequency that a circuit can achieve on a specific FPGA target before full synthesis and implementation. This early quantification can greatly help evaluate key design characteristics without reverting to tedious runs of the full implementation flow. In this work, we focus on the predictability of timing delay of circuits composed of high-level blocks on an FPGA. We are well aware of difficulties in tackling uncertainties in early timing estimation, e.g., an inherent gap between a high-level representation and gates/wires; extremely difficult delay estimation due to the randomness in physical design tools, etc. We show that the estimation uncertainties can be mitigated through a carefully characterized timing database of primitive building blocks and refined timing analysis models. We primarily focus on applications composed of data-intensive word-level arithmetic computations from the DSP domain and specified using static dataflow models. Our experiments indicate that for these applications, timing estimates can be obtained reliably within a good error margin on average and in the worst case. As future work, we plan to fine tune the timing database by modeling resource utilization effects and inter-primitive/actor routing delay via variants of Rent's rule and related efforts. We are also interested in exploring dynamic sub-cycle timing characterization.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"71 1","pages":"271"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78304771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental clustering applied to radar deinterleaving: a parameterized FPGA implementation 增量聚类应用于雷达去交错:一个参数化的FPGA实现
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145699
Scott Bailie, M. Leeser
{"title":"Incremental clustering applied to radar deinterleaving: a parameterized FPGA implementation","authors":"Scott Bailie, M. Leeser","doi":"10.1145/2145694.2145699","DOIUrl":"https://doi.org/10.1145/2145694.2145699","url":null,"abstract":"ICED (Incremental Clustering of Evolving Data) is a novel incremental clustering algorithm designed for data whose characteristics change over time. ICED is an unsupervised clustering technique that assumes no prior knowledge of the incoming data, and supports removing clusters that contain stale data. The user controls the FPGA implementation through a combination of compile time parameters (number of clusters) and run time parameters (distance threshold, fade cycle length). ICED has been applied to a radar application: pulse deinterleaving. ICED is the first implementation of incremental clustering on an FPGA of which we are aware. The implementation runs 39 times faster than an equivalent C implementation on a 3GHz Intel Xeon processor, and is capable of processing radar data in real time.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"346 1","pages":"25-28"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79667431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
FPGA-accelerated 3D reconstruction using compressive sensing 基于压缩感知的fpga加速三维重建
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145721
Jianwen Chen, J. Cong, Ming Yan, Yi Zou
{"title":"FPGA-accelerated 3D reconstruction using compressive sensing","authors":"Jianwen Chen, J. Cong, Ming Yan, Yi Zou","doi":"10.1145/2145694.2145721","DOIUrl":"https://doi.org/10.1145/2145694.2145721","url":null,"abstract":"The radiation dose associated with computerized tomography (CT) is significant. Optimization-based iterative reconstruction approaches, e.g., compressive sensing provide ways to reduce the radiation exposure, without sacrificing image quality. However, the computational requirement such algorithms is much higher than that of the conventional Filtered Back Projection (FBP) reconstruction algorithm. This paper describes an FPGA implementation of one important iterative kernel called EM, which is the major computation kernel of a recent EM+TV reconstruction algorithm. We show that a hybrid approach (CPU+GPU+FPGA) can deliver a better performance and energy efficiency than GPU-only solutions, providing 13X boost of throughput than a dual-core CPU implementation.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"23 1","pages":"163-166"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75513281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Thermal-aware logic block placement for 3D FPGAs considering lateral heat dissipation (abstract only) 考虑横向散热的3D fpga热感知逻辑块放置(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145749
Juinn-Dar Huang, Ya-Shih Huang, Mi-Yu Hsu, Han-Yuan Chang
{"title":"Thermal-aware logic block placement for 3D FPGAs considering lateral heat dissipation (abstract only)","authors":"Juinn-Dar Huang, Ya-Shih Huang, Mi-Yu Hsu, Han-Yuan Chang","doi":"10.1145/2145694.2145749","DOIUrl":"https://doi.org/10.1145/2145694.2145749","url":null,"abstract":"Three-dimensional (3D) integration is an attractive and promising technology to keep Moore's Law alive, whereas the thermal issue also presents a critical challenge for 3D integrated circuits. Meanwhile, accurate thermal analysis is very time-consuming and thus can hardly be incorporated into most of placement algorithms generally performing numerous iterative refinement steps. As a consequence, in this paper, we first present a fine-grained grid-based thermal model for the 3D regular FPGA architecture and also highlight that lateral heat dissipation paths can no longer be assumed negligible. Then we propose two fast thermal-aware placement algorithms for 3D FPGAs, Standard Deviation (SD) and MineSweeper (MS), in which rapid thermal evaluation instead of slow detailed analysis is utilized. Moreover, both take the lateral heat dissipation into consideration and focus on distributing heat sources more evenly within a layer in a 3D FPGA to avoid creating hotspots. Experimental results show that SD and MS achieve 12.1%/7.6% reduction in maximum temperature and 82%/56% improvement in temperature deviation compared with a classical thermal-unaware placement method only at the cost of minor increase in wirelength and delay. Moreover, MS merely consumes 4% more runtime for producing thermal-aware placement solutions.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"40 1","pages":"268"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73325653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Limit study of energy & delay benefits of component-specific routing 限制对特定组件路由的能量和延迟效益的研究
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145710
Nikil Mehta, Raphael Rubin, A. DeHon
{"title":"Limit study of energy & delay benefits of component-specific routing","authors":"Nikil Mehta, Raphael Rubin, A. DeHon","doi":"10.1145/2145694.2145710","DOIUrl":"https://doi.org/10.1145/2145694.2145710","url":null,"abstract":"As feature sizes scale toward atomic limits, parameter variation continues to increase, leading to increased margins in both delay and energy. The possibility of very slow devices on critical paths forces designers to increase transistor sizes, reduce clock speed and operate at higher voltages than desired in order to meet timing. With post-fabrication configurability, FPGAs have the opportunity to use slow devices on non-critical paths while selecting fast devices for critical paths. To understand the potential benefit we might gain from component-specific mapping, we quantify the margins associated with parameter variation in FPGAs over a wide range of predictive technologies (45nm-12nm) and gate sizes and show how these margins can be significantly reduced by delay-aware, component-specific routing. For the Toronto 20 benchmark set, we show that component-specific routing can eliminate delay margins induced by variation and reduce energy for energy minimal designs by 1.42-1.98×. We further show that these benefits increase as technology scales.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"34 1","pages":"97-106"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73327494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Algorithm and architecture optimization for large size two dimensional discrete fourier transform (abstract only) 大尺寸二维离散傅里叶变换的算法与结构优化(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145760
Berkin Akin, Peter Milder, F. Franchetti, J. Hoe
{"title":"Algorithm and architecture optimization for large size two dimensional discrete fourier transform (abstract only)","authors":"Berkin Akin, Peter Milder, F. Franchetti, J. Hoe","doi":"10.1145/2145694.2145760","DOIUrl":"https://doi.org/10.1145/2145694.2145760","url":null,"abstract":"We present a poster showcasing our FPGA implementations of two-dimensional discrete Fourier transform (2D-DFT) on large datasets that must reside off-chip in DRAM. These memory-bound large 2D-DFT computations are at the heart of important scientific computing and image processing applications. The central challenge in creating high-performance implementations is in the carefully orchestrated use of the available off-chip memory bandwidth and on-chip temporary storage. Our implementations derive their efficiency from a combined attention to both the algorithm design to enable efficient DRAM access patterns and datapath design to extract the maximum compute throughput at a given level of memory bandwidth. The poster reports results including a 1024x1024 double-precision 2D-DFT implementation on an Altera DE4 platform (based on a Stratix IV EP4SGX530 with 12 GB/s DRAM bandwidth) that reached over 16 Gflop/s, achieving a much higher ratio of performance-to-memory-bandwidth than both state-of-the-art CPU and GPU implementations.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"26 1","pages":"271"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78327641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Functionally verifying state saving and restoration in dynamically reconfigurable systems 动态可重构系统状态保存与恢复的功能验证
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145735
Lingkan Gong, O. Diessel
{"title":"Functionally verifying state saving and restoration in dynamically reconfigurable systems","authors":"Lingkan Gong, O. Diessel","doi":"10.1145/2145694.2145735","DOIUrl":"https://doi.org/10.1145/2145694.2145735","url":null,"abstract":"Dynamically reconfigurable systems increase design density and flexibility by allowing hardware modules to be swapped at run time. Systems that employ checkpointing, periodic or phased execution, preemptive multitasking and resource defragmentation, may also need to be able to save and restore the state of a module that is being reconfigured. Existing tools verify the functionality of a system that is undergoing reconfiguration. These tools can also be employed if state is accessed using application logic. However, when state is accessed via the configuration port, functional verification is hindered because the FPGA fabric, which mediates the transfer of state between the application logic and the configuration port, is not being simulated. We describe how to efficiently simulate those aspects of the fabric that are used in accessing module state. To the best of our knowledge, this work is the first to allow cycle-accurate simulation of a system partially reconfiguring both its logic and state and a case study shows that our method is effective in detecting device independent design errors.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"19 1","pages":"241-244"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90480805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A fast discrete placement algorithm for FPGAs fpga的快速离散布局算法
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145713
Qinghong Wu, K. McElvain
{"title":"A fast discrete placement algorithm for FPGAs","authors":"Qinghong Wu, K. McElvain","doi":"10.1145/2145694.2145713","DOIUrl":"https://doi.org/10.1145/2145694.2145713","url":null,"abstract":"Good FPGA placement is crucial to obtain the best Quality of Results (QoR) from FPGA hardware. Although many published global placement techniques place objects in a continuous ASIC-like environment, FPGAs are discrete in nature, and a continuous algorithm cannot always achieve superior QoR by itself. Therefore, discrete FPGA-specific detail placement algorithms are used to improve the global placement results. Unfortunately, most of these detail placement algorithms do not have a global view. This paper presents a discrete \"middle\" placer that fills the gap between the two placement steps. It works like simulated annealing, but leverages various acceleration techniques. It does not pay the runtime penalty typical of simulated annealing solutions. Experiments show that with this placer, final QoR is significantly better than with the global-detail placer approach.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"96 1","pages":"115-118"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86607705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信