2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines最新文献

筛选
英文 中文
Accurate Thermal-Profile Estimation and Validation for FPGA-Mapped Circuits fpga映射电路的精确热分布估计和验证
A. Amouri, H. Amrouch, T. Ebi, J. Henkel, M. Tahoori
{"title":"Accurate Thermal-Profile Estimation and Validation for FPGA-Mapped Circuits","authors":"A. Amouri, H. Amrouch, T. Ebi, J. Henkel, M. Tahoori","doi":"10.1109/FCCM.2013.48","DOIUrl":"https://doi.org/10.1109/FCCM.2013.48","url":null,"abstract":"Accurate thermal profile estimation for FPGA, at design time, is necessary to avoid unexpected thermal hot-spots in the circuit before deploying the FPGA to the in-field operation. Both accurate dynamic and leakage power values are needed for the thermal profile estimation and they can be estimated using the FPGA vendor's tools. However these report leakage power as a single value for the whole chip, and no details are given in literature or the FPGA toolset about its distribution across the FPGA chip for the thermal simulation. To cope with this problem, we present a method for properly distributing the leakage power across the FPGA chip. The method uses a temperature-leakage loop estimation model for distributing and adapting the leakage power for more accurate thermal simulation. Furthermore, to accurately calibrate the presented method and its model and also to validate the resulting thermal profiles, we utilize an infrared thermal camera, which measures the emissions from the backside of a Virtex-5 FPGA chip. The results of testing several designs, with different sizes and frequencies, show that our approach can achieve accurate thermal-profile estimation when compared to the camera measurements, with average absolute estimation error of around 1°C across the chip.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114787891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
High Speed Video Processing Using Fine-Grained Processing on FPGA Platform 基于FPGA平台的细粒度高速视频处理
Z. Ang, Akash Kumar, Yajun Ha
{"title":"High Speed Video Processing Using Fine-Grained Processing on FPGA Platform","authors":"Z. Ang, Akash Kumar, Yajun Ha","doi":"10.1109/FCCM.2013.32","DOIUrl":"https://doi.org/10.1109/FCCM.2013.32","url":null,"abstract":"This summary paper1 proposes an FPGA-based array processor which performs Laplacian filtering on a 40 by 40 pixel grayscale video. The architecture comprises of bit-serial pixel processors interconnected to give a two-dimensional mesh array. This architecture features the novel use of partial reconfiguration which transfers data to and fro the array. Each processor occupies a configurable logic block and achieves a target frame rate of 10000 frames per second, at an operating frequency of 0.31 MHz on the Virtex-6 ML605 Evaluation Kit. The detailed correspondence between the contents of slice lookup tables and the Virtex-6 bitstream format is also documented.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115460457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
High-Level Description and Synthesis of Floating-Point Accumulators on FPGA FPGA上浮点累加器的高级描述与综合
Marc-André Daigneault, J. David
{"title":"High-Level Description and Synthesis of Floating-Point Accumulators on FPGA","authors":"Marc-André Daigneault, J. David","doi":"10.1109/FCCM.2013.37","DOIUrl":"https://doi.org/10.1109/FCCM.2013.37","url":null,"abstract":"Decades of research in the field of high level hardware description now result in tools that are able to automatically transform C/C++ constructs into highly optimized parallel and pipelined architectures. Such approaches work fine when the control flow is a priory known since the computation results in a large dataflow graph that can be mapped into the available operators. Nevertheless, some applications have a control flow that is highly dependant on the data. This paper focuses on the hardware implementation of such applications and presents a high level synthesis methodology applied to a Hardware Description Language (HDL) in which assignments correspond to self-synchronized connections between predefined data streaming sources and sinks. A data transfer occurs over an established connection when both source and sink are ready, according to their synchronization interfaces. Founded on a high-level communicating FSM programming model, the language allows the user to describe and dynamically modify streaming architectures exploiting spatial and temporal parallelism. Our compiler attempts to maximize the number of transfers at each clock cycle and automatically fixes the potential combinatorial loops induced by the dynamic connection of dependant sources and sinks. The methodology is applied to the synthesis of a pipelined floating point accumulator using the Delayed-Buffering (DB) reduction method. The results we obtain are similar to state-of-the-art dedicated architectures but require much less design time and expertise.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116418531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On Optimizing the Arithmetic Precision of MCMC Algorithms MCMC算法的算法精度优化
Grigorios Mingas, Farhan Rahman, C. Bouganis
{"title":"On Optimizing the Arithmetic Precision of MCMC Algorithms","authors":"Grigorios Mingas, Farhan Rahman, C. Bouganis","doi":"10.1109/FCCM.2013.31","DOIUrl":"https://doi.org/10.1109/FCCM.2013.31","url":null,"abstract":"Markov Chain Monte Carlo (MCMC) is an ubiquitous stochastic method, used to draw random samples from arbitrary probability distributions, such as the ones encountered in Bayesian inference. MCMC often requires forbiddingly long runtimes to give a representative sample in problems with high dimensions and large-scale data. Field-Programmable Gate Arrays (FPGAs) have proven to be a suitable platform for MCMC acceleration due to their ability to support massive parallelism. This paper introduces an automated method, which minimizes the floating point precision of the most computationally intensive part of an FPGA-mapped MCMC sampler, while keeping the precision-related bias in the output within a user-specified tolerance. The method is based on an efficient bias estimator, proposed here, which is able to estimate the bias in the output with only few random samples. The optimization process involves FPGA pre-runs, which estimate the bias and choose the optimized precision. This precision is then used to reconfigure the FPGA for the final, long MCMC run, allowing for higher sampling throughputs. The process requires no user intervention. The method is tested on two Bayesian inference case studies: Mixture models and neural network regression. The achieved speedups over double-precision FPGA designs were 3.5x-5x (including the optimization overhead). Comparisons with a sequential CPU and a GPGPU showed speedups of 223x-446x and 16x-18x respectively.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115292366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Parallel Computation of Skyline Queries Skyline查询的并行计算
L. Woods, G. Alonso, J. Teubner
{"title":"Parallel Computation of Skyline Queries","authors":"L. Woods, G. Alonso, J. Teubner","doi":"10.1109/FCCM.2013.18","DOIUrl":"https://doi.org/10.1109/FCCM.2013.18","url":null,"abstract":"Due to stagnant clock speeds and high power consumption of commodity microprocessors, database vendors have started to explore massively parallel co-processors such as FPGAs to further increase performance. A typical approach is to push simple but compute-intensive operations (e.g., prefiltering, (de)compression) to FPGAs for acceleration. In this paper, we show how a significantly more complex operation- the computation of the skyline-can be holistically implemented on an FPGA. A skyline query computes the pareto optimal set of multi-dimensional data points. These queries have been studied in software extensively over the last decade but this paper is the first to examine skyline computation in hardware. We propose a methodology that interleaves data storage and computation, allowing multiple operations to be executed on the same working set in parallel, while accounting for all data dependencies. Our experiments show that we achieve very promising results compared to CPU-based solutions.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124398941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Reconfigurable Acceleration of Short Read Mapping 短读映射的可重构加速
James Arram, K. H. Tsoi, W. Luk, P. Jiang
{"title":"Reconfigurable Acceleration of Short Read Mapping","authors":"James Arram, K. H. Tsoi, W. Luk, P. Jiang","doi":"10.1109/FCCM.2013.57","DOIUrl":"https://doi.org/10.1109/FCCM.2013.57","url":null,"abstract":"Recent improvements in the throughput of nextgeneration DNA sequencing machines poses a great computational challenge in analysing the massive quantities of data produced. This paper proposes a novel approach, based on reconfigurable computing technology, for accelerating short read mapping, where the positions of millions of short reads are located relative to a known reference sequence. Our approach consists of two key components: an exact string matcher for the bulk of the alignment process, and an approximate string matcher for the remaining cases. We characterise interesting regions of the design space, including homogeneous, heterogeneous and run-time reconfigurable designs and provide back of envelope estimations of the corresponding performance. We show that a particular implementation of this architecture targeting a single FPGA can be up to 293 times faster than BWA on an Intel X5650 CPU, and 134 times faster than SOAP3 on an NVIDIA GTX 580 GPU.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131028315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
An FPGA Based PCI-E Root Complex Architecture for Standalone SOPCs 基于FPGA的独立单片机PCI-E根复合体结构
Yingjie Cao, Yongxin Zhu, Xu Wang, Jiang Jiang, Meikang Qiu
{"title":"An FPGA Based PCI-E Root Complex Architecture for Standalone SOPCs","authors":"Yingjie Cao, Yongxin Zhu, Xu Wang, Jiang Jiang, Meikang Qiu","doi":"10.1109/FCCM.2013.29","DOIUrl":"https://doi.org/10.1109/FCCM.2013.29","url":null,"abstract":"We present an FPGA (field programmable gate array) based PCI-E (PCI-Express) root complex architecture for SOPCs (System-on-a-Programmable-Chip) in this paper. In our work, the system on the FPGA serves as a PCIE master device rather than a PCIE endpoint, which is usually a common practice as a co-processing device driven by a desktop computer or a server. We use this system to control a PCIE endpoint, which is also an FPGA based endpoint implemented on another FPGA board. This architecture requires only IP cores free of charge. We also provide basic software driver so that specific device driver can be developed on it to control popular PCIE device in the future, i.e. ethernet card or graphic card. The whole architecture has been implemented on Xilinx Virtex-6 FPGAs to indicate that this architecture is a feasible approach to standalone SOPCs, which has better efficiencies than those with additional generic controlling processors.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131262923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automating Elimination of Idle Functions by Run-Time Reconfiguration 通过运行时重新配置自动消除空闲函数
Xinyu Niu, T. Chau, Qiwei Jin, W. Luk, Qiang Liu, O. Pell
{"title":"Automating Elimination of Idle Functions by Run-Time Reconfiguration","authors":"Xinyu Niu, T. Chau, Qiwei Jin, W. Luk, Qiang Liu, O. Pell","doi":"10.1145/2700415","DOIUrl":"https://doi.org/10.1145/2700415","url":null,"abstract":"A design approach is proposed to automatically identify and exploit run-time reconfiguration opportunities while optimising resource utilisation. We introduce Reconfiguration Data Flow Graph, a hierarchical graph structure enabling reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and run-time solution generation. Three applications, based on barrier option pricing, particle filter, and reverse time migration are used in evaluating the proposed approach. The run-time solutions approximate the theoretical performance by eliminating idle functions, and are 1.31 to 2.19 times faster than optimised static designs. FPGA designs developed with the proposed approach are up to 28.8 times faster than optimised CPU reference designs and 1.55 times faster than optimised GPU designs.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"2 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131437428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Image Segmentation Using Hardware Forest Classifiers
Richard Neil Pittman, A. Forin, A. Criminisi, J. Shotton, A. Mahram
{"title":"Image Segmentation Using Hardware Forest Classifiers","authors":"Richard Neil Pittman, A. Forin, A. Criminisi, J. Shotton, A. Mahram","doi":"10.1109/FCCM.2013.20","DOIUrl":"https://doi.org/10.1109/FCCM.2013.20","url":null,"abstract":"Image segmentation is the process of partitioning an image into segments or subsets of pixels for purposes of further analysis, such as separating the interesting objects in the foreground from the un-interesting objects in the background. In many image processing applications, the process requires a sequence of computational steps on a per pixel basis, thereby binding the performance to the size and resolution of the image. As applications require greater resolution and larger images the computational resources of this step can quickly exceed those of available CPUs, especially in the power and thermal constrained areas of consumer electronics and mobile. In this work, we use a hardware tree-based classifier to solve the image segmentation problem. The application is background removal (BGR) from depth-maps obtained from the Microsoft Kinect sensor. After the image is segmented, subsequent steps then classify the objects in the scene. The approach is flexible: to address different application domains we only need to change the trees used by the classifiers. We describe two distinct approaches and evaluate their performance using the commercial-grade testing environment used for the Microsoft Xbox gaming console.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123974294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Open-Source Bitstream Generation 开源比特流生成
Ritesh Soni, Neil Steiner, M. French
{"title":"Open-Source Bitstream Generation","authors":"Ritesh Soni, Neil Steiner, M. French","doi":"10.1109/FCCM.2013.45","DOIUrl":"https://doi.org/10.1109/FCCM.2013.45","url":null,"abstract":"This work presents an open-source bitstream generation tool for Torc. Bitstream generation has traditionally been the single part of the FPGA design flow that could not be openly reproduced, but our novel approach enables this without reverse-engineering or violating End-User License Agreement terms. We begin by creating a library of “micro-bitstreams” which constitute a collection of primitives at a granularity of our choosing. These primitives can then be combined to create larger designs, or portions thereof, with simple merging operations. Our effort is motivated by a desire to resume earlier work on embedded bitstream generation and autonomous hardware. This is not feasible with Xilinx bitgen because there is no reasonable way to run an x86 binary with complex library and data dependencies on most embedded systems. Initial support is limited to the Virtex5, but we intend to extend this to other Xilinx architectures. We are able to support nearly all routing resources in the device, as well as the most common logic resources.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129189278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信