2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines最新文献_第2页

A Mixed Precision Methodology for Mathematical Optimisation 数学优化的混合精度方法学

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.16

G. C. Chow, W. Luk, P. Leong

引用次数: 1

Hardware Acceleration of Short Read Mapping 短读映射的硬件加速

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.36

C. Olson, Maria Kim, Cooper Clauson, B. Kogon, C. Ebeling, S. Hauck, W. L. Ruzzo

引用次数: 116

A Heterogeneous Architecture for Evaluating Real-Time One-Dimensional Computational Fluid Dynamics on FPGAs fpga实时一维计算流体动力学评估的异构体系结构

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.31

Isaac Liu, Edward A. Lee, M. Viele, Guoqiang Wang, H. Andrade

{"title":"A Heterogeneous Architecture for Evaluating Real-Time One-Dimensional Computational Fluid Dynamics on FPGAs","authors":"Isaac Liu, Edward A. Lee, M. Viele, Guoqiang Wang, H. Andrade","doi":"10.1109/FCCM.2012.31","DOIUrl":"https://doi.org/10.1109/FCCM.2012.31","url":null,"abstract":"Many fuel systems for diesel engines are developed with the help of commercial one-dimensional computational fluid dynamics (1D CFD) solvers that model and simulate the behavior of fluid flow through the interconnected pipes off-line. This paper presents a novel framework to evaluate 1D CFD models in real time on an FPGA. This improves fuel pressure estimation and closes the loop on fuel delivery, allowing for a cleaner and more efficient engine. The real-time requirements of the models are defined by the physics and geometry of the problem being solved. In this framework, the interconnected pipes are partitioned into individual sub-volumes that compute their pressure and flow rate every time step based upon neighboring values. We use timing-based synchronization and multiple Precision Timed (PRET) processor cores to ensure the real-time constraints are met. Leveraging the programmability of FPGAs, we use a configurable heterogeneous architecture to save hardware resources. Several examples are presented along with the implementation results after place and route for a Xilinx Virtex 6 FPGA. The results demonstrate the resource savings and scalability of our framework, confirming the feasibility of our approach -- solving 1D CFD models in real time on FPGAs.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115163720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

VENICE: A Compact Vector Processor for FPGA Applications 用于FPGA应用的紧凑型矢量处理器

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.55

Aaron Severance, G. Lemieux

引用次数: 81

Exploiting Memory-Level Parallelism in Reconfigurable Accelerators 在可重构加速器中利用内存级并行性

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.35

Shaoyi Cheng, Mingjie Lin, H. Liu, S. Scott, J. Wawrzynek

引用次数: 13

Fixed Point Lanczos: Sustaining TFLOP-equivalent Performance in FPGAs for Scientific Computing 在科学计算中维持fpga的tflop等效性能

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.19

J. Jerez, G. Constantinides, E. Kerrigan

{"title":"Fixed Point Lanczos: Sustaining TFLOP-equivalent Performance in FPGAs for Scientific Computing","authors":"J. Jerez, G. Constantinides, E. Kerrigan","doi":"10.1109/FCCM.2012.19","DOIUrl":"https://doi.org/10.1109/FCCM.2012.19","url":null,"abstract":"We consider the problem of enabling fixed-point implementations of linear algebra kernels to match the strengths of the field-programmable gate array (FPGA). Algorithms for solving linear equations, finding eigen values or finding singular values are typically nonlinear and recursive making the problem of establishing analytical bounds on variable dynamic range non-trivial. Current approaches fail to provide tight bounds for this type of algorithms. We use as a case study one of the most important kernels in scientific computing, the Lanczos iteration, which lies at the heart of well known methods such as conjugate gradient and minimum residual, and we show how we can modify the algorithm to allow us to apply standard linear algebra analysis to prove tight analytical bounds on all variables of the process, regardless of the properties of the original matrix. It is shown that the numerical behaviour of fixed-point implementations of the modified problem can be chosen to be at least as good as a double precision floating point implementation. Using this approach it is possible to get sustained FPGA performance very close to the peak general-purpose graphics processing unit (GPGPU) performance in FPGAs of comparable size when solving a single problem. If there are several independent problems to solve simultaneously it is possible to exceed the peak floating-point performance of a GPGPU, obtaining approximately 1, 2 or 4 TFLOPs for error tolerances of 10-7, 10-5 and 10-3, respectively, in a large Virtex 7 FPGA.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131327274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Emulating Mammalian Vision on Reconfigurable Hardware 在可重构硬件上模拟哺乳动物视觉

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.33

S. Kestur, Mi Sun Park, J. Sabarad, D. Dantara, N. Vijaykrishnan, Yang Chen, D. Khosla

引用次数: 32

FPGA-based Acceleration for Tracking Audio Effects in Movies 基于fpga的电影音频效果跟踪加速

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.24

M. Psarakis, A. Pikrakis, Giannis Dendrinos

{"title":"FPGA-based Acceleration for Tracking Audio Effects in Movies","authors":"M. Psarakis, A. Pikrakis, Giannis Dendrinos","doi":"10.1109/FCCM.2012.24","DOIUrl":"https://doi.org/10.1109/FCCM.2012.24","url":null,"abstract":"In this paper we propose an FPGA-based hardware platform to accelerate an audio tracking method. Our tracking approach is inspired by the problem of molecular sequence alignment and adopts a well-known dynamic programming algorithm (Smith-Waterman algorithm) from the area of bioinformatics. However, the high computational complexity of such algorithms imposes a significant barrier to their adoption by audio tracking systems. To alleviate the time-consuming problem and achieve realistic response times, we propose the acceleration of computationally intensive parts of our tracking method using an FPGA-based platform. Our FPGA accelerator is actually based on the systolization of the Smith-Waterman algorithm proposed in previous approaches for the acceleration of bio-sequence scanning but the special requirements of the audio tracking method impose significant design challenges in the accelerator architecture. The accelerator has been implemented in a Xilinx Virtex-5 device and the experimental results show that it achieves significant speedup compared with the software implementation of the tracking method. The proposed approach has been tested in the context of detecting animal sounds in audio streams from movies, where a basic requirement is to reduce the noisiness of the detection results by means of exploiting the statistical nature of the scores that are generated by the dynamic programming algorithm.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"268 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123410093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Bluehive - A field-programable custom computing machine for extreme-scale real-time neural network simulation Bluehive -一个现场可编程的定制计算机器，用于极端规模的实时神经网络模拟

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.32

S. Moore, P. Fox, Steven Jt, Marsh, A. T. Markettos, A. Mujumdar

引用次数: 102

Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA? 加速随机森林分类器:多核、GP-GPU还是FPGA?

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.47

B. V. Essen, C. Macaraeg, M. Gokhale, R. Prenger

{"title":"Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA?","authors":"B. V. Essen, C. Macaraeg, M. Gokhale, R. Prenger","doi":"10.1109/FCCM.2012.47","DOIUrl":"https://doi.org/10.1109/FCCM.2012.47","url":null,"abstract":"Random forest classification is a well known machine learning technique that generates classifiers in the form of an ensemble (\"forest\") of decision trees. The classification of an input sample is determined by the majority classification by the ensemble. Traditional random forest classifiers can be highly effective, but classification using a random forest is memory bound and not typically suitable for acceleration using FPGAs or GP-GPUs due to the need to traverse large, possibly irregular decision trees. Recent work at Lawrence Livermore National Laboratory has developed several variants of random forest classifiers, including the Compact Random Forest (CRF), that can generate decision trees more suitable for acceleration than traditional decision trees. Our paper compares and contrasts the effectiveness of FPGAs, GP-GPUs, and multi-core CPUs for accelerating classification using models generated by compact random forest machine learning classifiers. Taking advantage of training algorithms that can produce compact random forests composed of many, small trees rather than fewer, deep trees, we are able to regularize the forest such that the classification of any sample takes a deterministic amount of time. This optimization then allows us to execute the classifier in a pipelined or single-instruction multiple thread (SIMT) fashion. We show that FPGAs provide the highest performance solution, but require a multi-chip / multi-board system to execute even modest sized forests. GP-GPUs offer a more flexible solution with reasonably high performance that scales with forest size. Finally, multi-threading via Open MP on a shared memory system was the simplest solution and provided near linear performance that scaled with core count, but was still significantly slower than the GP-GPU and FPGA.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124949972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 148