2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines最新文献

筛选
英文 中文
A Mixed Precision Methodology for Mathematical Optimisation 数学优化的混合精度方法学
G. C. Chow, W. Luk, P. Leong
{"title":"A Mixed Precision Methodology for Mathematical Optimisation","authors":"G. C. Chow, W. Luk, P. Leong","doi":"10.1109/FCCM.2012.16","DOIUrl":"https://doi.org/10.1109/FCCM.2012.16","url":null,"abstract":"This paper introduces a novel mixed precision methodology for mathematical optimisation. It involves the use of reduced precision FPGA optimisers for searching potential regions containing the global optimum, and double precision optimisers on a general purpose processor (GPP) for verifying the results. An empirical method is proposed to determine parameters of the mixed precision methodology running on a reconfigurable accelerator consisting of FPGA and GPP. The effectiveness of our approach is evaluated using a set of optimisation benchmarks. Using our mixed precision methodology and a modern reconfigurable accelerator, we can locate the global optima 1.7 to 6 times faster compared with quad-core optimiser. The mixed precision optimisations search up to 40.3 times more starting vector per unit time compared with quad core optimisers and only 0.7% to 2.7% of these searches are refined using GPP double precision optimisers. The proposed methodology also allows us to accelerate problems with more complicated functions or to solve problems involving higher dimensions.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127943711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hardware Acceleration of Short Read Mapping 短读映射的硬件加速
C. Olson, Maria Kim, Cooper Clauson, B. Kogon, C. Ebeling, S. Hauck, W. L. Ruzzo
{"title":"Hardware Acceleration of Short Read Mapping","authors":"C. Olson, Maria Kim, Cooper Clauson, B. Kogon, C. Ebeling, S. Hauck, W. L. Ruzzo","doi":"10.1109/FCCM.2012.36","DOIUrl":"https://doi.org/10.1109/FCCM.2012.36","url":null,"abstract":"Bioinformatics is an emerging field with seemingly limitless possibilities for advances in numerous areas of research and applications. We propose a scalable FPGA-based solution to the short read mapping problem in DNA sequencing, which greatly accelerates the task of aligning short length reads to a known reference genome. We compare the runtime, power consumption, and sensitivity of the hardware system to the BFAST and Bowtie software tools. The hardware system demonstrates a 250X speedup versus BFAST and a 31X speedup versus Bowtie on eight CPU cores. Also, the hardware system is more sensitive than Bowtie, which aligns approximately 80% of the short reads, as compared to 91% aligned by the hardware.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117219935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
A Heterogeneous Architecture for Evaluating Real-Time One-Dimensional Computational Fluid Dynamics on FPGAs fpga实时一维计算流体动力学评估的异构体系结构
Isaac Liu, Edward A. Lee, M. Viele, Guoqiang Wang, H. Andrade
{"title":"A Heterogeneous Architecture for Evaluating Real-Time One-Dimensional Computational Fluid Dynamics on FPGAs","authors":"Isaac Liu, Edward A. Lee, M. Viele, Guoqiang Wang, H. Andrade","doi":"10.1109/FCCM.2012.31","DOIUrl":"https://doi.org/10.1109/FCCM.2012.31","url":null,"abstract":"Many fuel systems for diesel engines are developed with the help of commercial one-dimensional computational fluid dynamics (1D CFD) solvers that model and simulate the behavior of fluid flow through the interconnected pipes off-line. This paper presents a novel framework to evaluate 1D CFD models in real time on an FPGA. This improves fuel pressure estimation and closes the loop on fuel delivery, allowing for a cleaner and more efficient engine. The real-time requirements of the models are defined by the physics and geometry of the problem being solved. In this framework, the interconnected pipes are partitioned into individual sub-volumes that compute their pressure and flow rate every time step based upon neighboring values. We use timing-based synchronization and multiple Precision Timed (PRET) processor cores to ensure the real-time constraints are met. Leveraging the programmability of FPGAs, we use a configurable heterogeneous architecture to save hardware resources. Several examples are presented along with the implementation results after place and route for a Xilinx Virtex 6 FPGA. The results demonstrate the resource savings and scalability of our framework, confirming the feasibility of our approach -- solving 1D CFD models in real time on FPGAs.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115163720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
VENICE: A Compact Vector Processor for FPGA Applications 用于FPGA应用的紧凑型矢量处理器
Aaron Severance, G. Lemieux
{"title":"VENICE: A Compact Vector Processor for FPGA Applications","authors":"Aaron Severance, G. Lemieux","doi":"10.1109/FCCM.2012.55","DOIUrl":"https://doi.org/10.1109/FCCM.2012.55","url":null,"abstract":"VENICE is a new soft vector processor (SVP) for FPGA applications that is designed for maximum through-put with a small number (1 to 4) of ALUs. By increasing clock speed and eliminating bottlenecks in ALU utilization, VENICE achieves over 2x better performance-per-logic block than VEGAS, the previous best SVP. VENICE is also simpler to program, as its instructions use standard C pointers into a scratchpad memory rather than vector registers.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115319147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Exploiting Memory-Level Parallelism in Reconfigurable Accelerators 在可重构加速器中利用内存级并行性
Shaoyi Cheng, Mingjie Lin, H. Liu, S. Scott, J. Wawrzynek
{"title":"Exploiting Memory-Level Parallelism in Reconfigurable Accelerators","authors":"Shaoyi Cheng, Mingjie Lin, H. Liu, S. Scott, J. Wawrzynek","doi":"10.1109/FCCM.2012.35","DOIUrl":"https://doi.org/10.1109/FCCM.2012.35","url":null,"abstract":"As memory accesses increasingly limit the overall performance of reconfigurable accelerators, it is important for high level synthesis (HLS) flows to discover and exploit memory-level parallelism. This paper develops 1) a framework where parallelism between memory accesses can be revealed from runtime profile of applications and provided to a high level synthesis flow, and 2) a novel multi-accelerator/multi-cache architecture to support parallel memory accesses, taking advantage of the high aggregated memory bandwidth found in modern FPGA devices. Our experimental results have shown that for 10 accelerators generated from 9 benchmark applications, circuits using our proposed memory structure achieve on average 52% improved performance over accelerators using a traditional memory interface. We believe that our study represents a solid advance towards achieving memory-parallel embedded computing on hybrid CPU+FPGA platforms.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122734506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Fixed Point Lanczos: Sustaining TFLOP-equivalent Performance in FPGAs for Scientific Computing 在科学计算中维持fpga的tflop等效性能
J. Jerez, G. Constantinides, E. Kerrigan
{"title":"Fixed Point Lanczos: Sustaining TFLOP-equivalent Performance in FPGAs for Scientific Computing","authors":"J. Jerez, G. Constantinides, E. Kerrigan","doi":"10.1109/FCCM.2012.19","DOIUrl":"https://doi.org/10.1109/FCCM.2012.19","url":null,"abstract":"We consider the problem of enabling fixed-point implementations of linear algebra kernels to match the strengths of the field-programmable gate array (FPGA). Algorithms for solving linear equations, finding eigen values or finding singular values are typically nonlinear and recursive making the problem of establishing analytical bounds on variable dynamic range non-trivial. Current approaches fail to provide tight bounds for this type of algorithms. We use as a case study one of the most important kernels in scientific computing, the Lanczos iteration, which lies at the heart of well known methods such as conjugate gradient and minimum residual, and we show how we can modify the algorithm to allow us to apply standard linear algebra analysis to prove tight analytical bounds on all variables of the process, regardless of the properties of the original matrix. It is shown that the numerical behaviour of fixed-point implementations of the modified problem can be chosen to be at least as good as a double precision floating point implementation. Using this approach it is possible to get sustained FPGA performance very close to the peak general-purpose graphics processing unit (GPGPU) performance in FPGAs of comparable size when solving a single problem. If there are several independent problems to solve simultaneously it is possible to exceed the peak floating-point performance of a GPGPU, obtaining approximately 1, 2 or 4 TFLOPs for error tolerances of 10-7, 10-5 and 10-3, respectively, in a large Virtex 7 FPGA.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131327274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Emulating Mammalian Vision on Reconfigurable Hardware 在可重构硬件上模拟哺乳动物视觉
S. Kestur, Mi Sun Park, J. Sabarad, D. Dantara, N. Vijaykrishnan, Yang Chen, D. Khosla
{"title":"Emulating Mammalian Vision on Reconfigurable Hardware","authors":"S. Kestur, Mi Sun Park, J. Sabarad, D. Dantara, N. Vijaykrishnan, Yang Chen, D. Khosla","doi":"10.1109/FCCM.2012.33","DOIUrl":"https://doi.org/10.1109/FCCM.2012.33","url":null,"abstract":"A significant challenge in creating machines with artificial vision is designing systems which can process visual information as efficiently as the brain. To address this challenge, we identify key algorithms which model the process of attention and recognition in the visual cortex of mammals. This paper presents Cover - an FPGA framework for generating systems which can potentially emulate the visual cortex. We have designed accelerators for models of attention and recognition in the cortex and integrated them to realize an end-to-end attention-recognition system. Evaluation of our system on a Dinigroup multi-FPGA platform shows high performance and accuracy for attention and recognition systems and speedups over existing CPU, GPU and FPGA implementations. Results show that our end-to-end system which emulates the cortex can achieve near real-time speeds for high resolution images. This system can be applied to many artificial vision applications such as augmented virtual reality and autonomous vehicle navigation.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132695887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
FPGA-based Acceleration for Tracking Audio Effects in Movies 基于fpga的电影音频效果跟踪加速
M. Psarakis, A. Pikrakis, Giannis Dendrinos
{"title":"FPGA-based Acceleration for Tracking Audio Effects in Movies","authors":"M. Psarakis, A. Pikrakis, Giannis Dendrinos","doi":"10.1109/FCCM.2012.24","DOIUrl":"https://doi.org/10.1109/FCCM.2012.24","url":null,"abstract":"In this paper we propose an FPGA-based hardware platform to accelerate an audio tracking method. Our tracking approach is inspired by the problem of molecular sequence alignment and adopts a well-known dynamic programming algorithm (Smith-Waterman algorithm) from the area of bioinformatics. However, the high computational complexity of such algorithms imposes a significant barrier to their adoption by audio tracking systems. To alleviate the time-consuming problem and achieve realistic response times, we propose the acceleration of computationally intensive parts of our tracking method using an FPGA-based platform. Our FPGA accelerator is actually based on the systolization of the Smith-Waterman algorithm proposed in previous approaches for the acceleration of bio-sequence scanning but the special requirements of the audio tracking method impose significant design challenges in the accelerator architecture. The accelerator has been implemented in a Xilinx Virtex-5 device and the experimental results show that it achieves significant speedup compared with the software implementation of the tracking method. The proposed approach has been tested in the context of detecting animal sounds in audio streams from movies, where a basic requirement is to reduce the noisiness of the detection results by means of exploiting the statistical nature of the scores that are generated by the dynamic programming algorithm.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"268 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123410093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bluehive - A field-programable custom computing machine for extreme-scale real-time neural network simulation Bluehive -一个现场可编程的定制计算机器,用于极端规模的实时神经网络模拟
S. Moore, P. Fox, Steven Jt, Marsh, A. T. Markettos, A. Mujumdar
{"title":"Bluehive - A field-programable custom computing machine for extreme-scale real-time neural network simulation","authors":"S. Moore, P. Fox, Steven Jt, Marsh, A. T. Markettos, A. Mujumdar","doi":"10.1109/FCCM.2012.32","DOIUrl":"https://doi.org/10.1109/FCCM.2012.32","url":null,"abstract":"Bluehive is a custom 64-FPGA machine targeted at scientific simulations with demanding communication requirements. Bluehive is designed to be extensible with a reconfigurable communication topology suited to algorithms with demanding high-bandwidth and low-latency communication, something which is unattainable with commodity GPGPUs and CPUs. We demonstrate that a spiking neuron algorithm can be efficiently mapped to Bluehive using Bluespec System Verilog by taking a communication-centric approach. This contrasts with many FPGA-based neural systems which are very focused on parallel computation, resulting in inefficient use of FPGA resources. Our design allows 64k neurons with 64M synapses per FPGA and is scalable to a large number of FPGAs.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126816669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 102
Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA? 加速随机森林分类器:多核、GP-GPU还是FPGA?
B. V. Essen, C. Macaraeg, M. Gokhale, R. Prenger
{"title":"Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA?","authors":"B. V. Essen, C. Macaraeg, M. Gokhale, R. Prenger","doi":"10.1109/FCCM.2012.47","DOIUrl":"https://doi.org/10.1109/FCCM.2012.47","url":null,"abstract":"Random forest classification is a well known machine learning technique that generates classifiers in the form of an ensemble (\"forest\") of decision trees. The classification of an input sample is determined by the majority classification by the ensemble. Traditional random forest classifiers can be highly effective, but classification using a random forest is memory bound and not typically suitable for acceleration using FPGAs or GP-GPUs due to the need to traverse large, possibly irregular decision trees. Recent work at Lawrence Livermore National Laboratory has developed several variants of random forest classifiers, including the Compact Random Forest (CRF), that can generate decision trees more suitable for acceleration than traditional decision trees. Our paper compares and contrasts the effectiveness of FPGAs, GP-GPUs, and multi-core CPUs for accelerating classification using models generated by compact random forest machine learning classifiers. Taking advantage of training algorithms that can produce compact random forests composed of many, small trees rather than fewer, deep trees, we are able to regularize the forest such that the classification of any sample takes a deterministic amount of time. This optimization then allows us to execute the classifier in a pipelined or single-instruction multiple thread (SIMT) fashion. We show that FPGAs provide the highest performance solution, but require a multi-chip / multi-board system to execute even modest sized forests. GP-GPUs offer a more flexible solution with reasonably high performance that scales with forest size. Finally, multi-threading via Open MP on a shared memory system was the simplest solution and provided near linear performance that scaled with core count, but was still significantly slower than the GP-GPU and FPGA.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124949972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 148
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信