2014 International Conference on Field-Programmable Technology (FPT)最新文献_第2页

A dataflow system for anomaly detection and analysis 一个用于异常检测和分析的数据流系统

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082793

A. Bara, Xinyu Niu, W. Luk

引用次数: 4

FPGA-accelerated Monte-Carlo integration using stratified sampling and Brownian bridges 使用分层采样和布朗桥的fpga加速蒙特卡罗积分

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082755

M. D. Jong, V. Sima, K. Bertels, David B. Thomas

{"title":"FPGA-accelerated Monte-Carlo integration using stratified sampling and Brownian bridges","authors":"M. D. Jong, V. Sima, K. Bertels, David B. Thomas","doi":"10.1109/FPT.2014.7082755","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082755","url":null,"abstract":"Monte-Carlo Integration (MCI) is a numerical technique for evaluating integrals which have no closed form solution. Naive MCI randomly samples the integrand at uniformly distributed points. This naive approach converges very slowly. Stratified sampling can be used to concentrate the samples on segments of the integration domain where the integrand has the highest variance. Even with stratified sampling, MCI converges very slowly for multidimensional integrals. In this work, we implement an FPGA-accelerated design for MISER, a widely used adaptive MCI algorithm applying stratified sampling. We show how to eliminate the recursion from MISER and partition the algorithm between CPUs and FPGAs. The CPUs manage the control-heavy stratification strategy, while the FPGA is responsible for sampling the integrand. The integrand is compiled into a deep pipeline on the FPGA, producing one function evaluation per clock cycle. We demonstrate the FPGA-accelerated design by pricing a path dependent financial derivative called an Asian option. To make optimal use of the stratification, we implement a Brownian bridge on the FPGA that produces one entire bridge per clock cycle. The FPGA-accelerated design is up to 880 times faster compared to a software reference using the GSL implementation of MISER. Compared to naive MCI in software, our design even requires up to 3572 times less execution time to achieve the same accuracy.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"40 19 1","pages":"68-75"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89238331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Using C to implement high-efficient computation of dense optical flow on FPGA-accelerated heterogeneous platforms 利用C语言在fpga加速异构平台上实现密集光流的高效计算

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082789

Zhilei Chai, Haojie Zhou, Zhibin Wang, Dong Wu

{"title":"Using C to implement high-efficient computation of dense optical flow on FPGA-accelerated heterogeneous platforms","authors":"Zhilei Chai, Haojie Zhou, Zhibin Wang, Dong Wu","doi":"10.1109/FPT.2014.7082789","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082789","url":null,"abstract":"High-quality algorithms for dense optical flow computation are computationally intensive. To compute them with high speed and low power is vital to make optical flow computation applicable in real-world applications. In contrast to only the Horn-Schunck model being studied on FPGA-based systems today, one of the best linear variational methods for dense optical flow computation, Combine-Brightness-Gradient, is implemented on FPGA-accelerated heterogeneous platforms in this paper. C instead of HDLs is employed and optimizing techniques based on the algorithmic parallelism and hardware architecture are introduced. Experimental results show that 30-110x improvement of the computing efficiency over CPUs was achieved. The FPGA-accelerated version is able to process 640 × 480 image at 12 fps with 0.38 J per frame, while it is 0.8 fps and around 40 J on CPUs. Through demonstrating high performance and low power of dense optical flow algorithm on FPGA-based heterogeneous platforms implemented in C, this paper shows that the off-the-shelf commodity FPGAs coupled with High-Level-Synthesis (HLS) tools could provide an available option when computational efficiency together with development speed are required.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"63 1","pages":"260-263"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75331649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Network recorder and player: FPGA-based network traffic capture and replay 网络记录器和播放器:基于fpga的网络流量捕获和重放

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082815

Siyi Qiao, Chen Xu, Lei Xie, Ji Yang, Chengchen Hu, X. Guan, Jianhua Zou

引用次数: 7

An efficient FPGA implementation of QR decomposition using a novel systolic array architecture based on enhanced vectoring CORDIC 一种基于增强矢量CORDIC的新型收缩阵列结构的QR分解的高效FPGA实现

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082764

Jianfeng Zhang, P. Chow, Hengzhu Liu

{"title":"An efficient FPGA implementation of QR decomposition using a novel systolic array architecture based on enhanced vectoring CORDIC","authors":"Jianfeng Zhang, P. Chow, Hengzhu Liu","doi":"10.1109/FPT.2014.7082764","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082764","url":null,"abstract":"Multiple input multiple output (MIMO) - Orthogonal frequency division multiplexing (OFDM) systems typically use Orthogonal-triangular (QR) decomposition. In this paper, we present a novel systolic array architecture to realize QR decomposition based on the Givens rotation method for a 4 × 4 real matrix. The coordinate rotation digital computer (CORDIC) algorithm is adopted and modified to speed up and simplify the Givens rotation. To verify the function and evaluate the performance, the proposed architectures are validated on a Virtex 5 FPGA development platform. Compared to a commercial implementation of vectoring CORDIC, an enhanced vectoring CORDIC is presented that uses 37.7% less hardware resources, dissipates 76.8% less power and provides a 1.8 times speed-up while maintaining the same computation accuracy. The novel QR systolic array architecture based on the enhanced vectoring CORDIC saves 5% in hardware and the throughput is improved by a factor of two with no accuracy penalty when compared with the best previous version of the QR systolic array.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"35 1","pages":"123-130"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85145372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Zero latency encryption with FPGAs for secure time-triggered automotive networks 零延迟加密与fpga的安全时间触发汽车网络

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082788

Shanker Shreejith, Suhaib A. Fahmy

引用次数: 10

Collaborative processing of Least-Square Monte Carlo for American options 美国期权的最小二乘蒙特卡罗协同处理

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082753

Jinzhe Yang, Ce Guo, W. Luk, Terence Nahar

引用次数: 0

Size aware placement for island style FPGAs 岛式fpga的尺寸感知放置

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082749

Junying Huang, C. Y. Lin, Yang Liu, Zhihua Li, Haigang Yang

引用次数: 0

Is high level synthesis ready for business? A computational finance case study 高级合成是否已准备就绪?计算金融案例研究

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082747

G. Inggs, Shane T. Fleming, David B. Thomas, W. Luk

{"title":"Is high level synthesis ready for business? A computational finance case study","authors":"G. Inggs, Shane T. Fleming, David B. Thomas, W. Luk","doi":"10.1109/FPT.2014.7082747","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082747","url":null,"abstract":"High Level Synthesis (HLS) tools for Field Programmable Gate Arrays (FPGAs) have made considerable progress, and are now sufficiently mature that a novice developer could create functionally correct implementation with limited understanding of the target hardware. In this case study, a novice developer considers a benchmark of financial problems for implementation upon FPGA via HLS. This novice starts by extending an existing implementation for a CPU or GPU using tools such as Xilinx's Vivado HLS, the Altera OpenCL SDK or Maxeler's MaxCompiler. When their direct source code translation inevitably didn't meet performance expectations, this developer then applies optimisations such as exploiting task or pipeline parallelism as well as C-slowing. When a combination of these optimisations are considered for a range of devices and process technologies, an acceleration of up to 220 times is achieved using these tools, the sort of acceleration expected of custom architectures. Compared to the 31 times improvement shown by an optimised Multicore CPU implementation, the 60 times improvement by a GPU and 207 times by a Xeon Phi, these results suggest that HLS is indeed ready for industrial adoption.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"66 1","pages":"12-19"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86139429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Hardware/software co-design architecture for Blokus Duo solver Blokus Duo解算器的硬件/软件协同设计架构

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082820

N. Sugimoto, H. Amano

引用次数: 6