2014 International Conference on Field-Programmable Technology (FPT)最新文献

筛选
英文 中文
A dataflow system for anomaly detection and analysis 一个用于异常检测和分析的数据流系统
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082793
A. Bara, Xinyu Niu, W. Luk
{"title":"A dataflow system for anomaly detection and analysis","authors":"A. Bara, Xinyu Niu, W. Luk","doi":"10.1109/FPT.2014.7082793","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082793","url":null,"abstract":"This paper proposes DeADA, a dataflow architecture incorporating an automated, unsupervised and online learning algorithm. Compared with 24 core software implementations, DeADA achieves up to 6.17 times lower data drop rate and 10.7 times higher power efficiency. More importantly, experimental results for the Heartbleed case study suggest that DeADA is capable of detecting unknown attacks under network speeds of at least 18Mbps, a feature which is essential for modern network intrusion detection.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"63 1","pages":"276-279"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85170374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
FPGA-accelerated Monte-Carlo integration using stratified sampling and Brownian bridges 使用分层采样和布朗桥的fpga加速蒙特卡罗积分
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082755
M. D. Jong, V. Sima, K. Bertels, David B. Thomas
{"title":"FPGA-accelerated Monte-Carlo integration using stratified sampling and Brownian bridges","authors":"M. D. Jong, V. Sima, K. Bertels, David B. Thomas","doi":"10.1109/FPT.2014.7082755","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082755","url":null,"abstract":"Monte-Carlo Integration (MCI) is a numerical technique for evaluating integrals which have no closed form solution. Naive MCI randomly samples the integrand at uniformly distributed points. This naive approach converges very slowly. Stratified sampling can be used to concentrate the samples on segments of the integration domain where the integrand has the highest variance. Even with stratified sampling, MCI converges very slowly for multidimensional integrals. In this work, we implement an FPGA-accelerated design for MISER, a widely used adaptive MCI algorithm applying stratified sampling. We show how to eliminate the recursion from MISER and partition the algorithm between CPUs and FPGAs. The CPUs manage the control-heavy stratification strategy, while the FPGA is responsible for sampling the integrand. The integrand is compiled into a deep pipeline on the FPGA, producing one function evaluation per clock cycle. We demonstrate the FPGA-accelerated design by pricing a path dependent financial derivative called an Asian option. To make optimal use of the stratification, we implement a Brownian bridge on the FPGA that produces one entire bridge per clock cycle. The FPGA-accelerated design is up to 880 times faster compared to a software reference using the GSL implementation of MISER. Compared to naive MCI in software, our design even requires up to 3572 times less execution time to achieve the same accuracy.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"40 19 1","pages":"68-75"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89238331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using C to implement high-efficient computation of dense optical flow on FPGA-accelerated heterogeneous platforms 利用C语言在fpga加速异构平台上实现密集光流的高效计算
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082789
Zhilei Chai, Haojie Zhou, Zhibin Wang, Dong Wu
{"title":"Using C to implement high-efficient computation of dense optical flow on FPGA-accelerated heterogeneous platforms","authors":"Zhilei Chai, Haojie Zhou, Zhibin Wang, Dong Wu","doi":"10.1109/FPT.2014.7082789","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082789","url":null,"abstract":"High-quality algorithms for dense optical flow computation are computationally intensive. To compute them with high speed and low power is vital to make optical flow computation applicable in real-world applications. In contrast to only the Horn-Schunck model being studied on FPGA-based systems today, one of the best linear variational methods for dense optical flow computation, Combine-Brightness-Gradient, is implemented on FPGA-accelerated heterogeneous platforms in this paper. C instead of HDLs is employed and optimizing techniques based on the algorithmic parallelism and hardware architecture are introduced. Experimental results show that 30-110x improvement of the computing efficiency over CPUs was achieved. The FPGA-accelerated version is able to process 640 × 480 image at 12 fps with 0.38 J per frame, while it is 0.8 fps and around 40 J on CPUs. Through demonstrating high performance and low power of dense optical flow algorithm on FPGA-based heterogeneous platforms implemented in C, this paper shows that the off-the-shelf commodity FPGAs coupled with High-Level-Synthesis (HLS) tools could provide an available option when computational efficiency together with development speed are required.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"63 1","pages":"260-263"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75331649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Network recorder and player: FPGA-based network traffic capture and replay 网络记录器和播放器:基于fpga的网络流量捕获和重放
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082815
Siyi Qiao, Chen Xu, Lei Xie, Ji Yang, Chengchen Hu, X. Guan, Jianhua Zou
{"title":"Network recorder and player: FPGA-based network traffic capture and replay","authors":"Siyi Qiao, Chen Xu, Lei Xie, Ji Yang, Chengchen Hu, X. Guan, Jianhua Zou","doi":"10.1109/FPT.2014.7082815","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082815","url":null,"abstract":"An appropriate tool to generate real network traffic plays an important role in testing network system. Traditionally, such a tool relies on software solutions that copies data back and forth between different part of memory to capture or replay network traffic. In this paper, we propose an FPGA-centric approach using parallel logic, which can ensure high accuracy of time and high throughput. We first design an FPGA add-on board dealing with the multifarious work like adding content or calculate statistical value. The system is implemented on an own designed off-the-shelf FPGA network add-on card to demonstrate the viability of our assumption. Experiments demonstrate reasonable performance improvement (higher throughput and replay time precision) when compared with software based solutions.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"45 2 1","pages":"342-345"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78137756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An efficient FPGA implementation of QR decomposition using a novel systolic array architecture based on enhanced vectoring CORDIC 一种基于增强矢量CORDIC的新型收缩阵列结构的QR分解的高效FPGA实现
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082764
Jianfeng Zhang, P. Chow, Hengzhu Liu
{"title":"An efficient FPGA implementation of QR decomposition using a novel systolic array architecture based on enhanced vectoring CORDIC","authors":"Jianfeng Zhang, P. Chow, Hengzhu Liu","doi":"10.1109/FPT.2014.7082764","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082764","url":null,"abstract":"Multiple input multiple output (MIMO) - Orthogonal frequency division multiplexing (OFDM) systems typically use Orthogonal-triangular (QR) decomposition. In this paper, we present a novel systolic array architecture to realize QR decomposition based on the Givens rotation method for a 4 × 4 real matrix. The coordinate rotation digital computer (CORDIC) algorithm is adopted and modified to speed up and simplify the Givens rotation. To verify the function and evaluate the performance, the proposed architectures are validated on a Virtex 5 FPGA development platform. Compared to a commercial implementation of vectoring CORDIC, an enhanced vectoring CORDIC is presented that uses 37.7% less hardware resources, dissipates 76.8% less power and provides a 1.8 times speed-up while maintaining the same computation accuracy. The novel QR systolic array architecture based on the enhanced vectoring CORDIC saves 5% in hardware and the throughput is improved by a factor of two with no accuracy penalty when compared with the best previous version of the QR systolic array.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"35 1","pages":"123-130"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85145372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Zero latency encryption with FPGAs for secure time-triggered automotive networks 零延迟加密与fpga的安全时间触发汽车网络
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082788
Shanker Shreejith, Suhaib A. Fahmy
{"title":"Zero latency encryption with FPGAs for secure time-triggered automotive networks","authors":"Shanker Shreejith, Suhaib A. Fahmy","doi":"10.1109/FPT.2014.7082788","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082788","url":null,"abstract":"Security has emerged as a key concern in increasingly complex embedded automotive networks. The distributed architecture and broadcast transmission characteristics mean they are vulnerable and provide little resistance to intrusive and non-intrusive attack mechanisms. Incorporating data security using traditional approaches introduces significant latency which can be problematic in the presence of real-time deadlines. We demonstrate how a security layer can be added within the network communication controller in modern time-triggered systems, without introducing additional latency or processing overheads. This allows critical communications to be secured in a manner that is transparent to the processors in the electronic control units (ECUs), while also safeguarding network communication properties.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"33 1","pages":"256-259"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80431527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Collaborative processing of Least-Square Monte Carlo for American options 美国期权的最小二乘蒙特卡罗协同处理
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082753
Jinzhe Yang, Ce Guo, W. Luk, Terence Nahar
{"title":"Collaborative processing of Least-Square Monte Carlo for American options","authors":"Jinzhe Yang, Ce Guo, W. Luk, Terence Nahar","doi":"10.1109/FPT.2014.7082753","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082753","url":null,"abstract":"American options are popularly traded in the financial market, so pricing those options becomes crucial in practice. In reality, many popular pricing models do not have analytical solutions. Hence techniques such as Monte Carlo are often used in practice. This paper presents a CPU-FPGA collaborative accelerator using state-of-the-art Least-Square Monte Carlo method, for pricing American options. We provide a new sequence of generating the Monte Carlo paths, and a precalculation strategy for the regression process. Our design is customisable for different pricing models, discretisation schemes, and regression functions. The Heston model is used as a case study for evaluating our strategy. Experimental results show that an FPGA-based solution could provide 22 to 64.5 times faster than a single-core CPU implementation.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"08 1","pages":"52-59"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86368483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Size aware placement for island style FPGAs 岛式fpga的尺寸感知放置
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082749
Junying Huang, C. Y. Lin, Yang Liu, Zhihua Li, Haigang Yang
{"title":"Size aware placement for island style FPGAs","authors":"Junying Huang, C. Y. Lin, Yang Liu, Zhihua Li, Haigang Yang","doi":"10.1109/FPT.2014.7082749","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082749","url":null,"abstract":"In this paper we first examine the impact of FPGA size on overall performance and run-time of placement and routing in the context of cluster-based island-style FPGAs. Based on the observations, an FPGA placement algorithm, Min-Size, is introduced to alleviate the deterioration of performance and run-time of placement and routing when using a large FPGA to implement a circuit. We achieve this by allowing Min-Size to generate a more compact placement of logic, I/O and hard blocks. Our experimental results have shown a 3X and 4X speedup in placement and routing run-time, a 38% and 41% reduction in wire length, and a 8% and 5% improvement in critical path delay when FPGA size increases 10 times.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"18 5 1","pages":"28-35"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83495011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is high level synthesis ready for business? A computational finance case study 高级合成是否已准备就绪?计算金融案例研究
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082747
G. Inggs, Shane T. Fleming, David B. Thomas, W. Luk
{"title":"Is high level synthesis ready for business? A computational finance case study","authors":"G. Inggs, Shane T. Fleming, David B. Thomas, W. Luk","doi":"10.1109/FPT.2014.7082747","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082747","url":null,"abstract":"High Level Synthesis (HLS) tools for Field Programmable Gate Arrays (FPGAs) have made considerable progress, and are now sufficiently mature that a novice developer could create functionally correct implementation with limited understanding of the target hardware. In this case study, a novice developer considers a benchmark of financial problems for implementation upon FPGA via HLS. This novice starts by extending an existing implementation for a CPU or GPU using tools such as Xilinx's Vivado HLS, the Altera OpenCL SDK or Maxeler's MaxCompiler. When their direct source code translation inevitably didn't meet performance expectations, this developer then applies optimisations such as exploiting task or pipeline parallelism as well as C-slowing. When a combination of these optimisations are considered for a range of devices and process technologies, an acceleration of up to 220 times is achieved using these tools, the sort of acceleration expected of custom architectures. Compared to the 31 times improvement shown by an optimised Multicore CPU implementation, the 60 times improvement by a GPU and 207 times by a Xeon Phi, these results suggest that HLS is indeed ready for industrial adoption.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"66 1","pages":"12-19"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86139429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Hardware/software co-design architecture for Blokus Duo solver Blokus Duo解算器的硬件/软件协同设计架构
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082820
N. Sugimoto, H. Amano
{"title":"Hardware/software co-design architecture for Blokus Duo solver","authors":"N. Sugimoto, H. Amano","doi":"10.1109/FPT.2014.7082820","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082820","url":null,"abstract":"This paper presents a software and hardware design of an FPGA-based Blokus Duo solver. We used Embedded system called ZYNQ-7000 All Programmable SoC to implement the solver. By combining hardware with software, efficient acceleration is performed. Our system searches a game tree by using the miniMax algorithm with alpha-beta pruning. The implemented solver works at 75MHz with Xilinx Zynq-7000 AP SoC XC7Z020-CLG484 on the Digilent ZedBoard. It can search states after three moves in most cases.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"40 1","pages":"358-361"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91201197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信