2018 International Conference on Field-Programmable Technology (FPT)最新文献

筛选
英文 中文
Scheduling Algorithms for High Performance Network Switching on FPGAs: A Survey 基于fpga的高性能网络交换调度算法综述
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00033
Nadeen Gebara, Jiuxi Meng, W. Luk, Paolo Costa
{"title":"Scheduling Algorithms for High Performance Network Switching on FPGAs: A Survey","authors":"Nadeen Gebara, Jiuxi Meng, W. Luk, Paolo Costa","doi":"10.1109/FPT.2018.00033","DOIUrl":"https://doi.org/10.1109/FPT.2018.00033","url":null,"abstract":"The scheduling algorithm used in a network switch significantly impacts the switch's performance and thereby the performance of the entire network. To keep up with the ongoing demands for higher network performance, a myriad of scheduling algorithms have been investigated. We propose that FPGAs can be outstanding candidates for benchmarking scheduling algorithms, and that it can be beneficial to have customized scheduling algorithms which are enabled by FPGA based switches due to their reconfigurable architectures. This paper presents the first FPGA targeted survey on high performance scheduling algorithms used in the most popular switch architecture, input-buffered crossbars, with the aim of guiding future research on high performance network switching.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127613233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Enabling Overclocking Through Algorithm-Level Error Detection 通过算法级错误检测使能超频
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00034
T. Marty, Tomofumi Yuki, Steven Derrien
{"title":"Enabling Overclocking Through Algorithm-Level Error Detection","authors":"T. Marty, Tomofumi Yuki, Steven Derrien","doi":"10.1109/FPT.2018.00034","DOIUrl":"https://doi.org/10.1109/FPT.2018.00034","url":null,"abstract":"In this paper, we propose a technique for improving the efficiency of hardware accelerators based on timing speculation (overclocking) and fault tolerance. We augment the accelerator with a lightweight error detection mechanism to protect against timing errors, enabling aggressive timing speculation. We demonstrate the validity of our approach for the convolution layers in convolutional neural networks. We present an implementation of a fault-tolerant convolution layer accelerator combined with the lightweight error detection. The error detection mechanism we have developed works at the algorithm-level, utilizing algebraic properties of the computation, allowing the full implementation to be realized using High-Level Synthesis tools. Our prototype on ZC706 demonstrated 68% - 77% higher throughput with negligible overhead.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130476514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Mapping Estimator for OpenCL Heterogeneous Accelerators OpenCL异构加速器的映射估计器
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00057
A. B. Perina, Vanderlei Bonato
{"title":"Mapping Estimator for OpenCL Heterogeneous Accelerators","authors":"A. B. Perina, Vanderlei Bonato","doi":"10.1109/FPT.2018.00057","DOIUrl":"https://doi.org/10.1109/FPT.2018.00057","url":null,"abstract":"To increase computing performance while keeping energy consumption to an acceptable budget, heterogeneous systems are currently investigated. By using dedicated compute units as accelerators to speedup specific parts of an application, hardware resources are better utilised resulting in a more energy efficient computing system. However, the task of performing such application mapping to accelerators is still a challenge, requiring knowledge beyond software domain in order to understand which part of the code fits better to the capability of the hardware available. Currently, there are tools supporting unified frontends and languages to simplify the programming of such heterogeneous systems, however there is still a high dependency of the user to manually perform the final mapping process. This work exposes a machine learning framework used to automatically infer the most suitable accelerator (between FPGA and GPU) for a given code by statically estimating energy efficiency. This framework can be used to assist the developer in deciding the best mapping for its application with an average hit-rate of 85 percent.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131481423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Lens Distortion Self-Calibration Using the Hough Transform 基于霍夫变换的透镜畸变自校正
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00080
D. Bailey, Yuan Chang, S. L. Moan
{"title":"Lens Distortion Self-Calibration Using the Hough Transform","authors":"D. Bailey, Yuan Chang, S. L. Moan","doi":"10.1109/FPT.2018.00080","DOIUrl":"https://doi.org/10.1109/FPT.2018.00080","url":null,"abstract":"The Hough transform is a well known technique for detecting straight lines within images, especially in the presence of noise, or where there is incomplete data (gaps or occlusions). When subjected to lens distortion, straight lines become curved, and indeed this can be used to identify and correct lens distortion. However, curved lines distort and blur the peaks within the Hough transform, making the lines more difficult to detect. However, by analysing the distortion within the Hough transform, it is possible to directly estimate the lens distortion parameters enabling the distortion to be corrected in real time. The proposed technique uses a Terasic DE1-SoC FPGA board (Cyclone V FPGA) to fit a parabola to the distorted peak using Hough's original transform, and from the parabola coefficients directly estimates the lens distortion parameter. This enables the following frame to be corrected in parallel with curve detection.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"41 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120902460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Checking for Electrical Level Security Threats in Bitstreams for Multi-tenant FPGAs 检查多租户fpga位流中的电级安全威胁
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00055
Dennis R. E. Gnad, Sascha Rapp, Jonas Krautter, M. Tahoori
{"title":"Checking for Electrical Level Security Threats in Bitstreams for Multi-tenant FPGAs","authors":"Dennis R. E. Gnad, Sascha Rapp, Jonas Krautter, M. Tahoori","doi":"10.1109/FPT.2018.00055","DOIUrl":"https://doi.org/10.1109/FPT.2018.00055","url":null,"abstract":"Multi-tenant FPGAs, in which 3rd parties have partial access to the FPGA fabric, are a rising usage trend in cloud and reconfigurable SoCs. This gives rise to new types of attacks in FPGAs, as shown in recent studies. These attacks can operate on the electrical level through the common power delivery network, making them very hard to isolate. Thus, software-controlled FPGA configuration can be exploited to insert hardware trojans, impacting the security of the entire system. The attacks can be separated into fault and side-channel attacks to either actively manipulate a system or quietly extract secret information. In this paper, we show the first attempt of countermeasures against these voltage fluctuation based attacks, by analyzing FPGA bitstreams for malicious logic, basically implementing an FPGA antivirus. We provide a way to check bitstreams for potentially malicious structures, by extending a combination of commercial and open-source tools.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"547 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115227436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
ReFiRe: Efficient Deployment of Remote Fine-Grained Reconfigurable Accelerators ReFiRe:远程细粒度可重构加速器的有效部署
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00064
Emmanouil Pissadakis, Nikolaos S. Alachiotis, P. Skrimponis, D. Theodoropoulos, T. Korakis, D. Pnevmatikatos
{"title":"ReFiRe: Efficient Deployment of Remote Fine-Grained Reconfigurable Accelerators","authors":"Emmanouil Pissadakis, Nikolaos S. Alachiotis, P. Skrimponis, D. Theodoropoulos, T. Korakis, D. Pnevmatikatos","doi":"10.1109/FPT.2018.00064","DOIUrl":"https://doi.org/10.1109/FPT.2018.00064","url":null,"abstract":"The need for specialized hardware acceleration in today's computing platforms is well established, due to power and efficiency reasons. Broadening an accelerator's scope of application is highly desirable, but requires a finer-grained architecture with basic primitives, which inevitably exhibits increased communication and synchronization requirements. In disaggregated-computing environ-ments, where data transfers between remote nodes are realized via datacenter-wide packet exchanges, reducing communication and synchronization is a prerequisite for the effective employment of remote acceleration. To this end, we present ReFiRe (Remote Fine-grained Reconfigurable acceleration), a generic deployment framework with native support for partial reconfiguration that allows to considerably reduce communication needs between a processor and remote accelerators. This is achieved by shifting control flow, partial reconfiguration, and execution decisions to the remote side through arbitrarily long instructions that encapsulate complex sequences of operations and their re-spective synchronization requirements. ReFiRe outperforms an SDSoC-generated accelerator system that employs the same accelerator cores to boost performance of a genomics application that detects positive selection.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130214049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enhancing Memory Bandwidth in a Single Stream Computation with Multiple FPGAs 用多个fpga增强单流计算中的内存带宽
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00078
Antoniette Mondigo, K. Sano, H. Takizawa
{"title":"Enhancing Memory Bandwidth in a Single Stream Computation with Multiple FPGAs","authors":"Antoniette Mondigo, K. Sano, H. Takizawa","doi":"10.1109/FPT.2018.00078","DOIUrl":"https://doi.org/10.1109/FPT.2018.00078","url":null,"abstract":"Stream computing is an area where FPGAs can be suitably utilized to meet high performance and high scalability demands. To achieve these, a deep computing pipeline is implemented on multiple FPGAs where stream computing is performed. This paper presents an approach to utilize two masters in a 1D ring network of multiple FPGAs for a single stream computation. Each master FPGA will be reading and writing to their respective DDR3 memories alternately, while streaming through the slave FPGAs. This is done in order to synchronize the computational results on physically separate memory units. Due to this, the aggregate memory bandwidth is doubled, which suggests enhanced performance. The introduction of this streaming concept lays the groundwork towards full utilization of memories in all the FPGAs, as an identified future work.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123158322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Speed and Resource Optimization of BFGS Quasi-Newton Implementation on FPGA Using Inexact Line Search Method for Neural Network Training 基于非精确线搜索法的BFGS准牛顿实现在FPGA上的速度和资源优化
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00074
Jia Liu, Qiang Liu
{"title":"Speed and Resource Optimization of BFGS Quasi-Newton Implementation on FPGA Using Inexact Line Search Method for Neural Network Training","authors":"Jia Liu, Qiang Liu","doi":"10.1109/FPT.2018.00074","DOIUrl":"https://doi.org/10.1109/FPT.2018.00074","url":null,"abstract":"Quasi-Newton (QN) method is one of the most effective Neural Network (NN) training methods. However, QN training often needs long time especially when the NN architecture is large. The BFGS-QN has been implemented on FPGA for accelerating the training process. The experimental results show that the line search module of BFGS-QN is the most timeconsuming module because of its frequent objective function evaluation. In order to solve the issue, an inexact line search method, Armijo-Goldstein (AG) method, is implemented to replace the original exact line search method-Golden Section (GS) method. For the highest training speed, an end-to-end FPGA version of BFGS using AG method is implemented. Moreover, the efficiency AG method makes it possible for hardware-software co-design. The objective function evalution unit in line search module which consumes the most computional resource is moved to CPU for a speed and resource tradeoff. The experimental results show that the end-to-end FPGA BFGS-AG implementation achieves up to 239 times speed up compared with software implementation. The FPGA+CPU BFGS-AG implementation is up to 153.1 times faster than the end-to-end software implementation and achieves up to 45% LUT, 29% FF and 64% DSP reduction.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122686485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
High Throughput CNN Accelerator Design Based on FPGA 基于FPGA的高吞吐量CNN加速器设计
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00052
Liang Xie, Xitian Fan, Wei Cao, Lingli Wang
{"title":"High Throughput CNN Accelerator Design Based on FPGA","authors":"Liang Xie, Xitian Fan, Wei Cao, Lingli Wang","doi":"10.1109/FPT.2018.00052","DOIUrl":"https://doi.org/10.1109/FPT.2018.00052","url":null,"abstract":"Due to the fact that FPGA on-chip memory capacity increases significantly, the feature maps and weights of convolutional layers can be stored on chip, which can reduce the data movement between on-chip memory and off-chip memory. Hence, the bottleneck can shift from the bandwidth to the computing resources in convolutional layers, which will improve the performance dramatically. Under this circumstance, this paper quantitatively analyzes how to design the hardware architecture based on the roofline model to optimize the performance under the constraints of available on-chip computing resources and propose an efficient architecture. Our accelerator is implemented on Xilinx UltraScale+ FPGA with the performance of 9.39 TOPS and 6.86 TOPS for 8-bit data width with 100MHz main frequency and 400MHz DSP frequency on ResNet-50 and AlexNet, which outperforms the existing FPGA-based CNN accelerator.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130767794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
LeFlow: Automatic Compilation of TensorFlow Machine Learning Applications to FPGAs 自动编译TensorFlow机器学习在fpga上的应用
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00082
D. H. Noronha, Kahlan Gibson, B. Salehpour, S. Wilton
{"title":"LeFlow: Automatic Compilation of TensorFlow Machine Learning Applications to FPGAs","authors":"D. H. Noronha, Kahlan Gibson, B. Salehpour, S. Wilton","doi":"10.1109/FPT.2018.00082","DOIUrl":"https://doi.org/10.1109/FPT.2018.00082","url":null,"abstract":"Acceleration of Machine Learning applications on Field-Programmable Gate Arrays (FPGAs) has shown to have advantages over other computing platforms in recent work. However, since machine learning code is often specified in a high-level software language such as Python, the manual translation of the algorithm to either C code for high-level synthesis or to Register Transfer Level (RTL) code for synthesis is time consuming and requires the designer to have expertise in designing hardware. In order to show how we can make FPGAs more accessible to software developers, we present a demonstration of LeFlow: an open-source tool which maps numerical computation models written in TensorFlow to synthesizable RTL. This demonstration includes two examples which begin with a model written in TensorFlow and show how a designer would use the LeFlow tool to generate Verilog, simulate the result, and synthesize the design to target FPGAs.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134219793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信