2019 International Conference on Field-Programmable Technology (ICFPT)最新文献

筛选
英文 中文
Pipelined Parallel Finite Automata Evaluation 流水线并行有限自动机评估
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00021
Vipula Sateesh, Connor Mckeon, Jared Winograd, A. DeHon
{"title":"Pipelined Parallel Finite Automata Evaluation","authors":"Vipula Sateesh, Connor Mckeon, Jared Winograd, A. DeHon","doi":"10.1109/ICFPT47387.2019.00021","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00021","url":null,"abstract":"Finite automata are key compute models in modern computational theory and important building blocks for digital logic used for regular expression and protocol parsing, filtering, and control. Finite automata evaluation would seem to be a sequential operation, since we need to complete the evaluation of one state to know the next state in which to evaluate the logic. Nonetheless, parallel theory provides strategies for parallel finite automata evaluation. We show how to exploit this parallel evaluation strategy in practice on today's high capacity FPGAs, including a novel formulation for spatially pipelined evaluation. For non-deterministic finite automata (NFA) with S states, we can evaluate N inputs in a single cycle with O(N * S^2) BRAMs and O(N*S^3) LUTs. This allows us, for example, to consume 64 inputs on a 16 state NFA in a single cycle on the Xilinx XZCU9EG-ffvb1156-2-i SoC FPGA, achieving 47 GB/s (377 Gb/s) single stream throughput for 8b inputs. For a 40 Gb/s network link, we can support 28 state NFAs.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133896912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A High Performance FPGA-Based Accelerator Design for End-to-End Speaker Recognition System 端到端说话人识别系统中基于fpga的高性能加速器设计
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00033
Ming-jun Jiao, Yue Li, Pengbo Dang, Wei Cao, Lingli Wang
{"title":"A High Performance FPGA-Based Accelerator Design for End-to-End Speaker Recognition System","authors":"Ming-jun Jiao, Yue Li, Pengbo Dang, Wei Cao, Lingli Wang","doi":"10.1109/ICFPT47387.2019.00033","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00033","url":null,"abstract":"Speaker recognition technique is significant for identification applications. X-vectors, a robust text-independent speaker recognition system, spends plenty of time on extracting voiceprint features due to massive neural network computation and scoring with all the people registered in the database to find the best match person. In this paper, an FPGA-based high-performance accelerator for this end-to-end speaker recognition system is proposed, which contains three parts: Mel Frequency Cepstral Coefficients (MFCC), time delay neural network (TDNN) and probabilistic linear discriminant analysis (PLDA) classifier. A quantitative analysis is presented to balance the bit width and the recognition accuracy. In addition, an optimization strategy to make a trade-off between the system parallelism and the FPGA resource utilization is introduced. As a comparison, the proposed accelerator running on Xilinx XCVU9P FPGA of UltraScale+ VCU118 board can achieve a peak performance of 1.067 TOP/s and 1.30×105 voice frames per second (vFPS) with 200MHz, which can obtain 1296× speedup compared with X-vectors software implementation running on a 2.5GHz Intel Xeon E5-2620 processor and 6.42× energy efficiency than Nvidia TITAN Xp GPU solution.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133723388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ZyNet: Automating Deep Neural Network Implementation on Low-Cost Reconfigurable Edge Computing Platforms ZyNet:在低成本可重构边缘计算平台上实现深度神经网络自动化
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00058
Kizheppatt Vipin
{"title":"ZyNet: Automating Deep Neural Network Implementation on Low-Cost Reconfigurable Edge Computing Platforms","authors":"Kizheppatt Vipin","doi":"10.1109/ICFPT47387.2019.00058","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00058","url":null,"abstract":"Prevalence of internet of things (IoT) enabled applications provide a new opportunity to low-cost FPGA devices to act as edge computing neural network nodes. Although FPGA vendors provide neural network development environments, they often target high-end devices. At the same time these development platforms are not as user friendly as their software counterparts. In this work we introduce ZyNet, a Python package, which enables faster implementation of deep neural networks (DNNs) targeting low-cost hybrid FPGA platforms such as the Xilinx Zynq. Based on hardware-software co-design approach, this platform supports pre-trained or on-board trained networks with development environment very similar to the popular TensorFlow. Implementation results show that the DNNs generated by the platform achieve accuracy very close to software implementations at the same time gives throughput by an order of magnitude compared to other edge computing devices at lower energy footprint. The platform is integrated with Xilinx development tools and is distributed as open source.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127066839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
High-Level Synthesis Techniques to Generate Deeply Pipelined Circuits for FPGAs with Registered Routing 基于注册路由的fpga深度流水线电路的高级综合技术
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00071
Yu Ting Chen, Jin Hee Kim, Ke-Xin Li, Graham Hoyes, J. Anderson
{"title":"High-Level Synthesis Techniques to Generate Deeply Pipelined Circuits for FPGAs with Registered Routing","authors":"Yu Ting Chen, Jin Hee Kim, Ke-Xin Li, Graham Hoyes, J. Anderson","doi":"10.1109/ICFPT47387.2019.00071","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00071","url":null,"abstract":"Recent Intel FPGAs Intel have bypassable registers in the routing switches, permitting deeper pipelining and more flexible retiming. We investigate high-level synthesis (HLS) approaches to leverage these interconnect registers. We alter LegUp, an academic HLS tool, to insert extra registers within the datapaths, and at the inputs/outputs of memory blocks. Initially, one or more registers are inserted after every computational and memory instruction to assess the maximum reachable frequency (Fmax). Subsequently, we apply a more judicious approach, profiling applications in software to gather statistics on the execution frequency of code segments. Guided by profiling, we insert additional pipeline registers in subcircuits corresponding to infrequently executed code segments. This permits F max improvements to be realized with modest impact to cycle latency.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115627376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Evolved Binary Neural Networks Through Harnessing FPGA Capabilities 利用FPGA功能的进化二进制神经网络
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00076
Raul Valencia, Chiu-Wing Sham, O. Sinnen
{"title":"Evolved Binary Neural Networks Through Harnessing FPGA Capabilities","authors":"Raul Valencia, Chiu-Wing Sham, O. Sinnen","doi":"10.1109/ICFPT47387.2019.00076","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00076","url":null,"abstract":"The exponential progress of semiconductor tech-nologies has enabled the proliferation of deep learning as a prominent area of research, where neural networks have demon-strated its effectiveness to solve very hard multi dimensional problems. This paper focuses on one in particular, Binary Neural Networks (BNN), which use fixed length bits in its connections and logic functions to perform excitation operations. Exploiting those characteristics, hardware accelerators that integrate field-programmable gate arrays (FPGAs) have been adopted to hasten inference of deep learning networks, given its proficiency to maximize parallelism and energy efficiency. This work will show how the algorithm Binary Spectrum-diverse Unified Neuroevolution Architecture (BiSUNA) can perform training and inference on FPGA without the need of gradient descent. Source code can be found in github.com/rval735/bisunaocl","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114505735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Iterative Technique for Runtime Efficient Hardware-Software Partitioning 一种运行时高效的软硬件分区迭代技术
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00078
Deshya Wijesundera, Kisaru Liyanage, Alok Prakash, T. Srikanthan, Thilina Perera
{"title":"An Iterative Technique for Runtime Efficient Hardware-Software Partitioning","authors":"Deshya Wijesundera, Kisaru Liyanage, Alok Prakash, T. Srikanthan, Thilina Perera","doi":"10.1109/ICFPT47387.2019.00078","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00078","url":null,"abstract":"The increasing popularity of FPGA-based devices for applications of different size and complexity calls for runtime efficient hardware-software partitioning techniques with high levels of accuracy. However, the prohibitively large design space during partitioning makes this task a challenging one, leading to restrictions on the design space at the cost of accuracy. In this work, we propose an iterative technique for runtime efficient hardware-software partitioning based on a divide and conquer algorithm. The proposed techniques have been evaluated using applications from the CHstone benchmark suite with accuracy of 94% and 99% compared to implementation and an exhaustive technique respectively, with significantly low runtimes.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128313677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Dataflow Pipelining Architecture for Tile Segmentation with a Sparse MobileNet on an FPGA 基于FPGA的稀疏MobileNet瓷砖分割数据流管道体系结构
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00044
Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara
{"title":"A Dataflow Pipelining Architecture for Tile Segmentation with a Sparse MobileNet on an FPGA","authors":"Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara","doi":"10.1109/ICFPT47387.2019.00044","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00044","url":null,"abstract":"Implementation of fast semantic segmentation in an embedded system is necessary due to the increasing interest in automatic driving and energy-efficiency is a fundamental metric in such a scenario. Because high-resolution images are required to achieve high segmentation accuracy, its accelerator must prepare large buffers for the corresponding feature maps, and it is not suitable for the limited on-chip memory of an FPGA. To address this, we propose a tile segmentation algorithm and develop an FPGA-based accelerator for a sparse MobileNet-based PSPNet. To reduce the buffer size, the tile segmentation algorithm splits incoming images into the numbers determined by the stride on an FPGA. The PSPNet then performs many split tiles. Moreover, we propose a pipelined sparse convolutional circuit to compute multiple tiles with high-speed. We compared the proposed FPGA-based system with the NVIDIA RTX 2080 Ti using the Cityscapes benchmark. The FPGA achieved 139 FPS with 24 W power consumption for a 1024×512 image, and its accuracy (mIoU) was 64.2%. Compared with the GPU, it was 1.5 times faster, its power consumption was 9.3 times lower, and its performance per power consumption was 13.8 times better.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130141110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Optimisation of System Throughput Exploiting Tasks Heterogeneity on Space Shared FPGAs 利用空间共享fpga的任务异构性优化系统吞吐量
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00067
U. Minhas, R. Woods, G. Karakonstantis
{"title":"Optimisation of System Throughput Exploiting Tasks Heterogeneity on Space Shared FPGAs","authors":"U. Minhas, R. Woods, G. Karakonstantis","doi":"10.1109/ICFPT47387.2019.00067","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00067","url":null,"abstract":"There are challenges in optimising system throughput in FPGA-based cloud computing due to mapping constraints resulting in suboptimal space sharing of resources, as the number of tasks grow and become more heterogeneous. This work proposes a methodology for exploring and optimising their resource utilisation. By identifying high-level synthesis parameters for each task, machine learning models and intelligent clustering are then employed to define clusters of tasks which will share the FPGA space. Assuming heterogeneity characterisation of tasks and thus static partitioning of the FPGA, it is ensured that each task in a cluster accommodates other tasks' resource requirements resulting in a higher compute density. Using 11 high performance computing tasks, we achieve an average 3.3× higher system throughput at 2.8× better energy efficiency when compared to existing approaches.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125312666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An OpenCL-Based FPGA Accelerator for Compressed YOLOv2 基于opencl的压缩YOLOv2 FPGA加速器
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00036
Anrong Yang, Yuanhui Li, Hongqiao Shu, Jianlin Deng, Chuanzhao Ma, Zheng Li, Qigang Wang
{"title":"An OpenCL-Based FPGA Accelerator for Compressed YOLOv2","authors":"Anrong Yang, Yuanhui Li, Hongqiao Shu, Jianlin Deng, Chuanzhao Ma, Zheng Li, Qigang Wang","doi":"10.1109/ICFPT47387.2019.00036","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00036","url":null,"abstract":"Convolutional neural networks (CNNs) are widely used in computer vision applications. GPU has been the mainstream accelerator for CNNs. Compared with GPU, FPGA has the advantages of high flexibility, low power consumption and abundant DSP resources, which make it possible to surpass GPU in some scenarios. The recent progress of high level synthesis tools greatly improves the development efficiency of FPGA. In this paper, an OpenCL-based CNN accelerator is designed for FPGA and a variety of model compression techniques are applied to the YOLOv2 model. The accelerator uses the Winograd algorithm to implement convolution efficiently and solves the unaligned global memory access issue caused by the Winograd algorithm with an alignment stream buffer. This design makes full use of the available memory access bandwidth and utilizes all the available DSP resources. Parallelism is exploited in various dimensions for optimal performance. The performance of our FPGA design can reach 10 ms per image in terms of latency, compared to 15 ms per image with an nVidia P100 GPU. We plan to make our design open source so that the community can benefit from it and contribute to it together.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"332 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122744990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Storage Mirroring for Bare-Metal Malware Analysis on FPGA Devices 基于FPGA器件的裸金属恶意软件分析存储镜像
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00061
D. C. Turicu, O. Creţ, L. Văcariu
{"title":"Storage Mirroring for Bare-Metal Malware Analysis on FPGA Devices","authors":"D. C. Turicu, O. Creţ, L. Văcariu","doi":"10.1109/ICFPT47387.2019.00061","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00061","url":null,"abstract":"Malware continue to be a major security threat for computer systems. Due to their fast-growing number and increasing complexity, automated analysis methods are preferred by security analysts over manual ones. The automated dynamic analysis of malware executes the samples in controlled environments and monitors their potentially malicious behavior. Modern malware can detect these emulated or virtualized environments and suspend their malicious activities to foil the analysis. Consequently, the ultimate technique for analyzing the behavior of malware is through execution of the samples in bare metal analysis environments. Detection aside, restoring the analysis system to a clean state after each analysis is challenging. To resolve this, in this paper we propose an FPGA-implemented storage mirroring technique for instantaneous restoration of the storage device and the retrieval of the files having been modified during the sample execution.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121036666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信