2018 International Conference on Field-Programmable Technology (FPT)最新文献

筛选
英文 中文
A Tri-State Weight Convolutional Neural Network for an FPGA: Applied to YOLOv2 Object Detector 基于FPGA的三态权重卷积神经网络:应用于YOLOv2目标检测器
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00058
Hiroki Nakahara, Masayuki Shimoda, Shimpei Sato
{"title":"A Tri-State Weight Convolutional Neural Network for an FPGA: Applied to YOLOv2 Object Detector","authors":"Hiroki Nakahara, Masayuki Shimoda, Shimpei Sato","doi":"10.1109/FPT.2018.00058","DOIUrl":"https://doi.org/10.1109/FPT.2018.00058","url":null,"abstract":"A frame object detection, such as the YOLO (You only look once), is used in embedded vision systems, such as a robot, an automobile, a security camera, and a drone. However, it requires highly performance-per-power detection by an inexpensive device. In the paper, we propose a tri-state weight CNN, which is a generalization of a low-precision and sparse (pruning) for CNN weight. In the former part, we set a weight {-1,0,+1} as a ternary CNN, while in the latter part, we set a {-w,0,+w} as a sparse weight CNN. The proposed tri-state CNN is a kind of a mixed-precision one, which is suitable for an object detector consisting of a bounding box prediction (regression) and a class estimation (classification). We apply an indirect memory access architecture to skip zero part and propose the weight parallel 2D convolutional circuit. It can efficiently be applied to the AlexNet based CNN, which has different size kernels. We design the AlexNet based YOLOv2 to reduce the number of layers toward low-latency computation. In the experiment, the proposed tri-state scheme CNN reduces the memory size for weight by 92%. We implement the proposed tri-state weight YOLOv2 on the AvNet Inc. UltraZed-EG starter kit, which has the Xilinx Inc. Zynq Ultrascale+ MPSoC ZU3EG. It archived 61.70 frames per second (FPS), which exceeds the standard video frame rate (29.97 FPS). Compared with the ARM Cortex-A57, it was 268.2 times faster, and its performance per power efficiency was 313.51 times better. Also, compared with the NVidia Pascal embedded GPU, it was 4.0 times faster, and its power performance efficiency was 11.35 times better.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133723854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Automated FPGA-Based Fault Injection Platform for Granularly-Pipelined Fault Tolerant CORDIC 基于fpga的粒度流水线容错CORDIC自动故障注入平台
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00076
Yu Xie, He Chen, Yizhuang Xie, Chuang-An Mao, Bingyi Li
{"title":"An Automated FPGA-Based Fault Injection Platform for Granularly-Pipelined Fault Tolerant CORDIC","authors":"Yu Xie, He Chen, Yizhuang Xie, Chuang-An Mao, Bingyi Li","doi":"10.1109/FPT.2018.00076","DOIUrl":"https://doi.org/10.1109/FPT.2018.00076","url":null,"abstract":"Augment of integration and complexity makes VLSI circuits more sensitive to errors. Also, soft errors caused by Single Event Upset (SEU) have become a significant threat to modern electronic systems. Therefore, the demand of high reliability on modern electronic systems keeps increasing. Aiming at reliability evaluation of fault tolerant very large scale integrated circuits implemented on SRAM-based FPGA, an automated fault injection platform via Internal Configuration Access Port (ICAP) for rapid fault injection is presented in this paper. We adopt a granularly-pipelined fault tolerant CORDIC processor as the Design Under Test (DUT), and a C++ script is deployed for the external fault injection control environment and automating the fault injection procedure. The proposed method can achieve quantities of repeating fault injection tests and is suitable for any fault tolerant design implemented in SRAM-Based FPGA.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123834858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FLiMS: Fast Lightweight Merge Sorter 快速轻量级合并排序器
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00022
Philippos Papaphilippou, Chris Brooks, W. Luk
{"title":"FLiMS: Fast Lightweight Merge Sorter","authors":"Philippos Papaphilippou, Chris Brooks, W. Luk","doi":"10.1109/FPT.2018.00022","DOIUrl":"https://doi.org/10.1109/FPT.2018.00022","url":null,"abstract":"We have developed a highly-efficient and simple parallel hardware design for merging two sorted lists residing in banked (or multi-ported) memory. The FPGA implementation uses half the hardware resources required for implementing the current state-of-the-art architecture. This is achieved with better performance and half the latency, for the same amount of parallelism. The challenges for the merge operations in FPGAs have been the low clock frequency due to the feedback datapath of the merger being the critical path for timing, and also the high resource utilisation in recent attempts to eliminate/remove the feedback datapath. Our solution uses a modified version of the bitonic merge block, as found in a bitonic sorter, repurposed for performing parallel merge for streaming data. As with the state-of-the-art, it can be considered feedback-less since it only nests one parallel comparison for any desired level of parallelism. This leads to high operating frequency designs, 1.3 times higher than the previous best on our test platform. Since the new design uses 2 times fewer hardware resources, it allows more parallelism and leaves room for additional logic.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132955305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Optimisation of Convolution of Multiple Different Sized Filters in SKA Pulsar Search Engine SKA脉冲星搜索引擎中多个不同大小滤波器卷积的优化
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00073
Haomiao Wang, B. Stappers, P. Thiagaraj, O. Sinnen
{"title":"Optimisation of Convolution of Multiple Different Sized Filters in SKA Pulsar Search Engine","authors":"Haomiao Wang, B. Stappers, P. Thiagaraj, O. Sinnen","doi":"10.1109/FPT.2018.00073","DOIUrl":"https://doi.org/10.1109/FPT.2018.00073","url":null,"abstract":"Pulsar search is one of the main tasks for the Square Kilometre Array (SKA) central signal processor (CSP) sub-element. Because most of the pulsar details are unknown, many pulsar search approaches are employed. The main compute-intensive application of the pulsar search modules is the matched filter group, which convolves the input signals with a group of filters. High-performance designs on FPGAs have been proposed that can process multiple large filters efficiently. But given that in many applications, including the here targeted pulsar search, filters have different sizes, there is a high potential for optimisation. This paper investigates the optimisation of general matched filtering design for the SKA pulsar search engine. The influence of changing the number of filters and the difference in sizes is analysed. The general implementations in time-domain (TD) and frequency-domain (FD) are optimised, employing the longest processing time (LPT) first rule to distribute filter templates across filter processing pipelines. The proposed design is employed to implement the matched filter groups in two SKA pulsar modules. The results show that the optimisation can provide up to 2.1x speedup in TD and 1.2x speedup in FD.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114303354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-Time Object Detection and Semantic Segmentation Hardware System with Deep Learning Networks 基于深度学习网络的实时目标检测和语义分割硬件系统
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00081
Shaoxia Fang, Lu Tian, Junbin Wang, Shuang Liang, Dongliang Xie, Zhongmin Chen, Lingzhi Sui, Qian Yu, Xiaoming Sun, Yi Shan, Yu Wang
{"title":"Real-Time Object Detection and Semantic Segmentation Hardware System with Deep Learning Networks","authors":"Shaoxia Fang, Lu Tian, Junbin Wang, Shuang Liang, Dongliang Xie, Zhongmin Chen, Lingzhi Sui, Qian Yu, Xiaoming Sun, Yi Shan, Yu Wang","doi":"10.1109/FPT.2018.00081","DOIUrl":"https://doi.org/10.1109/FPT.2018.00081","url":null,"abstract":"Advanced Driver Assistance Systems (ADAS) help the driver in the driving process by detecting objects, doing basic classification, implementing safety guards and so on. Convolution Neural Networks (CNN) has been proved to be an essential to support ADAS. We designed an architecture named Aristotle to execute neural networks for both object detection and semantic segmentation on FPGA. DNNDK (Deep Learning Development Toolkit), a full-stack software tool, with tens of compilation optimization techniques is proposed to improve the energy efficiency and make it easy to develop. The Aristotle architecture is implemented on Xilinx ZU9 FPGA, and two networks are deployed on it to execute object detection and semantic segmentation, respectively.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114830967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA 基于FPGA的压缩SSDLite实时目标检测加速器
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00014
Hongxiang Fan, Shuanglong Liu, Martin Ferianc, Ho-Cheung Ng, Zhiqiang Que, Shen Liu, Xinyu Niu, W. Luk
{"title":"A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA","authors":"Hongxiang Fan, Shuanglong Liu, Martin Ferianc, Ho-Cheung Ng, Zhiqiang Que, Shen Liu, Xinyu Niu, W. Luk","doi":"10.1109/FPT.2018.00014","DOIUrl":"https://doi.org/10.1109/FPT.2018.00014","url":null,"abstract":"Convolutional neural network (CNN)-based object detection has been widely employed in various applications such as autonomous driving and intelligent video surveillance. However, the computational complexity of conventional convolution hinders its application in embedded systems. Recently, a mobile-friendly CNN model SSDLite-MobileNetV2 (SSDLiteM2) has been proposed for object detection. This model consists of a novel layer called bottleneck residual block (BRB). Although SSDLiteM2 contains far fewer parameters and computations than conventional CNN models, its performance on embedded devices still cannot meet the requirements of real-time processing. This paper proposes a novel FPGA-based architecture for SSDLiteM2 in combination with hardware optimizations including fused BRB, processing element (PE) sharing and load-balanced channel pruning. Moreover, a novel quantization scheme called partial quantization has been developed, which partially quantizes SSDLiteM2 to 8 bits with only 1.8% accuracy loss. Experiments show that the proposed design on a Xilinx ZC706 device can achieve up to 65 frames per second with 20.3 mean average precision on the COCO dataset.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124474270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Improving Confidentiality in Virtualized FPGAs 提高虚拟化fpga的保密性
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00048
S. Yazdanshenas, Vaughn Betz
{"title":"Improving Confidentiality in Virtualized FPGAs","authors":"S. Yazdanshenas, Vaughn Betz","doi":"10.1109/FPT.2018.00048","DOIUrl":"https://doi.org/10.1109/FPT.2018.00048","url":null,"abstract":"FPGAs are being deployed in modern datacenters to provide users with specialized accelerators that offer superior compute capability, increased energy efficiency, lower latency, and more programming flexibility than CPUs. However, FPGAs are not utilized as efficiently in datacenters: unlike CPUs, FPGAs in datacenters are currently not shared between users due to potential security risks. The higher flexibility that comes with FPGAs also gives more capabilities to malicious users. Several recent studies have demonstrated examples of FPGA user applications capable of remotely sniffing data from other applications running on the same FPGA. In this work, we look at various ways to ameliorate these threats by encrypting/decrypting the user application's data under different trust levels for current virtualized FPGAs. We also discuss the role of interconnect and discuss the potential of more efficient security features that can be implemented together with the interconnect if the FPGAs use a hard network on chip.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131872728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Dither NN: An Accurate Neural Network with Dithering for Low Bit-Precision Hardware 抖动神经网络:用于低位精度硬件的具有抖动的精确神经网络
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00013
Kota Ando, Kodai Ueyoshi, Yuka Oba, Kazutoshi Hirose, Ryota Uematsu, Takumi Kudo, M. Ikebe, T. Asai, Shinya Takamaeda-Yamazaki, M. Motomura
{"title":"Dither NN: An Accurate Neural Network with Dithering for Low Bit-Precision Hardware","authors":"Kota Ando, Kodai Ueyoshi, Yuka Oba, Kazutoshi Hirose, Ryota Uematsu, Takumi Kudo, M. Ikebe, T. Asai, Shinya Takamaeda-Yamazaki, M. Motomura","doi":"10.1109/FPT.2018.00013","DOIUrl":"https://doi.org/10.1109/FPT.2018.00013","url":null,"abstract":"Energy-constrained neural network processing is in high demanded for various mobile applications. Binary neural network aggressively enhances the computational efficiency, and in contrast, it suffers from degradation of accuracy due to its extreme approximation. We propose a novel accurate neural network model based on binarization and \"dithering\" that distributes the quantization error to neighboring pixels. The quantization errors in the binarization are distributed in the plane, so that a pixel in the multi-level source expression more accurately represented in the resulting binarized plane by multiple pixels. We designed a low-overhead binary-based hardware architecture for the proposed model. The evaluation results show that this method can be realized with a few additional lightweight hardware components.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128928042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
An FPGA Implementation of Robust Matting 鲁棒抠图的FPGA实现
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00047
Takuya Yamazaki, T. Maruyama
{"title":"An FPGA Implementation of Robust Matting","authors":"Takuya Yamazaki, T. Maruyama","doi":"10.1109/FPT.2018.00047","DOIUrl":"https://doi.org/10.1109/FPT.2018.00047","url":null,"abstract":"Matting is a process of extracting a foreground from background in an image. It is one of the key techniques in many image editing, and many algorithms have been proposed. Robust Matting is one of the powerful matting algorithms. In the Robust Matting, first, a set of pixels in the foreground and background are sampled, and then, for each pixel in the image, its category is determined by comparing it to a number of pairs of a foreground and background sample. Its computational complexity is very high because of a large number of multiply and square operations. In this paper, we propose an FPGA implementation for real-time processing of HD images. In our approach, the image is divided into small blocks and the same pairs of the foreground and background pixels are used for the pixels in the same block in order to reduce the number of the operations. This approach makes it possible to execute multiply operations by simply looking up tables that are dynamically updated block by block.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131084578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FCLNN: A Flexible Framework for Fast CNN Prototyping on FPGA with OpenCL and Caffe FCLNN:基于OpenCL和Caffe的FPGA快速CNN原型设计的灵活框架
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00043
Xianchao Xu, Brian Liu
{"title":"FCLNN: A Flexible Framework for Fast CNN Prototyping on FPGA with OpenCL and Caffe","authors":"Xianchao Xu, Brian Liu","doi":"10.1109/FPT.2018.00043","DOIUrl":"https://doi.org/10.1109/FPT.2018.00043","url":null,"abstract":"The CNN algorithms are still in rapid evolution, while the traditional RTL level programming on FPGA is relatively slow and requires great efforts and expertise. In this paper, we propose a flexible HW/SW co-design framework for both fast and high-throughput CNN prototyping with commercial high-level OpenCL language and the standard open-source deep learning framework Caffe. We build up a parameterizable stream-architected convolution engine and extend it to support any input size and filter depth. For iterative development process, we provide both layer-based and subgraph-based execution schedule. While for competitive performance, both on-chip and off-chip communication are optimized. Using our framework with Intel Arria 10 GX1150 FPGA, we achieve 69.2 fps and 18.6 fps on official YOLOv2-tiny-voc and YOLOv2-voc respectively. To the best of our knowledge, this is the first work to accelerate the state-of-the-art YOLOv2 with both real-time performance and < 1% accuracy drop on FPGA.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121710548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信