2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)最新文献

筛选
英文 中文
Optimal processor interface for CGRA-based accelerators implemented on FPGAs fpga上实现基于cgra的加速器的最优处理器接口
2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-30 DOI: 10.1109/ReConFig.2016.7857178
L. Jung, C. Hochberger
{"title":"Optimal processor interface for CGRA-based accelerators implemented on FPGAs","authors":"L. Jung, C. Hochberger","doi":"10.1109/ReConFig.2016.7857178","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857178","url":null,"abstract":"Coarse Grained Reconfigurable Arrays (CGRA) can be used to substantially boost the processing power of embedded applications. They can be included in typical system-on-chip architectures to execute computationally demanding parts of the application. Delegating execution to the CGRA requires the exchange of live in/out variables between the processor core and the CGRA. In this paper we search the optimal interface between the surrounding system and the CGRA with respect to impact on the operating frequency, the used resources and the runtime overhead.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133914186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Zynq-based testbed for the experimental benchmarking of algorithms competing in cryptographic contests 一个基于zynq的测试平台,用于在密码学竞赛中竞争算法的实验基准测试
2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857148
Farnoud Farahmand, Ekawat Homsirikamol, K. Gaj
{"title":"A Zynq-based testbed for the experimental benchmarking of algorithms competing in cryptographic contests","authors":"Farnoud Farahmand, Ekawat Homsirikamol, K. Gaj","doi":"10.1109/ReConFig.2016.7857148","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857148","url":null,"abstract":"Hardware performance evaluation of candidates competing in cryptographic contests, such as SHA-3 and CAE-SAR, is very important for ranking their suitability for standardization. One of the most essential performance metrics is the throughput, which highly depends on the algorithm, hardware implementation architecture, coding style, and options of tools. The maximum throughput is calculated based on the maximum clock frequency supported by each algorithm. A common way of determining the maximum clock frequency is static timing analysis provided by the CAD toolsets such as Xilinx ISE, Xilinx Vivado, and Altera Quartus Prime. In this project, we have developed a universal testbed, which is capable of measuring the maximum clock frequency experimentally, using a prototyping board. We are targeting cryptographic hardware cores, such as implementations of SHA-3 candidates. Our testbed is designed using a Zynq platform and takes advantage of software/hardware co-design. It supports two separate clock domains, one for a hardware module under test, and the other for the communication between an ARM core and hardware accelerator. We measured the maximum clock frequency and the execution time of 12 Round 2 SHA-3 candidates experimentally on ZedBoard and compared the results with the frequencies reported by Xilinx Vivado. Our results indicate that depending on the characteristics of each algorithm, we may achieve either much higher or the same experimental frequency than the results reported by the tools using static timing analysis. This behavior is then further analyzed, and the relevant conclusions drawn.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126156050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A configurable architecture for the generalized hough transform applied to the analysis of huge aerial images and to traffic sign detection 一种可配置的广义霍夫变换体系结构,应用于大型航空图像分析和交通标志检测
2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857143
G. Kiefer, Matthias Vahl, Julian Sarcher, M. Schaeferling
{"title":"A configurable architecture for the generalized hough transform applied to the analysis of huge aerial images and to traffic sign detection","authors":"G. Kiefer, Matthias Vahl, Julian Sarcher, M. Schaeferling","doi":"10.1109/ReConFig.2016.7857143","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857143","url":null,"abstract":"Object recognition in huge image data sets or in live camera images at interactive frame rates is a very demanding task, especially within embedded systems. The recognition task includes the localization of a reference object and its rotation and scaling in a search image. The Generalized Hough Transform (GHT) is known as a powerful and robust technique to support this task by transforming the search image into a 4D parameter space. However, the GHT itself is very complex and demanding towards computational power and memory consumption. This paper presents a novel hardware architecture to perform a complete 4D GHT at interactive frame rates in an FPGA. The architecture is configurable in order to allow a trade-off between performance, accuracy and hardware usage. The proposed architecture has been implemented in a low-cost Zynq-7000 FPGA and successfully evaluated in two practical applications, namely groyne detection in aerial images and traffic sign detection.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122062439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hardware-accelerated pose estimation for embedded systems using Vivado HLS 基于Vivado HLS的嵌入式系统硬件加速姿态估计
2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857173
J. Joseph, Tobias Winker, Kristian Ehlers, Christopher Blochwitz, Thilo Pionteck
{"title":"Hardware-accelerated pose estimation for embedded systems using Vivado HLS","authors":"J. Joseph, Tobias Winker, Kristian Ehlers, Christopher Blochwitz, Thilo Pionteck","doi":"10.1109/ReConFig.2016.7857173","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857173","url":null,"abstract":"The focus of this work is to facilitate pose estimation and, thus, gesture recognition for embedded systems, although these are tasks with high computational performance requirements. Therefore, an existing pose estimation algorithm is optimized for Xilinx High Level Synthesis (HLS). The resulting hardware acceleration cores are compared for different optimizations and, finally, we propose a hardware/software system design for a Xilinx Zynq Zedboard. Using this method, we achieve a speedup of 1.6 in comparison to a software solution on the ARM processor and, thus, facilitate hand tracking for embedded systems with low power consumption.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129245338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ReOrder: Runtime datapath generation for high-throughput multi-stream processing ReOrder:用于高吞吐量多流处理的运行时数据路径生成
2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857185
Andreas Becher, S. Wildermann, Moritz Mühlenthaler, J. Teich
{"title":"ReOrder: Runtime datapath generation for high-throughput multi-stream processing","authors":"Andreas Becher, S. Wildermann, Moritz Mühlenthaler, J. Teich","doi":"10.1109/ReConFig.2016.7857185","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857185","url":null,"abstract":"Modern Programmable FPGA-based SoCs that tightly couple CPU and programmable logic enable the acceleration of stream processing in hardware on-demand by making use of the available high input and output throughputs and the reconfigurability both in software and hardware. In this paper, we present the concept and implementation of a hardware unit called ReOrder that serves as a converter for multiple parallel streams of data read from and written to an accelerator. Our technique and programmable design allows flexible data access and connects different stream processing accelerators independent of the host data layout. In order to achieve a high accelerator throughput, it is necessary to determine an optimized datapath according to the accelerator's internal schedule of input and output data. We are concerned with an online setting, in which either the data layout (e.g., in the case of modern database systems) or the accelerator operational mode change dynamically. Therefore, an algorithm is required which can be used at “runtime” in order to maintain an optimized datapath configuration. We propose an efficient heuristic algorithm and corresponding FPGA design that is able to translate arbitrary (multi-source) data layouts of the connected host system to generate any specified data stream of the accelerator at runtime within ms.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130379407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Coarse grain reconfiguration: Power estimation and management flow for hybrid gated systems 粗粒重构:混合门控系统的功率估计和管理流程
2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857160
Tiziana Fanni, L. Raffo
{"title":"Coarse grain reconfiguration: Power estimation and management flow for hybrid gated systems","authors":"Tiziana Fanni, L. Raffo","doi":"10.1109/ReConFig.2016.7857160","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857160","url":null,"abstract":"This work presents an automatic power estimation and implementation flow for coarse-grained reconfigurable systems, capable of guiding designers towards the optimal implementation of power-efficient systems. The entire flow is assessed over the reconfigurable computing core of a dedicated image processing accelerator, targeting an ASIC 45 nm technology.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134558039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-throughput cellular imaging with high-speed asymmetric-detection time-stretch optical microscopy under FPGA platform FPGA平台下高速非对称检测时间拉伸光学显微镜的高通量细胞成像
2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/RECONFIG.2016.7857175
Ho-Cheung Ng, Maolin Wang, Bob M. F. Chung, B. S. C. Varma, M. Jaiswal, S. M. H. Ho, K. Tsia, H. Shum, Hayden Kwok-Hay So
{"title":"High-throughput cellular imaging with high-speed asymmetric-detection time-stretch optical microscopy under FPGA platform","authors":"Ho-Cheung Ng, Maolin Wang, Bob M. F. Chung, B. S. C. Varma, M. Jaiswal, S. M. H. Ho, K. Tsia, H. Shum, Hayden Kwok-Hay So","doi":"10.1109/RECONFIG.2016.7857175","DOIUrl":"https://doi.org/10.1109/RECONFIG.2016.7857175","url":null,"abstract":"Asymmetric-Detection Time-Stretch Optical Microscopy (ATOM) is a recently emerged technology that provides ultra-fast cell imaging with a frame rate up to MHz — orders-of-magnitude higher than any classical imaging systems. However, existing measuring instruments are unable to fully exploit the capability of ATOM. For example, the volume of imaging data-set of ATOM quickly increases beyond the capacity of available onboard buffer of a modern high-speed oscilloscope. This paper presents an open source, FPGA-based solution which serves as a dual role of collecting low-level signals from ATOM frontend as well as processing and transferring data to backing store. Optical signals are sampled by a high-speed analog-to-digital converter and the resulting values are collected by an FPGA. The quantized values received are then further processed and divided into four segments for subsequent data transfer with 10 Gbit Ethernet. Four computing units are attached to these channels with direct connection in order to reliably receive the data for post-processing. Experiments show that, with decent quality images for single-cell analysis, the proposed system can store 10x more dataset than existing high-end oscilloscope. With 8x decrease in equipment cost, the proposed FPGA-based system will definitely be beneficial for many bio imaging applications with ATOM technology such as rare cancer cell imaging and identification.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131204906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Adaptive single-event effect mitigation for dependable processing systems 可靠处理系统的自适应单事件效应缓解
2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857149
R. Glein, F. Rittner, A. Heuberger
{"title":"Adaptive single-event effect mitigation for dependable processing systems","authors":"R. Glein, F. Rittner, A. Heuberger","doi":"10.1109/ReConFig.2016.7857149","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857149","url":null,"abstract":"For application in radiation-harsh environments, designers apply mitigation techniques according the worst-case (solar) condition to achieve a dependable design. This results in a resource overhead, which is most of the time unnecessary. To overcome this problem, adaptive mitigation techniques are used. This technique is a trade-off between two parameters, such as performance and reliability, according to different operating modes by toggling between these modes. In this context, we propose an Adaptive Single-Event Effect Mitigation (ASEEM) method. It is based on adaptive reconfiguration of an FPGA between two modes, specifically a performance mode and a high reliability mode. The performance mode offers high processing power and thus higher signal processing throughput. We evaluate ASEEM by calculating results with particle data from 2010 until 2016 for one space-grade and two commercial-grade FPGAs. Based on radiation data, we calculate upset rates, availability, performance and performability. We discuss one realization of ASEEM in detail with fixed upset rates. The examples presented in this paper show a reduction of the upset rate form a sixth to a ninth (compared with the performance mode) and the availability of the high processing power over 90 % in the considered time interval. We conclude that the investigated ASEEM realization is optimal for moderate and long mean times to repair. In a processing case study, with a fixed mean time to repair of one hour, we obtain a performability improvement of 14% and an availability improvement of 21 % over the performance mode for an FPGA using the latest semiconductor technology.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124246178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dual fixed-point CORDIC processor: Architecture and FPGA implementation 双定点CORDIC处理器:体系结构和FPGA实现
2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857166
Andres Jacoby, D. Llamocca
{"title":"Dual fixed-point CORDIC processor: Architecture and FPGA implementation","authors":"Andres Jacoby, D. Llamocca","doi":"10.1109/ReConFig.2016.7857166","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857166","url":null,"abstract":"We introduce Dual Fixed Point CORDIC, that provides a compromise between Fixed Point and Floating Point CORDIC hardware implementations. A fully parameterized hardware is presented that allows for extensive exploration of the resources-accuracy design space, from which we generate optimal (in the multi-objective sense) realizations. We compare Fixed Point, Dual Fixed Point, and Floating Point CORDIC units in terms of resources and accuracy. Results show the effectiveness of Dual Fixed Point for CORDIC implementation where the increase in resources is largely offset by the high accuracy improvements.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123399635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC 在中档全可编程SoC上用于CNN加速的高效运行时可重构IP
2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857144
P. Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, L. Raffo, L. Benini
{"title":"A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC","authors":"P. Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, L. Raffo, L. Benini","doi":"10.1109/ReConFig.2016.7857144","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857144","url":null,"abstract":"Convolutional Neural Networks (CNNs) are a nature-inspired model, extensively employed in a broad range of applications in computer vision, machine learning and pattern recognition. The CNN algorithm requires execution of multiple layers, commonly called convolution layers, that involve application of 2D convolution filters of different sizes over a set of input image features. Such a computation kernel is intrinsically parallel, thus significantly benefits from acceleration on parallel hardware. In this work, we propose an accelerator architecture, suitable to be implemented on mid-to high-range FPGA devices, that can be re-configured at runtime to adapt to different filter sizes in different convolution layers. We present an accelerator configuration, mapped on a Xilinx Zynq XC-Z7045 device, that achieves up to 120 GMAC/s (16 bit precision) when executing 5×5 filters and up to 129 GMAC/s when executing 3×3 filters, consuming less than 10W of power, reaching more than 97% DSP resource utilizazion at 150MHz operating frequency and requiring only 16B/cycle I/O bandwidth.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125026773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信