2009 International Conference on Reconfigurable Computing and FPGAs最新文献

筛选
英文 中文
Matrix Multiplication Based on Scalable Macro-Pipelined FPGA Accelerator Architecture 基于可扩展宏流水线FPGA加速架构的矩阵乘法
2009 International Conference on Reconfigurable Computing and FPGAs Pub Date : 2009-12-09 DOI: 10.1109/ReConFig.2009.30
Jiang Jiang, Vincent Mirian, Kam Pui Tang, P. Chow, Zuocheng Xing
{"title":"Matrix Multiplication Based on Scalable Macro-Pipelined FPGA Accelerator Architecture","authors":"Jiang Jiang, Vincent Mirian, Kam Pui Tang, P. Chow, Zuocheng Xing","doi":"10.1109/ReConFig.2009.30","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.30","url":null,"abstract":"In this paper, we introduce a scalable macro-pipelined architecture to perform floating point matrix multiplication, which aims to exploit temporal parallelism and architectural scalability. We demonstrate the functionality of the hardware design with 16 processing elements (PEs) on Xilinx ML507 development board containing Virtex-5 XC5VFX70T. A 32-PE design for matrix size ranging from 32*32 to 1024*1024 is also simulated. Our experiment shows that we have achieved 12.18 GFLOPS with 32 PEs or about 1.90 GFLOPS per PE per GHz performance, which is over 95% PE usage. Moreover, the proposed SMPA has the capability to scale up to tens or hundreds of GFLOPS using multiple FPGA devices and high speed interconnect.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115732011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A Scalable Architecture for Multivariate Polynomial Evaluation on FPGA 基于FPGA的多元多项式求值可扩展架构
2009 International Conference on Reconfigurable Computing and FPGAs Pub Date : 2009-12-09 DOI: 10.1109/ReConFig.2009.22
Mathieu Allard, P. Grogan, J. David
{"title":"A Scalable Architecture for Multivariate Polynomial Evaluation on FPGA","authors":"Mathieu Allard, P. Grogan, J. David","doi":"10.1109/ReConFig.2009.22","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.22","url":null,"abstract":"Polynomial evaluation is currently used in multiple domains such as image processing, control systems and applied mathematics. Its high demand in calculation time and the need for embedded solutions make it a good target application for a hardware-oriented solution. This paper presents a new scalable architecture and its FPGA implementation designed to exploit the high level of parallelism present in such applications. Illustrated by an example in the field of 3-D graphic computation, results show important acceleration factors varying from 178 to 880 for orders ranging from 4 to 19, while the associated hardware cost scales linearly with polynomial order. Moreover using parallel implementations of the architecture to evaluate multiple polynomials, acceleration factor as high as 30858 can be obtained compared to an execution on a single processor.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125219607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Runtime Memory Allocation in a Heterogeneous Reconfigurable Platform 异构可重构平台中的运行时内存分配
2009 International Conference on Reconfigurable Computing and FPGAs Pub Date : 2009-12-09 DOI: 10.1109/ReConFig.2009.38
V. Sima, K. Bertels
{"title":"Runtime Memory Allocation in a Heterogeneous Reconfigurable Platform","authors":"V. Sima, K. Bertels","doi":"10.1109/ReConFig.2009.38","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.38","url":null,"abstract":"In this paper, we present a runtime memory allocation algorithm, that aims to substantially reduce the overhead caused by shared-memory accesses by allocating memory directly in the local scratch pad memories. We target a heterogeneous platform, with a complex memory hierarchy. Using special instrumentation, we determine what memory areas are used in functions that could run on different processing elements, like, for example a reconfigurable logic array. Based on profile information, the programmer annotates some functions as candidates for accelerated execution. Then, an algorithm decides the best allocation, taking into account the various processing elements and special scratch pad memories of the heterogeneous platform. Tests are performed on our prototype platform, a Virtex ML410 with Linux operating system, containing a PowerPC processor and a Xilinx FPGA, implementing the MOLEN programming paradigm. We test the algorithm using both state of the art H.264 video encoder as well as other synthetic applications. The performance improvement for the H.264 application is 14% compared to the software only version while the overhead is less than 1% of the application execution time. This improvement is the optimal improvement that can be obtained by optimizing the memory allocation. For the synthetic applications the results are within 5% of the optimum.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125356744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Framework for 2.5D NoC Exploration Using Homogeneous Networks over Heterogeneous Floorplans 基于异构平面上同构网络的2.5D NoC勘探框架
2009 International Conference on Reconfigurable Computing and FPGAs Pub Date : 2009-12-09 DOI: 10.1109/ReConFig.2009.14
V. D. Paulo, Cristinel Ababei
{"title":"A Framework for 2.5D NoC Exploration Using Homogeneous Networks over Heterogeneous Floorplans","authors":"V. D. Paulo, Cristinel Ababei","doi":"10.1109/ReConFig.2009.14","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.14","url":null,"abstract":"In this paper, we propose a new 2.5D NoC architecture that uses a homogeneous network on one layer on top of a heterogeneous floorplanning layer. The purpose of this approach is to exploit the benefits of compact heterogeneous floorplans and regular mesh networks through an automated design space exploration procedure. A design methodology consisting of floorplanning and router assignment in a specifically designed tool that integrates a cycle accurate NoC simulator is implemented and used to investigate the new architecture. We use this tool to compute the flit latency and compare it to a conventional 2D implementation. The separation of cores and network on two different layers offers additional area that we use to improve the network performance by searching for the optimal buffers size, number of virtual channels or mesh size. Experimental results are application specific with potential significant performance improvements for some testcases.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122748715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Protecting the NOEKEON Cipher against SCARE Attacks in FPGAs by Using Dynamic Implementations 利用动态实现保护fpga中的NOEKEON密码免受SCARE攻击
2009 International Conference on Reconfigurable Computing and FPGAs Pub Date : 2009-12-09 DOI: 10.1109/ReConFig.2009.19
J. Bringer, H. Chabanne, J. Danger
{"title":"Protecting the NOEKEON Cipher against SCARE Attacks in FPGAs by Using Dynamic Implementations","authors":"J. Bringer, H. Chabanne, J. Danger","doi":"10.1109/ReConFig.2009.19","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.19","url":null,"abstract":"Protecting an implementation against Side Channel Analysis for Reverse Engineering (SCARE) attacks is a great challenge and we address this challenge by presenting a first proof of concept. White-box cryptography has been developed to protect programs against an adversary who has full access to their software implementation. It has also been suggested as a countermeasure against side channel attacks and we examine here these techniques in the wider perspective of SCARE. We consider that the adversary has only access to the cryptographic device through its side channels and his goal is to recover the specifications of the algorithm. In this work, we focus on FPGA (Field-Programmable Gate Array) technologies and examine how to thwart SCARE attacks by implementing a block cipher following white-box techniques. The proposed principle is based on changing dynamically the implementations. It is illustrated by an example on the Noekeon cipher and feasibility in different FPGAs is studied.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114152017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Traversal Cache Framework for FPGA Acceleration of Pointer Data Structures: A Case Study on Barnes-Hut N-body Simulation 一种用于FPGA加速指针数据结构的遍历缓存框架:以Barnes-Hut n体仿真为例
2009 International Conference on Reconfigurable Computing and FPGAs Pub Date : 2009-12-09 DOI: 10.1109/ReConFig.2009.68
J. Coole, J. Wernsing, G. Stitt
{"title":"A Traversal Cache Framework for FPGA Acceleration of Pointer Data Structures: A Case Study on Barnes-Hut N-body Simulation","authors":"J. Coole, J. Wernsing, G. Stitt","doi":"10.1109/ReConFig.2009.68","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.68","url":null,"abstract":"Numerous studies have shown that field-programmable gate arrays (FPGAs) often achieve large speedups compared to microprocessors. However, one significant limitation of FPGAs that has prevented their use on important applications is the requirement for regular memory access patterns. Traversal caches were previously introduced to improve the performance of FPGA implementations of algorithms with irregular memory access patterns, especially those traversing pointer-based data structures. However, a significant limitation of previous traversal caches is that speedup was limited to traversals repeated frequently over time, thus preventing speedup for algorithms without repetition, even if the similarity between traversals was large. This paper presents a new framework that extends traversal caches to enable performance improvements in such cases and provides additional improvements through reduced memory accesses and parallel processing of multiple traversals. Most importantly, we show that, for algorithms with highly similar traversals, the traversal cache framework achieves approximately linear kernel speedup with additional area, thus eliminating the memory bandwidth bottleneck commonly associated with FPGAs. We evaluate the framework using a Barnes-Hut n-body simulation case study, showing application speedups ranging from 12x to 13.5x on a Virtex4 LX100 with projected speedups as high as 40x on today’s largest FPGAs.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130525664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Efficient Technique for the FPGA Implementation of the AES MixColumns Transformation AES混合列变换的高效FPGA实现技术
2009 International Conference on Reconfigurable Computing and FPGAs Pub Date : 2009-12-09 DOI: 10.1109/ReConFig.2009.52
S. Ghaznavi, C. Gebotys, R. Elbaz
{"title":"Efficient Technique for the FPGA Implementation of the AES MixColumns Transformation","authors":"S. Ghaznavi, C. Gebotys, R. Elbaz","doi":"10.1109/ReConFig.2009.52","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.52","url":null,"abstract":"The advanced encryption standard, AES, is commonly used to provide several security services such as data confidentiality or authentication in embedded systems. However designing efficient hardware architectures with small hardware resource usage and short critical path delay is a challenge. In this paper, a new technique for the FPGA implementation of the MixColumns transformation, an important part of AES, is introduced. The proposed MixColumns architecture, targeting 4-input LUTs on an FPGA, uses up to 23% less hardware resources than previous research. Overall, incorporating the proposed technique along with block memories for the SubBytes transformation in the AES encryption reduces usage of hardware resources by up to 10% and 18% in terms of slices and LUTs, respectively. The improvement is obtained by more efficient resource sharing through expansion and rearrangement of the MixColumns equation with respect to the structure of FPGAs. This can be highly advantageous in an FPGA implementation of block cipher modes using AES in many secure embedded systems.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128024951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A Reconfigurable Architecture for Stereo-Assisted Detection of Point-Features for Robot Mapping 机器人测绘中立体辅助点特征检测的可重构体系
2009 International Conference on Reconfigurable Computing and FPGAs Pub Date : 2009-12-09 DOI: 10.1109/RECONFIG.2009.41
J. Kalomiros, J. Lygouras
{"title":"A Reconfigurable Architecture for Stereo-Assisted Detection of Point-Features for Robot Mapping","authors":"J. Kalomiros, J. Lygouras","doi":"10.1109/RECONFIG.2009.41","DOIUrl":"https://doi.org/10.1109/RECONFIG.2009.41","url":null,"abstract":"A hardware-friendly procedure is presented for the extraction of point-features from stereo image pairs for the purpose of real-time robot motion estimation and 3-D environmental mapping. The procedure is implemented in reconfigurable hardware and is developed as a set of custom HDL library components ready for integration in a system-on-a-programmable-chip. The main hardware stages are a stereo accelerator, a left and right image corner detector and a stage performing left-right consistency check. For the stereo-processor stage we have implemented and tested a SAD-based component for local area-matching and a global-matching component based on a Maximum-Likelihood dynamic programming technique. The system includes a Nios II processor for data control and a USB 2.0 interface for host communication. Resource usage and 3D-mapping results are reported for different versions of the reconfigurable system.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115813483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
FPGA Implementations of BCD Multipliers BCD乘法器的FPGA实现
2009 International Conference on Reconfigurable Computing and FPGAs Pub Date : 2009-12-09 DOI: 10.1109/ReConFig.2009.28
G. Sutter, E. Todorovich, G. Bioul, M. Vazquez, J. Deschamps
{"title":"FPGA Implementations of BCD Multipliers","authors":"G. Sutter, E. Todorovich, G. Bioul, M. Vazquez, J. Deschamps","doi":"10.1109/ReConFig.2009.28","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.28","url":null,"abstract":"This paper presents a number of approaches to implement decimal multiplication algorithms on Xilinx FPGA’s. A variety of algorithms for basic one by one digit multiplication are proposed and FPGA implementations are presented. Later on N by one digit and N by M digit multiplications are studied. Time and area results for sequential and combinational implementations show better figures compared with previous published work. Comparisons against binary fully-optimized multipliers emphasize the interest of the proposed design techniques.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129973144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Reconfigurable Hardware Implementation of Arithmetic Modulo Minimal Redundancy Cyclotomic Primes for ECC ECC算法模最小冗余环素数的可重构硬件实现
2009 International Conference on Reconfigurable Computing and FPGAs Pub Date : 2009-12-09 DOI: 10.1109/ReConFig.2009.67
Brian Baldwin, W. Marnane, R. Granger
{"title":"Reconfigurable Hardware Implementation of Arithmetic Modulo Minimal Redundancy Cyclotomic Primes for ECC","authors":"Brian Baldwin, W. Marnane, R. Granger","doi":"10.1109/ReConFig.2009.67","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.67","url":null,"abstract":"The dominant cost in Elliptic Curve Cryptography (ECC) over prime fields is modular multiplication. Minimal Redundancy Cyclotomic Primes (MRCPs) were recently introduced by Granger~ea for use as base field moduli in ECC, since they permit a novel and very efficient modular multiplication algorithm. Here we consider a reconfigurable hardware implementation of arithmetic modulo a $258$-bit example, for use at the $128$-bit AES security level. We examine this implementation for speed and area using parallelisation methods and inbuilt FPGA resources. The results are compared against a current method in use, the Montgomery multiplier.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132965595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信