2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines最新文献

筛选
英文 中文
Binding Hardware IPs to Specific FPGA Device via Inter-twining the PUF Response with the FSM of Sequential Circuits 通过将PUF响应与顺序电路的FSM交织,将硬件ip绑定到特定的FPGA设备
Jiliang Zhang, Yaping Lin, Yongqiang Lyu, R. Cheung, Wenjie Che, Qiang Zhou, Jinian Bian
{"title":"Binding Hardware IPs to Specific FPGA Device via Inter-twining the PUF Response with the FSM of Sequential Circuits","authors":"Jiliang Zhang, Yaping Lin, Yongqiang Lyu, R. Cheung, Wenjie Che, Qiang Zhou, Jinian Bian","doi":"10.1109/FCCM.2013.12","DOIUrl":"https://doi.org/10.1109/FCCM.2013.12","url":null,"abstract":"The continuous growth in both capability and capacity for FPGA now requires significant resources invested in the hardware design, which results in two classes of main security issues: 1) the unauthorized use and piracy attacks including cloning, reverse engineering, tampering etc. 2) the licensing issue. Binding hardware IPs (HW-IPs) to specific FPGA devices can efficiently resolve these problems. However, previous binding techniques are all based on encryption and hence have three main drawbacks: 1) encryption-based proposals in commercial are limited to protect the single large FPGA configuration, 2) many encryption-based proposals depend on a trusted third party to involve the licensing protocol, and 3) the encryption-based binding methods use costly mechanisms such as secure ROM or flash memory to store FPGA specific cryptographic keys, which is not only expensive but also vulnerable to side-channel attacks, and the management and transport of secret keys became a practical issue. In this work, we propose a PUF-FSM binding technique completely different from the traditional encryption-based methods to address these shortcomings.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129202515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Parallel Generation of Gaussian Random Numbers Using the Table-Hadamard Transform 利用Table-Hadamard变换并行生成高斯随机数
David B. Thomas
{"title":"Parallel Generation of Gaussian Random Numbers Using the Table-Hadamard Transform","authors":"David B. Thomas","doi":"10.1109/FCCM.2013.53","DOIUrl":"https://doi.org/10.1109/FCCM.2013.53","url":null,"abstract":"Gaussian Random Number Generators (GRNGs) are an important component in parallel Monte-Carlo simulations using FPGAs, where tens or hundreds of high-quality Gaussian samples must be generated per cycle using very few logic resources. This paper describes the Table-Hadamard generator, which is a GRNG designed to generate multiple streams of random numbers in parallel. It uses discrete table distributions to generate pseudo-Gaussian base samples, then a parallel Hadamard transform to efficiently apply the central limit theorem. When generating 64 output samples the TableHadamard requires just 100 slices per generated sample, a quarter the resources of the next best technique, while providing higher statistical quality.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125311885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Case for Heterogeneous Technology-Mapping: Soft Versus Hard Multiplexers 异构技术映射的案例:软与硬多路复用器
M. Purnaprajna, P. Ienne
{"title":"A Case for Heterogeneous Technology-Mapping: Soft Versus Hard Multiplexers","authors":"M. Purnaprajna, P. Ienne","doi":"10.1109/FCCM.2013.19","DOIUrl":"https://doi.org/10.1109/FCCM.2013.19","url":null,"abstract":"Lookup table-based FPGAs offer flexibility but compromise on performance, as compared to custom CMOS implementations. This paper explores the idea of minimising this performance gap by using fixed, fine-grained, nonprogrammable logic structures in place of lookup tables (LUTs). Functions previously mapped onto LUTs can now be diverted to these structures, resulting in reduced LUT usage and higher operating speed. This paper presents a generic heterogeneous technology-mapping scheme for segregating LUTs and hard logic blocks. For the proof-of-concept, we choose to isolate multiplexers present in most general-purpose circuits. These multiplexers are mapped onto hard blocks of multiplexers that are present in existing commercial FPGA fabrics, but often unused. Since the hard multiplexers are already present, there is no additional performance or area penalty. Using this approach, an average reduction in LUT usage of 16% and an average speedup of 8% has been observed for the VTR benchmarks as compared to the LUTs-only implementation.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123480586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
ShrinkWrap: Compiler-Enabled Optimization and Customization of Soft Memory Interconnects ShrinkWrap:软内存互连的编译器启用优化和定制
Eric S. Chung, Michael Papamichael
{"title":"ShrinkWrap: Compiler-Enabled Optimization and Customization of Soft Memory Interconnects","authors":"Eric S. Chung, Michael Papamichael","doi":"10.1109/FCCM.2013.56","DOIUrl":"https://doi.org/10.1109/FCCM.2013.56","url":null,"abstract":"Today's FPGAs lack dedicated on-chip memory interconnects, requiring users to (1) rely on inefficient, general-purpose solutions, or (2) tediously create an application-specific memory interconnect for each target platform. The CoRAM architecture, which offers a general-purpose abstraction for FPGA memory management, encodes high-level application information that can be exploited to generate customized soft memory interconnects. This paper describes the ShrinkWrap Compiler, which analyzes a CoRAM application for its connectivity and bandwidth requirements, enabling synthesis of highly-tuned area-efficient soft memory interconnects.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129978820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Elementary Function Implementation with Optimized Sub Range Polynomial Evaluation 优化子范围多项式求值的初等函数实现
M. Langhammer, B. Pasca
{"title":"Elementary Function Implementation with Optimized Sub Range Polynomial Evaluation","authors":"M. Langhammer, B. Pasca","doi":"10.1109/FCCM.2013.30","DOIUrl":"https://doi.org/10.1109/FCCM.2013.30","url":null,"abstract":"Efficient elementary function implementations require primitives optimized for modern FPGAs. Fixed-point function generators are one such type of primitives. When built around piecewise polynomial approximations they make use of memory blocks and embedded multipliers, mapping well to contemporary FPGAs. Another type of primitive which can exploit the power series expansions of some elementary functions is floating-point polynomial evaluation. The high costs traditionally associated with floating-point arithmetic made this primitive unattractive for elementary function implementation on FPGAs. In this work we present a novel and efficient way of implementing floating-point polynomial evaluators on a restricted input range. We show on the atan(x) function in double precision that this very different technique reduces memory block count by up to 50% while only slightly increasing DSP count compared to the best implementation built around polynomial approximation fixed-point primitives.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131241193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Soft Coarse-Grained Reconfigurable Array Based High-level Synthesis Methodology: Promoting Design Productivity and Exploring Extreme FPGA Frequency 基于软粗粒度可重构阵列的高级综合方法:提高设计效率和探索极限FPGA频率
Cheng Liu, C. Y. Lin, Hayden Kwok-Hay So
{"title":"A Soft Coarse-Grained Reconfigurable Array Based High-level Synthesis Methodology: Promoting Design Productivity and Exploring Extreme FPGA Frequency","authors":"Cheng Liu, C. Y. Lin, Hayden Kwok-Hay So","doi":"10.1109/FCCM.2013.21","DOIUrl":"https://doi.org/10.1109/FCCM.2013.21","url":null,"abstract":"Compared to the use of a typical software development flow, the productivity of developing FPGA-based compute applications remains much lower. Although the use of high-level synthesis (HLS) tools may partly alleviate this shortcoming, the lengthy low-level FPGA implementation process remains a major obstacle to high productivity computing, limiting the number of compile-debug-edit cycles per day. Furthermore, high-level application developers often lack the intimate hardware engineering experience that is needed to achieve high performance on FPGAs, therefore undermining their usefulness as accelerators. To address the productivity and performance problems, a HLS methodology that utilizes soft coarse-grained reconfigurable arrays (SCGRAs) as an intermediate compilation step is presented. Instead of compiling high-level applications directly to circuits, the compilation process is reduced to an operation scheduling task targeting the SCGRA.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134300630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Boosting Memory Performance of Many-Core FPGA Device through Dynamic Precedence Graph 利用动态优先图提高多核FPGA器件的内存性能
Yunru Bai, Abigail Fuentes-Rivera, Mike Riera, Mohammed Alawad, Mingjie Lin
{"title":"Boosting Memory Performance of Many-Core FPGA Device through Dynamic Precedence Graph","authors":"Yunru Bai, Abigail Fuentes-Rivera, Mike Riera, Mohammed Alawad, Mingjie Lin","doi":"10.1109/FCCM.2013.39","DOIUrl":"https://doi.org/10.1109/FCCM.2013.39","url":null,"abstract":"Emerging FPGA device, integrated with abundant RAM blocks and high-performance processor cores, offers an unprecedented opportunity to effectively implement single-chip distributed logic-memory (DLM) architectures [1]. Being “memory-centric”, the DLM architecture can significantly improve the overall performance and energy efficiency of many memory-intensive embedded applications, especially those that exhibit irregular array data access patterns at algorithmic level. However, implementing DLM architecture poses unique challenges to an FPGA designer in terms of 1) organizing and partitioning diverse on-chip memory resources, and 2) orchestrating effective data transmission between on-chip and off-chip memory. In this paper, we offer our solutions to both of these challenges. Specifically, 1) we propose a stochastic memory partitioning scheme based on the well-known simulated annealing algorithm. It obtains memory partitioning solutions that promote parallelized memory accesses by exploring large solution space; 2) we augment the proposed DLM architecture with a reconfigure hardware graph that can dynamically compute precedence relationship between memory partitions, thus effectively exploiting algorithmic level memory parallelism on a per-application basis. We evaluate the effectiveness of our approach (A3) against two other DLM architecture synthesizing methods: an algorithmic-centric reconfigurable computing architectures with a single monolithic memory (A1) and the heterogeneous distributed architectures synthesized according to [1] (A2). To make our comparison fair, in all three architectures, the data path remains the same while local memory architecture differs. For each of ten benchmark applications from SPEC2006 and MiBench [2], we break down the performance benefit of using A3 into two parts: the portion due to stochastic local memory partitioning and the portion due to the dynamic graph-based memory arbitration. All experiments have been conducted with a Virtex-5 (XCV5LX155T-2) FPGA. On average, our experimental results show that our proposed A3 architecture outperforms A2 and A1 by 34% and 250%, respectively. Within the performance improvement of A3 over A2, more than 70% improvement comes from the hardware graph-based memory scheduling.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115014902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Accelerating the Computation of Induced Dipoles for Molecular Mechanics with Dataflow Engines 用数据流引擎加速分子力学中感应偶极子的计算
F. Pratas, D. Oriato, O. Pell, R. Mata, L. Sousa
{"title":"Accelerating the Computation of Induced Dipoles for Molecular Mechanics with Dataflow Engines","authors":"F. Pratas, D. Oriato, O. Pell, R. Mata, L. Sousa","doi":"10.1109/FCCM.2013.34","DOIUrl":"https://doi.org/10.1109/FCCM.2013.34","url":null,"abstract":"In Molecular Mechanics simulations, the treatment of electrostatics is the most computational intensive task. Modern force fields, such as the AMOEBA, which include explicit polarization effects, are particularly computationally demanding. We propose a static dataflow architecture for accelerating polarizable force fields. Results, obtained with Maxeler's MaxCompiler, show a speed-up factor of about 14x on a Maxeler 1U MaxNode, when compared to a 12-core CPU node while using half of the dataflow engine capacity. Projections for a full chip implementation indicate that speed-up results of up to 29x per node can be reached. Moreover, our implementation on the Maxeler system shows improvements between 2.5x and 4x compared to NVIDIA Fermibased GPUs. The current work shows the potential of dataflow engines in accelerating this field of applications.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127803530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Atacama: An Open FPGA-Based Platform for Mixed-Criticality Communication in Multi-segmented Ethernet Networks Atacama:一个基于fpga的多分段以太网混合临界通信开放平台
G. Carvajal, M. Figueroa, R. Trausmuth, S. Fischmeister
{"title":"Atacama: An Open FPGA-Based Platform for Mixed-Criticality Communication in Multi-segmented Ethernet Networks","authors":"G. Carvajal, M. Figueroa, R. Trausmuth, S. Fischmeister","doi":"10.1109/FCCM.2013.54","DOIUrl":"https://doi.org/10.1109/FCCM.2013.54","url":null,"abstract":"Ethernet is widely recognized as an attractive networking technology for modern distributed real-time systems. However, standard Ethernet components require specific modifications and hardware support to provide strict latency guarantees necessary for safety-critical applications. Although this is a well-stated fact, the design of hardware components for real-time communication remains mostly unexplored. This becomes evident from the few solutions reporting prototypes and experimental validation, which hinders the consolidation of Ethernet in real-world distributed applications. This paper presents Atacama, the first open-source framework based on reconfigurable hardware for mixed-criticality communication in multi-segmented Ethernet networks. Atacama uses specialized modules for time-triggered communication of real-time data, which seamlessly integrate with a standard infrastructure using regular best-effort traffic. Atacama enables low and highly predictable communication latency on multi-segmented 1Gbps networks, easy optimization of devices for specific application scenarios, and rapid prototyping of new protocol characteristics. Researchers can use the open-source design to verify our results and build upon the framework, which aims to accelerate the development, validation, and adoption of Ethernet-based solutions in real-time applications.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121560578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
PRML: A Modeling Language for Rapid Design Exploration of Partially Reconfigurable FPGAs 部分可重构fpga快速设计探索的建模语言
Rohit Kumar, A. Gordon-Ross
{"title":"PRML: A Modeling Language for Rapid Design Exploration of Partially Reconfigurable FPGAs","authors":"Rohit Kumar, A. Gordon-Ross","doi":"10.1109/FCCM.2013.24","DOIUrl":"https://doi.org/10.1109/FCCM.2013.24","url":null,"abstract":"Leveraging partial reconfiguration (PR) can improve system flexibility, cost, and performance/power/area tradeoffs over non-PR functionally-equivalent systems, however, realizing these benefits is challenging, time-consuming, and PR must be considered early during application design to reduce design exploration time and improve system quality. To facilitate realizing these benefits, we present an application design framework and an abstract modeling language for PR (PRML). By applying extensive PRML modeling guidelines to a complex arithmetic core, we show PRML's potential for efficient PR capability analysis, enabling designers to determine Pareto optimal systems during application formulation based on designer-specified area and performance metrics.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117081157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信