2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines最新文献_第4页

Bus-based MPSoC Security through Communication Protection: A Latency-efficient Alternative 通过通信保护的基于总线的MPSoC安全性:一种延迟高效的替代方案

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/fccm.2012.42

Pascal Cotret, J. Crenne, G. Gogniat, J. Diguet

{"title":"Bus-based MPSoC Security through Communication Protection: A Latency-efficient Alternative","authors":"Pascal Cotret, J. Crenne, G. Gogniat, J. Diguet","doi":"10.1109/fccm.2012.42","DOIUrl":"https://doi.org/10.1109/fccm.2012.42","url":null,"abstract":"Security in MPSoC is gaining an increasing attention since several years. Digital convergence is one of the numerous reasons explaining such a focus on embedded systems as much sensitive and secret data are now stored, manipulated and exchanged in these systems. Most solutions are currently built at the software level, we believe hardware enhancements also play a major role in system protection. One strategic point is the communication layer as all data goes through it. Monitoring and controlling communications enable to fend off attacks before system corruption. In this work, we propose an efficient solution with several hardware enhancements to secure data exchanges in a bus-based MPSoC. Our approach relies on low complexity distributed firewalls connected to all critical IPs of the system. Designers can deploy different security policies (access right, data format, authentication, confidentiality) in order to protect the system in a flexible way. To illustrate the benefit of such a solution, implementations are discussed for different MPSoCs implemented on Xilinx Virtex-6 FPGAs. Results demonstrate a reduction up to 33% in terms of latency overhead compared to existing efforts.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126805374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

EBRAM - Extending the BlockRAMs in FPGAs to Support Caches and Hash Tables in an Efficient Manner EBRAM -在fpga中扩展blockram以有效地支持缓存和哈希表

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.52

A. Ehliar

引用次数: 0

Accelerating Millions of Short Reads Mapping on a Heterogeneous Architecture with FPGA Accelerator 利用FPGA加速器加速异构架构下的百万短读映射

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.39

Wen Tang, Wendi Wang, Bo Duan, Chunming Zhang, Guangming Tan, Peiheng Zhang, Ninghui Sun

引用次数: 49

FLEXDET: Flexible, Efficient Multi-Mode MIMO Detection Using Reconfigurable ASIP FLEXDET:灵活，高效的多模MIMO检测使用可重构的ASIP

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.22

Xiaolin Chen, Andreas Minwegen, Yahia Hassan, D. Kammler, Shuai Li, T. Kempf, A. Chattopadhyay, G. Ascheid

{"title":"FLEXDET: Flexible, Efficient Multi-Mode MIMO Detection Using Reconfigurable ASIP","authors":"Xiaolin Chen, Andreas Minwegen, Yahia Hassan, D. Kammler, Shuai Li, T. Kempf, A. Chattopadhyay, G. Ascheid","doi":"10.1109/FCCM.2012.22","DOIUrl":"https://doi.org/10.1109/FCCM.2012.22","url":null,"abstract":"This paper describes the implementation of a multi-mode MIMO detector based on the concept of partially reconfigurable ASIP (rASIP). The multi-mode detector can support three different detection algorithms which are the Maximum Ratio Combining, the linear Minimum Mean Square Error (MMSE) detection, and the MMSE Successive Interference Cancellation. The detection algorithms also support different antenna configurations and modulation schemes. The rASIP is based on a Coarse-Grained Reconfigurable Architecture (CGRA), which is designed for efficient architectural support of matrix operations. A matrix inversion algorithm, which is used for the preprocessing of different detection algorithms, is mapped on the CGRA. By integrating a processor with the CGRA, the variations in the control path of different algorithm configurations can be handled efficiently. To the best of our knowledge, we show, for the first time that, a CGRA-based multi-mode MIMO detection is extremely efficient and matches the performance of dedicated ASIC implementation.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123854042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

An Extensible and Portable Tool Suite for Managing Multi-Node FPGA Systems 用于管理多节点FPGA系统的可扩展和便携式工具套件

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.29

Y. Rajasekhar, Rahul R. Sharma, R. Sass

引用次数: 1

On-the-fly Composition of FPGA-Based SQL Query Accelerators Using a Partially Reconfigurable Module Library 使用部分可重构模块库的基于fpga的SQL查询加速器的动态组合

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.18

C. Dennl, Daniel Ziener, J. Teich

{"title":"On-the-fly Composition of FPGA-Based SQL Query Accelerators Using a Partially Reconfigurable Module Library","authors":"C. Dennl, Daniel Ziener, J. Teich","doi":"10.1109/FCCM.2012.18","DOIUrl":"https://doi.org/10.1109/FCCM.2012.18","url":null,"abstract":"In this paper, we introduce a novel FPGA-based methodology for accelerating SQL queries using dynamic partial reconfiguration. Query acceleration is of utmost importance in large database systems to achieve a very high throughput. Although common FPGA-based accelerators are suitable to achieve such a high throughput, their design is hard to extend for new operations. Using partial dynamic reconfiguration, we are able to build more flexible architectures which can be extended to new operations or SQL constructs with a very low area overhead on the FPGA. Furthermore, the reconfiguration of a few FPGA frames can be used to switch very fast from one query to the next. In our approach, an SQL query is transformed into a hardware pipeline consisting of partially reconfigurable modules. The assembly of the (FPGA) data path is done at run-time using a static system providing the stream-based communication interfaces to the partial modules and the database management system. More specifically, each incoming SQL query is analyzed and divided into single operations which are subsequently mapped onto library modules and the composed data path loaded on the FPGA. We show that our approach is able to achieve a substantially higher throughput compared to a software-only solution.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123229849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 81

Implementing Murf: Accelerating Large State Space Exploration on FPGAs 实现Murf:在fpga上加速大状态空间探索

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.53

Ma Tie, M. Leeser

{"title":"Implementing Murf: Accelerating Large State Space Exploration on FPGAs","authors":"Ma Tie, M. Leeser","doi":"10.1109/FCCM.2012.53","DOIUrl":"https://doi.org/10.1109/FCCM.2012.53","url":null,"abstract":"PHAST, a Pipelined Hardware Accelerated STate Checker, achieves a 30x end-to-end speedup of a large state space exploration application in the form of an explicit state model checker. PHAST is a re-implementation, to accommodate FPGA hardware, of the Murphi verifier developed at Stanford University. Explicit state model checking explores a large state space and checks properties defined by the user. The FPGA infrastructure for PHAST can be reused for many different models and properties. Our model of the DASH protocol is similar in size and complexity to models Intel uses to validate proposed features of future processors: state sizes between 1200 and 1800 bits and a transition relation with more than 100 rules. Analysis of the DASH model as verified by PHAST indicates that the speedup will stay constant independent of the model being explored. The current implementation of PHAST, implemented on an Alpha-Data board with a Xilinx Virtex 5 and 1 GB of SDRAM, has the ability to explore up to 300,000 states in the DASH model. This model, with close to one hundred thousand states and 220 rules, takes up less than forty percent on the Virtex chip and less than thirty percent of the block RAMs. PHAST takes advantage of the flexible memory architecture and inherent concurrency provided by an FPGA to explore large state spaces. With access to more memory, PHAST could explore a much larger state space. This paper focuses on the generic structure developed for a hardware implementation of model checking as an example of accelerating large state space exploration. The main contributions in this paper lie in the hardware implementation specifics, including pipelining state generation to generate a new state every cycle and check that invariants, or safety properties, hold for all states; as well as efficiently implementing hash compaction and hash table lookups with a CAM for duplication detection and collision handling. The current implementation of PHAST uses a CAM to improve the generated number of states by over 20,000. Large state space exploration is an application area particularly well suited to FPGA acceleration. State space exploration applications developed on GPUs have good results or small states, but none have managed to exhibit both characteristics.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127994551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems 缓存结构和接口对fpga处理器/并行加速器系统性能和面积的影响

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.13

Jongsok Choi, Kevin Nam, Andrew Canis, J. Anderson, S. Brown, Tomasz S. Czajkowski

{"title":"Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems","authors":"Jongsok Choi, Kevin Nam, Andrew Canis, J. Anderson, S. Brown, Tomasz S. Czajkowski","doi":"10.1109/FCCM.2012.13","DOIUrl":"https://doi.org/10.1109/FCCM.2012.13","url":null,"abstract":"We describe new multi-ported cache designs suitable for use in FPGA-based processor/parallel-accelerator systems, and evaluate their impact on application performance and area. The baseline system comprises a MIPS soft processor and custom hardware accelerators with a shared memory architecture: on-FPGA L1 cache backed by off-chip DDR2 SDRAM. Within this general system model, we evaluate traditional cache design parameters (cache size, line size, associativity). In the parallel accelerator context, we examine the impact of the cache design and its interface. Specifically, we look at how the number of cache ports affects performance when multiple hardware accelerators operate (and access memory) in parallel, and evaluate two different hardware implementations of multi-ported caches using: 1) multi-pumping, and 2) a recently-published approach based on the concept of a live-value table. Results show that application performance depends strongly on the cache interface and architecture: for a system with 6 accelerators, depending on the cache design, speed up swings from 0.73× to 6.14×, on average, relative to a baseline sequential system (with a single accelerator and a direct-mapped, 2KB cache with 32B lines). Considering both performance and area, the best architecture is found to be a 4-port multi-pump direct-mapped cache with a 16KB cache size and a 128B line size.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128103232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

Online Measurement of Timing in Circuits: For Health Monitoring and Dynamic Voltage & Frequency Scaling 电路定时的在线测量:用于健康监测和动态电压频率缩放

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.27

Joshua M. Levine, Edward A. Stott, G. Constantinides, P. Cheung

{"title":"Online Measurement of Timing in Circuits: For Health Monitoring and Dynamic Voltage & Frequency Scaling","authors":"Joshua M. Levine, Edward A. Stott, G. Constantinides, P. Cheung","doi":"10.1109/FCCM.2012.27","DOIUrl":"https://doi.org/10.1109/FCCM.2012.27","url":null,"abstract":"Reliability, power consumption and timing performance are key considerations for the utilisation of field-programmable gate arrays. Online measurement techniques can determine the timing characteristics of an FPGA application while it is operating, and facilitate a range of benefits. Degradation can be monitored by tracking changes in timing performance, while power consumption can be reduced through dynamic voltage scaling (DVS) of the power supply to exploit any spare timing headroom. If higher performance is the objective, dynamic frequency scaling (DFS) can be used to maximise operating frequency. In both cases, online timing measurement of the application circuit is used to exploit favourable operating conditions. This work demonstrates a method of online measurement, achieved by sweeping the phase of a secondary clock signal, driving additional shadowing registers strategically added to the application design. The measurement technique and initial voltage and frequency scaling experiments are demonstrated on an Alter a Cyclone III FPGA. Timing performance can be measured with a best case resolution of 96ps. The additional circuitry results in minimal overhead in terms of area and performance. Power savings of 23% dynamic and 13% static in an example circuit are achieved through DVS, or performance improvements of 21% through DFS, when compared with operating at nominal core voltage, or timing model FMax.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129666473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Multi-Resolution Real-Time Dense Stereo Vision Processing in FPGA 基于FPGA的多分辨率实时密集立体视觉处理

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.15

Eduardo Gudis, G. V. D. Wal, S. Kuthirummal, S. Chai

引用次数: 10