FPGA. ACM International Symposium on Field-Programmable Gate Arrays最新文献_第8页

Efficient system-level mapping from streaming applications to FPGAs (abstract only) 从流应用到fpga的高效系统级映射(仅抽象)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435342

J. Cong, Muhuan Huang, Peng Zhang

{"title":"Efficient system-level mapping from streaming applications to FPGAs (abstract only)","authors":"J. Cong, Muhuan Huang, Peng Zhang","doi":"10.1145/2435264.2435342","DOIUrl":"https://doi.org/10.1145/2435264.2435342","url":null,"abstract":"Streaming processing is an important computation model that represents many applications in various domains such as video processing, signal processing and wireless communication. FPGA is a natural platform for streaming applications because the task-level pipelined parallelism can be efficiently implemented on FPGA by its customizable communication and memory architecture. In this paper we propose an efficient design space exploration algorithm to map kernels of streaming applications onto FPGAs. We aim at finding the most area-efficient selections of hardware modules from the implementation library while satisfying the system performance requirement. In particular, we consider both module selection and replication techniques. Design metrics are formulated in our high-level model based on these two techniques. In addition, we extend the analytic formulations in previous work by supporting complex stream graph structures like feedback loops. The proposed iterative exploration algorithm is based on the system of difference constraint (SDC) and thus can be solved in polynomial time. Compared to previous mainstream ILP-based solutions, our proposed algorithm is scalable and practical in large systems. Both the ILP formulation and our proposed iterative exploration mechanism are applied to a set of streaming applications from StreamIt benchmarks and also to one real example MPEG-4 decoder. Experiments demonstrate that our design space exploration algorithm can efficiently find a feasible solution with an average 5.7% area overhead.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"17 1","pages":"277"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87180202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scalable high-throughput architecture for large balanced tree structures on FPGA (abstract only) FPGA上大型平衡树结构的可扩展高吞吐量架构(仅抽象)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435345

Yun Qu, V. Prasanna

{"title":"Scalable high-throughput architecture for large balanced tree structures on FPGA (abstract only)","authors":"Yun Qu, V. Prasanna","doi":"10.1145/2435264.2435345","DOIUrl":"https://doi.org/10.1145/2435264.2435345","url":null,"abstract":"Architectures for tree structures on FPGAs as well as ASICs have been proposed over the years. The exponential growth in the memory size with respect to the tree levels restricts the scalability of these architectures due to limited on-chip memory. For large trees, off-chip memory has to be used. We propose a pipeline architecture on FPGA for large balanced tree structures which achieves both scalability and high throughput. In the proposed architecture, each tree level is mapped onto a single or multiple Processing Elements (PEs) using dual-port distributed RAM, dual-port block RAM and off-chip RAM. We parameterize the pipeline architecture and optimize the performance with respect to scalability and throughput. The resulting architecture for the search tree is dual-threaded and deeply pipelined. It can accept two search requests per clock cycle and operates at a high clock rate of 280MHz. Post place-and-route results show that, by using only 17% of the logic resources and 9% of the BRAM available on a state-of-the-art FPGA, our dual-thread pipelined search tree can perform 560 million search operations per second in a tree containing 512K 64-bit keys.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"29 1","pages":"278"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84477961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Side-channel attacks on the bitstream encryption mechanism of Altera Stratix II: facilitating black-box analysis using software reverse-engineering Altera Stratix II位流加密机制的侧信道攻击:利用软件逆向工程促进黑箱分析

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435282

A. Moradi, David F. Oswald, C. Paar, Pawel Swierczynski

{"title":"Side-channel attacks on the bitstream encryption mechanism of Altera Stratix II: facilitating black-box analysis using software reverse-engineering","authors":"A. Moradi, David F. Oswald, C. Paar, Pawel Swierczynski","doi":"10.1145/2435264.2435282","DOIUrl":"https://doi.org/10.1145/2435264.2435282","url":null,"abstract":"In order to protect FPGA designs against IP theft and related issues such as product cloning, all major FPGA manufacturers offer a mechanism to encrypt the bitstream used to configure the FPGA. From a mathematical point of view, the employed encryption algorithms, e.g., AES or 3DES, are highly secure. However, recently it has been shown that the bitstream encryption feature of several FPGA product lines is susceptible to side-channel attacks that monitor the power consumption of the cryptographic module. In this paper, we present the first successful attack on the bitstream encryption of the Altera Stratix II FPGA. To this end, we reverse-engineered the details of the proprietary and unpublished Stratix II bitstream encryption scheme from the Quartus II software. Using this knowledge, we demonstrate that the full 128-bit AES key of a Stratix II can be recovered by means of side-channel analysis with 30,000 measurements, which can be acquired in less than three hours. The complete bitstream of a Stratix II that is (seemingly) protected by the bitstream encryption feature can hence fall into the hands of a competitor or criminal - possibly implying system-wide damage if confidential information such as proprietary encryption schemes or keys programmed into the FPGA are extracted. In addition to lost IP, reprogramming the attacked FPGA with modified code, for instance, to secretly plant a hardware trojan, is a particularly dangerous scenario for many security-critical applications.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"310 10 1","pages":"91-100"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86547813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

The masala machine: accelerating thread-intensive and explicit memory management programs with dynamically reconfigurable FPGAs (abstract only) masala机器:用动态可重构fpga加速线程密集型和显式内存管理程序(仅抽象)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145741

M. Wen, N. Wu, Qianming Yang, Chunyuan Zhang, Liang Zhao

{"title":"The masala machine: accelerating thread-intensive and explicit memory management programs with dynamically reconfigurable FPGAs (abstract only)","authors":"M. Wen, N. Wu, Qianming Yang, Chunyuan Zhang, Liang Zhao","doi":"10.1145/2145694.2145741","DOIUrl":"https://doi.org/10.1145/2145694.2145741","url":null,"abstract":"A uniform FPGA-based architecture, an efficient programming model and a simple mapping method are paramount for PPGA technology to be more widely accepted. This paper presents MASALA, a dynamically reconfigurable FPGA-based accelerator specifically for parallel programs written in thread-intensive and explicit memory management (TEMM) programming models. The system uses TEMM programming model to parallelize the demanding application, including decomposing the application into separate thread blocks, decoupling compute and data load/store etc. Hardware engines are included into the MASALA by using partial dynamic reconfigure modules, each of which encapsulates Thread Process Engine implementing the thread functionality in hardware. A data dispatching scheme is also included in MASALA to enable the explicit communication among multiple memory hierarchies such as between inter-hardware engines, the host processor and hardware engines. At last, the paper illustrates a Multi-FPGA prototype system of the presented architecture: MASALA-SX. A large synthetic aperture radar (SAR) image formatting experiment shows that the MASALA architecture facilitates the construction of a TEMM program accelerator by providing it with greater performance and less power consumption than current CPU platforms, but without sacrificing programmability, flexibility and scalability.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"31 1","pages":"265"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84549015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FCache: a system for cache coherent processing on FPGAs FCache:一种在fpga上进行缓存相干处理的系统

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145733

Vincent Mirian, P. Chow

引用次数: 12

Saturating the transceiver bandwidth: switch fabric design on FPGAs 饱和收发器带宽:fpga上的交换结构设计

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145706

Zefu Dai, Jianwen Zhu

引用次数: 22

Post-silicon debugging targeting electrical errors with patchable controllers (abstract only) 针对可修补控制器的电气错误的后硅调试(仅抽象)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145759

M. Fujita, Hiroaki Yoshida

{"title":"Post-silicon debugging targeting electrical errors with patchable controllers (abstract only)","authors":"M. Fujita, Hiroaki Yoshida","doi":"10.1145/2145694.2145759","DOIUrl":"https://doi.org/10.1145/2145694.2145759","url":null,"abstract":"Due to continuous increase of design complexity in SoC development, the time required for post-silicon verification and debugging keeps increasing especially for electrical errors and subtle corner case bugs, and it is now understood that some sort of programmability in silicon is essential to reduce the time for post-silicon verification and debugging. Although an easiest way to achieve this is to use FPGA for entire circuits, performance especially in terms of power efficiency compared with pure hardwired logic may be significantly inferior. Here, we discuss partial use of such in-field programmability in control parts of circuits for post-silicon debugging processes for electrical errors and corner case logical bugs. Our method deals with RTL designs in FSMD (Finite State Machine with Datapath) by adding partially in-field programmability, called \"patch logic\", in their control parts. With our patch logic we can dynamically change the behaviors of circuits in such a way to trace state transition sequences as well as values of internal values periodically. Our patch logic can also check if there is any electrical error or not periodically. Assuming that electrical errors occur very infrequently, an error can be detected by comparing the equivalence on the results of duplicated computations. Through experiments we discuss the area, timing, and power overhead due to the patch logic and also show results on electrical error detection with duplicated computations.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"22 1","pages":"271"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81661887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient in-system RTL verification and debugging using FPGAs (abstract only) 使用fpga的高效系统内RTL验证和调试(仅抽象)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145753

P. Saha, C. Haymes, Ralph Bellofatto, B. Brezzo, M. Kapur, S. Asaad

{"title":"Efficient in-system RTL verification and debugging using FPGAs (abstract only)","authors":"P. Saha, C. Haymes, Ralph Bellofatto, B. Brezzo, M. Kapur, S. Asaad","doi":"10.1145/2145694.2145753","DOIUrl":"https://doi.org/10.1145/2145694.2145753","url":null,"abstract":"FPGAs have become indispensible in processor design, bring-up and debug. Traditionally FPGAs have been used in prototyping, allowing end-users to emulate functionality of a specific component of a processor. However, as the complexity of processors grows, another aspect of processor design, RTL verification, has become a prime target for acceleration using FPGAs. Software-only RTL simulation and verification tools are no longer sufficient for many verification tasks as they often incur long execution time penalties. Software simulation time for a basic Linux kernel bring-up on a BlueGene/Q [1] processor, with 16 user PowerPC A2 cores, for example, could easily exceed several years.\u0000 An important feature of RTL verification acceleration using FPGAs is its fast debugging capabilities. The ability to quickly and accurately pinpoint the location of an anomaly in an RTL source is highly desirable. This paper proposes efficient in-system debugging techniques on FPGAs for RTL verification. We show how a network of over 45 Virtex 5 LX330 FPGAs can be efficiently used to read out state information of the BlueGene/Q processor. We also demonstrate how the new in-system debugging technique is 250x faster than comparable methods.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"75 1","pages":"269"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86411538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A lean FPGA soft processor built using a DSP block 使用DSP块构建的精益FPGA软处理器

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145734

Hui Yan Cheah, Suhaib A. Fahmy, D. Maskell, C. Kulkarni

引用次数: 19

A scalable approach for automated precision analysis 一种可扩展的自动化精密分析方法

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI: 10.1145/2145694.2145726

D. Boland, G. Constantinides

{"title":"A scalable approach for automated precision analysis","authors":"D. Boland, G. Constantinides","doi":"10.1145/2145694.2145726","DOIUrl":"https://doi.org/10.1145/2145694.2145726","url":null,"abstract":"The freedom over the choice of numerical precision is one of the key factors that can only be exploited throughout the datapath of an FPGA accelerator, providing the ability to trade the accuracy of the final computational result with the silicon area, power, operating frequency, and latency. However, in order to tune the precision used throughout hardware accelerators automatically, a tool is required to verify that the hardware will meet an error or range specification for a given precision. Existing tools to perform this task typically suffer either from a lack of tightness of bounds or require a large execution time when applied to large scale algorithms; in this work, we propose an approach that can both scale to larger examples and obtain tighter bounds, within a smaller execution time, than the existing methods. The approach we describe also provides a user with the ability to trade the quality of bounds with execution time of the procedure, making it suitable within a word-length optimization framework for both small and large-scale algorithms.\u0000 We demonstrate the use of our approach on instances of iterative algorithms to solve a system of linear equations. We show that because our approach can track how the relative error decreases with increasing precision, unlike the existing methods, we can use it to create smaller hardware with guaranteed numerical properties. This results in a saving of 25% of the area in comparison to optimizing the precision using competing analytical techniques, whilst requiring a smaller execution time than the these methods, and saving almost 80% of area in comparison to adopting IEEE double precision arithmetic.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"67 1","pages":"185-194"},"PeriodicalIF":0.0,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85248556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18