2014 International Conference on Field-Programmable Technology (FPT)最新文献_第7页

Integrating FPGA-based processing elements into a runtime for parallel heterogeneous computing 将基于fpga的处理元素集成到并行异构计算的运行时中

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082807

David de la Chevallerie, Jens Korinth, A. Koch

引用次数: 4

Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS 比较TILT叠加处理器与OpenCL HLS的性能、生产率和可扩展性

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082748

Rafat Rashid, J. Steffan, Vaughn Betz

{"title":"Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS","authors":"Rafat Rashid, J. Steffan, Vaughn Betz","doi":"10.1109/FPT.2014.7082748","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082748","url":null,"abstract":"High-Level-Synthesis (HLS) tools translate a software description of an application into custom FPGA logic, increasing designer productivity vs. Hardware Description Language (HDL) design flows. Overlays seek to further improve productivity by reducing application compile times and raising abstraction by enabling the designer to target a software-programmable substrate instead of the underlying FPGA. We compare the performance, development effort and scalability of two C-to-FPGA approaches: our TILT overlay processor and Altera's OpenCL HLS. Our application-customized TILT implementations of five data-parallel benchmarks have from 41 % to 80% of the throughput per unit of layout area achieved by our best OpenCL HLS designs. The time required for initial hardware compilation of these TILT designs and configuration of the target application onto the overlay is roughly comparable to the compile times of the OpenCL HLS designs: 28 and 103 minutes on average respectively. However subsequent reconfigurations due to changes in the application that do not require re-synthesis of the overlay are fast, taking 38 seconds on average. In contrast, OpenCL HLS applications require full recompilation after every code change. TILT also enables smaller, more area-efficient designs than OpenCL HLS when low to moderate throughput is sufficient. For high throughput, the larger spatially pipelined designs of OpenCL HLS are preferable.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"50 1","pages":"20-27"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91386502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

Hardware Trojan detection acceleration based on word-level statistical properties management 基于字级统计属性管理的硬件木马检测加速

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082769

He Li, Qiang Liu

{"title":"Hardware Trojan detection acceleration based on word-level statistical properties management","authors":"He Li, Qiang Liu","doi":"10.1109/FPT.2014.7082769","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082769","url":null,"abstract":"Hardware Trojan insertion has raised serious concerns to semiconductor industry and government agencies. Hardware Trojan is usually activated under rare conditions associated with low transition bits in a circuit. The damage includes circuit functional failure or important information leakage. Previous research on hardware Trojan detection is mainly based on side-channel analysis and Trojan activation. Long activation time is a major concern during the detection process. In this paper, we propose a novel approach for efficiently accelerating Trojan activation by increasing the transition activity of rare bits. In particular, the proposed approach increases the bit-level transition activity by controlling signal word-level statistical properties, such as changing the variance and autocorrelation of the signal. In addition, by analyzing the signal propagation statistical properties through various digital signal processing (DSP) operators such as adders and multipliers, the proposed approach can control the statistical properties of internal signals and then enhance the internal bit transition activity from the primary input of the circuit. The proposed approach is evaluated on several circuits. The results show that the transition activity of rare bits can be dramatically increased by up to 166.7 times and Trojan activation time can be reduced by up to 121 times.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"96 1","pages":"153-160"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77623290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Efficient FPGA implementation of digit parallel online arithmetic operators 数字并行在线算术运算符的高效FPGA实现

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082763

Kan Shi, D. Boland, G. Constantinides

{"title":"Efficient FPGA implementation of digit parallel online arithmetic operators","authors":"Kan Shi, D. Boland, G. Constantinides","doi":"10.1109/FPT.2014.7082763","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082763","url":null,"abstract":"Online arithmetic has been widely studied for ASIC implementation. Online components were originally designed to perform computations in digit serial with most significant digit (MSD) first, resulting in the ability to chain arithmetic operators together for low latency. More recently, research has shown that digit parallel online operators can fail more gracefully when operating beyond the deterministic clocking region in comparison to operators with conventional arithmetic. Unfortunately, the utilization of online arithmetic operators in the past has required a large area overhead for FPGA implementation. In this paper, we propose novel approaches to implement the key primitives of online arithmetic, adders and multipliers, efficiently on modern Xilinx FPGAs with 6-input LUTs and carry resources. We demonstrate experimentally that in comparison to a direct RTL synthesis, the proposed architectures achieve slice savings of over 67% and 69%, and speed-ups of over 1.2x and 1.5x for adders and multipliers, respectively. As a result, the area overheads of using online adders and multipliers in place of traditional arithmetic primitives is reduced from 8.41 x and 8.11 x to 1.88x and 1.84x respectively. Finally, because an online multiplier generates MSDs first, we also demonstrate the method to create an online multiplier with a reduced precision output that is smaller than a traditional multiplier producing the same result. We show that this can lead to silicon area savings of up to 56%.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"10 1","pages":"115-122"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79937125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Hardware architecture of bi-cubic convolution interpolation for real-time image scaling 实时图像缩放的双三次卷积插值硬件结构

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082790

Gopinath Mahale, H. Mahale, Rajesh Babu Parimi, S. Nandy, S. Bhattacharya

引用次数: 5

A novel three-dimensional FPGA architecture with high-speed serial communication links 一种具有高速串行通信链路的三维FPGA结构

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082805

T. Kajiwara, Qian Zhao, M. Amagasaki, M. Iida, Morituro Kuga, T. Sueyoshi

{"title":"A novel three-dimensional FPGA architecture with high-speed serial communication links","authors":"T. Kajiwara, Qian Zhao, M. Amagasaki, M. Iida, Morituro Kuga, T. Sueyoshi","doi":"10.1109/FPT.2014.7082805","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082805","url":null,"abstract":"Three-dimensional (3D) integrated circuit technology is expected to offer continual improvement to very-large-scale integration performance as the process of miniaturization approaches physical limits. However, because the through-silicon vias (TSVs) that are used to create interlayer vertical connections are much larger area than transistors, there is an inherent tradeoff between connectivity and small size. Field-programmable gate arrays (FPGAs) are particularly noted for requiring a high level of routing resources, which means that it is unrealistic to make the same number of connections vertically as horizontally. In previous research, we proposed a method for creating a two-layer compact 3D FPGA with face-down integration (the base FPGA). In this paper, we discuss stacking multiple base FPGAs by the face-up method and propose a method for achieving highspeed interlayer communications with TSV serial connections. The proposed architecture improves FPGA performance by using smaller TSVs. The evaluation results show that the proposed 3D FPGA can achieve a total area that is as low as 67% the equivalent two-dimensional FPGA.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"97 1","pages":"306-309"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88782408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A high-performance low-power near-Vt RRAM-based FPGA 一种高性能低功耗近vt随机存储器FPGA

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082777

Xifan Tang, P. Gaillardon, G. Micheli

{"title":"A high-performance low-power near-Vt RRAM-based FPGA","authors":"Xifan Tang, P. Gaillardon, G. Micheli","doi":"10.1109/FPT.2014.7082777","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082777","url":null,"abstract":"The routing architecture, heavily using programmable switches, dominates the area, delay and power of Field Programmable Gate Arrays (FPGAs). Resistive Random Access Memories (RRAMs) enable high-performance routing architectures through the replacement of Static Random Access Memory (SRAM)-based programming switches. Exploiting the very low on-resistance state achievable by RRAMs, RRAM-based routing multiplexers can be used to significantly reduce the FPGA routing delays. In addition, RRAM-based routing architectures are less sensitive to supply voltage reductions and show promises in low-power FPGA designs. In this paper, we propose a near-Vt low-power RRAM-based FPGA where both delay and power reductions are achieved. Experimental results demonstrate that a near-Vi RRAM-based FPGA design leads to a 15% area shrink, a 10% delay reduction, and a 65% power improvement, compared to a conventional FPGA design for a given technology node. To achieve low on-resistance values, RRAMs typically require high programming currents. In other word, they need relatively large programming transistors, potentially resulting in area, delay and power inefficiencies. We also present a design methodology to properly size the programming transistors of RRAMs in order to further improve the area-efficiency. Experimental results show that a correct programming transistor sizing strategy contributes to further 18% area and 2% delay shrink, compared to the initial near-Vi RRAM-based FPGA.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"48 1","pages":"207-214"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82217499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Real-time 3D reconstruction for FPGAs: A case study for evaluating the performance, area, and programmability trade-offs of the Altera OpenCL SDK fpga的实时3D重建:评估Altera OpenCL SDK的性能，面积和可编程性权衡的案例研究

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082810

Q. Gautier, A. Shearer, J. Matai, D. Richmond, Pingfan Meng, R. Kastner

引用次数: 15

Deep and narrow binary content-addressable memories using FPGA-based BRAMs 基于fpga的bram的深度和窄二进制内容可寻址存储器

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082808

Ameer Abdelhadi, G. Lemieux

引用次数: 9

Memory security in reconfigurable computers: Combining formal verification with monitoring 可重构计算机中的存储器安全:将形式验证与监控相结合

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082771

T. Wiersema, Stephanie Drzevitzky, M. Platzner

{"title":"Memory security in reconfigurable computers: Combining formal verification with monitoring","authors":"T. Wiersema, Stephanie Drzevitzky, M. Platzner","doi":"10.1109/FPT.2014.7082771","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082771","url":null,"abstract":"Ensuring memory access security is a challenge for reconfigurable systems with multiple cores. Previous work introduced access monitors attached to the memory subsystem to ensure that the cores adhere to pre-defined protocols when accessing memory. In this paper, we combine access monitors with a formal runtime verification technique known as proof-carrying hardware to guarantee memory security. We extend previous work on proof-carrying hardware by covering sequential circuits and demonstrate our approach with a prototype leveraging ReconOS/Zynq with an embedded ZUMA virtual FPGA overlay. Experiments show the feasibility of the approach and the capabilities of the prototype, which constitutes the first realization of proof-carrying hardware on real FPGAs. The area overheads for the virtual FPGA are measured as 2x-10x, depending on the resource type. The delay overhead is substantial with almost 100x, but this is an extremely pessimistic estimate that will be lowered once accurate timing analysis for FPGA overlays become available. Finally, reconfiguration time for the virtual FPGA is about one order of magnitude lower than for the native Zynq fabric.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"21 1","pages":"167-174"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75816079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13