Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献_第2页

Measuring the Power-Constrained Performance and Energy Gap between FPGAs and Processors (Abstract Only) 测量fpga和处理器之间的功耗约束性能和能量差距(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021756

A. Ye, K. Ganesan

{"title":"Measuring the Power-Constrained Performance and Energy Gap between FPGAs and Processors (Abstract Only)","authors":"A. Ye, K. Ganesan","doi":"10.1145/3020078.3021756","DOIUrl":"https://doi.org/10.1145/3020078.3021756","url":null,"abstract":"This work measures the performance and power consumption gap between the current generation of low power FPGAs and low power microprocessors (microcontrollers) through an implementation of the Canny edge detection algorithm. In particular, the algorithm is implemented on Altera MAX 10 FPGAs and its performance and power consumption are then compared to the same algorithm implemented on the STMicroelectronics' implementation of the ARM M-series microcontrollers. We found an extremely high, four- to five-orders of magnitude, performance advantage of the FPGAs over the microcontrollers, which is much greater than any previously reported values in FPGAs vs. processors studies. Furthermore, this speedup only comes at a cost of 1.2x to 15x higher power consumption, which gives FPGAs a significant advantage in energy efficiency. We also observe, however, the current generation of low power FPGAs have significantly higher static power consumption than the microcontrollers. In particular, the low power FPGAs consume more static power than the total power consumption of the lowest power consuming microcontrollers, rendering the FPGAs inoperable under the power budgets of these processors. Furthermore, this high static power consumption exists despite the fact that the FPGAs are implemented on a low leakage 55nm process with dual supply voltages while the microcontrollers are implemented on a conventional, single supply voltage, 90nm process. Consequently, our results indicate that it is particular important for future research to address the static power consumption of low power FPGAs while maintaining logic capacity so the performance and energy efficiency advantages of the FPGAs can be fully utilized in the extremely low power application domain that are driven by batteries with very small form factors and emerging small scale energy harvesting technologies.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"276 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124212785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic Generation of Hardware Sandboxes for Trojan Mitigation in Systems on Chip (Abstract Only) 基于芯片系统的防木马硬件沙箱的自动生成(摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021774

C. Bobda, Taylor J. L. Whitaker, C. Kamhoua, K. Kwiat, L. Njilla

{"title":"Automatic Generation of Hardware Sandboxes for Trojan Mitigation in Systems on Chip (Abstract Only)","authors":"C. Bobda, Taylor J. L. Whitaker, C. Kamhoua, K. Kwiat, L. Njilla","doi":"10.1145/3020078.3021774","DOIUrl":"https://doi.org/10.1145/3020078.3021774","url":null,"abstract":"Component based design is one of the preferred methods to tackle system complexity, and reduce costs and time-to-market. Major parts of the system design and IC production are outsourced to facilities distributed across the globe, thus opening the door for malicious Trojan insertion. Hardware Sandboxing was introduce as a means to overcome the shortcomings of traditional static Trojan mitigation methods, which use intense simulation, verification, and physical tests to detect the evidence of malicious components before system deployment. The number of test patterns needed to activate with certainty potential hidden Trojans is very large for complex IPs and SoCs with dozens of inputs, outputs, states, and memory blocks, thus limiting the effectiveness of static testing methods. The rationale is to spend less effort testing pre-deployment. Instead, guards should be built around non-trusted components to catch malicious activities and prevent potential damage. While feasibility of hardware sandboxes has been proven with case studies and real-world applications, manual design was used and no systematic method was devised to automate the design process of system-on-chips that incorporate hardware sandboxes to provide high-level of security in embedded systems. In this work, we propose a method for automatic generation of hardware sandboxes in system-on-chips. Using the interface formalism of De Alfaro and Hetzinger to capture the interactions among components, along with the properties specification language to define non-authorized actions, sandboxes are generated and made ready for inclusion in a system-on-chip design. We leverage the concepts of composition, compatibility, and refinement to optimize resources across the boundary of single component and provide minimal resource consumption. With results on benchmarks implemented in FPGA, we prove that our approach can provide high-level of security, with less resource and no increase in delay.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132926461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Vivado-HLS for Structural Design: a NoC Case Study (Abstract Only) 使用Vivado-HLS进行结构设计:NoC案例研究(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021772

Zhipeng Zhao, J. Hoe

{"title":"Using Vivado-HLS for Structural Design: a NoC Case Study (Abstract Only)","authors":"Zhipeng Zhao, J. Hoe","doi":"10.1145/3020078.3021772","DOIUrl":"https://doi.org/10.1145/3020078.3021772","url":null,"abstract":"There have been ample successful examples of applying Xilinx Vivado's \"function-to-module\" high-level synthesis (HLS) where the subject is algorithmic in nature. In this work, we carried out a design study to assess the effectiveness of applying Vivado-HLS in structural design. We employed Vivado-HLS to synthesize C functions corresponding to standalone network-on-chip (NoC) routers as well as complete multi-endpoint NoCs. Interestingly, we find that describing a complete NoC comprising router submodules faces fundamental difficulties not present in describing the routers as standalone modules. Ultimately, we succeeded in using Vivado-HLS to produce router and NoC modules that are exact cycle- and bit-accurate replacements of our reference RTL-based router and NoC modules. Furthermore, the routers and NoCs resulting from HLS and RTL are comparable in resource utilization and critical path delay. Our experience subjectively suggests that HLS is able to simplify the design effort even though much of the structural details had to be provided in the HLS description through a combination of coding discipline and explicit pragmas. The C++ source code and a more extensive description of this work can be found at http://www.ece.cmu.edu/calcm/connect_hls.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133555116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Secure Function Evaluation Using an FPGA Overlay Architecture 基于FPGA覆盖架构的安全功能评估

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021746

Xin Fang, Stratis Ioannidis, M. Leeser

引用次数: 22

Cache Timing Attacks from The SoCFPGA Coherency Port (Abstract Only) 来自SoCFPGA一致性端口的缓存定时攻击(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021802

S. Chaudhuri

{"title":"Cache Timing Attacks from The SoCFPGA Coherency Port (Abstract Only)","authors":"S. Chaudhuri","doi":"10.1145/3020078.3021802","DOIUrl":"https://doi.org/10.1145/3020078.3021802","url":null,"abstract":"In this presentation we show that side-channels arising from micro-architecture of SoCFPGAs could be a security risk. We present a FPGA trojan based on OpenCL which performs cache-timing attacks through the accelerator coherency port (ACP) of a SoCFPGA. Its primary goal is to derive physical addresses used by the Linux kernel on ARM Hard Processor System. With this information the trojan can then surgically change memory locations to gain privileges as in a rootkit. We present the customisation to the Altera OpenCL platform, and the OpenCL code to implement the trojan. We show that it is possible to accurately predict physical addresses and the page table entries corresponding to an arbitrary location in the heap after sufficient (~300) iterations, and by using a differential ranking. The attack can be refined by the known page table structure of the Linux kernel, to accurately determine the target physical address, and its corresponding page table entry. Malicious code can then be injected from FPGA, by redirecting page table entries. Since Linux kernel version 4.0-rc5 physical addresses are obfuscated from the normal user to prevent Rowhammer attacks. With information from ACP side-channel the above measure can be bypassed.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121228973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Automatic Construction of Program-Optimized FPGA Memory Networks 程序优化FPGA存储网络的自动构建

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021748

Hsin-Jung Yang, Kermin Fleming, F. Winterstein, Annie I. Chen, Michael Adler, J. Emer

引用次数: 5

A Parallel Bandit-Based Approach for Autotuning FPGA Compilation 一种基于并行强盗的FPGA自动调谐方法

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021747

Chang Xu, Gai Liu, Ritchie Zhao, Stephen Yang, Guojie Luo, Zhiru Zhang

{"title":"A Parallel Bandit-Based Approach for Autotuning FPGA Compilation","authors":"Chang Xu, Gai Liu, Ritchie Zhao, Stephen Yang, Guojie Luo, Zhiru Zhang","doi":"10.1145/3020078.3021747","DOIUrl":"https://doi.org/10.1145/3020078.3021747","url":null,"abstract":"Mainstream FPGA CAD tools provide an extensive collection of optimization options that have a significant impact on the quality of the final design. These options together create an enormous and complex design space that cannot effectively be explored by human effort alone. Instead, we propose to search this parameter space using autotuning, which is a popular approach in the compiler optimization domain. Specifically, we study the effectiveness of applying the multi-armed bandit (MAB) technique to automatically tune the options for a complete FPGA compilation flow from RTL to bitstream, including RTL/logic synthesis, technology mapping, placement, and routing. To mitigate the high runtime cost incurred by the complex FPGA implementation process, we devise an efficient parallelization scheme that enables multiple MAB-based autotuners to explore the design space simultaneously. In particular, we propose a dynamic solution space partitioning and resource allocation technique that intelligently allocates computing resources to promising search regions based on the runtime information of search quality from previous iterations. Experiments on academic and commercial FPGA CAD tools demonstrate promising improvements in quality and convergence rate across a variety of real-life designs.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131040699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Packet Matching on FPGAs Using HMC Memory: Towards One Million Rules 基于HMC内存的fpga数据包匹配:迈向一百万规则

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021752

Daniel Rozhko, Geoffrey Elliott, D. Ly-Ma, P. Chow, H. Jacobsen

{"title":"Packet Matching on FPGAs Using HMC Memory: Towards One Million Rules","authors":"Daniel Rozhko, Geoffrey Elliott, D. Ly-Ma, P. Chow, H. Jacobsen","doi":"10.1145/3020078.3021752","DOIUrl":"https://doi.org/10.1145/3020078.3021752","url":null,"abstract":"Packet processing systems increasingly need larger rulesets to satisfy the needs of deep-network intrusion prevention and cluster computing. FPGA-based implementations of packet processing systems have been proposed but their use of on-chip memory limits the number of rules these existing systems can maintain. Off-chip memories have traditionally been too slow to enable meaningful processing rates, but in this work we present a packet processing system that utilizes the much faster Hybrid Memory Cube (HMC) technology, enabling larger rulesets at usable line-rates. The proposed architecture streams rules from the HMC memory to a packet matching engine, using prefetching to hide the HMC access latency. The packet matching engine is replicated to process multiple packets in parallel. The final system, implemented on a Xilinx Kintex Ultrascale 060, processes 160 packets in parallel, achieving a 10~Gbps line-rate with approximately 1500 rules and a 16~Mbps line-rate with 1M rules. To the best of our knowledge, this is the first hardware solution capable of maintaining rulesets of this size. We present this work as an exploration of the application of HMCs to packet processing and as a first step in achieving a processing capability of a million rules at usable line-rates.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125713999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

NAND-NOR: A Compact, Fast, and Delay Balanced FPGA Logic Element NAND-NOR:一种紧凑、快速、延迟平衡的FPGA逻辑元件

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021750

Zhihong Huang, Xing Wei, Grace Zgheib, Wei Li, Y. Lin, Zhenghong Jiang, Kaihui Tu, P. Ienne, Haigang Yang

{"title":"NAND-NOR: A Compact, Fast, and Delay Balanced FPGA Logic Element","authors":"Zhihong Huang, Xing Wei, Grace Zgheib, Wei Li, Y. Lin, Zhenghong Jiang, Kaihui Tu, P. Ienne, Haigang Yang","doi":"10.1145/3020078.3021750","DOIUrl":"https://doi.org/10.1145/3020078.3021750","url":null,"abstract":"The And-Inverter Cone has been introduced as an alternative logic element to the look-up table in FPGAs, since it improves their performance and resource utilization. However, further analysis of the AIC design showed that it suffers from the delay discrepancy problem. Furthermore, the existing AIC cluster design is not properly optimized and has some unnecessary logic that impedes its performance. Thus, we propose in this work a more efficient logic element called NAND-NOR and a delay-balanced dual-phased multiplexers for the input crossbar. Our simulations show that the NAND-NOR brings substantial reduction in delay discrepancy with a 14% to 46% delay improvement when compared to AICs. And, along with the other modifications, it reduces the total cluster area by about 27%, when compared to the reference AIC cluster. Testing the new architecture on a large set of benchmarks shows an improvement of the delay-area product by about 44% and 21% for the MCNC and VTR benchmarks, respectively, when compared to LUT-based cluster. This improvement reaches 31% and 19%, respectively, when compared to the AIC-based architecture.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122637463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A Parallelized Iterative Improvement Approach to Area Optimization for LUT-Based Technology Mapping 基于lut的技术映射区域优化的并行迭代改进方法

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021735

Gai Liu, Zhiru Zhang

{"title":"A Parallelized Iterative Improvement Approach to Area Optimization for LUT-Based Technology Mapping","authors":"Gai Liu, Zhiru Zhang","doi":"10.1145/3020078.3021735","DOIUrl":"https://doi.org/10.1145/3020078.3021735","url":null,"abstract":"Modern FPGA synthesis tools typically apply a predetermined sequence of logic optimizations on the input logic network before carrying out technology mapping. While the \"known recipes\" of logic transformations often lead to improved mapping results, there remains a nontrivial gap between the quality metrics driving the pre-mapping logic optimizations and those targeted by the actual technology mapping. Needless to mention, such miscorrelations would eventually result in suboptimal quality of results. In this paper we propose PIMap, which couples logic transformations and technology mapping under an iterative improvement framework to minimize the circuit area for LUT-based FPGAs. In each iteration, PIMap randomly proposes a transformation on the given logic network from an ensemble of candidate optimizations; it then invokes technology mapping and makes use of the mapping result to determine the likelihood of accepting the proposed transformation. To mitigate the runtime overhead, we further introduce parallelization techniques to decompose a large design into multiple smaller sub-netlists that can be optimized simultaneously. Experimental results show that our approach achieves promising area improvement over a set of commonly used benchmarks. Notably, PIMap reduces the LUT usage by up to 14% and 7% on average over the best-known records for the EPFL arithmetic benchmark suite.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129186076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27