Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献

筛选
英文 中文
Secure Function Evaluation Using an FPGA Overlay Architecture 基于FPGA覆盖架构的安全功能评估
Xin Fang, Stratis Ioannidis, M. Leeser
{"title":"Secure Function Evaluation Using an FPGA Overlay Architecture","authors":"Xin Fang, Stratis Ioannidis, M. Leeser","doi":"10.1145/3020078.3021746","DOIUrl":"https://doi.org/10.1145/3020078.3021746","url":null,"abstract":"Secure Function Evaluation (SFE) has received considerable attention recently due to the massive collection and mining of personal data over the Internet, but large computational costs still render it impractical. In this paper, we leverage hardware acceleration to tackle the scalability and efficiency challenges inherent in SFE. To that end, we propose a generic, reconfigurable implementation of SFE as a coarse-grained FPGA overlay architecture. Contrary to tailored approaches that are tied to the execution of a specific SFE structure, and require full reprogramming of an FPGA with each new execution, our design allows repurposing an FPGA to evaluate different SFE tasks without the need for reprogramming. Our implementation shows orders of magnitude improvement over a software package for evaluating garbled circuits, and demonstrates that the circuit being evaluated can change with almost no overhead.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115146342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Automatic Construction of Program-Optimized FPGA Memory Networks 程序优化FPGA存储网络的自动构建
Hsin-Jung Yang, Kermin Fleming, F. Winterstein, Annie I. Chen, Michael Adler, J. Emer
{"title":"Automatic Construction of Program-Optimized FPGA Memory Networks","authors":"Hsin-Jung Yang, Kermin Fleming, F. Winterstein, Annie I. Chen, Michael Adler, J. Emer","doi":"10.1145/3020078.3021748","DOIUrl":"https://doi.org/10.1145/3020078.3021748","url":null,"abstract":"Memory systems play a key role in the performance of FPGA applications. As FPGA deployments move towards design entry points that are more serial, memory latency has become a serious design consideration. For these applications, memory network optimization is essential in improving performance. In this paper, we examine the automatic, program-optimized construction of low-latency memory networks. We design a feedback-driven network compiler, which constructs an optimized memory network based on the target program's memory access behavior measured via a newly designed network profiler. In our test applications, the compiler-optimized networks provide a 45% performance gain on average over baseline memory networks by minimizing the impact of network latency on program performance.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123312090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
NAND-NOR: A Compact, Fast, and Delay Balanced FPGA Logic Element NAND-NOR:一种紧凑、快速、延迟平衡的FPGA逻辑元件
Zhihong Huang, Xing Wei, Grace Zgheib, Wei Li, Y. Lin, Zhenghong Jiang, Kaihui Tu, P. Ienne, Haigang Yang
{"title":"NAND-NOR: A Compact, Fast, and Delay Balanced FPGA Logic Element","authors":"Zhihong Huang, Xing Wei, Grace Zgheib, Wei Li, Y. Lin, Zhenghong Jiang, Kaihui Tu, P. Ienne, Haigang Yang","doi":"10.1145/3020078.3021750","DOIUrl":"https://doi.org/10.1145/3020078.3021750","url":null,"abstract":"The And-Inverter Cone has been introduced as an alternative logic element to the look-up table in FPGAs, since it improves their performance and resource utilization. However, further analysis of the AIC design showed that it suffers from the delay discrepancy problem. Furthermore, the existing AIC cluster design is not properly optimized and has some unnecessary logic that impedes its performance. Thus, we propose in this work a more efficient logic element called NAND-NOR and a delay-balanced dual-phased multiplexers for the input crossbar. Our simulations show that the NAND-NOR brings substantial reduction in delay discrepancy with a 14% to 46% delay improvement when compared to AICs. And, along with the other modifications, it reduces the total cluster area by about 27%, when compared to the reference AIC cluster. Testing the new architecture on a large set of benchmarks shows an improvement of the delay-area product by about 44% and 21% for the MCNC and VTR benchmarks, respectively, when compared to LUT-based cluster. This improvement reaches 31% and 19%, respectively, when compared to the AIC-based architecture.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122637463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Measuring the Power-Constrained Performance and Energy Gap between FPGAs and Processors (Abstract Only) 测量fpga和处理器之间的功耗约束性能和能量差距(仅摘要)
A. Ye, K. Ganesan
{"title":"Measuring the Power-Constrained Performance and Energy Gap between FPGAs and Processors (Abstract Only)","authors":"A. Ye, K. Ganesan","doi":"10.1145/3020078.3021756","DOIUrl":"https://doi.org/10.1145/3020078.3021756","url":null,"abstract":"This work measures the performance and power consumption gap between the current generation of low power FPGAs and low power microprocessors (microcontrollers) through an implementation of the Canny edge detection algorithm. In particular, the algorithm is implemented on Altera MAX 10 FPGAs and its performance and power consumption are then compared to the same algorithm implemented on the STMicroelectronics' implementation of the ARM M-series microcontrollers. We found an extremely high, four- to five-orders of magnitude, performance advantage of the FPGAs over the microcontrollers, which is much greater than any previously reported values in FPGAs vs. processors studies. Furthermore, this speedup only comes at a cost of 1.2x to 15x higher power consumption, which gives FPGAs a significant advantage in energy efficiency. We also observe, however, the current generation of low power FPGAs have significantly higher static power consumption than the microcontrollers. In particular, the low power FPGAs consume more static power than the total power consumption of the lowest power consuming microcontrollers, rendering the FPGAs inoperable under the power budgets of these processors. Furthermore, this high static power consumption exists despite the fact that the FPGAs are implemented on a low leakage 55nm process with dual supply voltages while the microcontrollers are implemented on a conventional, single supply voltage, 90nm process. Consequently, our results indicate that it is particular important for future research to address the static power consumption of low power FPGAs while maintaining logic capacity so the performance and energy efficiency advantages of the FPGAs can be fully utilized in the extremely low power application domain that are driven by batteries with very small form factors and emerging small scale energy harvesting technologies.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"276 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124212785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cache Timing Attacks from The SoCFPGA Coherency Port (Abstract Only) 来自SoCFPGA一致性端口的缓存定时攻击(仅摘要)
S. Chaudhuri
{"title":"Cache Timing Attacks from The SoCFPGA Coherency Port (Abstract Only)","authors":"S. Chaudhuri","doi":"10.1145/3020078.3021802","DOIUrl":"https://doi.org/10.1145/3020078.3021802","url":null,"abstract":"In this presentation we show that side-channels arising from micro-architecture of SoCFPGAs could be a security risk. We present a FPGA trojan based on OpenCL which performs cache-timing attacks through the accelerator coherency port (ACP) of a SoCFPGA. Its primary goal is to derive physical addresses used by the Linux kernel on ARM Hard Processor System. With this information the trojan can then surgically change memory locations to gain privileges as in a rootkit. We present the customisation to the Altera OpenCL platform, and the OpenCL code to implement the trojan. We show that it is possible to accurately predict physical addresses and the page table entries corresponding to an arbitrary location in the heap after sufficient (~300) iterations, and by using a differential ranking. The attack can be refined by the known page table structure of the Linux kernel, to accurately determine the target physical address, and its corresponding page table entry. Malicious code can then be injected from FPGA, by redirecting page table entries. Since Linux kernel version 4.0-rc5 physical addresses are obfuscated from the normal user to prevent Rowhammer attacks. With information from ACP side-channel the above measure can be bypassed.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121228973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Packet Matching on FPGAs Using HMC Memory: Towards One Million Rules 基于HMC内存的fpga数据包匹配:迈向一百万规则
Daniel Rozhko, Geoffrey Elliott, D. Ly-Ma, P. Chow, H. Jacobsen
{"title":"Packet Matching on FPGAs Using HMC Memory: Towards One Million Rules","authors":"Daniel Rozhko, Geoffrey Elliott, D. Ly-Ma, P. Chow, H. Jacobsen","doi":"10.1145/3020078.3021752","DOIUrl":"https://doi.org/10.1145/3020078.3021752","url":null,"abstract":"Packet processing systems increasingly need larger rulesets to satisfy the needs of deep-network intrusion prevention and cluster computing. FPGA-based implementations of packet processing systems have been proposed but their use of on-chip memory limits the number of rules these existing systems can maintain. Off-chip memories have traditionally been too slow to enable meaningful processing rates, but in this work we present a packet processing system that utilizes the much faster Hybrid Memory Cube (HMC) technology, enabling larger rulesets at usable line-rates. The proposed architecture streams rules from the HMC memory to a packet matching engine, using prefetching to hide the HMC access latency. The packet matching engine is replicated to process multiple packets in parallel. The final system, implemented on a Xilinx Kintex Ultrascale 060, processes 160 packets in parallel, achieving a 10~Gbps line-rate with approximately 1500 rules and a 16~Mbps line-rate with 1M rules. To the best of our knowledge, this is the first hardware solution capable of maintaining rulesets of this size. We present this work as an exploration of the application of HMCs to packet processing and as a first step in achieving a processing capability of a million rules at usable line-rates.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125713999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Parallelized Iterative Improvement Approach to Area Optimization for LUT-Based Technology Mapping 基于lut的技术映射区域优化的并行迭代改进方法
Gai Liu, Zhiru Zhang
{"title":"A Parallelized Iterative Improvement Approach to Area Optimization for LUT-Based Technology Mapping","authors":"Gai Liu, Zhiru Zhang","doi":"10.1145/3020078.3021735","DOIUrl":"https://doi.org/10.1145/3020078.3021735","url":null,"abstract":"Modern FPGA synthesis tools typically apply a predetermined sequence of logic optimizations on the input logic network before carrying out technology mapping. While the \"known recipes\" of logic transformations often lead to improved mapping results, there remains a nontrivial gap between the quality metrics driving the pre-mapping logic optimizations and those targeted by the actual technology mapping. Needless to mention, such miscorrelations would eventually result in suboptimal quality of results. In this paper we propose PIMap, which couples logic transformations and technology mapping under an iterative improvement framework to minimize the circuit area for LUT-based FPGAs. In each iteration, PIMap randomly proposes a transformation on the given logic network from an ensemble of candidate optimizations; it then invokes technology mapping and makes use of the mapping result to determine the likelihood of accepting the proposed transformation. To mitigate the runtime overhead, we further introduce parallelization techniques to decompose a large design into multiple smaller sub-netlists that can be optimized simultaneously. Experimental results show that our approach achieves promising area improvement over a set of commonly used benchmarks. Notably, PIMap reduces the LUT usage by up to 14% and 7% on average over the best-known records for the EPFL arithmetic benchmark suite.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129186076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Quality-Time Tradeoffs in Component-Specific Mapping: How to Train Your Dynamically Reconfigurable Array of Gates with Outrageous Network-delays 特定组件映射中的质量时间权衡:如何训练具有惊人网络延迟的动态可重构门阵列
Hans Giesen, Raphael Rubin, Benjamin Gojman, A. DeHon
{"title":"Quality-Time Tradeoffs in Component-Specific Mapping: How to Train Your Dynamically Reconfigurable Array of Gates with Outrageous Network-delays","authors":"Hans Giesen, Raphael Rubin, Benjamin Gojman, A. DeHon","doi":"10.1145/3020078.3026124","DOIUrl":"https://doi.org/10.1145/3020078.3026124","url":null,"abstract":"How should we perform component-specific adaptation for FPGAs? Prior work has demonstrated that the negative effects of variation can be largely mitigated using complete knowledge of device characteristics and full per-FPGA CAD flow. However, the cost of per-FPGA characterization and mapping could be prohibitively expensive. We explore light-weight options for per-FPGA mapping that avoid the need for a priori device characterization and perform less expensive per FPGA customization work. We characterize the tradeoff between Quality-of-Results (energy, delay) and per-device mapping costs for 7 design points ranging from complete mapping based on knowledge to no per-device mapping. We show that it is possible to get 48-77% of the component-specific mapping delay benefit or 57% of the energy benefit with a mapping that takes less than 20 seconds per FPGA. An incremental solution can start execution after a 21 ms bitstream load and converge to 77% delay benefit after 18 seconds of runtime.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132758465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Using Vivado-HLS for Structural Design: a NoC Case Study (Abstract Only) 使用Vivado-HLS进行结构设计:NoC案例研究(仅摘要)
Zhipeng Zhao, J. Hoe
{"title":"Using Vivado-HLS for Structural Design: a NoC Case Study (Abstract Only)","authors":"Zhipeng Zhao, J. Hoe","doi":"10.1145/3020078.3021772","DOIUrl":"https://doi.org/10.1145/3020078.3021772","url":null,"abstract":"There have been ample successful examples of applying Xilinx Vivado's \"function-to-module\" high-level synthesis (HLS) where the subject is algorithmic in nature. In this work, we carried out a design study to assess the effectiveness of applying Vivado-HLS in structural design. We employed Vivado-HLS to synthesize C functions corresponding to standalone network-on-chip (NoC) routers as well as complete multi-endpoint NoCs. Interestingly, we find that describing a complete NoC comprising router submodules faces fundamental difficulties not present in describing the routers as standalone modules. Ultimately, we succeeded in using Vivado-HLS to produce router and NoC modules that are exact cycle- and bit-accurate replacements of our reference RTL-based router and NoC modules. Furthermore, the routers and NoCs resulting from HLS and RTL are comparable in resource utilization and critical path delay. Our experience subjectively suggests that HLS is able to simplify the design effort even though much of the structural details had to be provided in the HLS description through a combination of coding discipline and explicit pragmas. The C++ source code and a more extensive description of this work can be found at http://www.ece.cmu.edu/calcm/connect_hls.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133555116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Parallel Bandit-Based Approach for Autotuning FPGA Compilation 一种基于并行强盗的FPGA自动调谐方法
Chang Xu, Gai Liu, Ritchie Zhao, Stephen Yang, Guojie Luo, Zhiru Zhang
{"title":"A Parallel Bandit-Based Approach for Autotuning FPGA Compilation","authors":"Chang Xu, Gai Liu, Ritchie Zhao, Stephen Yang, Guojie Luo, Zhiru Zhang","doi":"10.1145/3020078.3021747","DOIUrl":"https://doi.org/10.1145/3020078.3021747","url":null,"abstract":"Mainstream FPGA CAD tools provide an extensive collection of optimization options that have a significant impact on the quality of the final design. These options together create an enormous and complex design space that cannot effectively be explored by human effort alone. Instead, we propose to search this parameter space using autotuning, which is a popular approach in the compiler optimization domain. Specifically, we study the effectiveness of applying the multi-armed bandit (MAB) technique to automatically tune the options for a complete FPGA compilation flow from RTL to bitstream, including RTL/logic synthesis, technology mapping, placement, and routing. To mitigate the high runtime cost incurred by the complex FPGA implementation process, we devise an efficient parallelization scheme that enables multiple MAB-based autotuners to explore the design space simultaneously. In particular, we propose a dynamic solution space partitioning and resource allocation technique that intelligently allocates computing resources to promising search regions based on the runtime information of search quality from previous iterations. Experiments on academic and commercial FPGA CAD tools demonstrate promising improvements in quality and convergence rate across a variety of real-life designs.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131040699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信