Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献_第3页

Automatic Generation of Hardware Sandboxes for Trojan Mitigation in Systems on Chip (Abstract Only) 基于芯片系统的防木马硬件沙箱的自动生成(摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021774

C. Bobda, Taylor J. L. Whitaker, C. Kamhoua, K. Kwiat, L. Njilla

{"title":"Automatic Generation of Hardware Sandboxes for Trojan Mitigation in Systems on Chip (Abstract Only)","authors":"C. Bobda, Taylor J. L. Whitaker, C. Kamhoua, K. Kwiat, L. Njilla","doi":"10.1145/3020078.3021774","DOIUrl":"https://doi.org/10.1145/3020078.3021774","url":null,"abstract":"Component based design is one of the preferred methods to tackle system complexity, and reduce costs and time-to-market. Major parts of the system design and IC production are outsourced to facilities distributed across the globe, thus opening the door for malicious Trojan insertion. Hardware Sandboxing was introduce as a means to overcome the shortcomings of traditional static Trojan mitigation methods, which use intense simulation, verification, and physical tests to detect the evidence of malicious components before system deployment. The number of test patterns needed to activate with certainty potential hidden Trojans is very large for complex IPs and SoCs with dozens of inputs, outputs, states, and memory blocks, thus limiting the effectiveness of static testing methods. The rationale is to spend less effort testing pre-deployment. Instead, guards should be built around non-trusted components to catch malicious activities and prevent potential damage. While feasibility of hardware sandboxes has been proven with case studies and real-world applications, manual design was used and no systematic method was devised to automate the design process of system-on-chips that incorporate hardware sandboxes to provide high-level of security in embedded systems. In this work, we propose a method for automatic generation of hardware sandboxes in system-on-chips. Using the interface formalism of De Alfaro and Hetzinger to capture the interactions among components, along with the properties specification language to define non-authorized actions, sandboxes are generated and made ready for inclusion in a system-on-chip design. We leverage the concepts of composition, compatibility, and refinement to optimize resources across the boundary of single component and provide minimal resource consumption. With results on benchmarks implemented in FPGA, we prove that our approach can provide high-level of security, with less resource and no increase in delay.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132926461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Session details: Graph Processing Applications 会话详细信息:图形处理应用

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3257190

Nachiket Kapre

引用次数: 0

RxRE: Throughput Optimization for High-Level Synthesis using Resource-Aware Regularity Extraction (Abstract Only) RxRE:基于资源感知规则提取的高级合成吞吐量优化(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021797

A. Lotfi, Rajesh K. Gupta

{"title":"RxRE: Throughput Optimization for High-Level Synthesis using Resource-Aware Regularity Extraction (Abstract Only)","authors":"A. Lotfi, Rajesh K. Gupta","doi":"10.1145/3020078.3021797","DOIUrl":"https://doi.org/10.1145/3020078.3021797","url":null,"abstract":"Despite the considerable improvements in the quality of HLS tools, they still require the designer's manual optimizations and tweaks to generate efficient results, which negates the HLS design productivity gains. Majority of designer interventions lead to optimizations that are often global in nature, for instance, finding patterns in functions that better fit a custom designed solution. We introduce a high-level resource-aware regularity extraction workflow, called RxRE that detects a class of patterns in an input program, and enhances resource sharing to balance resource usage against increased throughput. RxRE automatically detects structural patterns, or repeated sequence of floating-point operations, from sequential loops, selects suitable resources for them, and shares resources for all instances of the selected patterns. RxRE reduces required hardware area for synthesizing an instance of the program. Hence, more number of program replicas can be fitted in the fixed area budget of an FPGA. RxRE contributes to a pre-synthesis workflow that exploits the inherent regularity of applications to achieve higher computational throughput using off-the-shelf HLS tools without any changes to the HLS flow. It uses a string-based pattern detection approach to find linear patterns across loops within the same function. It deploys a simple but effective model to estimate resource utilization and latency of each candidate design, to avoid synthesizing every possible design alternative. We have implemented and evaluated RxRE using a set of C benchmarks. The synthesis results on a Xilinx Virtex FPGA show that the reduced area of the transformed programs improves the number of mapped kernels by a factor of 1.54X on average (maximum 2.8X) which yields on average 1.59X (maximum 2.4X) higher throughput over Xilinx Vivado HLS tool solution. Current implementation has several limitations and only extracts a special case of regularity that is subject of current optimization and study.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117217995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint Modulo Scheduling and Memory Partitioning with Multi-Bank Memory for High-Level Synthesis (Abstract Only) 面向高级综合的多库内存联合模调度和内存分区(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021778

Tianyi Lu, S. Yin, Xianqing Yao, Zhicong Xie, Leibo Liu, Shaojun Wei

引用次数: 0

A New Approach to Automatic Memory Banking using Trace-Based Address Mining 基于跟踪地址挖掘的自动内存存储新方法

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021734

Yuan Zhou, Khalid Al-Hawaj, Zhiru Zhang

{"title":"A New Approach to Automatic Memory Banking using Trace-Based Address Mining","authors":"Yuan Zhou, Khalid Al-Hawaj, Zhiru Zhang","doi":"10.1145/3020078.3021734","DOIUrl":"https://doi.org/10.1145/3020078.3021734","url":null,"abstract":"Recent years have seen an increased deployment of FPGAs as programmable accelerators for improving the performance and energy efficiency of compute-intensive applications. A well-known \"secret sauce\" of achieving highly efficient FPGA acceleration is to create application-specific memory architecture that fully exploits the vast amounts of on-chip memory bandwidth provided by the reconfigurable fabric. In particular, memory banking is widely employed when multiple parallel memory accesses are needed to meet a demanding throughput constraint. In this paper we propose TraceBanking, a novel and flexible trace-driven address mining algorithm that can automatically generate efficient memory banking schemes by analyzing a stream of memory address bits. Unlike mainstream memory partitioning techniques that are based on static compile-time analysis, TraceBanking only relies on simple source-level instrumentation to provide the memory trace of interest without enforcing any coding restrictions. More importantly, our technique can effectively handle memory traces that exhibit either affine or non-affine access patterns, and produce efficient banking solutions with a reasonable runtime. Furthermore, TraceBanking can be used to process a reduced memory trace with the aid of an SMT prover to verify if the resulting banking scheme is indeed conflict free. Our experiments on Xilinx FPGAs show that TraceBanking achieves competitive performance and resource usage compared to the state-of-the-art across a set of real-life benchmarks with affine memory accesses. We also perform a case study on a face detection algorithm to show that TraceBanking is capable of generating a highly area-efficient memory partitioning based on a sequence of addresses without any obvious access patterns.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121495521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

A Framework for Iterative Stencil Algorithm Synthesis on FPGAs from OpenCL Programming Model (Abstract Only) 基于OpenCL编程模型的fpga迭代模板算法综合框架(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021761

Shuo Wang, Yun Liang

{"title":"A Framework for Iterative Stencil Algorithm Synthesis on FPGAs from OpenCL Programming Model (Abstract Only)","authors":"Shuo Wang, Yun Liang","doi":"10.1145/3020078.3021761","DOIUrl":"https://doi.org/10.1145/3020078.3021761","url":null,"abstract":"Iterative stencil algorithms find applications in a wide range of domains. FPGAs have long been adopted for computation acceleration due to its advantages of dedicated hardware design. Hence, FPGAs are a compelling alternative for executing iterative stencil algorithms. However, efficient implementation of iterative stencil algorithms on FPGAs is very challenging due to the data dependencies between iterations and elements in the stencil algorithms, programming hurdle of FPGAs, and large design space. In this paper, we present a comprehensive framework that synthesizes iterative stencil algorithms on FPGAs efficiently. We leverage the OpenCL-to-FPGA tool chain to generate accelerator automatically and perform design space exploration at high level. We propose to bridge the neighboring tiles through pipe and enable data sharing among them to improve computation efficiency. We first propose a homogeneous design with equal tile size. Then, we extend to a heterogeneous design with different tile size to balance the computation among different tiles. Our designs exhibit a large design space in terms of tile structure. We also develop analytical performance models to explore the complex design space. Experiments using a wide range of stencil applications demonstrate that on average our homogeneous and heterogeneous implementations achieve 1.49X and 1.65X performance speedup respectively but with less hardware resource compared to the state-of-the-art.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122197839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Accelerating Financial Market Server through Hybrid List Design (Abstract Only) 通过混合列表设计加速金融市场服务器(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021775

H. Fu, Conghui He, Huabin Ruan, Itay Greenspon, W. Luk, Yongkang Zheng, Junfeng Liao, Qing Zhang, Guangwen Yang

{"title":"Accelerating Financial Market Server through Hybrid List Design (Abstract Only)","authors":"H. Fu, Conghui He, Huabin Ruan, Itay Greenspon, W. Luk, Yongkang Zheng, Junfeng Liao, Qing Zhang, Guangwen Yang","doi":"10.1145/3020078.3021775","DOIUrl":"https://doi.org/10.1145/3020078.3021775","url":null,"abstract":"The financial market server in exchanges aims to maintain the order books and provide real time market data feeds to traders. Low-latency processing is in a great demand in financial trading. Although software solutions provide the flexibility to express algorithms in high-level programming models and to recompile quickly, it is becoming increasingly uncompetitive due to the long and unpredictable response time. Nowadays, Field Programmable Gate Arrays (FPGAs) have been proved to be an established technology for achieving a low and constant latency for processing streaming packets in a hardware accelerated way. However, maintaining order books on FPGAs involves organizing packets into GBs of structural data information as well as complicated routines (sort, insertion, deletion, etc.), which is extremely challenging to FPGA designs in both design methodology and memory volume. Thus existing FPGA designs often leave the post-processing part to the CPUs. However, it largely cancels the latency gain of the network packet processing part. This paper proposes a CPU-FPGA hybrid list design to accelerate financial market servers that achieve microsecond-level latencies. This paper mainly includes four contributions. First, we design a CPU-FPGA hybrid list with two levels, a small cache list on the FPGA and a large master list at the CPU host. Both lists are sorted with different sorting schemes, where the bitonic sort is applied to the cache list while a balanced tree is used to maintain the master list. Second, in order to effectively update the hybrid sorted list, we derive a complete set of low-latency routines, including insertion, deletion, selection, sorting, etc., providing a low latency at the scale of a few cycles. Third, we propose a non-blocking on-demand synchronization strategy for the cache list and the master list to communicate with each other. Lastly, we integrate the hybrid list as well as other components, such as packets splitting, parsing, processing, etc. to form an industry-level financial market server. Our design is applied in the environment of the China Financial Futures Exchange (CFFEX), demonstrating its functionality and stability by running 600+ hours with hundreds of millions packets per day. Compared with the existing CPU-based solution in CFFEX, our system is able to support identical functionalities while significantly reducing the latency from 100+ microseconds to 2 microseconds, gaining a speedup of 50x.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116379301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Session details: Panel: FPGAs in the Cloud 专题讨论:云中的fpga

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3257188

G. Constantinides

引用次数: 0

120-core microAptiv MIPS Overlay for the Terasic DE5-NET FPGA board 用于Terasic DE5-NET FPGA板的120核microAptiv MIPS Overlay

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021751

B. ChethanKumarH., P. Ravi, G. Modi, Nachiket Kapre

引用次数: 12

Towards Efficient Design Space Exploration of FPGA-based Accelerators for Streaming HPC Applications (Abstract Only) 面向流式高性能计算应用的fpga加速器的高效设计空间探索(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021767

Mostafa Koraei, Magnus Jahre, S. O. Fatemi

{"title":"Towards Efficient Design Space Exploration of FPGA-based Accelerators for Streaming HPC Applications (Abstract Only)","authors":"Mostafa Koraei, Magnus Jahre, S. O. Fatemi","doi":"10.1145/3020078.3021767","DOIUrl":"https://doi.org/10.1145/3020078.3021767","url":null,"abstract":"Streaming HPC applications are data intensive and have widespread use in various fields (e.g., Computational Fluid Dynamics and Bioinformatics). These applications consist of different processing kernels where each kernel performs a specific computation on its input data. The objective of the optimization process is to maximize performance. FPGAs show great promise for accelerating streaming applications because of their low power consumption combined with high theoretical compute capabilities. However, mapping an HPC application to a reconfigurable fabric is a challenging task. The challenge is exacerbated by need to temporally partition computational kernels when application requirements exceed resource availability. In this poster, we present work towards a novel design methodology for exploring design space of streaming HPC applications on FPGAs. We assume that the designer can represent the target application with a Synchronous Data Flow Graph (SDFG). In the SDFG, the nodes are compute kernels and the edges signify data flow between kernels. The designer should also determine the problem size of the application and the volume of raw data on each memory source of the SDFG. The output of our method is a set of FPGA configurations that each contain one or more SDFG nodes. The methodology consists of three main steps. In Step 1, we enumerate the valid partitions and the base configurations. In Step 2, we find the feasible base configurations given the hardware resources available and a library of processing kernel implementations. Finally, we use a performance model to calculate the execution time of each partition in Step 3. Our current assumption is that it is advantageous to represent SDFG at a coarse granularity since this enables exhaustive exploration of the design space for practical applications. This approach has yielded promising preliminary results. In one case, the temporal configuration selected by our methodology outperformed the direct mapping by 3X.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123412636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0