Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献_第5页

A Machine Learning Framework for FPGA Placement (Abstract Only) 一种用于FPGA放置的机器学习框架(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021765

G. Grewal, S. Areibi, Matthew Westrik, Ziad Abuowaimer, Betty Zhao

{"title":"A Machine Learning Framework for FPGA Placement (Abstract Only)","authors":"G. Grewal, S. Areibi, Matthew Westrik, Ziad Abuowaimer, Betty Zhao","doi":"10.1145/3020078.3021765","DOIUrl":"https://doi.org/10.1145/3020078.3021765","url":null,"abstract":"Many of the key stages in the traditional FPGA CAD flow require substantial amounts of computational effort. Moreover, due to limited overlap among individual stages, poor decisions made in earlier stages will often adversely affect the quality of result in later stages. To help address these issues, we propose a machine-learning framework that uses training data to learn the underlying relationship between circuits and the CAD algorithms used to map them onto a particular FPGA device. The framework does not solve the problem at an arbitrary stage in the flow. Rather, it seeks to assist the designer or the tool to solve the problem. The potential capabilities of the framework are demonstrated by applying it to the placement stage, where it is used to recommend the best placement flow for circuits with different features, and to predict placement and routing results without actually performing placement and routing. Results show that when trained using 372 challenging benchmarks for a Xilinx UltraScale device, the classification models employed in the framework achieve average accuracies in the range 92% to 95%, while the regression models have an average error rate in the range of 0.5% to 3.6%.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128961674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

CPU-FPGA Co-Optimization for Big Data Applications: A Case Study of In-Memory Samtool Sorting (Abstract Only) 面向大数据应用的CPU-FPGA协同优化:以内存中Samtool排序为例(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021787

J. Cong, Zhenman Fang, Muhuan Huang, Libo Wang, Di Wu

{"title":"CPU-FPGA Co-Optimization for Big Data Applications: A Case Study of In-Memory Samtool Sorting (Abstract Only)","authors":"J. Cong, Zhenman Fang, Muhuan Huang, Libo Wang, Di Wu","doi":"10.1145/3020078.3021787","DOIUrl":"https://doi.org/10.1145/3020078.3021787","url":null,"abstract":"To efficiently process a tremendous amount of data, today's big data applications tend to distribute the datasets into multiple partitions, such that each partition can be fit into memory and be processed by a separate core/server in parallel. Meanwhile, due to the limited scaling of general-purpose CPUs, FPGAs have emerged as an attractive alternative to accelerate big data applications due to their low power, high performance and energy efficiency. In this paper we aim to answer one key question: How should the multicore CPU and FPGA coordinate together to optimize the performance of big data applications? To address the above question, we conduct a step-by-step case study to perform CPU and FPGA co-optimization for in-memory Samtool sorting in genomic data processing, which is one of the most important big data applications for personalized healthcare. First, to accelerate the time-consuming compression algorithm and its associated cyclic redundancy check (CRC) in Samtool sorting, we implement a portable and maintainable FPGA accelerator using high-level synthesis (HLS). Although FPGAs are traditionally well-known to be suitable for compression and CRC, we find that a straightforward integration of this FPGA accelerator into the multi-threaded Samtool sorting only achieves marginal system throughput improvement over the software baseline running on a 12-core CPU. To improve system performance, we propose a dataflow execution model to effectively orchestrate the computation between the multi-threaded CPU and FPGA. Experimental results show that our co-optimized CPU-FPGA system achieves a 2.6x speedup for in-memory Samtool sorting.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"439 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114002022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

ASAP: Accelerated Short Read Alignment on Programmable Hardware (Abstract Only) ASAP:可编程硬件上的加速短读对齐(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021796

Subho Sankar Banerjee, Mohamed El-Hadedy, Jong Bin Lim, Daniel Chen, Z. Kalbarczyk, Deming Chen, Ravishankar K. Iyer

{"title":"ASAP: Accelerated Short Read Alignment on Programmable Hardware (Abstract Only)","authors":"Subho Sankar Banerjee, Mohamed El-Hadedy, Jong Bin Lim, Daniel Chen, Z. Kalbarczyk, Deming Chen, Ravishankar K. Iyer","doi":"10.1145/3020078.3021796","DOIUrl":"https://doi.org/10.1145/3020078.3021796","url":null,"abstract":"The proliferation of high-throughput sequencing machines allows for the rapid generation of billions of short nucleotide fragments in a short period. This massive amount of sequence data can quickly overwhelm today's storage and compute infrastructure. This poster explores the use of hardware acceleration to significantly improve the runtime of short-read alignment (SRA), a crucial step in pre-processing sequenced genomes. It presents the design and implementation of ASAP, an accelerator for computing Levenshtein distance (LD) in the context of the SRA problem. LD computation is a prominent underlying mathematical kernel that is common to a large number of SRA tools (e.g., BLAST, BWA, SNAP) and is responsible for 50-70% of their runtime. These algorithms mentioned above calculate the exact value of LD between nucleotide strings but only use them to build a total ordering (an ordered list) of the most likely point of origin in the genome. ASAP computes an approximation of LD by encoding computation in propagation delay of circuit elements. This approximation is calculated in an accelerated fashion in hardware and preserves the original total ordering of LDs produced by the traditional algorithms. This computation is performed by constructing circuits that comprise the recursive definition of the LD computation and measuring propagation delay of a signal entering and leaving the circuit. Additionally, ASAP can explore large portions of the search space (substrings of the strings being compared) within one clock cycle, and ignore parts of the search space that does not contribute to an answer. Our design is implemented on an Altera Stratix V FPGA in an IBM POWER8 system using the CAPI interface for cache coherence across the CPU and FPGA. Our design is 200x faster (median measurement) than the equivalent C implementation of the kernel running on the host processor and 2.2x faster for an end-to-end alignment tool for 120-150bp short-read sequences.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"97 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114025403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Learning Convolutional Neural Networks for Data-Flow Graph Mapping on Spatial Programmable Architectures (Abstract Only) 学习卷积神经网络在空间可编程架构上的数据流图映射(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021801

S. Yin, Dajiang Liu, Lifeng Sun, Xinhan Lin, Leibo Liu, Shaojun Wei

{"title":"Learning Convolutional Neural Networks for Data-Flow Graph Mapping on Spatial Programmable Architectures (Abstract Only)","authors":"S. Yin, Dajiang Liu, Lifeng Sun, Xinhan Lin, Leibo Liu, Shaojun Wei","doi":"10.1145/3020078.3021801","DOIUrl":"https://doi.org/10.1145/3020078.3021801","url":null,"abstract":"Data flow graph (DFG) mapping is critical for the compiling of spatial programmable architecture, where compilation time is a key factor for both time-to-market requirement and mapping successful rate. Inspired from the great progress made in tree search game using deep neural network, we proposed a framework for learning convolutional neural networks for mapping DFGs onto spatial programmable architectures. Considering that mapping is a process from source to target, we present a dual-input neural network capturing features from both DFGs in applications and Process Element Array (PEA) in spatial programmable architectures. In order to train the neural network, algorithms are designed to automatically generate a data set from PEA intermediate states of preprocessed DFG. Finally, we demonstrate that the trained neural network can get high identifying accuracy of mapping quality and our proposed mapping approach are competitive with state-of-the-art DFG mapping algorithms in performance while the compilation time is greatly reduced.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"51 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114130588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Accelerating Face Detection on Programmable SoC Using C-Based Synthesis 基于c语言的合成加速可编程SoC的人脸检测

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021753

Nitish Kumar Srivastava, Steve Dai, R. Manohar, Zhiru Zhang

引用次数: 14

The Role of FPGAs in Deep Learning fpga在深度学习中的作用

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3030013

A. Ling, J. Anderson

引用次数: 11

Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis 高阶综合中管道不规则回路的动态危险识别

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021754

Steve Dai, Ritchie Zhao, Gai Liu, S. Srinath, Udit Gupta, C. Batten, Zhiru Zhang

引用次数: 30

fpgaConvNet: Automated Mapping of Convolutional Neural Networks on FPGAs (Abstract Only) fpga上卷积神经网络的自动映射(仅摘要)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021791

Stylianos I. Venieris, C. Bouganis

{"title":"fpgaConvNet: Automated Mapping of Convolutional Neural Networks on FPGAs (Abstract Only)","authors":"Stylianos I. Venieris, C. Bouganis","doi":"10.1145/3020078.3021791","DOIUrl":"https://doi.org/10.1145/3020078.3021791","url":null,"abstract":"In recent years, Convolutional Neural Networks (ConvNets) have become the state-of-the-art in several Artificial Intelligence tasks. Across the range of applications, the performance needs vary significantly, from high-throughput image recognition to the very low-latency requirements of autonomous cars. In this context, FPGAs can provide a potential platform that can be optimally configured based on the different performance needs. However, the complexity of ConvNet models keeps increasing leading to a large design space. This work presents fpgaConvNet, an end-to-end framework for mapping ConvNets on FPGAs. The proposed framework employs an automated design methodology based on the Synchronous Dataflow (SDF) paradigm and defines a set of transformations on the SDF graph in order to efficiently explore the architectural design space. By treating high-throughput and latency-critical systems separately, the presented tool is able to efficiently explore the architectural design space and to generate hardware designs from high-level ConvNet specifications, explicitly optimised for the performance metric of interest. Overall our framework yields designs that improve the performance density and the performance efficiency by up to 6× and 4.49× respectively over existing highly-optimised FPGA, DSP and embedded GPU work.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116441812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion 基于子图动态扩展的gpu加速FPGA路由

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3020078.3021732

Minghua Shen, Guojie Luo

{"title":"Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion","authors":"Minghua Shen, Guojie Luo","doi":"10.1145/3020078.3021732","DOIUrl":"https://doi.org/10.1145/3020078.3021732","url":null,"abstract":"FPGAs are increasingly popular as application-specific accelerators because they lead to a good balance between flexibility and energy efficiency, compared to CPUs and ASICs. However, the long routing time imposes a barrier on FPGA computing, which significantly hinders the design productivity. Existing attempts of parallelizing the FPGA routing either do not fully exploit the parallelism or suffer from an excessive quality loss. Massive parallelism using GPUs has the potential to solve this issue but faces non-trivial challenges. To cope with these challenges, this work presents Corolla, a GPU-accelerated FPGA routing method. Corolla enables applying the GPU-friendly shortest path algorithm in FPGA routing, leveraging the idea of problem size reduction by limiting the search in routing subgraphs. We maintain the convergence after problem size reduction using the dynamic expansion of the routing resource subgraphs. In addition, Corolla explores the fine-grained single-net parallelism and proposes a hybrid approach to combine the static and dynamic parallelism on GPU. To explore the coarse-grained multi-net parallelism, Corolla proposes an effective method to parallelize mutli-net routing while preserving the equivalent routing results as the original single-net routing. Experimental results show that Corolla achieves an average of 18.72x speedup on GPU with a tolerable loss in the routing quality and sustains a scalable speedup on large-scale routing graphs. To our knowledge, this is the first work to demonstrate the effectiveness of GPU-accelerated FPGA routing.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128972776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Session details: Machine Learning 会议细节:机器学习

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI: 10.1145/3257184

J. Cong

引用次数: 0