2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)最新文献_第4页

Accelerating Large-Scale Graph Analytics with FPGA and HMC 用FPGA和HMC加速大规模图形分析

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.58

Soroosh Khoram, Jialiang Zhang, Maxwell Strange, J. Li

{"title":"Accelerating Large-Scale Graph Analytics with FPGA and HMC","authors":"Soroosh Khoram, Jialiang Zhang, Maxwell Strange, J. Li","doi":"10.1109/FCCM.2017.58","DOIUrl":"https://doi.org/10.1109/FCCM.2017.58","url":null,"abstract":"Graph analytics that explores the relationship among interconnected entities is becoming increasingly important due to its broad applicability from machine learning to social science. However, one major challenge for graph processing systems is the irregular data access pattern of graph computation which can significantly degrade the performance. The algorithms, software, and hardware that have been tailored for mainstream parallel applications are, as a result, generally not effective for massive-scale sparse graphs from the real world due to their complexity and irregularity. To address the performance issues in large-scale graph analytics, we combine the emerging Hybrid Memory Cube (HMC) with a modern FPGA in order to achieve exceptional random access performance without any loss of flexibility or efficiency in computation. In particular, we develop collaborative software/hardware techniques to perform a level-synchronized breadth first search (BFS) on the FPGA-HMC platform. From the software perspective, we develop an architecture-aware graph clustering algorithm that fully exploits the platform's capability to improve data locality and memory access efficiency. For each input graph, this algorithm provides an efficient data layout that allows the FPGA to coalesce memory requests into the largest possible HMC payload requests so that the number of memory requests, which is the primary factor in runtime, can be minimized. From the hardware perspective, we further improve the FPGA-HMC graph processor architecture by adding a merging unit. The merging unit takes the best advantage of the increased data locality resulting from graph clustering. We evaluated the performance of our BFS implementation using the AC-510 development kit from Micron over a set of benchmarks from a wide range of applications. We observed that the combination of the clustering algorithm and the merging hardware achieved 2.8 × average performance improvement compared to the latest FPGA-HMC based graph processing system.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131377641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Customizing Neural Networks for Efficient FPGA Implementation 定制神经网络的高效FPGA实现

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.43

Mohammad Samragh, M. Ghasemzadeh, F. Koushanfar

引用次数: 43

Improving the Accuracy of Arctan for Face Detection 提高Arctan在人脸检测中的准确性

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.48

Youngsoo Kim, Hossein Shahdoost, Shrikant S. Jadhav, C. Gloster

引用次数: 3

Megrez: Parallelizing FPGA Routing with Strictly-Ordered Partitioning Megrez:严格有序分区并行化FPGA路由

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.18

Minghua Shen, Guojie Luo

引用次数: 0

Relocating Encrypted Partial Bitstreams by Advance Task Address Loading 通过预先任务地址加载重定位加密的部分比特流

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.50

Adewale Adetomi, Godwin Enemali, T. Arslan

引用次数: 4

Terabyte Sort on FPGA-Accelerated Flash Storage fpga加速闪存上的tb排序

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.53

S. Jun, Shuotao Xu, Arvind

{"title":"Terabyte Sort on FPGA-Accelerated Flash Storage","authors":"S. Jun, Shuotao Xu, Arvind","doi":"10.1109/FCCM.2017.53","DOIUrl":"https://doi.org/10.1109/FCCM.2017.53","url":null,"abstract":"Sorting is one of the most fundamental and usefulapplications in computer science, and continues to be animportant tool in analyzing large datasets. An important andchallenging subclass of sorting problems involves sorting terabytescale datasets with hundreds of billions of records. Theconventional method of sorting such large amounts of datais to distribute the data and computation over a cluster ofmachines. Such solutions can be fast but are often expensiveand power-hungry. In this paper, we propose a solution basedon flash storage connected to a collection of FPGA-based sortingaccelerators that perform large-scale merge-sort in storage. Theaccelerators include highly efficient sorting networks and mergetrees that use bitonic sorting to emit multiple sorted valuesevery cycle. We show that by appropriate use of acceleratorswe can remove all the computation bottlenecks so that the endto-endsorting performance is limited only by the flash storagebandwidth. We demonstrate that our flash-based system matchesthe performance of existing distributed-cluster solutions of muchlarger scale. More importantly, our prototype is able to showalmost twice the power efficiency compared to the existingJoulesort record holder. An optimized system with less wastefulcomponents is projected to be four times more efficient comparedto the current record holder, sorting over 200,000 records perjoule of energy.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123766312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

A Real-Time Embedded FPGA Processor for a Stand-Alone Dual-Mode Assistive Device 用于独立双模辅助设备的实时嵌入式FPGA处理器

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.55

A. Jafari, Maysam Ghovanloo, T. Mohsenin

引用次数: 2

Communication-Aware MCMC Method for Big Data Applications on FPGAs fpga大数据应用的通信感知MCMC方法

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.9

Shuanglong Liu, C. Bouganis

{"title":"Communication-Aware MCMC Method for Big Data Applications on FPGAs","authors":"Shuanglong Liu, C. Bouganis","doi":"10.1109/FCCM.2017.9","DOIUrl":"https://doi.org/10.1109/FCCM.2017.9","url":null,"abstract":"Markov Chain Monte Carlo (MCMC) based methods have been the main tool for Bayesian Inference for some years now, and recently they find increasing applications in modern statistics and machine learning. Nevertheless, with the availability of large datasets and increasing complexity of Bayesian models, MCMC methods are becoming prohibitively expensive for real-world problems. At the heart of these methods, lies the computation of likelihood functions that requires access to all input data points in each iteration of the method. Current approaches, based on data subsampling, aim to accelerate these algorithms by reducing the number of the data points for likelihood evaluations at each MCMC iteration. However the existing work doesn't consider the properties of modern memory hierarchies, but treats the memory as one monolithic storage space. This paper proposes a communication-aware MCMC framework that takes into account the underlying performance of the memory subsystem. The framework is based on a novel subsampling algorithm that utilises an unbiased likelihood estimator based on Probability Proportional-to-Size (PPS) sampling, allowing information on the performance of the memory system to be taken into account during the sampling stage. The proposed MCMC sampler is mapped to an FPGA device and its performance is evaluated using the Bayesian logistic regression model on MNIST dataset. The proposed system achieves a 3.37x speed up over a highly optimised traditional FPGA design, therefore the risk in the estimates based on the generated samples is largely decreased.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127901854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

K-Mer Counting Using Bloom Filters with an FPGA-Attached HMC 使用带有fpga的HMC的布隆滤波器进行K-Mer计数

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.23

Nathaniel McVicar, Chih-Ching Lin, S. Hauck

{"title":"K-Mer Counting Using Bloom Filters with an FPGA-Attached HMC","authors":"Nathaniel McVicar, Chih-Ching Lin, S. Hauck","doi":"10.1109/FCCM.2017.23","DOIUrl":"https://doi.org/10.1109/FCCM.2017.23","url":null,"abstract":"As FPGAs are integrated into to the cloud, they become useful in a number of areas where they were not traditionally considered, such as processing genomics data. For many genomics applications, such as K-mer counting, the off-chip DRAM (and sometimes SRAM) memory subsystems provided by most FPGA boards for high capacity storage are not efficient. Recently new styles of memory have been developed, though their role in reconfigurable computing systems can be unclear. One of the challenges these memory systems present to FPGA designers is identifying how they can be used in current systems, and what new applications become possible. In this paper we describe how and why K-mer counting is one such use for an FPGA-attached Hybrid Memory Cube (HMC). The HMC's high random-access rate is ideal for large Bloom filters, an efficient structure for checking membership in a set, or even counting occurrences. Our HMC based counting Bloom filter, useful in a bioinformatics context, achieves a speedup of 13x over traditional FPGA-attached DRAM and 9.31x to 17.6x over multi-core, multi-threaded software on our host system.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125796158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Scalable Network Function Virtualization for Heterogeneous Middleboxes 异构中间件的可扩展网络功能虚拟化

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.24

Xuzhi Zhang, Xiaozhe Shao, George Provelengios, Naveen Kumar Dumpala, Lixin Gao, R. Tessier

{"title":"Scalable Network Function Virtualization for Heterogeneous Middleboxes","authors":"Xuzhi Zhang, Xiaozhe Shao, George Provelengios, Naveen Kumar Dumpala, Lixin Gao, R. Tessier","doi":"10.1109/FCCM.2017.24","DOIUrl":"https://doi.org/10.1109/FCCM.2017.24","url":null,"abstract":"Over the past decade, a wide-ranging collection of network functions in middleboxes has been used to accommodate the needs of network users. Although the use of general-purpose processors has been shown to be feasible for this purpose, the serial nature of microprocessors limits network functional virtualization (NFV) performance. In this paper, we describe a new heterogeneous hardware-software approach to NFV construction that provides scalability and programmability, while supporting significant hardware-level parallelism and reconfiguration. Our computing platform uses both field-programmable gate arrays (FPGA) and microprocessors to implement numerous NFV operations that can be dynamically customized to specific network flow needs. As the number of required functions and their characteristics change, the hardware in the FPGA is automatically reconfigured to support the updated requirements. Traffic management and hardware reconfiguration functions are performed by a global coordinator which allows for the rapid sharing of middlebox state and continuous evaluation of network function needs. To evaluate our approach, a series of software tools and NFV modules have been implemented. Our system is shown to be scalable for collections of network functions exceeding one million shared states.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129896259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6