Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献_第6页

Machine-Learning driven Auto-Tuning of High-Level Synthesis for FPGAs (Abstract Only) 基于机器学习的fpga高级综合自整定(仅摘要)

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2016-02-21 DOI: 10.1145/2847263.2847297

Li Ting, Harri Wijaya, Nachiket Kapre

{"title":"Machine-Learning driven Auto-Tuning of High-Level Synthesis for FPGAs (Abstract Only)","authors":"Li Ting, Harri Wijaya, Nachiket Kapre","doi":"10.1145/2847263.2847297","DOIUrl":"https://doi.org/10.1145/2847263.2847297","url":null,"abstract":"Modern High-Level Synthesis (HLS) tools allow C descriptions of computation to be compiled to optimized low-level RTL, but expose a range of manual optimization options, compiler directives and tweaks to the developer. In many instances, this results in a tedious iterative development flow to meet resource, timing and power constraints which defeats the purpose of adopting the high-level abstraction in the first place. In this paper, we show how to use Machine Learning routines to predict the impact of HLS compiler optimization on final FPGA utilization metrics. We compile multiple variations of the high-level C code across a range of compiler optimizations and pragmas to generate a large design space of candidate solutions. On the Machsuite benchmarks, we are able to train a linear regression model to predict resources, latency and frequency metrics with high accuracy (R2 > 0.75). We expect such developer-assistance tools to (1) offer insight to drive manual selection of suitable directive combinations, and (2) automate the process of selecting directives in the complex design space of modern HLS design.","PeriodicalId":438572,"journal":{"name":"Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124186727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Session details: Technical Session 5: Architecture and Tools 技术会议5:架构和工具

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2016-02-21 DOI: 10.1145/3250864

Jonathan Rose

引用次数: 0

Session details: Technical Session 7: High-level Synthesis and Tools 会议详情:技术会议7:高水平综合和工具

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2016-02-21 DOI: 10.1145/3250866

David Biancolin

引用次数: 0

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks 基于opencl的大规模卷积神经网络FPGA加速器

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2016-02-21 DOI: 10.1145/2847263.2847276

Naveen Suda, V. Chandra, Ganesh S. Dasika, Abinash Mohanty, Yufei Ma, S. Vrudhula, Jae-sun Seo, Yu Cao

{"title":"Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks","authors":"Naveen Suda, V. Chandra, Ganesh S. Dasika, Abinash Mohanty, Yufei Ma, S. Vrudhula, Jae-sun Seo, Yu Cao","doi":"10.1145/2847263.2847276","DOIUrl":"https://doi.org/10.1145/2847263.2847276","url":null,"abstract":"Convolutional Neural Networks (CNNs) have gained popularity in many computer vision applications such as image classification, face detection, and video analysis, because of their ability to train and classify with high accuracy. Due to multiple convolution and fully-connected layers that are compute-/memory-intensive, it is difficult to perform real-time classification with low power consumption on today?s computing systems. FPGAs have been widely explored as hardware accelerators for CNNs because of their reconfigurability and energy efficiency, as well as fast turn-around-time, especially with high-level synthesis methodologies. Previous FPGA-based CNN accelerators, however, typically implemented generic accelerators agnostic to the CNN configuration, where the reconfigurable capabilities of FPGAs are not fully leveraged to maximize the overall system throughput. In this work, we present a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGA resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth. The proposed methodology is demonstrated by optimizing two representative large-scale CNNs, AlexNet and VGG, on two Altera Stratix-V FPGA platforms, DE5-Net and P395-D8 boards, which have different hardware resources. We achieve a peak performance of 136.5 GOPS for convolution operation, and 117.8 GOPS for the entire VGG network that performs ImageNet classification on P395-D8 board.","PeriodicalId":438572,"journal":{"name":"Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131282883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 479

Just In Time Assembly of Accelerators 加速器的及时组装

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2016-02-21 DOI: 10.1145/2847263.2847341

Sen Ma, Zeyad Aklah, D. Andrews

{"title":"Just In Time Assembly of Accelerators","authors":"Sen Ma, Zeyad Aklah, D. Andrews","doi":"10.1145/2847263.2847341","DOIUrl":"https://doi.org/10.1145/2847263.2847341","url":null,"abstract":"Despite the significant advancements that have been made in High Level Synthesis, the reconfigurable computing community has failed at getting programmers to use Field Programmable Gate Arrays (FPGAs). Existing barriers that prevent programmers from using FPGAs include the need to work within vendor specific CAD tools, knowledge of hardware programming models, and the requirement to pass each design through synthesis, place and route. In this paper we present a new approach that takes these barriers out of the design flows for programmers. Synthesis is eliminated from the application programmers path by becoming part of the initial coding process when creating the programming patterns that define a Domain Specific Language. Programmers see no difference between creating software or hardware functionality when using the DSL. A run time interpreter is introduced that assembles hardware accelerators within a configurable tile array of partially reconfigurable slots at run time. Initial results show the approach allows hardware accelerators to be compiled 100x faster compared to the time required to synthesize the same functionality. Initial performance results further show a compilation/interpretation approach can achieve approximately equivalent performance for matrix operations and filtering compared to synthesizing a custom accelerator.","PeriodicalId":438572,"journal":{"name":"Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123093327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Using Stochastic Computing to Reduce the Hardware Requirements for a Restricted Boltzmann Machine Classifier 用随机计算降低受限玻尔兹曼机器分类器的硬件要求

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2016-02-21 DOI: 10.1145/2847263.2847340

Bingzhe Li, M. Najafi, D. Lilja

{"title":"Using Stochastic Computing to Reduce the Hardware Requirements for a Restricted Boltzmann Machine Classifier","authors":"Bingzhe Li, M. Najafi, D. Lilja","doi":"10.1145/2847263.2847340","DOIUrl":"https://doi.org/10.1145/2847263.2847340","url":null,"abstract":"Artificial neural networks are powerful computational systems with interconnected neurons. Generally, these networks have a very large number of computation nodes which forces the designer to use software-based implementations. However, the software based implementations are offline and not suitable for portable or real-time applications. Experiments show that compared with the software based implementations, FPGA-based systems can greatly speed up the computation time, making them suitable for real-time situations and portable applications. However, the FPGA implementation of neural networks with a large number of nodes is still a challenging task. In this paper, we exploit stochastic bit streams in the Restricted Boltzmann Machine (RBM) to implement the classification of the RBM handwritten digit recognition application completely on an FPGA. We use finite state machine-based (FSM) stochastic circuits to implement the required sigmoid function and use the novel stochastic computing approach to perform all large matrix multiplications. Experimental results show that the proposed stochastic architecture has much more potential for tolerating faults while requiring much less hardware compared to the currently un-implementable deterministic binary approach when the RBM consists of a large number of neurons. Exploiting the features of stochastic circuits, our implementation achieves much better performance than a software-based approach.","PeriodicalId":438572,"journal":{"name":"Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128920432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Stratix™ 10 High Performance Routable Clock Networks Stratix™10高性能路由时钟网络

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2016-02-21 DOI: 10.1145/2847263.2847279

C. Ebeling, D. How, D. Lewis, H. Schmit

引用次数: 12

Intel Acquires Altera: How Will the World of FPGAs be Affected? 英特尔收购Altera: fpga世界将受到怎样的影响?

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2016-02-21 DOI: 10.1145/2847263.2857658

Derek Chiou

引用次数: 4

A 1 GSa/s, Reconfigurable Soft-core FPGA ADC (Abstract Only) 1 GSa/s，可重构软核FPGA ADC(仅摘要)

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2016-02-21 DOI: 10.1145/2847263.2847310

Stefan Visser, H. Homulle, E. Charbon

{"title":"A 1 GSa/s, Reconfigurable Soft-core FPGA ADC (Abstract Only)","authors":"Stefan Visser, H. Homulle, E. Charbon","doi":"10.1145/2847263.2847310","DOIUrl":"https://doi.org/10.1145/2847263.2847310","url":null,"abstract":"There exist many applications where analog interfacing is abundant, e.g. sensor networks, automotive, industrial control, (quantum) physics etc. In those fields the use of FPGAs is continuously growing, however a direct link between the analog world and the digital FPGA is still missing (except for the newest generation of FPGAs, where analog-to-digital conversion is present, but limited in performance). External analog-to-digital converters (ADCs) are combined together with the FPGA to form a complete, application-specific system. This system is thus limited in compactness, flexibility, and reconfigurability. To address those issues we propose an ADC architecture, implemented in a FPGA, that is fully reconfigurable and easy to calibrate. This allows to alter the design, according to the system requirements. Therefore it can be used in a wide range of operating conditions and adjusted to changes in supply voltage and FPGA temperature. This architecture employs time-to-digital converters (TDCs) and phase interpolation techniques to reach a sampling rate higher than the clock frequency (400 MHz) of up to 1.2 GSa/s. The resulting FPGA ADC can achieve a 6 bit resolution over a 0.6 to 1.9 V input range. The system non-linearities (INL, DNL) are less than 0.45 LSB. The main advantages of this architecture are its scalability and reconfigurability, enabling applications with changing demands, on one single platform.","PeriodicalId":438572,"journal":{"name":"Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133719977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

PRFloor: An Automatic Floorplanner for Partially Reconfigurable FPGA Systems PRFloor:部分可重构FPGA系统的自动楼层规划器

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2016-02-21 DOI: 10.1145/2847263.2847270

T. D. A. Nguyen, Akash Kumar

{"title":"PRFloor: An Automatic Floorplanner for Partially Reconfigurable FPGA Systems","authors":"T. D. A. Nguyen, Akash Kumar","doi":"10.1145/2847263.2847270","DOIUrl":"https://doi.org/10.1145/2847263.2847270","url":null,"abstract":"Partial reconfiguration (PR) is gaining more attention from the research community because of its flexibility in dynamically changing some parts of the system at runtime. However, the current PR tools need the designer's involvement in manually specifying the shapes and locations for the PR regions (PRRs). It requires not only deep knowledge of the FPGA device, the system architecture, but also many trial-and-error attempts to find the best-possible floorplan. Therefore, many research works have been conducted to propose automatic floorplanners for PR systems. However, one of the most significant limitations of those works is that they only consider the PRRs and ignore all other static modules. In this paper, we propose a novel PR floorplanner called PRFloor. It takes into account all components in the system. The main ideas behind PRFloor are the unique recursive pseudo-bipartitioning heuristic using a new, simple, yet effective Nonlinear Integer Programming-based bipartitioner. The PRFloor performs very well in the experiments with various synthetic PR system setups with up to 130 modules, 24 PRRs and 85% of the FPGA resource. The average maximum clock frequency obtained for the actual PR systems implemented using PRFloor is even 3% higher than the similar systems without PR capability.","PeriodicalId":438572,"journal":{"name":"Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123269508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13