2018 International Conference on Field-Programmable Technology (FPT)最新文献_第3页

GridGAS: An I/O-Efficient Heterogeneous FPGA+CPU Computing Platform for Very Large-Scale Graph Analytics GridGAS:用于大规模图形分析的I/ o高效异构FPGA+CPU计算平台

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00045

Yu Zou, Mingjie Lin

{"title":"GridGAS: An I/O-Efficient Heterogeneous FPGA+CPU Computing Platform for Very Large-Scale Graph Analytics","authors":"Yu Zou, Mingjie Lin","doi":"10.1109/FPT.2018.00045","DOIUrl":"https://doi.org/10.1109/FPT.2018.00045","url":null,"abstract":"In this paper, we develop a highly scalable approach to constructing an efficient heterogeneous graph processing engine in order to handle extremely large graph size beyond its on-board memory capacity. Our FPGA-based computing engine not only surpasses cutting-edge GPU-based engines in terms of computing performance and energy efficiency, but also proves to be highly versatile and thus can be applied to many types of low-latency and high-throughput graph analytic tasks central to the next-generation graph-based machine learning. We analyze in detail the difference between GPU's and FPGA's architectures and provide several fundamental reasons why, for irregular computations, FPGA may surpass GPU in computing latency and energy efficiency, and discuss some \"golden rules\" for designing an efficient FPGA+CPU heterogeneous platform and GPU's inefficiency when handling extremely large-scale graph datasets. To validate our approach, we implement our FPGA-based GridGAS computing engine with a KC705 Xilinx FPGA board and a baseline implementation using a Quadro K420 GPU following the same approach and test with large-scale graph datasets. Using PCIe 2.0 x8 only, our architecture achieves up to 170.4 MTEPS and 14.8 times speedup over the GPU baseline for datasets exceeding 1.4 GB in size.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128353475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Distributed-Memory Based FPGA Debug: Design Timing Impact 基于分布式内存的FPGA调试:设计时序影响

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00071

R. Hale, B. Hutchings

{"title":"Distributed-Memory Based FPGA Debug: Design Timing Impact","authors":"R. Hale, B. Hutchings","doi":"10.1109/FPT.2018.00071","DOIUrl":"https://doi.org/10.1109/FPT.2018.00071","url":null,"abstract":"In FPGAs, debug observability is often achievedby attaching memory-based recording circuitry to user signals. Block-RAM (BRAM)-based embedded logic analyzers are ofteninserted into user circuits to observe circuit behavior. Incontrast with BRAM-based approaches, distributed memory:1) is almost always available (user circuits may consume allBRAMs but even highly utilized circuits contain unused LUTs), and 2) can usually be physically located very near to user signals(LUTs are spread across the entire device while BRAMs arelocated only in specific columns). Previous work has shownbasic feasibility and demonstrated that distributed memoriescan provide debug observability for highly utilized circuits. Thispaper focuses on timing impacts and describes the quantitativetradeoff between FPGA device utilization, debug probe count, and clock frequency. For example, a design with 70% of LUTsutilized, with no debug logic, can operate at a minimum clockperiod of 5ns. Instrumenting 300 debug probes increases thisperiod to 7ns, and 1500 probes to 8ns. Placing trace bufferswith a simulated annealing algorithm improved success ratesfrom 20% to 50% depending on the design and probe count.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133371409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Implementation of an Autonomous Driving System for FPT2018 FPGA Design Competition Using the Zynqberry Processing Board 使用Zynqberry处理板实现FPT2018 FPGA设计竞赛中的自动驾驶系统

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00086

Yohei Shimmyo, Maiko Arakawa, Shunsuke Mie, Hiroaki Saito, Y. Okuyama, Hiroki Yomogita

引用次数: 3

[Publisher's information] (发布者的信息)

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/fpt.2018.00093

引用次数: 0

Lattice-Based Scheduling for Multi-FPGA Systems 基于格的多fpga系统调度

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00063

Teng Yu, Bo Feng, Mark Stillwell, Liucheng Guo, Yuchun Ma, John Thomson

引用次数: 2

Face-off Between the CAESAR Lightweight Finalists: ACORN vs. Ascon 凯撒轻量级决赛选手对决:ACORN vs Ascon

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00066

William Diehl, Farnoud Farahmand, Abubakr Abdulgadir, J. Kaps, K. Gaj

{"title":"Face-off Between the CAESAR Lightweight Finalists: ACORN vs. Ascon","authors":"William Diehl, Farnoud Farahmand, Abubakr Abdulgadir, J. Kaps, K. Gaj","doi":"10.1109/FPT.2018.00066","DOIUrl":"https://doi.org/10.1109/FPT.2018.00066","url":null,"abstract":"Authenticated ciphers potentially provide resource savings and security improvements over the joint use of secret-key ciphers and message authentication codes. The CAESAR competition aims to choose the most suitable authenticated ciphers for several categories of applications, including a lightweight use case, for which the primary criteria are performance in resource-constrained devices, and ease of protection against side channel attacks (SCA). In March 2018, two of the candidates from this category, ACORN and Ascon, were selected as CAESAR contest finalists. In this research, we compare two SCA-resistant FPGA implementations of ACORN and Ascon, where one set of implementations has area consumption nearly equivalent to the defacto standard AES-GCM, and the other set has throughput (TP) close to that of AES-GCM. The results show that protected implementations of ACORN and Ascon, with area consumption less than but close to AES-GCM, have 23.3 and 2.5 times, respectively, the TP of AES-GCM. Likewise, implementations of ACORN and Ascon with TP greater than but close to AES-GCM, consume 18% and 74% of the area, respectively, of AES-GCM.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114761229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Digital Transformation of Automobile and Mobility Service 汽车数字化转型与出行服务

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00012

Hiroshi Miyata

{"title":"Digital Transformation of Automobile and Mobility Service","authors":"Hiroshi Miyata","doi":"10.1109/FPT.2018.00012","DOIUrl":"https://doi.org/10.1109/FPT.2018.00012","url":null,"abstract":"The traffic system for automobiles has not changed its physical, industrial, and social structures in more than 100 years since its introduction to society. It has been deployed at a large scale and plays an important role in mobility. The system elements which is driver, automobile, and road physically contact each other, and the system is managed only by humans. Advancements in electric and electronic technologies for over 30 years have improved performance of automobile, but they have not improved performance of drivers and road. However, drivers, automobiles, and roads have begun to be connected each other through digital data, and the traffic system is now starting to be managed not only by humans but also by information technology such as artificial intelligence. This situation is assumed to change the system value, size, range, and role dramatically. This is the digital transformation of automobile and mobility service. New trends of CASE, i.e., connected car, automated driving, sharing car, mobility as a service, and electrification have made large-scale innovation in not only automobile and service but also automobile traffic system, automotive industry, and society as a whole. This paper outlines these new trends of system and service. Then, the latest needs of a data cycle of digital transformation for improving the systems and services are described because they change every year, with the involvement of people. Moreover, the paper discusses why the automobile digital transformation requires scalability, flexibility, security, traceability, safety, and reliability, and describes the expectation for field programmable technology as a candidate for the requirement.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125669927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Introduction of MNSTbot MNSTbot简介

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00083

Kyosuke Mori, Y. Saitoh, N. Nakasato

引用次数: 2

Scaling Up Loop Pipelining for High-Level Synthesis: A Non-iterative Approach 高级综合的放大循环流水线:一种非迭代方法

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00020

L. Rosa, Vanderlei Bonato, C. Bouganis

{"title":"Scaling Up Loop Pipelining for High-Level Synthesis: A Non-iterative Approach","authors":"L. Rosa, Vanderlei Bonato, C. Bouganis","doi":"10.1109/FPT.2018.00020","DOIUrl":"https://doi.org/10.1109/FPT.2018.00020","url":null,"abstract":"High-level synthesis is a powerful tool for increasing productivity in digital hardware design. However, as digital systems become larger and more complex, designers have to consider an increased number of optimizations and directives offered by high-level synthesis tools to control the hardware generation process, resulting in a large design space to be explored. One of the most impactful optimizations is loop pipelining due to its large improvement in the hardware throughput. Nevertheless, the modulo scheduling algorithms that are used for loop pipelining are computationally expensive, and their application to the whole design space can make its exploration inviable, leading to sub-optimum solutions. Current state-of-the-art tools for modulo scheduling follow an iterative approach, which solves O(n^2) optimization problems, where n is the loop code size. To address this problem, this work proposes a novel data-flow-based approach that solves exactly 2 optimization problems, independently of the loop code size. Results show orders-of-magnitude savings in the computation time, leading to significant design space exploration time savings when compared with the state-of-the-art. As such, the proposed method produces hardware designs of higher performance than the ones produced by the current state of the art for large and complex loops, maintaining a similar resource utilization.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128373304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

SAT Based Place-And-Route for High-Speed Designs on 2.5D FPGAs 基于SAT的2.5D fpga高速布线设计

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00027

C. Ravishankar, H. Fraisse, D. Gaitonde

{"title":"SAT Based Place-And-Route for High-Speed Designs on 2.5D FPGAs","authors":"C. Ravishankar, H. Fraisse, D. Gaitonde","doi":"10.1109/FPT.2018.00027","DOIUrl":"https://doi.org/10.1109/FPT.2018.00027","url":null,"abstract":"2.5D stacking technology allows us to build high performance and high capacity FPGA devices at reasonable costs. The communication between multiple dies happen on a passive silicon interposer at high speed, which pose several interesting challenges. Due to clock skew characteristics across multiple dies and increase in the min-max spread of delays, place-and-route tools need to address inter-die hold violations and optimize for performance. We implement a tractable SAT based methodology to achieve this by minimally detouring data paths to meet all hold requirements while optimizing performance. We also confine the solution to a small window around each inter-die (Laguna) channel to reduce routing resource utilization, congestion, and scale the methodology to any Laguna channel utilization. We improve performance across the interface by 11% compared to a state-of-the-art commercial flow and meet a 500MHz spec on Xilinx(R) UltraScale+(TM) devices in 2E speedgrade. We address the scalability concerns of SAT and show how we can use this in practice with negligible runtimes in implementation tools. Our solution paves the way for FPGA-as-a-service platforms where fast inter-die communication, that does not interfere with user specific logic, is pivotal to their success.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130665133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3