2014 International Conference on Field-Programmable Technology (FPT)最新文献

Message from the General Chair and Program Co-Chairs 来自总主委和项目联合主委的信息

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2022-12-05 DOI: 10.1109/icfpt56656.2022.9974448

W. Zhang, R. Cheung, Yuning. Liang, Hiroki Nakahara

引用次数: 0

Accelerator-in-Switch: A Novel Cooperation Framework for FPGAs and GPUs 开关中的加速器:fpga与gpu的新型合作框架

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00010

H. Amano

引用次数: 0

FPGA Accelerated HPC and Data Analytics FPGA加速HPC和数据分析

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00009

M. Strickland

引用次数: 1

Novel Neural Network Applications on New Python Enabled Platforms 新的支持Python的平台上的新颖神经网络应用

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00011

K. Vissers

引用次数: 0

FPGA as service in public Cloud: Why and how FPGA在公共云中的服务:为什么和如何

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2016-12-01 DOI: 10.1109/FPT.2016.7929179

Yonghua Lin

引用次数: 0

High-level synthesis - the right side of history 高层次的综合——历史正确的一面

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2016-12-01 DOI: 10.1109/FPT.2016.7929177

J. Anderson

{"title":"High-level synthesis - the right side of history","authors":"J. Anderson","doi":"10.1109/FPT.2016.7929177","DOIUrl":"https://doi.org/10.1109/FPT.2016.7929177","url":null,"abstract":"High-level synthesis (HLS) was first proposed in the 1980s. After spending decades on the sidelines of mainstream RTL digital design, there has been tremendous buzz around HLS technology in recent years. Indeed, HLS is on the upswing as a design methodology for field-programmable gate arrays (FPGAs) to improve designer productivity and ultimately, to make FPGA technology accessible to software engineers having limited hardware expertise. The hope is that down the road, software developers could use HLS to realize FPGA-based accelerators customized to applications that work in tandem with standard processors to raise computational throughput and energy efficiency. And, the further hope is that such HLS-generated accelerators operate close to the speed and energy efficiency of human-expert-designed accelerators. In this talk, I will overview the trends behind the recent drive towards FPGA HLS and why the need for, and use of, HLS will only become more pronounced in the coming years. I will argue that HLS, as opposed to traditional RTL design, is on the “right side of history”. The talk will highlight current HLS research directions and expose some of the challenges for HLS that may hinder its update in the digital design community. I will also describe work underway in the LegUp HLS project at the University of Toronto - a publicly available HLS tool that has been downloaded by over 4000 groups from around the world.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82917649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A universal FPGA-based floating-point matrix processor for mobile systems 一种通用的基于fpga的移动系统浮点矩阵处理器

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082766

Wenqiang Wang, Kaiyuan Guo, Mengyuan Gu, Yuchun Ma, Yu Wang

{"title":"A universal FPGA-based floating-point matrix processor for mobile systems","authors":"Wenqiang Wang, Kaiyuan Guo, Mengyuan Gu, Yuchun Ma, Yu Wang","doi":"10.1109/FPT.2014.7082766","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082766","url":null,"abstract":"FPGA-based acceleration of matrix operations is a promising solution in mobile systems. However, most related work focuses on a certain operation instead of a complete system. In this paper, we explore the possibility of integrating multiple matrix accelerators with a master processor and propose a universal floating-point matrix processor. The processor supports multiple matrix-matrix operations (Level 3 BLAS) and the matrix size is unlimited. The key component of the processor is a shared matrix cache which enables on-chip communication between different accelerators. This structure reduces the external memory bandwidth requirement and improves the overall performance. Considering the performance of the whole system, an asynchronous instruction execution mechanism is further proposed in the hardware-software interface so as to reduce the workload of the master processor. We demonstrate the system using a DE3 develop board and achieve a computing performance of about 19 GFLOPS. Experiments show the proposed processor achieves higher performance and energy efficiency than some state-of-the-art embedded processors including ARM cortex A9 and NIOS Il/f soft-core processor. The performance of the processor is even comparable to some desktop processors.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"51 1","pages":"139-146"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75285538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Towards automatic partial reconfiguration in FPGAs fpga的自动部分重构

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082798

Fubing Mao, Wei Zhang, Bingsheng He

引用次数: 3

Blokus Duo engine on a Zynq 在Zynq上使用Blokus Duo引擎

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082824

Susumu Mashimo, K. Fukuda, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi

引用次数: 0

Gigabyte-scale alignment acceleration of biological sequences via Ethernet streaming 通过以太网流实现生物序列的千兆级对齐加速

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082781

T. Moorthy, S. Gopalakrishnan

{"title":"Gigabyte-scale alignment acceleration of biological sequences via Ethernet streaming","authors":"T. Moorthy, S. Gopalakrishnan","doi":"10.1109/FPT.2014.7082781","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082781","url":null,"abstract":"We describe the design of a PC-to-FPGA data streaming platform that enables hardware acceleration of gigabyte scale input data. Specifically, the acceleration is an FPGA implementation of the Dialign Algorithm, which performs both global and local alignment of query biological sequences against relatively larger reference strands of biological sequences. Earlier implementations of this algorithm could not be scaled to handle gigabyte-length reference sequences, nor megabyte-length query sequences, due to the inherent limitations of available memory and logic on single-FPGA platforms. We solve these issues via the design of an Ethernet channel to stream the reference sequence, and describe the novel use of SATA based Solid State Drives (SSDs) to time multiplex the FPGA logic into handling larger query sequences as well. In doing so, this paper also presents a general method to achieve gigabyte-depth FIFOs on commercially available FPGA development boards. This benefits data-intensive acceleration even outside of the bioinformatics application domain. Through the development of our acceleration logic and careful coupling of the required IO peripherals, we have successfully demonstrated a processing time of 28.61 minutes for a 200 base-pair query-sequence aligned against a 1 GB reference-sequence, a rate that is limited only by SATA 2 SDD write speeds. The present runtime offers a 38× speedup (18.36 hours down to 28.61 minutes) compared to standalone PC based processing.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"54 1","pages":"227-230"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76939245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1