2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)最新文献_第3页

FPGA Delay Model Considering Logic-Level and Transistor-Level Parameters 考虑逻辑级和晶体管级参数的FPGA延迟模型

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.16

Qiang Liu, HanJing Qian

引用次数: 0

Multi-FPGA Evaluation Platform for Disaggregated Computing 多fpga分解计算评估平台

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.20

D. Theodoropoulos, Nikolaos S. Alachiotis, D. Pnevmatikatos

引用次数: 7

The Potential of Dynamic Binary Modification and CPU-FPGA SoCs for Simulation 动态二进制修改和CPU-FPGA soc仿真的潜力

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.36

John Mawer, Oscar Palomar, Cosmin Gorgovan, A. Nisbet, W. Toms, M. Luján

{"title":"The Potential of Dynamic Binary Modification and CPU-FPGA SoCs for Simulation","authors":"John Mawer, Oscar Palomar, Cosmin Gorgovan, A. Nisbet, W. Toms, M. Luján","doi":"10.1109/FCCM.2017.36","DOIUrl":"https://doi.org/10.1109/FCCM.2017.36","url":null,"abstract":"In this paper we describe a flexible infrastructure that can directly interface unmodified application executables with FPGA hardware acceleration IP in order to 1), facilitate faster computer architecture simulation, and 2), to prototype microarchitecture or accelerator IP. Dynamic binary modification tool plugins are directly interfaced to the application under evaluation via flexible software interfaces provided by a userspace hardware control library that also manages access to a parameterised Bluespec IP library. We demonstrate the potential of our infrastructure with two use cases with unmodified application executables where, 1), an executable is dynamically instrumented to generate load/store and program counter events that are sent to FPGA hardware accelerated in-order microarchitecture pipeline, and memory hierarchy models, and 2), the design of a branch predictor is prototyped using an FPGA. The key features of our infrastructure are the ability to instrument at instruction level granularity, to code exclusively at the user level, and to dynamically discover and use available hardware models at run time, thus, we enable software developers to rapidly investigate and evaluate parameterised Bluespec microarchitecture and accelerator IP models. We present a comparison between our system and GEM5, the industry standard ARM architecture simulator, to demonstrate accuracy and relative performance, even though our system is implemented on an Xilinx Zynq 7000 FPGA board with tightly coupled FPGA and ARM Cortex A9 processors, it outperforms GEM5 running on a Xeon with 32GBs of RAM (400x vs 700x slowdown over native execution).","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132778291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A Network-on-Chip Based H.264 Video Decoder Prototype Implemented on FPGAs 基于片上网络的H.264视频解码器的fpga实现

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.10

Ian J. Barge, Cristinel Ababei

{"title":"A Network-on-Chip Based H.264 Video Decoder Prototype Implemented on FPGAs","authors":"Ian J. Barge, Cristinel Ababei","doi":"10.1109/FCCM.2017.10","DOIUrl":"https://doi.org/10.1109/FCCM.2017.10","url":null,"abstract":"We present a field programmable gate array (FPGA) based implementation of the H.264 video decoder algorithm. The novelty of our design is that the communication between the decoder modules is done using a network-on-chip (NoC). This makes our design scalable and easily integrated within larger future NoC based systems, where the same hardware platform can host other algorithms such as compression, filtering, etc. Our primary objective is to study the achievable performance with a NoC based H.264 decoder solution. The design process involves primarily three main steps. First, the H.264 algorithm is split into eight different partitions, which are implemented as individual processing elements (PEs). These processing elements are attached to the routers of the regular mesh NoC and include: network abstraction layer (NAL) parser and entropy decoder, frame buffer and integer motion, inverse quantization inverse transform, intra prediction, luma sub-pixel motion, chroma sub-pixel motion, deblocking filter, and display driver. These PEs are described in VHDL with the first two being executed on Nios II softcores. The network-on-chip was generated with the Connect tool from Carnegie Mellon University and integrated within the top level design entity. Second, we specify the location of each of the PEs inside the regular mesh NoC. Because we use eight PEs, the NoC architecture needs to be a 3x3 regular mesh topology. When we specify the location of the PEs inside the mesh topology (i.e., specify the router to which a particular PE is attached), we effectively solve what is called the NoC mapping problem. To do that, we use manual mapping, which is done intelligently based on information about the internal structure of the decoding algorithm. This helps to reduce the number of routers that packets must travel through the network. Finally, the entire project is synthesized, placed, and routed with Quartus Prime Standard Edition 16.1 tool. The final design is tested and verified on the DE4 development board, which uses Altera's Stratix IV GX FPGA chip. The performance of the implementation at the time of the submission is that to decode 100 frames takes 33 seconds for a frame size of 192x144 pixels and to decode 100 frames takes 56 seconds for a resolution of 320x240 pixels per frame. Documentation and source codes of the entire project will be released to the public domain. We hope that this will enable other researchers to easily replicate and compare results to ours and that it will encourage and facilitate further research in the areas of image processing, computer vision, and advanced VHDL design and FPGAs.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132166418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

An Architecture for the Acceleration of a Hybrid Leaky Integrate and Fire SNN on the Convey HC-2ex FPGA-Based Processor 基于HC-2ex fpga处理器的泄漏集成与火灾SNN混合加速体系结构

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.51

Emmanouil Kousanakis, A. Dollas, E. Sotiriades, I. Papaefstathiou, D. Pnevmatikatos, Athanasia Papoutsi, P. Petrantonakis, Panayiota Poirazi, Spyridon Chavlis, George Kastellakis

{"title":"An Architecture for the Acceleration of a Hybrid Leaky Integrate and Fire SNN on the Convey HC-2ex FPGA-Based Processor","authors":"Emmanouil Kousanakis, A. Dollas, E. Sotiriades, I. Papaefstathiou, D. Pnevmatikatos, Athanasia Papoutsi, P. Petrantonakis, Panayiota Poirazi, Spyridon Chavlis, George Kastellakis","doi":"10.1109/FCCM.2017.51","DOIUrl":"https://doi.org/10.1109/FCCM.2017.51","url":null,"abstract":"Neuromorphic computing is expanding by leaps and bounds through custom integrated circuits (digital and analog), and large scale platforms developed by industry or government-funded projects (e.g. TrueNorth and BrainScaleS, respectively). Whereas the trend is for massive parallelism and neuromorphic computation in order to solve problems, such as those that may appear in machine learning and deep learning algorithms, there is substantial work on brain-like highly accurate neuromorphic computing in order to model the human brain. In such a form of computing, spiking neural networks (SNN) such as the Hodgkin and Huxley model are mapped to various technologies, including FPGAs. In this work, we present a highly efficient FPGA-based architecture for the detailed hybrid Leaky Integrate and Fire SNN that can simulate generic characteristics of neurons of the cerebral cortex. This architecture supports arbitrary, sparse O(n2) interconnection of neurons without need to re-compile the design, and plasticity rules, yielding on a four-FPGA Convey 2ex hybrid computer a speedup of 923x for a non-trivial data set on 240 neurons vs. the same model in the software simulator BRAIN on a Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz, i.e. the reference state-of-the-art software. Although the reference, official software is single core, the speedup demonstrates that the application scales well among multiple FPGAs, whereas this would not be the case in general-purpose computers due to the arbitrary interconnect requirements. The FPGA-based approach leads to highly detailed models of parts of the human brain up to a few hundred neurons vs. a dozen or fewer neurons on the reference system.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134033379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

On Bit-Serial NoCs for FPGAs fpga的位串行noc

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.14

Nachiket Kapre

引用次数: 5

A Case for Common-Case: On FPGA Acceleration of Erasure Coding 一种通用情形:FPGA加速擦除编码

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.42

R. Nakhjavani, Jianwen Zhu

引用次数: 2

CAPSL: A Tool for Automatic Generation of Hardware Sandboxes for IP Security 用于IP安全的硬件沙盒自动生成工具

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.54

Taylor J. L. Whitaker, C. Bobda

引用次数: 1

Energy Efficient Loop Unrolling for Low-Cost FPGAs 低成本fpga的节能环路展开

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.22

Naveen Kumar Dumpala, S. B. Patil, Daniel E. Holcomb, R. Tessier

引用次数: 10

A Scalable FPGA-Based Accelerator for High-Throughput MCMC Algorithms 高通量MCMC算法的可扩展fpga加速器

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI: 10.1109/FCCM.2017.56

M. Hosseini, Rashidul Islam, A. Kulkarni, T. Mohsenin

{"title":"A Scalable FPGA-Based Accelerator for High-Throughput MCMC Algorithms","authors":"M. Hosseini, Rashidul Islam, A. Kulkarni, T. Mohsenin","doi":"10.1109/FCCM.2017.56","DOIUrl":"https://doi.org/10.1109/FCCM.2017.56","url":null,"abstract":"Markov Chain Monte Carlo (MCMC) algorithms are used to obtain samples from any target probability distribution and are widely used in stochastic processing techniques. Stochastic processing techniques such as machine learning and image processing need to compute large amounts of data in real-time, thus high throughput MCMC samplers are of utmost importance. Parallel Tempering (PT) MCMC has proven better mixing and convergence for high-dimensional and multi-modal distributions compared to other popular MCMC algorithms. In this paper, we employ a special case of Dth order Markov chains to modify the PT-MCMC algorithm, named \"Multiple Parallel Tempering\" (MPT). The modification converts one MCMC sampler into multiple independent samplers that generate and interleave their samples on one output line each clock cycle. A fully scalable and pipelined hardware accelerator for the PT and proposed MPT sampler is designed and implemented on Artix-7 Xilinx FPGA for chain numbers of 1, 2, and 8. The post-place and route FPGA implementation results indicate that the throughput of the proposed MPT sampler for chain numbers 1, 2, and 8 achieves 31x, 31x, and 28x respectively higher as compared to PT sampler with the same chain number configuration.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133719398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7