2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)最新文献

筛选
英文 中文
FPGA Delay Model Considering Logic-Level and Transistor-Level Parameters 考虑逻辑级和晶体管级参数的FPGA延迟模型
Qiang Liu, HanJing Qian
{"title":"FPGA Delay Model Considering Logic-Level and Transistor-Level Parameters","authors":"Qiang Liu, HanJing Qian","doi":"10.1109/FCCM.2017.16","DOIUrl":"https://doi.org/10.1109/FCCM.2017.16","url":null,"abstract":"Field programmable gate arrays (FPGAs) have been adopted in various fields, due to the design flexibility and customizability. Different applications have different requirements in performance, hardware resources and cost, leading to demands of diverse FPGA architectures. Delay is an important metric to evaluate different alternatives during FPGA architecture development. The existing analytical delay models for FPGAs mainly consider the logical architecture parameters. However, the variations of transistor-level parameters, Vdd and Vt, also have great influences on delay under the development trend of low-power design and deep sub-micron technology. To explore various design options at the early design stage and provide transistor-level accuracy, FPGA delay model considering Vdd and Vt is necessary. In this paper, an analytical model containing structural parameters of logic blocks and routing blocks as well as Vdd and Vt, is built to estimate the FPGA critical path delay.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130556926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-FPGA Evaluation Platform for Disaggregated Computing 多fpga分解计算评估平台
D. Theodoropoulos, Nikolaos S. Alachiotis, D. Pnevmatikatos
{"title":"Multi-FPGA Evaluation Platform for Disaggregated Computing","authors":"D. Theodoropoulos, Nikolaos S. Alachiotis, D. Pnevmatikatos","doi":"10.1109/FCCM.2017.20","DOIUrl":"https://doi.org/10.1109/FCCM.2017.20","url":null,"abstract":"We present a versatile FPGA-based evaluation platformfor exploring alternative execution strategies on disaggregatedenvironments for applications, considering differentprocessing block types: compute cores, memory, and accelerators. Developers can interconnect different blocks types in orderto create optimal configurations. A user-level software libraryallows quick mapping of applications on real hardware. Wehave implemented a fully working prototype using three ZC706FPGA boards, and evaluated different software / hardwareconfigurations of a matrix multiplication benchmark.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130642558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
The Potential of Dynamic Binary Modification and CPU-FPGA SoCs for Simulation 动态二进制修改和CPU-FPGA soc仿真的潜力
John Mawer, Oscar Palomar, Cosmin Gorgovan, A. Nisbet, W. Toms, M. Luján
{"title":"The Potential of Dynamic Binary Modification and CPU-FPGA SoCs for Simulation","authors":"John Mawer, Oscar Palomar, Cosmin Gorgovan, A. Nisbet, W. Toms, M. Luján","doi":"10.1109/FCCM.2017.36","DOIUrl":"https://doi.org/10.1109/FCCM.2017.36","url":null,"abstract":"In this paper we describe a flexible infrastructure that can directly interface unmodified application executables with FPGA hardware acceleration IP in order to 1), facilitate faster computer architecture simulation, and 2), to prototype microarchitecture or accelerator IP. Dynamic binary modification tool plugins are directly interfaced to the application under evaluation via flexible software interfaces provided by a userspace hardware control library that also manages access to a parameterised Bluespec IP library. We demonstrate the potential of our infrastructure with two use cases with unmodified application executables where, 1), an executable is dynamically instrumented to generate load/store and program counter events that are sent to FPGA hardware accelerated in-order microarchitecture pipeline, and memory hierarchy models, and 2), the design of a branch predictor is prototyped using an FPGA. The key features of our infrastructure are the ability to instrument at instruction level granularity, to code exclusively at the user level, and to dynamically discover and use available hardware models at run time, thus, we enable software developers to rapidly investigate and evaluate parameterised Bluespec microarchitecture and accelerator IP models. We present a comparison between our system and GEM5, the industry standard ARM architecture simulator, to demonstrate accuracy and relative performance, even though our system is implemented on an Xilinx Zynq 7000 FPGA board with tightly coupled FPGA and ARM Cortex A9 processors, it outperforms GEM5 running on a Xeon with 32GBs of RAM (400x vs 700x slowdown over native execution).","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132778291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Network-on-Chip Based H.264 Video Decoder Prototype Implemented on FPGAs 基于片上网络的H.264视频解码器的fpga实现
Ian J. Barge, Cristinel Ababei
{"title":"A Network-on-Chip Based H.264 Video Decoder Prototype Implemented on FPGAs","authors":"Ian J. Barge, Cristinel Ababei","doi":"10.1109/FCCM.2017.10","DOIUrl":"https://doi.org/10.1109/FCCM.2017.10","url":null,"abstract":"We present a field programmable gate array (FPGA) based implementation of the H.264 video decoder algorithm. The novelty of our design is that the communication between the decoder modules is done using a network-on-chip (NoC). This makes our design scalable and easily integrated within larger future NoC based systems, where the same hardware platform can host other algorithms such as compression, filtering, etc. Our primary objective is to study the achievable performance with a NoC based H.264 decoder solution. The design process involves primarily three main steps. First, the H.264 algorithm is split into eight different partitions, which are implemented as individual processing elements (PEs). These processing elements are attached to the routers of the regular mesh NoC and include: network abstraction layer (NAL) parser and entropy decoder, frame buffer and integer motion, inverse quantization inverse transform, intra prediction, luma sub-pixel motion, chroma sub-pixel motion, deblocking filter, and display driver. These PEs are described in VHDL with the first two being executed on Nios II softcores. The network-on-chip was generated with the Connect tool from Carnegie Mellon University and integrated within the top level design entity. Second, we specify the location of each of the PEs inside the regular mesh NoC. Because we use eight PEs, the NoC architecture needs to be a 3x3 regular mesh topology. When we specify the location of the PEs inside the mesh topology (i.e., specify the router to which a particular PE is attached), we effectively solve what is called the NoC mapping problem. To do that, we use manual mapping, which is done intelligently based on information about the internal structure of the decoding algorithm. This helps to reduce the number of routers that packets must travel through the network. Finally, the entire project is synthesized, placed, and routed with Quartus Prime Standard Edition 16.1 tool. The final design is tested and verified on the DE4 development board, which uses Altera's Stratix IV GX FPGA chip. The performance of the implementation at the time of the submission is that to decode 100 frames takes 33 seconds for a frame size of 192x144 pixels and to decode 100 frames takes 56 seconds for a resolution of 320x240 pixels per frame. Documentation and source codes of the entire project will be released to the public domain. We hope that this will enable other researchers to easily replicate and compare results to ours and that it will encourage and facilitate further research in the areas of image processing, computer vision, and advanced VHDL design and FPGAs.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132166418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Architecture for the Acceleration of a Hybrid Leaky Integrate and Fire SNN on the Convey HC-2ex FPGA-Based Processor 基于HC-2ex fpga处理器的泄漏集成与火灾SNN混合加速体系结构
Emmanouil Kousanakis, A. Dollas, E. Sotiriades, I. Papaefstathiou, D. Pnevmatikatos, Athanasia Papoutsi, P. Petrantonakis, Panayiota Poirazi, Spyridon Chavlis, George Kastellakis
{"title":"An Architecture for the Acceleration of a Hybrid Leaky Integrate and Fire SNN on the Convey HC-2ex FPGA-Based Processor","authors":"Emmanouil Kousanakis, A. Dollas, E. Sotiriades, I. Papaefstathiou, D. Pnevmatikatos, Athanasia Papoutsi, P. Petrantonakis, Panayiota Poirazi, Spyridon Chavlis, George Kastellakis","doi":"10.1109/FCCM.2017.51","DOIUrl":"https://doi.org/10.1109/FCCM.2017.51","url":null,"abstract":"Neuromorphic computing is expanding by leaps and bounds through custom integrated circuits (digital and analog), and large scale platforms developed by industry or government-funded projects (e.g. TrueNorth and BrainScaleS, respectively). Whereas the trend is for massive parallelism and neuromorphic computation in order to solve problems, such as those that may appear in machine learning and deep learning algorithms, there is substantial work on brain-like highly accurate neuromorphic computing in order to model the human brain. In such a form of computing, spiking neural networks (SNN) such as the Hodgkin and Huxley model are mapped to various technologies, including FPGAs. In this work, we present a highly efficient FPGA-based architecture for the detailed hybrid Leaky Integrate and Fire SNN that can simulate generic characteristics of neurons of the cerebral cortex. This architecture supports arbitrary, sparse O(n2) interconnection of neurons without need to re-compile the design, and plasticity rules, yielding on a four-FPGA Convey 2ex hybrid computer a speedup of 923x for a non-trivial data set on 240 neurons vs. the same model in the software simulator BRAIN on a Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz, i.e. the reference state-of-the-art software. Although the reference, official software is single core, the speedup demonstrates that the application scales well among multiple FPGAs, whereas this would not be the case in general-purpose computers due to the arbitrary interconnect requirements. The FPGA-based approach leads to highly detailed models of parts of the human brain up to a few hundred neurons vs. a dozen or fewer neurons on the reference system.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134033379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
On Bit-Serial NoCs for FPGAs fpga的位串行noc
Nachiket Kapre
{"title":"On Bit-Serial NoCs for FPGAs","authors":"Nachiket Kapre","doi":"10.1109/FCCM.2017.14","DOIUrl":"https://doi.org/10.1109/FCCM.2017.14","url":null,"abstract":"We can build lightweight bit-serial FPGA NoC routers thatcost 20 LUT, 17 FF per router and operate at 800–900 MHzspeeds. Each bit-serial router implements deflection-routing on aunidirectional torus topology requiring 1b-wide connection perport. The key ideas that enable this implementation are (1)reformulation of the dimension-ordered routing (DOR) functionusing compact 1 LUT, 1 FF streaming pattern matchers, (2)compact retiming of the datapath signals into SRL16 blocks, and(3) careful FPGA layout to efficiently pack the router logic intosmall rectangular regions 2×4 SLICEs on the chip. We anticipatethese bit-serial NoCs can be used in a variety of scenariosincluding overlay support for triggered debug, lightweight controlsignal dissemination, massively-parallel bit-serial processing.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"17 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133238767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Case for Common-Case: On FPGA Acceleration of Erasure Coding 一种通用情形:FPGA加速擦除编码
R. Nakhjavani, Jianwen Zhu
{"title":"A Case for Common-Case: On FPGA Acceleration of Erasure Coding","authors":"R. Nakhjavani, Jianwen Zhu","doi":"10.1109/FCCM.2017.42","DOIUrl":"https://doi.org/10.1109/FCCM.2017.42","url":null,"abstract":"Reliable storage is central component of data centers that support private or public cloud. Erasure coding has becoming increasingly popular alternative to replication for its capability in substantially cutting disk cost while delivering the same reliability. This paper reports the comprehensive results of using FPGA for accelerating erasure encoding and decoding algorithms. In particular, to accomplish the best efficiency in throughput delivered per thousand LUTs, we argue it is best to allocate more resources to the common-case, which we show can be more than 90%, while reducing performance target for the general-case. With further innovations, we show, as an example, that for a RS(10,4) erasure code, and a 1.3% disk failure probability, a 6Gb/s/KLUT can be accomplished for 5 nines of reliability. In terms of power efficiency, our design is able to achieve 40Gb/s/Watt.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121673276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CAPSL: A Tool for Automatic Generation of Hardware Sandboxes for IP Security 用于IP安全的硬件沙盒自动生成工具
Taylor J. L. Whitaker, C. Bobda
{"title":"CAPSL: A Tool for Automatic Generation of Hardware Sandboxes for IP Security","authors":"Taylor J. L. Whitaker, C. Bobda","doi":"10.1109/FCCM.2017.54","DOIUrl":"https://doi.org/10.1109/FCCM.2017.54","url":null,"abstract":"We propose a design flow for automatic generation of hardware sandboxes. Our tool, the Component Authentication Process for Sandboxed Layouts (CAPSL), generates sandboxes capable of detecting trojan activation and nullifying potential damage to a system at run-time. Our approach captures the behavioral properties of non-trusted IPs with formal models that are translated to checker automata and implemented within a untrusted partition of the system to isolate sandbox-system interactions upon deviation from the behavioral checkers.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115763369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Energy Efficient Loop Unrolling for Low-Cost FPGAs 低成本fpga的节能环路展开
Naveen Kumar Dumpala, S. B. Patil, Daniel E. Holcomb, R. Tessier
{"title":"Energy Efficient Loop Unrolling for Low-Cost FPGAs","authors":"Naveen Kumar Dumpala, S. B. Patil, Daniel E. Holcomb, R. Tessier","doi":"10.1109/FCCM.2017.22","DOIUrl":"https://doi.org/10.1109/FCCM.2017.22","url":null,"abstract":"Many FPGA computations, including block ciphers, require repetitive loop operations that are difficult to parallelize. Sequential loop implementation leads to significant clock powerwhile loop unrolling can lead to significant glitch power. In thispaper, we provide a low overhead approach to unroll blockciphers and other loops in low-cost FPGAs to reduce energyconsumption. A latch-based glitch filter is introduced for unrolledloops that reduces loop energy per operation by over an order ofmagnitude. Our filters and associated control for unrolled loopscan be automatically instantiated as a macro for FPGA designs, allowing for easy designer use. We demonstrate our approach forSIMON-128 and AES-256 block ciphers implemented on a XilinxArtix-7 FPGA.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116736032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Scalable FPGA-Based Accelerator for High-Throughput MCMC Algorithms 高通量MCMC算法的可扩展fpga加速器
M. Hosseini, Rashidul Islam, A. Kulkarni, T. Mohsenin
{"title":"A Scalable FPGA-Based Accelerator for High-Throughput MCMC Algorithms","authors":"M. Hosseini, Rashidul Islam, A. Kulkarni, T. Mohsenin","doi":"10.1109/FCCM.2017.56","DOIUrl":"https://doi.org/10.1109/FCCM.2017.56","url":null,"abstract":"Markov Chain Monte Carlo (MCMC) algorithms are used to obtain samples from any target probability distribution and are widely used in stochastic processing techniques. Stochastic processing techniques such as machine learning and image processing need to compute large amounts of data in real-time, thus high throughput MCMC samplers are of utmost importance. Parallel Tempering (PT) MCMC has proven better mixing and convergence for high-dimensional and multi-modal distributions compared to other popular MCMC algorithms. In this paper, we employ a special case of Dth order Markov chains to modify the PT-MCMC algorithm, named \"Multiple Parallel Tempering\" (MPT). The modification converts one MCMC sampler into multiple independent samplers that generate and interleave their samples on one output line each clock cycle. A fully scalable and pipelined hardware accelerator for the PT and proposed MPT sampler is designed and implemented on Artix-7 Xilinx FPGA for chain numbers of 1, 2, and 8. The post-place and route FPGA implementation results indicate that the throughput of the proposed MPT sampler for chain numbers 1, 2, and 8 achieves 31x, 31x, and 28x respectively higher as compared to PT sampler with the same chain number configuration.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133719398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信