2014 24th International Conference on Field Programmable Logic and Applications (FPL)最新文献

筛选
英文 中文
Pipelined compressor tree optimization using integer linear programming 基于整数线性规划的管道压缩机树优化
M. Kumm, P. Zipf
{"title":"Pipelined compressor tree optimization using integer linear programming","authors":"M. Kumm, P. Zipf","doi":"10.1109/FPL.2014.6927468","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927468","url":null,"abstract":"Compressor trees offer an effective realization of the multiple input addition needed by many arithmetic operations. However, mapping the commonly used carry save adders (CSA) of classical compressor trees to FPGAs suffers from a poor resource utilization. This can be enhanced by using generalized performance counters (GPCs). Prior work has shown that high efficient GPCs can be constructed by exploiting the low-level structure of the FPGA. However, due to their irregular shape, the selection of those is not straight forward. Furthermore, the compressor tree has to be pipelined to achieve the potential FPGA performance. Then, a selection between registered GPCs or flip-flops has to be done to balance the pipeline. This work defines the pipelined compressor tree synthesis as an optimization problem and proposes a (resource) optimal method using integer linear programming (ILP). Besides that, two new GPC mappings with high efficiency are proposed for Xilinx FPGAs.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129076145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
High level programming framework for FPGAs in the data center 数据中心fpga的高级编程框架
Oren Segal, M. Margala, S. R. Chalamalasetti, M. Wright
{"title":"High level programming framework for FPGAs in the data center","authors":"Oren Segal, M. Margala, S. R. Chalamalasetti, M. Wright","doi":"10.1109/FPL.2014.6927442","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927442","url":null,"abstract":"Heterogeneous computing offers a promising solution for energy efficient computing in the data center. FPGA based heterogeneous computing is an especially promising direction since it allows for the creation of custom hardware solutions for data centric parallel applications. One of the main issues delaying wide spread adoption of FPGAs as main stream high performance computing devices is the difficulty in programming them. OpenCL was meant to address the difficulties and the non-uniformity related to programming heterogeneous devices, unfortunately because of its complexity it sets the bar high for many software programmers, preventing them from directly benefiting from the computing power and energy efficiency that OpenCL and heterogeneous computing have to offer. This work presents an effort to bridge the gap by extending an existing Java programming framework (APARAPI), based on OpenCL, so that it can be used to program FPGAs at a high level of abstraction and increased ease of programmability. We run several real world algorithms to assess the performance of the APARAPI framework on both a low end and a high end system. On the low end and high and systems respectively we find up to 78-80 percent power reduction and 4.8X-5.3X speed increase running NBody simulation, as well as up to 65-80 percent power reduction and 6.2X-7X speed increase for a K-Means MapReduce algorithm running on top of the Hadoop framework and APARAPI.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127835661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Configuration approaches to improve computing efficiency of coarse-grained reconfigurable multimedia processor 提高粗粒度可重构多媒体处理器计算效率的配置方法
Chen Yang, Leibo Liu, Yansheng Wang, S. Yin, Peng Cao, Shaojun Wei
{"title":"Configuration approaches to improve computing efficiency of coarse-grained reconfigurable multimedia processor","authors":"Chen Yang, Leibo Liu, Yansheng Wang, S. Yin, Peng Cao, Shaojun Wei","doi":"10.1109/FPL.2014.6927439","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927439","url":null,"abstract":"This paper proposes three configuration approaches to improve computing efficiency of a coarse-grained reconfigurable array, including input data relocation, line-based context switching, and loop interval minimization. These proposed approaches fully exploit the parallelism and pipelining of the reconfigurable array, which reduce interval latency when switching the configuration contexts, and therefore greatly enhance computing efficiency. These proposed techniques are used in a coarse-grained reconfigurable multimedia system (REMUS). Measured results show that, owing to the proposed approaches, REMUS can achieve 1080p@30fps performance for H.264 high profile video decoding under 200MHz working frequency. When normalized to the same technology, REMUS outperforms XPP-III 6.98x in energy efficiency.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124472369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A fast and scalable FPGA damage diagnostic service for R3TOS using BIST cloning technique 基于BIST克隆技术的R3TOS快速可扩展FPGA损伤诊断服务
Ali Ebrahim, T. Arslan, X. Iturbe
{"title":"A fast and scalable FPGA damage diagnostic service for R3TOS using BIST cloning technique","authors":"Ali Ebrahim, T. Arslan, X. Iturbe","doi":"10.1109/FPL.2014.6927386","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927386","url":null,"abstract":"This paper presents a new technique to be used in the context of reconfigurable computing to accelerate the online diagnosis of permanent damage on Xilinx FPGAs using Built-In Self Tests (BISTs). Detecting and locating permanently damaged resources with precision is central to keep the system implemented on the FPGA flawless at all times; i.e. upcoming hardware tasks are mapped to available functional resources, circumventing the use of the damaged ones. The proposed diagnostic technique exploits the Multiple Frame Write (MFW) feature available in Xilinx FPGAs to “clone” (i.e. replicate) a single basic BIST circuit along arbitrarily sized and shaped areas on the FPGA without incurring large time overheads. Hence, the proposed technique allows for creating at runtime on-demand tailored BIST circuits to satisfy any diagnosis requirements that may rise up. Moreover, the proposed solution allows for saving memory in the system as it only requires storing basic BIST circuits. Finally, the paper presents a diagnostic service for a Reliable Reconfigurable Real-Time Operating System (R3TOS) that is based on the BIST cloning technique and works in cooperation with the R3TOS fault-handling and recovery mechanisms.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124596459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Achieving portability and efficiency over chip heterogeneous multiprocessor systems 在芯片异构多处理器系统上实现可移植性和效率
E. Cartwright, A. Sadeghian, Sen Ma, D. Andrews
{"title":"Achieving portability and efficiency over chip heterogeneous multiprocessor systems","authors":"E. Cartwright, A. Sadeghian, Sen Ma, D. Andrews","doi":"10.1109/FPL.2014.6927395","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927395","url":null,"abstract":"Emerging programming models for chip heterogeneous multiprocessor (CHMP) systems elevate architecture details up into the source code. This eliminates portability and requires designers to navigate a multidimensional search space when trying to optimize designs. In this paper, we present an approach that reinstates portability through a combination of polymorphic functions and an adaptive runtime system. Together they enable runtime profiling and dynamic scheduling of unaltered source code across systems with different combinations of heterogeneous resources. Our results verify the ability of our programming model and runtime system to re-enable the notion of writing code once and run anywhere. Runtime results show how runtime tuning can increase resource utilization and provide performance increases as the number and heterogeneity of computing resources increases.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122630319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An efficient and flexible host-FPGA PCIe communication library 一个高效灵活的主机- fpga PCIe通信库
Jian Gong, Tao Wang, Jiahua Chen, Haoyang Wu, Fan Ye, Songwu Lu, J. Cong
{"title":"An efficient and flexible host-FPGA PCIe communication library","authors":"Jian Gong, Tao Wang, Jiahua Chen, Haoyang Wu, Fan Ye, Songwu Lu, J. Cong","doi":"10.1109/FPL.2014.6927459","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927459","url":null,"abstract":"A high-performance interconnection between a host processor and FPGA accelerators is in much demand. Among various interconnection methods, a PCIe bus is an attractive choice for loosely coupled accelerators. Because there is no standard host-FPGA communication library, FPGA developers have to write significant amounts of PCIe related code at both the FPGA side and the host processor side. A high-performance host-FPGA PCIe communication library holds the key to broadening the use of FPGA accelerators. In this paper we target efficiency and flexibility as two important features in such a library. We discuss the challenges in providing these features, and present our solution to these challenges. We propose EPEE, an efficient and flexible host-FPGA PCIe communication library and describe its design. We implemented EPEE in various generations of Xilinx FPGAs with up to 26.24 Gbps half-duplex and 43.02 Gbps full-duplex aggregate throughput in the PCIe Gen2 X8 mode; these are at the best utilization levels that a host-FPGA PCIe library can achieve. The EPEE library has been integrated into four different FPGA applications with different data usage patterns in various institutes.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131671568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Ultrasmall: The smallest MIPS soft processor ultrassmall:最小的MIPS软处理器
Hiroshi Nakatsuka, Yuichiro Tanaka, Thiem Van Chu, Shinya Takamaeda-Yamazaki, Kenji Kise
{"title":"Ultrasmall: The smallest MIPS soft processor","authors":"Hiroshi Nakatsuka, Yuichiro Tanaka, Thiem Van Chu, Shinya Takamaeda-Yamazaki, Kenji Kise","doi":"10.1109/FPL.2014.6927387","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927387","url":null,"abstract":"Soft processors have been commonly used in FPGAbased designs to perform various useful functions. Some of these functions are not performance-critical and required to be implemented using very few FPGA resources. For such cases, it is desired to reduce circuit area of the soft processor as much as possible. This paper proposes Ultrasmall, a small soft processor for FPGAs. Ultrasmall supports a subset of the MIPS-I ISA and is designed for microcontrollers in FPGA-based SoCs. Ultrasmall employs an area efficient architecture to minimize the use of FPGA resources. While supporting the 32-bit ISA, Ultrasmall adopts the 2-bit wide serial ALU architecture. This approach significantly reduces the amount of FPGA resource usage. In addition to the device-independent optimizations for any FPGAs, we apply primitives-based optimizations for the Xilinx Spartan-3E FPGA series with 4-input LUTs, thereby further reducing the total number of occupied slices. The evaluation result shows that, on the Xilinx Spartan-3E XC3S500E FPGA, Ultrasmall occupies only 137 slices which is 84% of the number of occupied slices of Supersmall, a very small soft processor with the same design concept as Ultrasmall. On the other hand, in term of performance, Ultrasmall is 2.9× faster than Supersmall.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128444284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Ready PCIe data streaming solutions for FPGAs 为fpga准备的PCIe数据流解决方案
Thomas B. Preußer, R. Spallek
{"title":"Ready PCIe data streaming solutions for FPGAs","authors":"Thomas B. Preußer, R. Spallek","doi":"10.1109/FPL.2014.6927444","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927444","url":null,"abstract":"The PCIe attachment of FPGA accelerators within host workstations is convenient and offers a high-performance direct integration. FPGA-boards designed and equipped as PCIe extension cards are available off-the-shelf. This paper gives an overview on options to provide data streaming abstractions in user applications using PCIe technology. RIFFA and Xillybus are straightforward implementations, which aim at providing ready solutions at this level. This paper will describe these platforms and evaluate them in terms of their ease of use, their perceived maturity and the performance metrics bandwidth and latency. It concludes with providing a brief guideline on how to select the proper platform depending on the usage context.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122073460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Fast and accurate SEU-tolerance characterization method for Zynq SoCs Zynq soc的快速、准确的seu容限表征方法
Igor Villata, U. Bidarte, Uli Kretzschmar, A. Astarloa, Jesús Lázaro
{"title":"Fast and accurate SEU-tolerance characterization method for Zynq SoCs","authors":"Igor Villata, U. Bidarte, Uli Kretzschmar, A. Astarloa, Jesús Lázaro","doi":"10.1109/FPL.2014.6927416","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927416","url":null,"abstract":"In this paper a new SEU (Single Event Upset) emulation method for testing fault tolerant systems in FPGAs is presented. It is implemented on a “Xilinx Zynq®-7000 All Programmable System on Chip (SoC)” device, which combines a hard microprocessor with programmable logic. An important new feature is that an internal hardware configuration interface controlled by this microprocessor is provided. This interface is used for injecting faults into the configuration bitstream in order to emulate radiation effects. Since both the processing system and the programmable logic are in the same chip, this method has the high speed characteristics of internal fault injection methods. As a hard internal configuration interface is provided, a configuration bit belonging to the internal interface port cannot be flipped and injection side effects are avoided. This method is especially suitable for testing complex real fault-tolerant FPGA designs because no substantial modifications need to be added to the original design. A universal verification system is proposed to avoid designing complex external application-dependent testbenches.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124897517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Using an OpenCL framework to evaluate interconnect implementations on FPGAs 使用OpenCL框架评估fpga上的互连实现
Vincent Mirian, P. Chow
{"title":"Using an OpenCL framework to evaluate interconnect implementations on FPGAs","authors":"Vincent Mirian, P. Chow","doi":"10.1109/FPL.2014.6927440","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927440","url":null,"abstract":"Field Programmable Gate Arrays (FPGAs) are an ideal platform for building systems with custom hardware accelerators, however managing these systems is still a major challenge. The OpenCL standard has become accepted as a good programming model for managing heterogeneous platforms due to its rich constructs. Although commercial OpenCL frameworks are now emerging, there is a need for an open-source OpenCL framework that facilitates the exploration of the overall system architecture and software, as well as the implementation and architectures of the custom hardware accelerators (devices). In this paper, we use an OpenCL framework to compare interconnect implementations for a simple multiprocessor accelerator.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116193368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信