2013 23rd International Conference on Field programmable Logic and Applications最新文献

筛选
英文 中文
Bambu: A modular framework for the high level synthesis of memory-intensive applications Bambu:用于内存密集型应用程序的高级综合的模块化框架
2013 23rd International Conference on Field programmable Logic and Applications Pub Date : 2013-10-24 DOI: 10.1109/FPL.2013.6645550
C. Pilato, Fabrizio Ferrandi
{"title":"Bambu: A modular framework for the high level synthesis of memory-intensive applications","authors":"C. Pilato, Fabrizio Ferrandi","doi":"10.1109/FPL.2013.6645550","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645550","url":null,"abstract":"This paper presents bambu, a modular framework for research on high-level synthesis currently under development at Politecnico di Milano. It can accept most of C constructs without requiring any three-state for their implementations by exploiting a novel and efficient memory architecture. It also allows the integration of floating-point units and thus it can deal with a wide range of data types. Finally, it allows to easily customize the synthesis flow (e.g., transformation passes, constraints, options, synthesis scripts) through an XML file and it automatically generates test-benches and validates the results against the corresponding software execution, supporting both ASIC and FPGA technologies.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133264822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 123
Demonstration of a heterogeneous multi-core processor with 3-D inductive coupling links 具有三维感应耦合链路的异构多核处理器的演示
2013 23rd International Conference on Field programmable Logic and Applications Pub Date : 2013-10-24 DOI: 10.1109/FPL.2013.6645628
Yusuke Koizumi, N. Miura, Yasuhiro Take, Hiroki Matsutani, T. Kuroda, H. Amano, Ryuichi Sakamoto, M. Namiki, K. Usami, Masaaki Kondo, Hiroshi Nakamura
{"title":"Demonstration of a heterogeneous multi-core processor with 3-D inductive coupling links","authors":"Yusuke Koizumi, N. Miura, Yasuhiro Take, Hiroki Matsutani, T. Kuroda, H. Amano, Ryuichi Sakamoto, M. Namiki, K. Usami, Masaaki Kondo, Hiroshi Nakamura","doi":"10.1109/FPL.2013.6645628","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645628","url":null,"abstract":"Cube-1 is a heterogeneous multi-core processor which can achieve the required performance with the least energy consumption as possible. It can control the performance and energy with two levels: (1) the number of accelerators can be easily changed by increasing or decreasing the number of stacked chips after fabrication, as they are connected with inductive coupling links. (2) The supply voltage for PE array of the accelerator can be controlled by the host CPU so that the required performance can be obtained with a minimum supply voltage.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132758107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generation of multi-core systems from multithreaded software 从多线程软件生成多核系统
2013 23rd International Conference on Field programmable Logic and Applications Pub Date : 2013-10-24 DOI: 10.1109/FPL.2013.6645582
Alexander Wold, J. Tørresen, Andreas Agne
{"title":"Generation of multi-core systems from multithreaded software","authors":"Alexander Wold, J. Tørresen, Andreas Agne","doi":"10.1109/FPL.2013.6645582","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645582","url":null,"abstract":"A heterogeneous system with soft CPU tailored to the individual threads of the application, while still software based, offers the potential for improved performance and resource utilization over a homogeneous system. In this paper we present a method to automatically create a heterogeneous multi-core system from a multithreaded software application. The resulting system consists of processing elements based on customized MIPS soft CPUs coupled with their respective programs. Using instruction set architecture (ISA) subsetting, we adapt the individual soft CPUs to the specific computations they have to perform. We have carried out a case study with a constraint solver application for which we find a performance increase of 1.54 accompanied with an area reduction of 22.5% compared to a homogeneous multi-core system. We also present an automated toolchain that generates synthesizable IP-cores from software threads with little additional development overhead.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115560086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated synthesis of FPGA-based heterogeneous interconnect topologies 基于fpga的异构互连拓扑的自动合成
2013 23rd International Conference on Field programmable Logic and Applications Pub Date : 2013-10-24 DOI: 10.1109/FPL.2013.6645494
A. Cilardo, E. Fusella, L. Gallo, A. Mazzeo
{"title":"Automated synthesis of FPGA-based heterogeneous interconnect topologies","authors":"A. Cilardo, E. Fusella, L. Gallo, A. Mazzeo","doi":"10.1109/FPL.2013.6645494","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645494","url":null,"abstract":"The choice of the communication topology in many systems is of vital importance because it affects the entire inter-component data traffic and impacts significantly the overall system performance and cost. On the other hand, there is a very large spectrum of interconnection topologies that potentially meet given communication requirements, determining various trade-offs between cost and performance. This work proposes an automated methodology to choose among all of these possibilities, avoiding a manual and time consuming design space search process. The methodology takes as input the description of the application communication requirements, and gives as output an on-chip synthesizable interconnection structure satisfying given area constraints. Targeted at FPGA technologies, the approach generates an interconnection structure combining crossbars and shared buses, connected through bridges, yielding a scalable, efficient structure. To the best of the authors' knowledge, it provides the first method to automatically generate FPGA-based communication architectures where heterogeneous communication elements, such as shared buses and crossbar switches, coexist in a network inherently supporting multiple communication paths. The resulting architecture improves the level of communication parallelism that can be exploited, while keeping area requirements low. The paper thoroughly describes the formalisms and the methodology used to derive such optimized heterogeneous topologies. It also discusses a couple of case-study applications emphasizing the impact of the proposed approach and highlighting the essential differences with a few other solutions in the literature.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Accelerating solvers for global atmospheric equations through mixed-precision data flow engine 利用混合精度数据流引擎加速求解全球大气方程
2013 23rd International Conference on Field programmable Logic and Applications Pub Date : 2013-10-24 DOI: 10.1109/FPL.2013.6645508
L. Gan, H. Fu, W. Luk, Chao Yang, Wei Xue, Xiaomeng Huang, Youhui Zhang, Guangwen Yang
{"title":"Accelerating solvers for global atmospheric equations through mixed-precision data flow engine","authors":"L. Gan, H. Fu, W. Luk, Chao Yang, Wei Xue, Xiaomeng Huang, Youhui Zhang, Guangwen Yang","doi":"10.1109/FPL.2013.6645508","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645508","url":null,"abstract":"One of the most essential and challenging components in a climate system model is the atmospheric model. To solve the multi-physical atmospheric equations, developers have to face extremely complex stencil kernels. In this paper, we propose a hybrid CPU-FPGA algorithm that applies single and multiple FPGAs to compute the upwind stencil for the global shallow water equations. Through mixed-precision arithmetic, we manage to build a fully pipelined upwind stencil design on a single FPGA, which can perform 428 floating-point and 235 fixed-point operations per cycle. The CPU-FPGA algorithm using one Virtex-6 FPGA provides 100 times speedup over a 6-core CPU and 4 times speedup over a hybrid node with 12 CPU cores and a Fermi GPU card. The algorithm using four FPGAs provides 330 times speedup over a 6-core CPU; it is also 14 times faster and 9 times more power efficient than the hybrid CPU-GPU node.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123628974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
An open-source multi-FPGA modular system for fair benchmarking of True Random Number Generators 一个开源的多fpga模块化系统,用于真随机数生成器的公平基准测试
2013 23rd International Conference on Field programmable Logic and Applications Pub Date : 2013-10-24 DOI: 10.1109/FPL.2013.6645570
V. Fischer, F. Bernard, Patrick Haddad
{"title":"An open-source multi-FPGA modular system for fair benchmarking of True Random Number Generators","authors":"V. Fischer, F. Bernard, Patrick Haddad","doi":"10.1109/FPL.2013.6645570","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645570","url":null,"abstract":"True Random Number Generators (TRNG) are cryptographic primitives that exploit intrinsic noise sources in electronic devices. Their quality is linked to the underlying technology, activity of the neighboring circuitry and device environment (temperature, power supply, electromagnetic emanations). Consequently, when comparing TRNGs, they should be tested in identical technology, system architecture and operating conditions. We present a unified hardware platform and related open source tools aimed at fair benchmarking of TRNGs implemented in different FPGA technologies. The platform is accessible remotely. Designers can download related tools from the web site and they can upload their configuration bitstream to the remote FPGA and download random data generated in the same hardware and in the same conditions as other concurrent designs and state-of-the-art generators. The proposed tools were approved in many applications and they guarantee safe acquisition of random sequences at data rates of up to 400 Mbits/s.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123642044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Improving autonomous soft-error tolerance of FPGA through LUT configuration bit manipulation 通过LUT组态位操作提高FPGA的自主软容错性
2013 23rd International Conference on Field programmable Logic and Applications Pub Date : 2013-10-24 DOI: 10.1109/FPL.2013.6645498
Anup Das, Shyamsundar Venkataraman, Akash Kumar
{"title":"Improving autonomous soft-error tolerance of FPGA through LUT configuration bit manipulation","authors":"Anup Das, Shyamsundar Venkataraman, Akash Kumar","doi":"10.1109/FPL.2013.6645498","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645498","url":null,"abstract":"Soft-errors in LUT configuration bits of FPGAs can alter the functionality of an implemented design, rendering it useless, unless re-programmed. This paper proposes a technique to improve autonomous fault-masking capabilities of a design by maximizing the number of zeros or ones in LUTs. The technique utilizes spare resources (XOR gates and carry chain) of FPGA devices to selectively manipulate LUT contents using two operations - LUT restructuring and LUT decomposition. Experiments conducted with a wide set of benchmarks from MCNC, IWLS 2005 and ITC99 benchmark suite on Xilinx Virtex 6 FPGA board demonstrate that the proposed methodology maximizes logic 0/1 of LUTs by an average 20% achieving 80% fault-masking with no area overhead. The fault-rate of the entire design is reduced by 60% on average as compared to the existing techniques. Further, an additional 5% fault-masking can be achieved with a 7% increase in slice usage.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114157014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Efficient implementation of Virtual Coarse Grained Reconfigurable Arrays on FPGAS 虚拟粗粒度可重构阵列在fpga上的高效实现
2013 23rd International Conference on Field programmable Logic and Applications Pub Date : 2013-10-24 DOI: 10.1109/FPL.2013.6645516
Karel Heyse, Tom Davidson, Elias Vansteenkiste, Karel Bruneel, D. Stroobandt
{"title":"Efficient implementation of Virtual Coarse Grained Reconfigurable Arrays on FPGAS","authors":"Karel Heyse, Tom Davidson, Elias Vansteenkiste, Karel Bruneel, D. Stroobandt","doi":"10.1109/FPL.2013.6645516","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645516","url":null,"abstract":"Fine grained Field Programmable Gate Arrays (FPGA) are complex to program and therefore suffer from high development costs. To solve this problem, Virtual Coarse Grained Reconfigurable Arrays (Virtual CGRA), or CGRAs implemented on FPGAs, have been proposed. Conventional implementations of VCGRAs use functional FPGA resources, such as LookUp Tables, to implement the virtual switch blocks, registers and other components that make the VCGRA configurable. We show that this is a large overhead that can often be avoided by mapping these components directly on lower level FPGA resources such as physical switch blocks and configuration memory. We show how this can be achieved using the tool flow for parameterised FPGA configurations and illustrate the advantages of this method by showing that an area reduction of 50% is attainable for a VCGRA aimed at regular expression matching.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131302496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Accurate and flexible flow-based monitoring for high-speed networks 准确灵活的高速网络流量监控
2013 23rd International Conference on Field programmable Logic and Applications Pub Date : 2013-10-24 DOI: 10.1109/FPL.2013.6645557
Marco Forconesi, G. Sutter, S. López-Buedo, J. Aracil
{"title":"Accurate and flexible flow-based monitoring for high-speed networks","authors":"Marco Forconesi, G. Sutter, S. López-Buedo, J. Aracil","doi":"10.1109/FPL.2013.6645557","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645557","url":null,"abstract":"In this paper we present an FPGA-based architecture to export flows in 10 Gbps networks, implemented on the NetFPGA-10G platform. Flow-based monitoring is a powerful methodology to analyze and detect network issues, such as congested links or DDoS attacks. Our design provides the following advantages: (i) The architecture allows processing 10 Gbps links without sampling, even for the highest packet rate of 14.88 Mpps (Million packets per second) that corresponds to the shortest (64-byte) Ethernet frames; (ii) It is possible to manage up to 786,432 concurrent flows; (iii) The project is developed in an open-source hardware platform and the HDL code is open to the community; (iv) The proposed approach frees network routers from the burden of exporting flows.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114748091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Energy efficient parameterized FFT architecture 节能参数化FFT架构
2013 23rd International Conference on Field programmable Logic and Applications Pub Date : 2013-10-24 DOI: 10.1109/FPL.2013.6645545
Ren Chen, H. Le, V. Prasanna
{"title":"Energy efficient parameterized FFT architecture","authors":"Ren Chen, H. Le, V. Prasanna","doi":"10.1109/FPL.2013.6645545","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645545","url":null,"abstract":"In this paper, we revisit the classic Fast Fourier Transform (FFT) for energy efficient designs on FPGAs. A parameterized FFT architecture is proposed to identify the design trade-offs in achieving energy efficiency. We first perform design space exploration by varying the algorithm mapping parameters, such as the degree of vertical and horizontal parallelism, that characterize decomposition based FFT algorithms. Then we explore an energy efficient design by empirical selection on the values of the chosen architecture parameters, including the type of memory elements, the type of interconnection network and the number of pipeline stages. The trade offs between energy, area, and time are analyzed using two performance metrics: the energy efficiency (defined as the number of operations per Joule) and the Energy×Area×Time (EAT) composite metric. From the experimental results, a design space is generated to demonstrate the effect of these parameters on the various performance metrics. For N-point FFT (16 ≤ N ≤ 1024), our designs achieve up to 28% and 38% improvement in the energy efficiency and EAT, respectively, compared with a state-of-the-art design.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127850568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信