2014 International Conference on Field-Programmable Technology (FPT)最新文献

筛选
英文 中文
A flexible interface architecture for reconfigurable coprocessors in embedded multicore systems using PCIe Single-root I/O virtualization 使用PCIe单根I/O虚拟化的嵌入式多核系统中可重构协处理器的灵活接口架构
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082780
O. Sander, S. Bähr, Enno Lübbers, T. Sandmann, Viet Vu Duy, J. Becker
{"title":"A flexible interface architecture for reconfigurable coprocessors in embedded multicore systems using PCIe Single-root I/O virtualization","authors":"O. Sander, S. Bähr, Enno Lübbers, T. Sandmann, Viet Vu Duy, J. Becker","doi":"10.1109/FPT.2014.7082780","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082780","url":null,"abstract":"Especially in complex system-of-systems scenarios, where multiple high-performance or real-time processing functions need to co-exist and interact, reconfigurable devices together with virtualization techniques show considerable promise to increase efficiency, ease integration and maintain functional and non-functional properties of the individual functions. In this paper, we propose a flexible interface architecture with low overhead for coupling reconfigurable coprocessors to high-performance general-purpose processors, allowing customized yet efficient construction of heterogeneous processing systems. Our implementation is based on PCI Express (PCIe) and optimized for virtualized systems, taking advantage of the SR-IOV capabilities in modern PCIe implementations. We describe the interface architecture and its fundamental technologies, detail the services provided to individual coprocessors and accelerator modules, and quantify key corner performance indicators relevant for virtualized applications.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"25 1","pages":"223-226"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79645248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
HW acceleration of multiple applications on a single FPGA 在单个FPGA上实现多个应用的硬件加速
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082797
Yidi Liu, Benjamin Carrión Schäfer
{"title":"HW acceleration of multiple applications on a single FPGA","authors":"Yidi Liu, Benjamin Carrión Schäfer","doi":"10.1109/FPT.2014.7082797","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082797","url":null,"abstract":"This works presents a fast and efficient method to map multiple computationally intensive kernels onto the same FPGA given the FPGA area and communication bandwidth constraint. FPGAs have grown to a size where multiple applications can now be mapped onto a single device. It is therefore important to develop methods than can efficiently decide which kernels of all of the applications under consideration should be mapped onto the FPGA in order to maximize the total system acceleration. Our method shows very good results compared to a standard genetic algorithm, which is often used for multi-objective optimization problems and against the optimal solution obtained using an exhaustive search method. Experimental results show that our method is very scalable and extremely fast.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"28 12 1","pages":"284-285"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87667772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating transfer entropy computation 加速传递熵计算
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082754
Shengjia Shao, Ce Guo, W. Luk, Stephen Weston
{"title":"Accelerating transfer entropy computation","authors":"Shengjia Shao, Ce Guo, W. Luk, Stephen Weston","doi":"10.1109/FPT.2014.7082754","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082754","url":null,"abstract":"Transfer entropy is a measure of information transfer between two time series. It is an asymmetric measure based on entropy change which only takes into account the statistical dependency originating in the source series, but excludes dependency on a common external factor. Transfer entropy is able to capture system dynamics that traditional measures cannot, and has been successfully applied to various areas such as neuroscience, bioinformatics, data mining and finance. When time series becomes longer and resolution becomes higher, computing transfer entropy is demanding. This paper presents the first reconfigurable computing solution to accelerate transfer entropy computation. The novel aspects of our approach include a new technique based on Laplace's Rule of Succession for probability estimation; a novel architecture with optimised memory allocation, bit-width narrowing and mixed-precision optimisation; and its implementation targeting a Xilinx Virtex-6 SX475T FPGA. In our experiments, the proposed FPGA-based solution is up to 111.47 times faster than one Xeon CPU core, and 18.69 times faster than a 6-core Xeon CPU.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"60-67"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89142750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Logic emulation in the megaLUT era - Moore's Law beats Rent's Rule 超级计算机时代的逻辑仿真——摩尔定律打败了“租金法则”
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082742
M. Butts
{"title":"Logic emulation in the megaLUT era - Moore's Law beats Rent's Rule","authors":"M. Butts","doi":"10.1109/FPT.2014.7082742","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082742","url":null,"abstract":"Throughout its twenty-five year history, logic emulation architectures have been governed by Rent's Rule. This empirical observation, first used to build 1960s mainframes, predicts the average number of cut nets that result when a digital module is arbitrarily partitioned into multiple parts, such as the FPGAs of a logic emulator. A fundamental advantage of emulation is that, unlike most devices, FPGAs always grow in capacity according to Moore's Law, just as the designs to be emulated have grown. Unfortunately packaging technology advances at a far slower pace, leaving emulators short on the pins demanded by Rent's Rule. Many cut nets are now sent through each package pin, which costs speed, power and area. At today's system-on-chip level of design, the number of system-level modules is growing, while their sizes are remaining constant. In the meantime, FPGAs have grown from a handful of logic lookup tables (LUTs) at the beginning to over a million LUTs today. At this scale, an entire system-level module such as an advanced 64-bit CPU can fit inside a single FPGA. Fewer module-internal nets need be cut, so Rent's Rule constraints are relaxing. Fewer and higher-level cut nets means logic emulation with megaLUT FPGAs is becoming faster, cooler, smaller, cheaper, and more reliable. FPGA's Moore's Law scaling is escaping from Rent's Rule.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"36 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81317231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A complementary architecture for high-speed true random number generator 一种高速真随机数发生器的互补结构
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082786
Xian-wei Yang, R. Cheung
{"title":"A complementary architecture for high-speed true random number generator","authors":"Xian-wei Yang, R. Cheung","doi":"10.1109/FPT.2014.7082786","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082786","url":null,"abstract":"In this paper, we introduce a novel FPGA-based design for true random number generator (TRNG). It is able to harvest the timing difference caused by the nonuniformity of the Integrated Circuits (ICs) and use it to generate the randomness. Compared with the previous related work, this design uses a complementary scheme that leads to a doubled data rated output. The proposed complementary design has improved entropy and achieved higher throughput. The prototype design has been implemented and verified on a Xilinx Virtex-6 ML605 evaluation board. As a result, the generated random number stream is able to pass the statistical NIST and DIEHARD test suites showing a reliable performance. Meanwhile, it can approach the maximum data rate as 50 Mbps stably.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"248-251"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76412274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Design re-use for compile time reduction in FPGA high-level synthesis flows 设计重用以减少FPGA高级合成流的编译时间
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082746
Marcel Gort, J. Anderson
{"title":"Design re-use for compile time reduction in FPGA high-level synthesis flows","authors":"Marcel Gort, J. Anderson","doi":"10.1109/FPT.2014.7082746","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082746","url":null,"abstract":"High-level synthesis (HLS) raises the level of abstraction for hardware design through the use of software methodologies. An impediment to productivity in HLS flows, however, is the run-time of the back-end toolflow - synthesis, packing, placement and routing - which can take hours or days for the largest designs. We propose a new back-end flow for HLS that makes use of pre-synthesized and placed \"macros\" for portions of the design, thereby reducing the amount of work to be done by the back-end tools, lowering run-time. A key aspect of our work is an analytical placement algorithm capable of handling large macros whose internal blocks have fixed relative placements, in conjunction with placing the surrounding individual logic blocks. In an experimental study, we consider the impact on run-time and quality-of-results of using macros: 1) in synthesis alone, and 2) in synthesis, packing and placement. Results show that the proposed approach reduces run-time by ~3x, on average, with a negative performance impact of ~5%.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"20 1","pages":"4-11"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81440057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Evaluation of SNMP-like protocol to manage a NoC emulation platform 对管理NoC仿真平台的类snmp协议的评估
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082776
O. A. D. L. Junior, V. Fresse, F. Rousseau
{"title":"Evaluation of SNMP-like protocol to manage a NoC emulation platform","authors":"O. A. D. L. Junior, V. Fresse, F. Rousseau","doi":"10.1109/FPT.2014.7082776","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082776","url":null,"abstract":"The Networks-on-Chip (NoCs) are currently the most appropriate communication structure for many-core embedded systems. An FPGA-based emulation platform can drastically reduce the time needed to evaluate a NoC, even if it is composed by tens or hundreds of distributed components. These components should be timely managed in order to execute an evaluation traffic scenario. There is a lack of standard protocols to drive FPGA-based NoC emulators. Such protocols could ease the integration of emulation components developed by different designers. In this paper, we evaluate a light version of SNMP (Simple Network Management Protocol) to manage an FPGA-based NoC emulation platform. The SNMP protocol and its related components are adapted to a hardware implementation. This facilitates the configuration of the emulation nodes without FPGA-resynthesis, as well as the extraction of emulation results. Some experiments highlight that this protocol is quite simple to implement and very efficient for a light resources overhead.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"8 1","pages":"199-206"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81493685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Analyzing the impact of heterogeneous blocks on FPGA placement quality 分析异构块对FPGA放置质量的影响
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082750
Chang Xu, Wentai Zhang, Guojie Luo
{"title":"Analyzing the impact of heterogeneous blocks on FPGA placement quality","authors":"Chang Xu, Wentai Zhang, Guojie Luo","doi":"10.1109/FPT.2014.7082750","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082750","url":null,"abstract":"In this paper we propose a quantitative approach to analyze the impact of heterogeneous blocks (H-blocks) on the FPGA placement quality. The basic idea is to construct synthetic heterogeneous placement benchmarks with known optimal wire-length to facilitate the quantitative analysis. To the best of our knowledge, this is the first work that enables the construction of wirelength-optimal heterogeneous placement examples. Besides analyzing the quality of existing placers, we further decompose the impacts of H-blocks from the architectural aspect and netlist aspect. Our analysis shows that a heterogeneous design hides the wirelength degradation by a more compact netlist than its homogeneous version; however, the heterogeneity results in a optimality gap of 52% in wirelength, where 25% is from architectural heterogeneity and 27% is from netlist heterogeneity. Therefore, new heterogeneous placement algorithms are needed to bridge the optimality gap and improve design quality.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"222 1","pages":"36-43"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74947045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improve memory access for achieving both performance and energy efficiencies on heterogeneous systems 改进内存访问,在异构系统上实现性能和能源效率
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082759
Hongyuan Ding, Miaoqing Huang
{"title":"Improve memory access for achieving both performance and energy efficiencies on heterogeneous systems","authors":"Hongyuan Ding, Miaoqing Huang","doi":"10.1109/FPT.2014.7082759","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082759","url":null,"abstract":"Hardware accelerators are capable of achieving significant performance improvement for many applications. In this work we demonstrate that it is critical to provide sufficient memory access bandwidth for accelerators to improve the performance and reduce energy consumption. We use the scale-invariant feature transform (SIFT) algorithm as a case study in which three bottleneck stages are accelerated on hardware logic. Based on different memory access patterns of SIFT algorithms, two different approaches are designed to accelerate different functions in SIFT on the Xilinx Zynq-7045 device. In the first approach, convolution is accelerated by designing fully customized hardware accelerator. On top of it, three interfacing methods are analyzed. In the second approach, a distributed multi-processor hardware system with its programming model is built to handle inconsecutive memory accesses. Furthermore, the last level cache (LLC) on the host processor is shared by all slaves to achieve better performance. Experiment results on the Zynq-7045 device show that the hybrid design in which two approaches are combined can achieve ~10 times and better improvement for both performance improvement and energy reduction compared with the pure software implementation for the convolution stage and the SIFT algorithm, respectively.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"30 1","pages":"91-98"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75534152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
FPGA-based high throughput XTS-AES encryption/decryption for storage area network 基于fpga的存储区域网络高吞吐量XTS-AES加密/解密
2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082791
Yi (Estelle) Wang, Akash Kumar, Yajun Ha
{"title":"FPGA-based high throughput XTS-AES encryption/decryption for storage area network","authors":"Yi (Estelle) Wang, Akash Kumar, Yajun Ha","doi":"10.1109/FPT.2014.7082791","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082791","url":null,"abstract":"The key issue to improve the performance for secure large-scale Storage Area Network (SAN) applications lies in the speed of its encryption/decryption module. Software-based encryption/decryption cannot meet throughput requirements. To solve this problem, we propose a FPGA-based XTS-AES encryption/decryption to suit the needs for secure SAN applications with high throughput requirements. Besides throughput, area optimization is also considered in this proposed design. First, we reuse the same AES encryption to produce the tweak value and unify the operations of AES encryption/decryption in XTS-AES encryption/decryption. Second, we transfer the computations of AES encryption/decryption from GF(28) to GF(24)2, which enables us move the map and the inverse map functions outside the AES round. Third, we propose to support the SubBytes and the inverse SubBytes by the same hardware component. Finally, pipelined registers have been inserted into the proposed unrolled architecture for XTS-AES encryption/decryption. The experiments show that the proposed design achieves 36.2 Gbits/s throughput using 6784 slices on XC6VLX240T FPGA.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"322 1","pages":"268-271"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76293414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信