2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC)最新文献

[Copyright notice] (版权)

2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC) Pub Date : 2019-11-01 DOI: 10.1109/h2rc49586.2019.00002

引用次数: 0

Accelerating Large Garbled Circuits on an FPGA-enabled Cloud 在支持fpga的云上加速大型乱码电路

2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC) Pub Date : 2019-11-01 DOI: 10.1109/H2RC49586.2019.00008

M. Leeser, Mehmet Güngör, Kai Huang, Stratis Ioannidis

引用次数: 6

Performance and Energy Efficiency Analysis of Reverse Time Migration on a FPGA Platform FPGA平台上逆时迁移的性能与能效分析

2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC) Pub Date : 2019-11-01 DOI: 10.1109/H2RC49586.2019.00012

Joao Carlos Bittencourt, João Souza, Adhvan Furtado, E. Nascimento, Wagner Oliveira, A. Nascimento, L. Fialho, J. Oliveira, R. Tutu, Georgina Rojas, L. Jesus, André Lima

{"title":"Performance and Energy Efficiency Analysis of Reverse Time Migration on a FPGA Platform","authors":"Joao Carlos Bittencourt, João Souza, Adhvan Furtado, E. Nascimento, Wagner Oliveira, A. Nascimento, L. Fialho, J. Oliveira, R. Tutu, Georgina Rojas, L. Jesus, André Lima","doi":"10.1109/H2RC49586.2019.00012","DOIUrl":"https://doi.org/10.1109/H2RC49586.2019.00012","url":null,"abstract":"Reverse time migration (RTM) modeling is a computationally intensive component in the seismic processing workflow of oil and gas exploration, often demanding the manipulation of terabytes of data. Therefore, the computational kernels of the RTM algorithms need to access a large range of memory locations. However, most of these accesses result in cache misses, degrading the overall system performance. GPGPUs and FPGAs are the two endpoints in the spectrum of acceleration platforms, since both can achieve better performance in comparison to CPU on several high-performance applications. Recent literature highlights FPGA better energy efficiency when compared to GPGPU. The present work proposes a FPGA accelerated platform prototype targeting the computation of the RTM algorithm on an HPC environment. Experimental results highlight that speedups of 112x can be achieved, when compared to a sequential execution on CPU. When compared to a GPU, the power consumption has been reduced up to 55%.","PeriodicalId":413478,"journal":{"name":"2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126630678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

It's All About Data Movement: Optimising FPGA Data Access to Boost Performance 这一切都是关于数据移动:优化FPGA数据访问以提高性能

2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC) Pub Date : 2019-11-01 DOI: 10.1109/H2RC49586.2019.00006

Nick Brown, D. Dolman

{"title":"It's All About Data Movement: Optimising FPGA Data Access to Boost Performance","authors":"Nick Brown, D. Dolman","doi":"10.1109/H2RC49586.2019.00006","DOIUrl":"https://doi.org/10.1109/H2RC49586.2019.00006","url":null,"abstract":"The use of reconfigurable computing, and FPGAs in particular, to accelerate computational kernels has the potential to be of great benefit to scientific codes and the HPC community in general. However, whilst recent advanced in FPGA tooling have made the physical act of programming reconfigurable architectures much more accessible, in order to gain good performance the entire algorithm must be rethought and recast in a dataflow style. Reducing the cost of data movement for all computing devices is critically important, and in this paper we explore the most appropriate techniques for FPGAs. We do this by describing the optimisation of an existing FPGA implementation of an atmospheric model's advection scheme. By taking an FPGA code that was over four times slower than running on the CPU, mainly due to data movement overhead, we describe the profiling and optimisation strategies adopted to significantly reduce the runtime and bring the performance of our FPGA kernels to a much more practical level for real-world use. The result of this work is a set of techniques, steps, and lessons learnt that we have found significantly improves the performance of FPGA based HPC codes and that others can adopt in their own codes to achieve similar results.","PeriodicalId":413478,"journal":{"name":"2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122188277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Implementation and Impact of an Ultra-Compact Multi-FPGA Board for Large System Prototyping 用于大型系统原型设计的超紧凑多fpga板的实现及其影响

2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC) Pub Date : 2019-11-01 DOI: 10.1109/H2RC49586.2019.00010

Fabien Chaix, Georgios Ailamakis, Theocharis Vavouris, A. Damianakis, M. Katevenis, I. Mavroidis, Aggelos D. Ioannou, Nikolaos Kossifidis, Nikolaos Dimou, Giorgos Ieronymakis, M. Marazakis, Vassilis D. Papaefstathiou, Vassilis Flouris, Mihailis Ligerakis

{"title":"Implementation and Impact of an Ultra-Compact Multi-FPGA Board for Large System Prototyping","authors":"Fabien Chaix, Georgios Ailamakis, Theocharis Vavouris, A. Damianakis, M. Katevenis, I. Mavroidis, Aggelos D. Ioannou, Nikolaos Kossifidis, Nikolaos Dimou, Giorgos Ieronymakis, M. Marazakis, Vassilis D. Papaefstathiou, Vassilis Flouris, Mihailis Ligerakis","doi":"10.1109/H2RC49586.2019.00010","DOIUrl":"https://doi.org/10.1109/H2RC49586.2019.00010","url":null,"abstract":"Efficient prototyping of a large complex system can be significantly facilitated by the use of a flexible and versatile physical platform where both new hardware and software components can readily be implemented and tightly integrated in a timely manner. Towards this end, we have developed the 120 130 mm QFDB board and associated firmware, including the system software environment. We developed a large system based on this advanced dense and modular building block. The QFDB features 4 interconnected Xilinx Zynq Ultrascale+ devices, each one consisting of an ARM-based subsystem tightly coupled with reconfigurable logic. Each Zynq Ultrascale+ is connected to 16 GB of DDR4 memory. In addition, one Zynq provides storage through an M.2 Solid State Disk (SSD). In this paper, we present the design and the implementation of this board, as well as the software environment for board operation. Moreover, we describe a 10 Gb Ethernet communication infrastructure for interconnecting multiple boards together. Finally, we highlight the impact of this board on a number of ongoing research activities that leverage the QFDB versatility, both as a largescale prototyping system for HPC solutions, and as a host for the development of FPGA integration techniques.","PeriodicalId":413478,"journal":{"name":"2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131582200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

High-Throughput Multi-Threaded Sum-Product Network Inference in the Reconfigurable Cloud 可重构云中的高吞吐量多线程和积网络推理

2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC) Pub Date : 2019-11-01 DOI: 10.1109/H2RC49586.2019.00009

Micha Ober, Jaco A. Hofmann, Lukas Sommer, Lukas Weber, A. Koch

引用次数: 6

Combining Perfect Shuffle and Bitonic Networks for Efficient Quantum Sorting 结合完美洗牌和双元网络的高效量子排序

2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC) Pub Date : 2019-11-01 DOI: 10.1109/H2RC49586.2019.00011

Naveed Mahmud, Bailey Srimoungchanh, Bennett Haase-Divine, Nolan Blankenau, Annika Kuhnke, E. El-Araby

{"title":"Combining Perfect Shuffle and Bitonic Networks for Efficient Quantum Sorting","authors":"Naveed Mahmud, Bailey Srimoungchanh, Bennett Haase-Divine, Nolan Blankenau, Annika Kuhnke, E. El-Araby","doi":"10.1109/H2RC49586.2019.00011","DOIUrl":"https://doi.org/10.1109/H2RC49586.2019.00011","url":null,"abstract":"The emergence of quantum computers in the last decade has generated research interest in applications such as quantum sorting. Quantum sorting plays a critical role in creating ordered sets of data that can be better utilized, e.g., quantum ordered search or quantum network switching. In this paper, we propose a quantum sorting algorithm that combines highly parallelizable bitonic merge networks with perfect shuffle permutations (PSP), for sorting data represented in the quantum domain. The combination of bitonic networks with PSP improves the temporal complexity of bitonic merge sorting which is critical for reducing decoherence effects for quantum processing. We present space-efficient quantum circuits that can be used for quantum bit comparison and permutation. We also present a reconfigurable hardware quantum emulator for prototyping the proposed quantum algorithm. The emulator has a fully-pipelined architecture and supports double-precision floating-point computations, resulting in high throughput and accuracy. The proposed hardware architectures are implemented on a high-performance reconfigurable computer (HPRC). In our experiments, we emulated quantum sorting circuits of up to 31 fully-entangled quantum bits on a single FPGA node of the HPRC platform. To the best of our knowledge, our effort is the first to investigate a reconfigurable hardware emulation of quantum sorting using bitonic networks and perfect shuffle.","PeriodicalId":413478,"journal":{"name":"2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130794219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface 内存控制器墙:对OpenCL内存接口的英特尔FPGA SDK进行基准测试

2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC) Pub Date : 2019-10-15 DOI: 10.1109/H2RC49586.2019.00007

H. Zohouri, S. Matsuoka

{"title":"The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface","authors":"H. Zohouri, S. Matsuoka","doi":"10.1109/H2RC49586.2019.00007","DOIUrl":"https://doi.org/10.1109/H2RC49586.2019.00007","url":null,"abstract":"Supported by their high power efficiency and recent advancements in High Level Synthesis (HLS), FPGAs are quickly finding their way into HPC and cloud systems. Large amounts of work have been done so far on loop and area optimizations for different applications on FPGAs using HLS. However, a comprehensive analysis of the behavior and efficiency of the memory controller of FPGAs is missing in literature, which becomes even more crucial when the limited memory bandwidth of modern FPGAs compared to their GPU counterparts is taken into account. In this work, we will analyze the memory interface generated by Intel FPGA SDK for OpenCL with different configurations for input/output arrays, vector size, interleaving, kernel programming model, on-chip channels, operating frequency, padding, and multiple types of overlapped blocking. Our results point to multiple shortcomings in the memory controller of Intel FPGAs, especially with respect to memory access alignment, that can hinder the programmer’s ability in maximizing memory performance in their design. For some of these cases, we will provide work-arounds to improve memory bandwidth efficiency; however, a general solution will require major changes in the memory controller itself.","PeriodicalId":413478,"journal":{"name":"2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122478972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Organization 组织

2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC) Pub Date : 2018-10-03 DOI: 10.1201/9781315220659-8

R. F. Tinder, S. Yanushkevich, C. Hamacher, Z. Vranesic, S. Zaky, J. Raymond

引用次数: 0