2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)最新文献_第5页

Hobbit — Smaller but faster than a dwarf: Revisiting lightweight SHA-3 FPGA implementations 霍比特人-比矮人小但比矮人快:重新审视轻量级SHA-3 FPGA实现

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857176

Bernhard Jungk, Marc Stöttinger

引用次数: 12

Power-efficiency analysis of accelerated BWA-MEM implementations on heterogeneous computing platforms 异构计算平台上加速BWA-MEM实现的能效分析

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857181

Ernst Houtgast, V. Sima, G. Marchiori, K. Bertels, Z. Al-Ars

{"title":"Power-efficiency analysis of accelerated BWA-MEM implementations on heterogeneous computing platforms","authors":"Ernst Houtgast, V. Sima, G. Marchiori, K. Bertels, Z. Al-Ars","doi":"10.1109/ReConFig.2016.7857181","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857181","url":null,"abstract":"Next Generation Sequencing techniques have dramatically reduced the cost of sequencing genetic material, resulting in huge amounts of data being sequenced. The processing of this data poses huge challenges, both from a performance perspective, as well as from a power-efficiency perspective. Heterogeneous computing can help on both fronts, by enabling more performant and more power-efficient solutions. In this paper, power-efficiency of the BWA-MEM algorithm, a popular tool for genomic data mapping, is studied on two heterogeneous architectures. The performance and power-efficiency of an FPGA-based implementation using a single Xilinx Virtex-7 FPGA on the Alpha Data add-in card is compared to a GPU-based implementation using an NVIDIA GeForce GTX 970 and against the software-only baseline system. By offloading the Seed Extension phase on an accelerator, both implementations are able to achieve a two-fold speedup in overall application-level performance over the software-only implementation. Moreover, the highly customizable nature of the FPGA results in much higher power-efficiency, as the FPGA power consumption is less than one fourth of that of the GPU. To facilitate platform and tool-agnostic comparisons, the base pairs per Joule unit is introduced as a measure of power-efficiency. The FPGA design is able to map up to 44 thousand base pairs per Joule, a 2.1x gain in power-efficiency as compared to the software-only baseline.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131671640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Breeze computing: A just in time (JIT) approach for virtualizing FPGAs in the cloud 微风计算:一种在云端虚拟化fpga的即时(JIT)方法

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857159

Sen Ma, D. Andrews, Shanyuan Gao, Jaime Cummins

{"title":"Breeze computing: A just in time (JIT) approach for virtualizing FPGAs in the cloud","authors":"Sen Ma, D. Andrews, Shanyuan Gao, Jaime Cummins","doi":"10.1109/ReConFig.2016.7857159","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857159","url":null,"abstract":"In this paper, we introduce a new design flow and architecture that lets programmers replace synthesis with compilation to create custom accelerators within data center and warehouse scale computers that include reconfigurable many core architectures. Within our new approach, we virtualize FPGAs into pre-defined partially reconfigurable tiles. We then define a run time interpreter that assembles bit stream versions of programming patterns into the tiles. The bit streams as well as software executables are maintained within libraries accessed by the application programmers. In our approach, synthesis occurs hand in hand with the initial coding of the software programming patterns when a Domain Specific Language is first created for the application programmers. Initial results show the approach allows hardware accelerators to be compiled 100x faster compared to the time required to synthesize the same functionality. Initial performance results further show a compilation/interpretation approach can achieve approximately equivalent performance for matrix operations and filtering compared to synthesizing a custom accelerator.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116431535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

ARM+FPGA platform to manage solid-state-smart transformer in smart grid application ARM+FPGA平台管理智能电网应用中的固态智能变压器

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857155

N. Nila-Olmedo, F. Mendoza-Mondragón, A. Espinosa-Calderón, Moreno

引用次数: 4

Towards FPGA-assisted spark: An SVM training acceleration case study fpga辅助火花:SVM训练加速案例研究

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857194

S. M. H. Ho, Maolin Wang, Ho-Cheung Ng, Hayden Kwok-Hay So

{"title":"Towards FPGA-assisted spark: An SVM training acceleration case study","authors":"S. M. H. Ho, Maolin Wang, Ho-Cheung Ng, Hayden Kwok-Hay So","doi":"10.1109/ReConFig.2016.7857194","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857194","url":null,"abstract":"A system that augments the Apache Spark data processing framework with FPGA accelerators is presented as a way to model and deploy FPGA-assisted applications in large-scale clusters. In our proposed framework, FPGAs can optionally be used in place of the host CPU for Resilient distributed datasets (RDDs) transformations, allowing seamless integration between gateware and software processing. Using the case of training an Support Vector Machine (SVM) cell image classifier as a case study, we explore the feasibilities, benefits and challenges of such technique. In our experiments where data communication between CPU and FPGA is tightly controlled, a consistent speedup of up to 1.6x can be achieved for the target SVM training application as the cluster size increases. Hardware-software techniques that are crucial to achieve acceleration such as the management of data partition size are explored. We demonstrate the benefits of the proposed framework, while also illustrate the importance of careful hardware-software management to avoid excessive CPU-FPGA communication that can quickly diminish the benefits of FPGA acceleration.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126268889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Robust bitstream protection in FPGA-based systems through low-overhead obfuscation 通过低开销混淆在基于fpga的系统中实现健壮的比特流保护

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857187

Robert Karam, Tamzidul Hoque, S. Ray, M. Tehranipoor, S. Bhunia

{"title":"Robust bitstream protection in FPGA-based systems through low-overhead obfuscation","authors":"Robert Karam, Tamzidul Hoque, S. Ray, M. Tehranipoor, S. Bhunia","doi":"10.1109/ReConFig.2016.7857187","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857187","url":null,"abstract":"Reconfigurable hardware, such as Field Programmable Gate Arrays (FPGAs), are being increasingly deployed in diverse application areas including automotive systems, critical infrastructures, and the emerging Internet of Things (IoT), to implement customized designs. However, securing FPGA-based designs against piracy, reverse engineering, and tampering is challenging, especially for systems that require remote upgrade. In many cases, existing solutions based on bit-stream encryption may not provide sufficient protection against these attacks. In this paper, we present a novel obfuscation approach for provably robust protection of FPGA bitstreams at low overhead that goes well beyond the protection offered by bitstream encryption. The approach works with existing FPGA architectures and synthesis flows, and can be used with encryption techniques, or by itself for power and area-constrained systems. It leverages “FPGA dark silicon” — unused resources within the configurable logic blocks — to efficiently obfuscate the true functionality. We provide a detailed threat model and security analysis for the approach. We have developed a complete application mapping framework that integrates with the Altera Quartus II software. Using this CAD framework, we achieve provably strong security against all major attacks on FPGA bitstreams with an average 13% latency and 2% total power overhead for a set of benchmark circuits, as well as several large-scale open-source IP blocks on commercial FPGA.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117288606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

FPGA implementation of optimized XBM specifications by transformation for AFSMs 优化XBM规格的FPGA实现转换为AFSMs

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857171

Kledermon Garcia, D. L. Oliveira, R. d'Amore, L. Faria, J. L. V. Oliveira

{"title":"FPGA implementation of optimized XBM specifications by transformation for AFSMs","authors":"Kledermon Garcia, D. L. Oliveira, R. d'Amore, L. Faria, J. L. V. Oliveira","doi":"10.1109/ReConFig.2016.7857171","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857171","url":null,"abstract":"The asynchronous paradigm is an alternative to digital system design because it eliminates the problems related to the clock signal, such as clock skew, clock distribution and power dissipation of the clock. An interesting style for asynchronous design, which is familiar to designers, divides the system in an asynchronous controller with synchronous datapath. A specification known as Extended Burst-Mode (XBM) is the most adequate one to describe the asynchronous controllers in this design style. The XBM specification must meet a number of properties to be implementable. A property known as the signal polarity may affect the controller performance. To satisfy the signal polarity, the designer must often introduce some state transitions that do not perform any operation, which are called in this paper as “dead transitions”. An XBM specification with dead transitions can reduce the controller performance. In this paper, we propose an algorithm that eliminates dead transitions in a XBM specification. This elimination occurs by transforming the original XBM specification, which leads to an optimization of the system performance. The algorithm was applied to seven well-known benchmarks and obtained a reduction of up to 37% in processing time.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114868780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A multi-functional memory unit with PLA-based reconfigurable decoder 具有基于pla的可重构解码器的多功能存储单元

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857145

Nobuyuki Yahiro, Bo Liu, Atsushi Nanri, S. Nakatake, Y. Takashima, Gong Chen

{"title":"A multi-functional memory unit with PLA-based reconfigurable decoder","authors":"Nobuyuki Yahiro, Bo Liu, Atsushi Nanri, S. Nakatake, Y. Takashima, Gong Chen","doi":"10.1109/ReConFig.2016.7857145","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857145","url":null,"abstract":"An application-specific usage of memory is an important key in development of embedded systems for IoT devices. A functional memory unit such as content addressable memory (CAM) is a good solution for network-specific applications. This work proposes a novel functional memory unit which can reconfigure a function of the memory decoder. In our reconfigurable mechanism, uni-switch cells are introduced to play an alternative role of a logic or a wire, and are embedded in an SRAM memory array. A set of uni-switches is connected and constitutes a programmable logic array (PLA) unit. The PLA has a suitable advantage for a decoder that the multi-input and multi-output function can be realized with a small area, compared with look-up table (LUT). Hence, an extensional function of the decoder is realized by PLA units inside the memory array, and a combination of PLA units provides potentials to configure various functions for stored data such as sorting, filtering, error correction, and encryption/decryption. In this paper, we present a fundamental architecture of our functional memory unit with PLA units, and demonstrate an implementation of 32-bit full adder and 2-bit counter by using PLA units.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115613890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The R2-D2 toolchain — Automated porting of safety-critical applications to FPGAs R2-D2工具链-将安全关键应用程序自动移植到fpga

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857192

Steffen Vaas, M. Reichenbach, Ulrich Margull, D. Fey

{"title":"The R2-D2 toolchain — Automated porting of safety-critical applications to FPGAs","authors":"Steffen Vaas, M. Reichenbach, Ulrich Margull, D. Fey","doi":"10.1109/ReConFig.2016.7857192","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857192","url":null,"abstract":"Safety-critical applications require reliable hardware platforms with deterministic behavior. Concerning the increasing demand for performance, current single core solutions are not sufficient anymore. Classical multi-core processors are designed for a general application case, which provide much performance at the expense of determinism and reliability. In safety-critical applications, all required tasks are already known at development time. They are specified by a system description, like AUTOSAR. Thus, a hardware architecture providing one core for each task and one physical link for each data exchange between different tasks can be derived. However, such a highly application-specific architecture is not available. Latest FPGA technologies provide now enough resources to integrate several soft-core processors in one low-cost chip. Furthermore, the cores and their connections can be arranged flexibly in an FPGA. To bridge the gap between safety-critical applications and FPGAs, this approach provides a toolchain as addition to existing AUTOSAR design tools for automatically generating a specific hardware architecture from metadata of an AUTOSAR description. By reducing the complexity of the hardware platform drastically, a reconfigurable, reliable, deterministic, distributed (R2-D2) hardware architecture can be created. The results show that safety-critical tasks can be executed deterministically on one chip in parallel and multiple applications can be mapped to one low-cost FPGA. Furthermore, the latency of the system could be reduced extensively, so new application areas can be accessed.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123234612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Automating structured matrix-matrix multiplication for stream processing 自动化结构化矩阵-矩阵乘法流处理

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857158

Thaddeus Koehn, P. Athanas

引用次数: 2