Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献

Low-Resource Bluespec Design of a Modular Acquisition and Stimulation System for Neuroscience (Abstract Only) 神经科学模块采集与刺激系统的低资源蓝图设计(摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689137

Paulo Matias, R. T. Guariento, L. Almeida, J. Slaets

{"title":"Low-Resource Bluespec Design of a Modular Acquisition and Stimulation System for Neuroscience (Abstract Only)","authors":"Paulo Matias, R. T. Guariento, L. Almeida, J. Slaets","doi":"10.1145/2684746.2689137","DOIUrl":"https://doi.org/10.1145/2684746.2689137","url":null,"abstract":"We have compared two different resource arbitration architectures in our developed data acquisition and stimuli generator system for neuroscience research, entirely specified in a high-level Hardware Description Language (HDL). One of them was designed with a decoupled and latency insensitive modular approach, allowing for easier code reuse, while the other adopted a centralized scheme, constructed specifically for our application. The usage of a high-level HDL allowed straightforward and stepwise code modifications to transform one architecture into the other. Despite the logic complexity penalty of synthesizing our hardware from a highly abstract language, both architectures were implemented in a very small programmable logic device without even consuming all the hardware resources. While the decoupled design has shown more resilience to input activity bursts, the centralized one gave an economy of about 10-15% in the device logic element usage. This system is not only useful for neuroscience protocols that require timing determinism and synchronous stimuli generation, but has also demonstrated that high-level languages can be effectively used for synthesizing hardware in small programmable devices.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125201047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RapidSmith 2: A Framework for BEL-level CAD Exploration on Xilinx FPGAs RapidSmith 2:基于Xilinx fpga的bel级CAD探索框架

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689085

Travis Haroldsen, B. Nelson, B. Hutchings

引用次数: 25

Technology Mapping into General Programmable Cells 技术映射到通用可编程单元

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689082

A. Mishchenko, R. Brayton, Wenyi Feng, J. Greene

引用次数: 15

Architecture of Reconfigurable-Logic Cell Array with Atom Switch: Cluster Size & Routing Fabrics (Abstract Only) 带原子开关的可重构逻辑单元阵列结构:簇大小和路由结构(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689122

X. Bai, Y. Tsuji, A. Morioka, M. Miyamura, T. Sakamoto, M. Tada, N. Banno, K. Okamoto, N. Iguchi, H. Hada

{"title":"Architecture of Reconfigurable-Logic Cell Array with Atom Switch: Cluster Size & Routing Fabrics (Abstract Only)","authors":"X. Bai, Y. Tsuji, A. Morioka, M. Miyamura, T. Sakamoto, M. Tada, N. Banno, K. Okamoto, N. Iguchi, H. Hada","doi":"10.1145/2684746.2689122","DOIUrl":"https://doi.org/10.1145/2684746.2689122","url":null,"abstract":"Emerging nonvolatile memories (NVMs) have a potential to overcome the issues in the conventional static random-access memory (SRAM) based reconfigurable logic cell arrays (RLCAs). Replacing a CMOS switch element composed of a SRAM and a pass transistor by a NVM reduces chip size. And non-volatility reduces the stand-by power. More importantly, the compactness of NVM allows fine-grain logic cells (small cluster size), which advantageously enables a highly efficient cell usage, resulting in compact circuit for applications. In this paper, we investigate the fine-grain cell architecture using atom switch which is one of the NVMs. We evaluate the effect of the cluster size and the segment length on the atom-switch-based RLCA to confirm the optimal point considering area-delay product. Cluster size is optimized to be 4, which is smaller than that in the conventional SRAM- and multiplexer-based RLCA. This optimization is originated from the fact that the inter-delay among clusters is only twice of the intra-delay in cluster for atom-switch-based RLCA with routing block formed by crossbar switches because of very small capacitance and resistance of atom switches. On the other hand, the segment length is optimized to be 4, which is the same as that in the conventional SRAM- and multiplexer-based RLCA.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125024538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MATCHUP: Memory Abstractions for Heap Manipulating Programs 匹配:堆操作程序的内存抽象

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689073

F. Winterstein, Kermin Fleming, Hsin-Jung Yang, Samuel Bayliss, G. Constantinides

{"title":"MATCHUP: Memory Abstractions for Heap Manipulating Programs","authors":"F. Winterstein, Kermin Fleming, Hsin-Jung Yang, Samuel Bayliss, G. Constantinides","doi":"10.1145/2684746.2689073","DOIUrl":"https://doi.org/10.1145/2684746.2689073","url":null,"abstract":"Memory-intensive implementations often require access to an external, off-chip memory which can substantially slow down an FPGA accelerator due to memory bandwidth limitations. Buffering frequently reused data on chip is a common approach to address this problem and the optimization of the cache architecture introduces yet another complex design space. This paper presents a high-level synthesis (HLS) design aid that generates parallel application-specific multi-scratchpad architectures including on-chip caches. Our program analysis identifies non-overlapping memory regions, supported by private scratchpads, and regions which are shared by parallel units after parallelization and which are supported by coherent scratchpads and synchronization primitives. It also decides whether the parallelization is legal with respect to data dependencies. The novelty of this work is the focus on programs using dynamic, pointer-based data structures and dynamic memory allocation which, while common in software engineering, remain difficult to analyze and are beyond the scope of the overwhelming majority of HLS techniques to date. We demonstrate our technique with three case studies of applications using dynamically allocated data structures and use Xilinx Vivado HLS as an exemplary HLS tool. We show up to 10x speed-up after parallelization of the HLS implementations and the insertion of the application-specific distributed hybrid scratchpad architecture.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125190392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Area Optimization of Arithmetic Units by Component Sharing for FPGAs (Abstract Only) 基于元件共享的fpga算术单元面积优化(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689146

S. Tang, G. Lemieux

引用次数: 0

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks 基于fpga的深度卷积神经网络加速器优化设计

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689060

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, J. Cong

{"title":"Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks","authors":"Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, J. Cong","doi":"10.1145/2684746.2689060","DOIUrl":"https://doi.org/10.1145/2684746.2689060","url":null,"abstract":"Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning algorithms has further improved research and implementations. Especially, various accelerators for deep CNN have been proposed based on FPGA platform because it has advantages of high performance, reconfigurability, and fast development round, etc. Although current FPGA accelerators have demonstrated better performance over generic processors, the accelerator design space has not been well exploited. One critical problem is that the computation throughput may not well match the memory bandwidth provided an FPGA platform. Consequently, existing approaches cannot achieve best performance due to under-utilization of either logic resource or memory bandwidth. At the same time, the increasing complexity and scalability of deep learning applications aggravate this problem. In order to overcome this problem, we propose an analytical design scheme using the roofline model. For any solution of a CNN design, we quantitatively analyze its computing throughput and required memory bandwidth using various optimization techniques, such as loop tiling and transformation. Then, with the help of rooine model, we can identify the solution with best performance and lowest FPGA resource requirement. As a case study, we implement a CNN accelerator on a VC707 FPGA board and compare it to previous approaches. Our implementation achieves a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"836 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116423197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1754

Ramethy: Reconfigurable Acceleration of Bisulfite Sequence Alignment 亚硫酸氢盐序列比对的可重构加速

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689066

James Arram, W. Luk, P. Jiang

{"title":"Ramethy: Reconfigurable Acceleration of Bisulfite Sequence Alignment","authors":"James Arram, W. Luk, P. Jiang","doi":"10.1145/2684746.2689066","DOIUrl":"https://doi.org/10.1145/2684746.2689066","url":null,"abstract":"This paper proposes a novel reconfigurable architecture for accelerating DNA sequence alignment. This architecture is applied to bisulfite sequence alignment, a stage in recently developed bioinformatics pipelines for cancer and non-invasive prenatal diagnosis. Alignment is currently the bottleneck in such pipelines, accounting for over 50% of the total analysis time. Our design, Ramethy (Reconfigurable Acceleration of METHYlation data analysis), performs alignment of short reads with up to two mismatches. Ramethy is based on the FM-index, which we optimise to reduce the number of search steps and improve approximate matching performance. We implement Ramethy on a 1U Maxeler MPC-X1000 data flow node consisting of 8 Altera Stratix-V FPGAs. Measured results show a 14.9 times speedup compared to soap2 running with 16 threads on dual Intel Xeon E5-2650 CPUs, and 3.8 times speedup compared to soap3-dp running on an NVIDIA GTX 580 GPU. Upper-bound performance estimates for the MPC-X1000 indicate a maximum speedup of 88.4 times and 22.6 times compared to soap2 and soap3-dp respectively. In addition to runtime, Ramethy consumes over an order of magnitude lower energy while having accuracy identical to soap2 and soap3-dp, making it a strong candidate for integration into bioinformatics pipelines.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122638311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Rapid Prototyping of Wireless Physical Layer Modules Using Flexible Software/Hardware Design Flow 基于灵活软硬件设计流程的无线物理层模块快速原型设计

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689084

James Chacko, Cem Sahin, Douglas Pfiel, Nagarajan Kandasamy, K. Dandekar

引用次数: 4

FPGA Implementation of Trained Coarse Carrier Frequency Offset Estimation and Correction for OFDM Signals (Abstract Only) OFDM信号训练粗载波频偏估计与校正的FPGA实现(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689128

Marko Jacovic, James Chacko, Doug Pfeil, Nagarajan Kandasamy, K. Dandekar

引用次数: 1