2013 International Conference on Field-Programmable Technology (FPT)最新文献_第2页

Reconfigurable filtered acceleration of short read alignment 短读对齐的可重构滤波加速

2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718408

James Arram, W. Luk, P. Jiang

{"title":"Reconfigurable filtered acceleration of short read alignment","authors":"James Arram, W. Luk, P. Jiang","doi":"10.1109/FPT.2013.6718408","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718408","url":null,"abstract":"Recent trends in the cost and demand of next generation DNA sequencing (NGS) has revealed a great computational challenge in analysing the massive quantities of sequenced data produced. Given that the projected increase in sequenced data far outstrips Moore's Law, the current technologies used to handle the data are likely to become insufficient. This paper explores the use of reconfigurable hardware in accelerating short read alignment. In this application, the positions of millions of short DNA sequences (called reads) are located in a known reference genome. This work proposes a new general approach for accelerating suffix-trie based short read alignment methods using reconfigurable hardware. In the proposed approach, specialised filters are designed to align short reads to a reference genome with a specific edit distance. The filters are arranged in a pipeline according to increasing edit distance, where short reads unable to be aligned by a given filter are forwarded to the next filter in the pipeline for further processing. Run-time reconfiguration is used to fully populate an accelerator device with each filter in the pipeline in turn. In our implementation a single FPGA is populated with specialised filters based on a novel bidirectional backtracking version of the FM-index, and it is found that in this particular implementation the alignment time can be up to 14.7 and 18.1 times faster than SOAP2 and BWA run on dual Intel X5650 CPUs.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124494274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Maximum flow algorithms for maximum observability during FPGA debug 最大流量算法在FPGA调试期间最大的可观察性

2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718324

Eddie Hung, Al-Shahna Jamal, S. Wilton

引用次数: 10

SOAP: Structural optimization of arithmetic expressions for high-level synthesis 用于高级综合的算术表达式的结构优化

2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718340

Xitong Gao, Samuel Bayliss, G. Constantinides

{"title":"SOAP: Structural optimization of arithmetic expressions for high-level synthesis","authors":"Xitong Gao, Samuel Bayliss, G. Constantinides","doi":"10.1109/FPT.2013.6718340","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718340","url":null,"abstract":"This paper introduces SOAP, a new tool to automatically optimize the structure of arithmetic expressions for FPGA implementation as part of a high level synthesis flow, taking into account axiomatic rules derived from real arithmetic, such as distributivity, associativity and others. We explicitly target an optimized area/accuracy trade-off, allowing arithmetic expressions to be automatically re-written for this purpose. For the first time, we bring rigorous approaches from software static analysis, specifically formal semantics and abstract interpretation, to bear on source-to-source transformation for high-level synthesis. New abstract semantics are developed to generate a computable subset of equivalent expressions from an original expression. Using formal semantics, we calculate two objectives, the accuracy of computation and an estimate of resource utilization in FPGA. The optimization of these objectives produces a Pareto frontier consisting of a set of expressions. This gives the synthesis tool the flexibility to choose an implementation satisfying constraints on both accuracy and resource usage. We thus go beyond existing literature by not only optimizing the precision requirements of an implementation, but changing the structure of the implementation itself. Using our tool to optimize the structure of a variety of real world and artificially generated examples in single precision, we improve either their accuracy or the resource utilization by up to 60%.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128144119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

A hardware implementation of Bag of Words and Simhash for image recognition 用于图像识别的word和Simhash的硬件实现

2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718403

Shengye Wang, Chen Liang, Xuegong Zhou, Wei Cao, Chen-Mie Wu, Xitian Fan, Lingli Wang

{"title":"A hardware implementation of Bag of Words and Simhash for image recognition","authors":"Shengye Wang, Chen Liang, Xuegong Zhou, Wei Cao, Chen-Mie Wu, Xitian Fan, Lingli Wang","doi":"10.1109/FPT.2013.6718403","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718403","url":null,"abstract":"Algorithms such as Bag of Words and Simhash have been widely used in image recognition. To achieve better performance as well as energy-efficiency, a hardware implementation of these two algorithms is proposed in this paper. To the best of our knowledge, it is the first time that these algorithms have been implemented on hardware for image recognition purpose. The proposed implementation is able to generate a fingerprint of an image and find the closest match in the database accurately. It is implemented on Xilinx's Virtex-6 SX475T FPGA. Tradeoffs between high performance and low hardware overhead are obtained through proper parallelization. The experimental result shows that the proposed implementation can process 1,018 images per second, approximately 17.8x faster than software on Intel's 12-thread Xeon X5650 processor. On the other hand, the power consumption is 0.35x compared to software-based implementation. Thus, the overall advantage in energy-efficiency is as much as 46x. The proposed architecture is scalable, and is able to meet various requirements of image recognition.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"293 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116869883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FlexGrip: A soft GPGPU for FPGAs FlexGrip:用于fpga的软GPGPU

2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718358

K. Andryc, Murtaza Merchant, R. Tessier

引用次数: 77

From C to Blokus Duo with LegUp high-level synthesis 从C到Blokus Duo与LegUp高级合成

2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718424

J. Cai, Ruolong Lian, Mengyao Wang, Andrew Canis, Jongsok Choi, B. Fort, Eric Hart, Emily Miao, Yanyan Zhang, Nazanin Calagar, S. Brown, J. Anderson

引用次数: 6

Maximizing speed and density of tiled FPGA overlays via partitioning 通过分区最大化平铺FPGA覆盖的速度和密度

2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718360

Charles Eric LaForest, J. Gregory Steffan

{"title":"Maximizing speed and density of tiled FPGA overlays via partitioning","authors":"Charles Eric LaForest, J. Gregory Steffan","doi":"10.1109/FPT.2013.6718360","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718360","url":null,"abstract":"Common practice for large FPGA design projects is to divide sub-projects into separate synthesis partitions to allow incremental recompilation as each sub-project evolves. In contrast, smaller design projects avoid partitioning to give the CAD tool the freedom to perform as many global optimizations as possible, knowing that the optimizations normally improve performance and possibly area. In this paper, we show that for high-speed tiled designs composed of duplicated components and hence having multi-localities (multiple instances of equivalent logic), a designer can use partitioning to preserve multi-locality and improve performance. In particular, we focus on the lanes of SIMD soft processors and multicore meshes composed of them, as compiled by Quartus 12.1 targeting a Stratix IV EP4SE230F29C2 device. We demonstrate that, with negligible impact on compile time (less than ±10%): (i) we can use partitioning to provide high-level information to the CAD tool about preserving multi-localities in a design, without low-level micro-managing of the design description or CAD tool settings; (ii) by preserving multi-localities within SIMD soft processors, we can increase both frequency (by up to 31%) and compute density (by up to 15%); (iii) partitioning improves the density and speed (by up to 51 and 54%) of a mesh of soft processors, across many building block configurations and mesh geometries; (iv) the improvements from partitioning increase as the number of tiled computing elements (SIMD lanes or mesh nodes) increases. As an example of the benefits of partitioning, a mesh of 102 scalar soft processors improves its operating frequency from 284 up to 437 MHz, its peak performance from 28,968 up to 44,574 MIPS, while increasing its logic area by only 0.85%.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126731154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Bitwidth-optimized hardware accelerators with software fallback 位宽优化的硬件加速器与软件回退

2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718343

Ana Klimovic, J. Anderson

引用次数: 8

Accelerating iterative algorithms with asynchronous accumulative updates on FPGAs fpga上异步累积更新的加速迭代算法

2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718332

D. Unnikrishnan, S. Virupaksha, Lekshmi Krishnan, Lixin Gao, R. Tessier

{"title":"Accelerating iterative algorithms with asynchronous accumulative updates on FPGAs","authors":"D. Unnikrishnan, S. Virupaksha, Lekshmi Krishnan, Lixin Gao, R. Tessier","doi":"10.1109/FPT.2013.6718332","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718332","url":null,"abstract":"Iterative algorithms represent a pervasive class of data mining, web search and scientific computing applications. In iterative algorithms, a final result is derived by performing repetitive computations on an input data set. Existing techniques to parallelize such algorithms typically use software frameworks such as MapReduce and Hadoop to distribute data for an iteration across multiple CPU-based workstations in a cluster and collect per-iteration results. These platforms are marked by the need to synchronize data computations at iteration boundaries, impeding system performance. In this paper, we demonstrate that FPGAs in distributed computing systems can serve a vital role in breaking this synchronization barrier with the help of asynchronous accumulative updates. These updates allow for the accumulation of intermediate results for numerous data points without the need for iteration-based barriers allowing individual nodes in a cluster to independently make progress towards the final outcome. Computation is dynamically prioritized to accelerate algorithm convergence. A general-class of iterative algorithms have been implemented on a cluster of four FPGAs. A speedup of 7× is achieved over an implementation of asynchronous accumulative updates on a general-purpose CPU. The system offers up to 154× speedup versus a standard Hadoop-based CPU-workstation. Improved performance is achieved by clusters of FPGAs.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121053841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Automated multi-device placement, I/O voltage supply assignment, and pin assignment in circuit board design 电路板设计中的自动化多器件放置，I/O电压分配和引脚分配

2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718363

D. Seemuth, Katherine Morrow

{"title":"Automated multi-device placement, I/O voltage supply assignment, and pin assignment in circuit board design","authors":"D. Seemuth, Katherine Morrow","doi":"10.1109/FPT.2013.6718363","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718363","url":null,"abstract":"Embedded systems often contain many components, some with multiple Field Programmable Gate Arrays (FPGAs). Designing Printed Circuit Boards (PCBs) for these systems can be a complex process that is often tedious, error-prone, and time-intensive. Existing computer-aided design tools require designers to manually insert components and explicitly define the connections between every component on the PCB - a cumbersome process. A fast PCB design framework requiring reduced designer time and effort would be particularly advantageous for rapid prototyping and short production run PCBs. Therefore, this paper proposes a novel, freely-available open-source framework to capture design intent and automatically implement the design details. Designers express connectivity at a higher level of abstraction than enumerating or drawing each individual trace between components. Given the components and connection requirements, the proposed framework automatically generates component placements, I/O voltage supply assignments, and FPGA pin assignments to minimize trace length. We also propose a novel method to improve trace length estimations during placement, before FPGA pins have actually been assigned to those connections. The proposed framework quickly explores large solution spaces, enabling rapid prototyping and design space exploration, and can lead to lower costs in design time and other non-recurring expenses. We demonstrate that it produces favorable results for various design requirements, which suggests the framework will be especially appreciated by designers of systems with multiple FPGAs having large numbers of flexible pins.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126986581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2