2013 International Conference on Field-Programmable Technology (FPT)最新文献

筛选
英文 中文
Reconfigurable filtered acceleration of short read alignment 短读对齐的可重构滤波加速
2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718408
James Arram, W. Luk, P. Jiang
{"title":"Reconfigurable filtered acceleration of short read alignment","authors":"James Arram, W. Luk, P. Jiang","doi":"10.1109/FPT.2013.6718408","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718408","url":null,"abstract":"Recent trends in the cost and demand of next generation DNA sequencing (NGS) has revealed a great computational challenge in analysing the massive quantities of sequenced data produced. Given that the projected increase in sequenced data far outstrips Moore's Law, the current technologies used to handle the data are likely to become insufficient. This paper explores the use of reconfigurable hardware in accelerating short read alignment. In this application, the positions of millions of short DNA sequences (called reads) are located in a known reference genome. This work proposes a new general approach for accelerating suffix-trie based short read alignment methods using reconfigurable hardware. In the proposed approach, specialised filters are designed to align short reads to a reference genome with a specific edit distance. The filters are arranged in a pipeline according to increasing edit distance, where short reads unable to be aligned by a given filter are forwarded to the next filter in the pipeline for further processing. Run-time reconfiguration is used to fully populate an accelerator device with each filter in the pipeline in turn. In our implementation a single FPGA is populated with specialised filters based on a novel bidirectional backtracking version of the FM-index, and it is found that in this particular implementation the alignment time can be up to 14.7 and 18.1 times faster than SOAP2 and BWA run on dual Intel X5650 CPUs.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124494274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Maximum flow algorithms for maximum observability during FPGA debug 最大流量算法在FPGA调试期间最大的可观察性
2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718324
Eddie Hung, Al-Shahna Jamal, S. Wilton
{"title":"Maximum flow algorithms for maximum observability during FPGA debug","authors":"Eddie Hung, Al-Shahna Jamal, S. Wilton","doi":"10.1109/FPT.2013.6718324","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718324","url":null,"abstract":"Due to the ever-increasing density and complexity of integrated circuits, FPGA prototyping has become a necessary part of the design process. To enhance observability into these devices, designers commonly insert trace-buffers to record and expose the values on a small subset of internal signals during live operation to help root-cause errors. For dense designs, routing congestion will restrict the number of signals that can be connected to these trace-buffers. In this work, we apply optimal network flow graph algorithms, a well studied technique, to the problem of transporting circuit signals to embedded trace-buffers for observation. Specifically, we apply a minimum cost maximum flow algorithm to gain maximum signal observability with minimum total wirelength. We showcase our techniques on both theoretical FPGA architectures using VPR, and with a Xilinx Virtex6 device, finding that for the latter, over 99.6% of all spare RAM inputs can be reclaimed for tracing across four large benchmarks.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126278495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
SOAP: Structural optimization of arithmetic expressions for high-level synthesis 用于高级综合的算术表达式的结构优化
2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718340
Xitong Gao, Samuel Bayliss, G. Constantinides
{"title":"SOAP: Structural optimization of arithmetic expressions for high-level synthesis","authors":"Xitong Gao, Samuel Bayliss, G. Constantinides","doi":"10.1109/FPT.2013.6718340","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718340","url":null,"abstract":"This paper introduces SOAP, a new tool to automatically optimize the structure of arithmetic expressions for FPGA implementation as part of a high level synthesis flow, taking into account axiomatic rules derived from real arithmetic, such as distributivity, associativity and others. We explicitly target an optimized area/accuracy trade-off, allowing arithmetic expressions to be automatically re-written for this purpose. For the first time, we bring rigorous approaches from software static analysis, specifically formal semantics and abstract interpretation, to bear on source-to-source transformation for high-level synthesis. New abstract semantics are developed to generate a computable subset of equivalent expressions from an original expression. Using formal semantics, we calculate two objectives, the accuracy of computation and an estimate of resource utilization in FPGA. The optimization of these objectives produces a Pareto frontier consisting of a set of expressions. This gives the synthesis tool the flexibility to choose an implementation satisfying constraints on both accuracy and resource usage. We thus go beyond existing literature by not only optimizing the precision requirements of an implementation, but changing the structure of the implementation itself. Using our tool to optimize the structure of a variety of real world and artificially generated examples in single precision, we improve either their accuracy or the resource utilization by up to 60%.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128144119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A hardware implementation of Bag of Words and Simhash for image recognition 用于图像识别的word和Simhash的硬件实现
2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718403
Shengye Wang, Chen Liang, Xuegong Zhou, Wei Cao, Chen-Mie Wu, Xitian Fan, Lingli Wang
{"title":"A hardware implementation of Bag of Words and Simhash for image recognition","authors":"Shengye Wang, Chen Liang, Xuegong Zhou, Wei Cao, Chen-Mie Wu, Xitian Fan, Lingli Wang","doi":"10.1109/FPT.2013.6718403","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718403","url":null,"abstract":"Algorithms such as Bag of Words and Simhash have been widely used in image recognition. To achieve better performance as well as energy-efficiency, a hardware implementation of these two algorithms is proposed in this paper. To the best of our knowledge, it is the first time that these algorithms have been implemented on hardware for image recognition purpose. The proposed implementation is able to generate a fingerprint of an image and find the closest match in the database accurately. It is implemented on Xilinx's Virtex-6 SX475T FPGA. Tradeoffs between high performance and low hardware overhead are obtained through proper parallelization. The experimental result shows that the proposed implementation can process 1,018 images per second, approximately 17.8x faster than software on Intel's 12-thread Xeon X5650 processor. On the other hand, the power consumption is 0.35x compared to software-based implementation. Thus, the overall advantage in energy-efficiency is as much as 46x. The proposed architecture is scalable, and is able to meet various requirements of image recognition.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"293 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116869883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlexGrip: A soft GPGPU for FPGAs FlexGrip:用于fpga的软GPGPU
2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718358
K. Andryc, Murtaza Merchant, R. Tessier
{"title":"FlexGrip: A soft GPGPU for FPGAs","authors":"K. Andryc, Murtaza Merchant, R. Tessier","doi":"10.1109/FPT.2013.6718358","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718358","url":null,"abstract":"Over the past decade, soft microprocessors and vector processors have been extensively used in FPGAs for a wide variety of applications. However, it is difficult to straightforwardly extend their functionality to support conditional and thread-based execution characteristic of general-purpose graphics processing units (GPGPUs) without recompiling FPGA hardware for each application. In this paper, we describe the implementation of FlexGrip, a soft GPGPU architecture which has been optimized for FPGA implementation. This architecture supports direct CUDA compilation to a binary which is executable on the FPGA-based GPGPU without hardware recompilation. Our architecture is customizable, thus providing the FPGA designer with a selection of GPGPU cores which display performance versus area tradeoffs. The benefits of our architecture are evaluated for a collection of five standard CUDA benchmarks which are compiled using standard GPGPU compilation tools. Speedups of up to 30× versus a MicroBlaze microprocessor are achieved for designs which take advantage of the conditional execution capabilities offered by FlexGrip.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130029473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
From C to Blokus Duo with LegUp high-level synthesis 从C到Blokus Duo与LegUp高级合成
2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718424
J. Cai, Ruolong Lian, Mengyao Wang, Andrew Canis, Jongsok Choi, B. Fort, Eric Hart, Emily Miao, Yanyan Zhang, Nazanin Calagar, S. Brown, J. Anderson
{"title":"From C to Blokus Duo with LegUp high-level synthesis","authors":"J. Cai, Ruolong Lian, Mengyao Wang, Andrew Canis, Jongsok Choi, B. Fort, Eric Hart, Emily Miao, Yanyan Zhang, Nazanin Calagar, S. Brown, J. Anderson","doi":"10.1109/FPT.2013.6718424","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718424","url":null,"abstract":"We apply high-level synthesis (HLS) to generate Blokus Duo game-playing hardware for the FPT 2013 Design Competition [3]. Our design, written in C, is synthesized using the LegUp open-source HLS tool to Verilog, then subsequently mapped using vendor tools to an Altera Cyclone IV FPGA on DE2 board. Our software implementation is designed to be amenable to high-level synthesis, and includes a custom stack implementation, uses only integer arithmetic, and employs the use of bitwise logical operations to improve overall computational performance. The underlying AI decision making is based on alpha-beta pruning [2]. The performance of our synthesizable solution is gauged by playing against the Pentobi [8] - a “known good” C++ software implementation.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130446212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Maximizing speed and density of tiled FPGA overlays via partitioning 通过分区最大化平铺FPGA覆盖的速度和密度
2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718360
Charles Eric LaForest, J. Gregory Steffan
{"title":"Maximizing speed and density of tiled FPGA overlays via partitioning","authors":"Charles Eric LaForest, J. Gregory Steffan","doi":"10.1109/FPT.2013.6718360","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718360","url":null,"abstract":"Common practice for large FPGA design projects is to divide sub-projects into separate synthesis partitions to allow incremental recompilation as each sub-project evolves. In contrast, smaller design projects avoid partitioning to give the CAD tool the freedom to perform as many global optimizations as possible, knowing that the optimizations normally improve performance and possibly area. In this paper, we show that for high-speed tiled designs composed of duplicated components and hence having multi-localities (multiple instances of equivalent logic), a designer can use partitioning to preserve multi-locality and improve performance. In particular, we focus on the lanes of SIMD soft processors and multicore meshes composed of them, as compiled by Quartus 12.1 targeting a Stratix IV EP4SE230F29C2 device. We demonstrate that, with negligible impact on compile time (less than ±10%): (i) we can use partitioning to provide high-level information to the CAD tool about preserving multi-localities in a design, without low-level micro-managing of the design description or CAD tool settings; (ii) by preserving multi-localities within SIMD soft processors, we can increase both frequency (by up to 31%) and compute density (by up to 15%); (iii) partitioning improves the density and speed (by up to 51 and 54%) of a mesh of soft processors, across many building block configurations and mesh geometries; (iv) the improvements from partitioning increase as the number of tiled computing elements (SIMD lanes or mesh nodes) increases. As an example of the benefits of partitioning, a mesh of 102 scalar soft processors improves its operating frequency from 284 up to 437 MHz, its peak performance from 28,968 up to 44,574 MIPS, while increasing its logic area by only 0.85%.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126731154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Bitwidth-optimized hardware accelerators with software fallback 位宽优化的硬件加速器与软件回退
2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718343
Ana Klimovic, J. Anderson
{"title":"Bitwidth-optimized hardware accelerators with software fallback","authors":"Ana Klimovic, J. Anderson","doi":"10.1109/FPT.2013.6718343","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718343","url":null,"abstract":"We propose the high-level synthesis of an FPGA-based hybrid computing system, where the implementations of compute-intensive functions are available in both software, and as hardware accelerators. The accelerators are optimized to handle common-case inputs, as opposed to worst-case inputs, allowing accelerator area to be reduced by 28%, on average, while retaining the majority of performance advantages associated with a hardware versus software implementation. When inputs exceed the range that the hardware accelerators can handle, a software fallback is automatically triggered. Optimization of the accelerator area is achieved by reducing datapath widths based on application profiling of variable ranges in software (under typical datasets). The selected widths are passed to a high-level synthesis tool which generates the accelerator for a given function. The optimized accelerators with software fallback capability are generated automatically by our framework, with minimal user intervention. Our study explores the trade-offs of delay and area for benchmarks implemented on an Altera Cyclone II FPGA.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126799282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Accelerating iterative algorithms with asynchronous accumulative updates on FPGAs fpga上异步累积更新的加速迭代算法
2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718332
D. Unnikrishnan, S. Virupaksha, Lekshmi Krishnan, Lixin Gao, R. Tessier
{"title":"Accelerating iterative algorithms with asynchronous accumulative updates on FPGAs","authors":"D. Unnikrishnan, S. Virupaksha, Lekshmi Krishnan, Lixin Gao, R. Tessier","doi":"10.1109/FPT.2013.6718332","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718332","url":null,"abstract":"Iterative algorithms represent a pervasive class of data mining, web search and scientific computing applications. In iterative algorithms, a final result is derived by performing repetitive computations on an input data set. Existing techniques to parallelize such algorithms typically use software frameworks such as MapReduce and Hadoop to distribute data for an iteration across multiple CPU-based workstations in a cluster and collect per-iteration results. These platforms are marked by the need to synchronize data computations at iteration boundaries, impeding system performance. In this paper, we demonstrate that FPGAs in distributed computing systems can serve a vital role in breaking this synchronization barrier with the help of asynchronous accumulative updates. These updates allow for the accumulation of intermediate results for numerous data points without the need for iteration-based barriers allowing individual nodes in a cluster to independently make progress towards the final outcome. Computation is dynamically prioritized to accelerate algorithm convergence. A general-class of iterative algorithms have been implemented on a cluster of four FPGAs. A speedup of 7× is achieved over an implementation of asynchronous accumulative updates on a general-purpose CPU. The system offers up to 154× speedup versus a standard Hadoop-based CPU-workstation. Improved performance is achieved by clusters of FPGAs.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121053841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automated multi-device placement, I/O voltage supply assignment, and pin assignment in circuit board design 电路板设计中的自动化多器件放置,I/O电压分配和引脚分配
2013 International Conference on Field-Programmable Technology (FPT) Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718363
D. Seemuth, Katherine Morrow
{"title":"Automated multi-device placement, I/O voltage supply assignment, and pin assignment in circuit board design","authors":"D. Seemuth, Katherine Morrow","doi":"10.1109/FPT.2013.6718363","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718363","url":null,"abstract":"Embedded systems often contain many components, some with multiple Field Programmable Gate Arrays (FPGAs). Designing Printed Circuit Boards (PCBs) for these systems can be a complex process that is often tedious, error-prone, and time-intensive. Existing computer-aided design tools require designers to manually insert components and explicitly define the connections between every component on the PCB - a cumbersome process. A fast PCB design framework requiring reduced designer time and effort would be particularly advantageous for rapid prototyping and short production run PCBs. Therefore, this paper proposes a novel, freely-available open-source framework to capture design intent and automatically implement the design details. Designers express connectivity at a higher level of abstraction than enumerating or drawing each individual trace between components. Given the components and connection requirements, the proposed framework automatically generates component placements, I/O voltage supply assignments, and FPGA pin assignments to minimize trace length. We also propose a novel method to improve trace length estimations during placement, before FPGA pins have actually been assigned to those connections. The proposed framework quickly explores large solution spaces, enabling rapid prototyping and design space exploration, and can lead to lower costs in design time and other non-recurring expenses. We demonstrate that it produces favorable results for various design requirements, which suggests the framework will be especially appreciated by designers of systems with multiple FPGAs having large numbers of flexible pins.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126986581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信