FPGA. ACM International Symposium on Field-Programmable Gate Arrays最新文献

筛选
英文 中文
Acceleration of the long read mapping on a PC-FPGA architecture (abstract only) PC-FPGA架构下长读映射的加速(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435329
Peng Chen, Chao Wang, Xi Li, Xuehai Zhou
{"title":"Acceleration of the long read mapping on a PC-FPGA architecture (abstract only)","authors":"Peng Chen, Chao Wang, Xi Li, Xuehai Zhou","doi":"10.1145/2435264.2435329","DOIUrl":"https://doi.org/10.1145/2435264.2435329","url":null,"abstract":"The genome sequence alignment, whereby ultra scale of sequence reads should be compared to an enormous long reference, has been one central challenge to the biologists for a long period. For recent years, new sequencing technology makes it possible to generate longer reads (sequences of genome fragments) which seem more valuable for the life science research. It has been foreseen that long genome reads (length longer than 200 base pairs) will dominate the field in the near future. Unfortunately, most of the state-of-art aligners nowadays are optimized and only applicable for the short read mapping while present long read aligners are still not satisfying at the aspect of speed. In this paper, we propose a novel PC-FPGA hybrid system to improve the performance of the long read mapping. The BWA-SW algorithm is chosen as the alignment approach and by accelerating the bottleneck of the algorithm, our solution could archive a significant improvement in term of speed. Experiments demonstrate that the described system is as accurate as the BWA-SW aligner and about 1.41-2.73 times faster than it for reads with lengths ranging from 500bp to 2000bp.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"21 1","pages":"271"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75668908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A novel multithread routing method for FPGAs (abstract only) 一种新的fpga多线程路由方法(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435322
Chun Zhu, Qiuli Li, Jian Wang, Jinmei Lai
{"title":"A novel multithread routing method for FPGAs (abstract only)","authors":"Chun Zhu, Qiuli Li, Jian Wang, Jinmei Lai","doi":"10.1145/2435264.2435322","DOIUrl":"https://doi.org/10.1145/2435264.2435322","url":null,"abstract":"We propose a platform-independent multithread routing method for FPGAs including two aspects: single high fanout net is routed parallel within itself and several low fanout nets are routed parallel between themselves. Routing for high fanout nets usually takes considerable time because of the large physical area surrounded by bounding boxes to traverse and tens of terminals to connect. Therefore, one high fanout net is partitioned into several subnets with fewer terminals and smaller bounding boxes to be routed in parallel. However, low fanout nets with intrinsic small bounding boxes and few terminals could hardly be divided. Instead, low fanout nets whose bounding boxes are not overlapping with each other are routed concurrently. A new graph, named bounding box graph, was utilized to facilitate the process of selecting several nets to be routed concurrently. In this graph, one vertex stands for a corresponding net and one edge between two connected vertex means that the two represented nets have their bounding boxes overlapped. Several strategies are introduced to balance the load among threads and ensure the deterministic results. The routing times scale down with increasing number of threads. On a 4-core processor, this technique improves the run-time by ~1.9 × with routing quality degrading by no more than 2.3%.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"30 1","pages":"269"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81246277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating subsequence similarity search based on dynamic time warping distance with FPGA 基于FPGA的动态时间翘曲距离加速子序列相似性搜索
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435277
Zilong Wang, Sitao Huang, Lanjun Wang, Hao Li, Yu Wang, Huazhong Yang
{"title":"Accelerating subsequence similarity search based on dynamic time warping distance with FPGA","authors":"Zilong Wang, Sitao Huang, Lanjun Wang, Hao Li, Yu Wang, Huazhong Yang","doi":"10.1145/2435264.2435277","DOIUrl":"https://doi.org/10.1145/2435264.2435277","url":null,"abstract":"Subsequence search, especially subsequence similarity search, is one of the most important subroutines in time series data mining algorithms, and there is increasing evidence that Dynamic Time Warping (DTW) is the best distance metric. However, in spite of the great effort in software speedup techniques, including early abandoning strategies, lower bound, indexing, computation-reuse, DTW still cost too much time for many applications, e.g. 80% of the total time. Since DTW is a 2-Dimension sequential dynamic search with quite high data dependency, it is hard to use parallel hardware to accelerate it. In this work, we propose a novel framework for FPGA based subsequence similarity search and a novel PE-ring structure for DTW calculation. This framework utilizes the data reusability of continuous DTW calculations to reduce the bandwidth and exploit the coarse-grain parallelism; meanwhile guarantees the accuracy with a two-phase precision reduction. The PE-ring supports on-line updating patterns of arbitrary lengths, and utilizes the hard-wired synchronization of FPGA to realize the fine-grained parallelism. It also achieves flexible parallelism degree to do performance-cost trade-off. The experimental results show that we can achieve several orders of magnitude speedup in accelerating subsequence similarity search compared with the best software and current GPU/FPGA implementations in different datasets.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"141 1","pages":"53-62"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77301047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Location, location, location: the role of spatial locality in asymptotic energy minimization 位置,位置,位置:空间局部性在能量渐近最小化中的作用
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435291
A. DeHon
{"title":"Location, location, location: the role of spatial locality in asymptotic energy minimization","authors":"A. DeHon","doi":"10.1145/2435264.2435291","DOIUrl":"https://doi.org/10.1145/2435264.2435291","url":null,"abstract":"Locality exploitation is essential to asymptotic energy minimization for gate array netlist evaluation. Naive implementations that ignore locality, including flat crossbars and simple processors based on monolithic memories, can require O(N2) energy for an N node graph. Specifically, it is important to exploit locality (1) to reduce the size of the description of the graph, (2) to reduce data movement, and (3) to reduce instruction movement. FPGAs exploit all three. FPGAs with a Rent Exponent p<0.5 running designs with p<0.5 achieve asymptotically optimal Theta(N) energy. FPGA designs with p>0.5 and implementations with metal layers that grow as O(N(p-0.5)) require only O(N(p+0.5)) energy; this bound can be achieved with O(1) metal layers with a novel multicontext design that has heterogeneous context depth. In contrast, a p>0.5 FPGA design on an implementation technology with O(1) metal layers requires O(N(2p)) energy.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"74 12 1","pages":"137-146"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79990777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
FPGA bitstream compression and decompression using LZ and golomb coding (abstract only) 使用LZ和golomb编码的FPGA比特流压缩和解压缩(仅抽象)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435309
Jinsong Mao, Hao Zhou, Haijiang Ye, Jinmei Lai
{"title":"FPGA bitstream compression and decompression using LZ and golomb coding (abstract only)","authors":"Jinsong Mao, Hao Zhou, Haijiang Ye, Jinmei Lai","doi":"10.1145/2435264.2435309","DOIUrl":"https://doi.org/10.1145/2435264.2435309","url":null,"abstract":"In this paper we propose an optimized bitstream compression algorithm based on LZ and a novel architecture of decompressor, the proposed algorithm improves the Compression Ratio by fully utilizing the regularity of configuration bits of CLB (Configurable Logic Box) in FPGA and using the variable length Golomb coding method. The experimental results show that the Optimized method can improve the Compression Ratio of LZSS by 32.3% for bitstream with high regularity and 10.3% for bitstream with low regularity, and our approach shows a higher flexibility than the BMC+RLE arithmetic when compressing the bitstream with high regularity for various FPGA. Moreover, we design a two-buffer-window decompressor to download the compressed bitstreams. In order to increase the throughput of the proposed decompressor, we design a multi-stage data selector in it. The post-simulation of the decompressor shows that its throughput is up to 9280 Mbps under 65nm CMOS process. And that is 4352Mbps when verified on a Virtex-5 FPGA.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"99 1","pages":"265"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80274715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An FPGA-based transient error simulator for evaluating resilient system designs (abstract only) 基于fpga的评估弹性系统设计的暂态误差模拟器(摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435328
Chia-Hsiang Chen, Shiming Song, Zhengya Zhang
{"title":"An FPGA-based transient error simulator for evaluating resilient system designs (abstract only)","authors":"Chia-Hsiang Chen, Shiming Song, Zhengya Zhang","doi":"10.1145/2435264.2435328","DOIUrl":"https://doi.org/10.1145/2435264.2435328","url":null,"abstract":"Error-resilient designs have become more important with the continued device scaling. One critical challenge of designing error-resilient systems is the lack of tools to quickly and accurately evaluate the effectiveness and performance of such systems. We propose an FPGA-based transient error simulator to accelerate transient error simulations incorporating accurate datapath delay models and realistic error models. Compared to conventional digital error simulators, the FPGA-based transient error simulator operates at a finer time step and captures intricate interactions between errors and datapath under different circuit-level error detection and correction techniques. The error simulator is constructed using configurable datapath delay model and error model, making it general-purpose and widely applicable. We demonstrate the capability of this simulator in the evaluation of two popular error-resilient design techniques, pre-edge and post-edge detection and correction, using a synthesized CORDIC processor and an Alpha processor that operate under soft error, coupling noise and voltage droop models. The proposed error simulator uncovers insights to guide practical designs, including the choice of checking window in pre-edge designs and the optimal operating frequency in post-edge designs. The FPGA-based transient simulation will complement circuit simulation and system emulation for resilient system designs.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"26 1","pages":"271"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85318388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards simulator-like observability for FPGAs: a virtual overlay network for trace-buffers 迈向fpga类似模拟器的可观察性:用于跟踪缓冲器的虚拟覆盖网络
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435272
Eddie Hung, S. Wilton
{"title":"Towards simulator-like observability for FPGAs: a virtual overlay network for trace-buffers","authors":"Eddie Hung, S. Wilton","doi":"10.1145/2435264.2435272","DOIUrl":"https://doi.org/10.1145/2435264.2435272","url":null,"abstract":"The rising complexity of verification has led to an increase in the use of FPGA prototyping, which can run at significantly higher operating frequencies and achieve much higher coverage than logic simulations. However, a key challenge is observability into these devices, which can be solved by embedding trace-buffers to record on-chip signal values. Rather than connecting a predetermined subset of circuits signals to dedicated trace-buffer inputs at compile-time, in this work we propose that a virtual overlay network is built to multiplex all on-chip signals to all on-chip trace-buffers. Subsequently, at debug-time, the designer can choose a signal subset for observation. To minimize its overhead, we build this network out of unused routing multiplexers, and by using optimal bipartite graph matching techniques, we show that any subset of on-chip signals can be connected to 80-90% of the maximum trace-buffer capacity in less than 50 seconds.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"18 1","pages":"19-28"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88913013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Accelerating ncRNA homology search with FPGAs 利用fpga加速ncRNA同源性搜索
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435276
Nathaniel McVicar, W. L. Ruzzo, S. Hauck
{"title":"Accelerating ncRNA homology search with FPGAs","authors":"Nathaniel McVicar, W. L. Ruzzo, S. Hauck","doi":"10.1145/2435264.2435276","DOIUrl":"https://doi.org/10.1145/2435264.2435276","url":null,"abstract":"Over the last decade, the number of known biologically important non-coding RNAs (ncRNAs) has increased by orders of magnitude. The function performed by a specific ncRNA is partially determined by its structure, defined by which nucleotides of the molecule form pairs. These correlations may span large and variable distances in the linear RNA molecule. Because of these characteristics, algorithms that search for ncRNAs belonging to known families are computationally expensive, often taking many CPU weeks to run. To improve the speed of this search, multiple search algorithms arranged into a series of progressively more stringent filters can be used. In this paper, we present an FPGA based implementation of some of these algorithms. This is the first FPGA based approach to attempt to accelerate multiple filters used in ncRNA search. The FPGA is reconfigured for each filter, resulting in a total system speedup of 25x when compared with a single CPU.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"38 1 1","pages":"43-52"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88205453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An FPGA based parallel architecture for music melody matching 基于FPGA的音乐旋律匹配并行架构
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435305
Hao Wang, Jyh-Charn S. Liu
{"title":"An FPGA based parallel architecture for music melody matching","authors":"Hao Wang, Jyh-Charn S. Liu","doi":"10.1145/2435264.2435305","DOIUrl":"https://doi.org/10.1145/2435264.2435305","url":null,"abstract":"We propose an FPGA-based high performance parallel architecture for music retrieval through singing. The database consists of monophonic MIDI files which are modeled into strings, and the user sung query is modeled as a set of regular expressions (regexp), with consideration of possible key transpositions and tempo variations to tolerate imperfectly sung queries. An approximate regexp matching algorithm is developed to calculate the similarity between a regexp and a string, using edit distance as the metrics. The algorithm supports user sung queries starting anywhere in the database song, not necessarily from the beginning. Using the proposed formal models and algorithms, the similarity between the user sung query and each song in the database can be evaluated and the top-10 most similar results will be reported. We designed the approximate regexp matching algorithm in such way that all terms of the regexp can execute concurrently, which perfectly fits the massive parallelism provided by FPGA. The FPGA implemented melody matching engine (MME) is a parameterized modular architecture that can be reconfigured to implement different regexps by simply updating their parameter registers, and can therefore avoid the time-consuming code re-synthesis. MME also includes an on-board DDR2 memory to store the database, so that they can be read in to calculate edit distances locally on the board. This way, each MME forms a self-contained system and multiple MMEs can be clustered to increase parallel processing power, with virtually no overhead. MME is evaluated using the query corpus of ThinkIT with 355 sung files and database of 5563 MIDI files. It achieves a top-10 hit rate of 90.7% and a runtime of 19.4 seconds, averaging 54.6 milliseconds for a single query. MME achieves significant speedup over software-based systems while providing the same level of flexibility.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"4 5 1","pages":"235-244"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78847876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors (abstract only) TEA和XTEA算法在FPGA、GPU和多核处理器上的硬件加速(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435326
V. Venugopal, D. Shila
{"title":"Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors (abstract only)","authors":"V. Venugopal, D. Shila","doi":"10.1145/2435264.2435326","DOIUrl":"https://doi.org/10.1145/2435264.2435326","url":null,"abstract":"Field programmable gate arrays (FPGA) are extensively used for rapid prototyping in embedded system applications. While hardware acceleration can be done via specialized processors like a Graphical Processing Unit (GPU), they can also be accomplished with FPGAs for more specialized scenarios. GPUs essentially consist of massively parallel cores and have high memory bandwidth; FPGAs, on the other hand, provide flexibility in terms of customizable I/O and computational resources. In this paper, we explore the usage of GPUs and FPGAs as cryptographic co-processors in streaming dataflow systems with huge rate of data inhalation. Two classic lightweight encryption algorithms, Tiny Encryption Algorithm (TEA) and Extended Tiny Encryption Algorithm (XTEA), are targeted for implementation on GPUs and FPGAs. The GPU implementations of TEA and XTEA in this study depict a maximum speedup of 13x over CPU based implementation. The pipelined FPGA implementation is able to realize a throughput of 6-9x more than the GPU for small plaintext sizes.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"107 1","pages":"270"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79347328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信