2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines最新文献

Remote Execution in Distributed Memory MPSoC 分布式内存MPSoC中的远程执行

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.30

Rémi Busseuil, Luciano Ost, Rafael Garibotti, G. Sassatelli, M. Robert

引用次数: 1

Efficient Query Processing for Web Search Engine with FPGAs 基于fpga的Web搜索引擎高效查询处理

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.28

Jing Yan, Zhanxiang Zhao, Ningyi Xu, Xi Jin, Lingmin Zhang, Feng-Hsiung Hsu

引用次数: 12

Designing Flexible Reconfigurable Regions to Relocate Partial Bitstreams 设计灵活的可重构区域来重新定位部分比特流

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.51

Y. Ichinomiya, Sadaki Usagawa, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi

引用次数: 10

Short-Read Mapping by a Systolic Custom FPGA Computation 短读映射的收缩自定义FPGA计算

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.37

Thomas B. Preußer, Oliver Knodel, R. Spallek

{"title":"Short-Read Mapping by a Systolic Custom FPGA Computation","authors":"Thomas B. Preußer, Oliver Knodel, R. Spallek","doi":"10.1109/FCCM.2012.37","DOIUrl":"https://doi.org/10.1109/FCCM.2012.37","url":null,"abstract":"The mapping of reads, i.e. short DNA base pair strings, to large genome databases has become a critical operation for genetic analysis and diagnosis. Although this mapping operation is a simple string search tolerant of some character mismatches, it is yet extremely challenging due to the tremendous size of the searched genome databases. It is the heavy use of search heuristics such as BLAST, Maq and Bowtie, which makes the economic deployment of read mappers possible. While these heuristics achieve feasible computation times, they also sacrifice the accuracy of the mapping results, which is itself a high value for reliable diagnostics. The traditional software implementations are unable to exploit the tremendous parallelism, which is available in the mapping of thousands and millions of reads. Merely a handful of concurrent control flows, and thus searches, can be performed efficiently on contemporary multicores. Even GPU assistance only enables a few dozens of parallel searches. This paper proposes a systolic custom computation on FPGA, which implements the read mapping on a massively parallel architecture. It implements a true search and guarantees to find all read mappings under a configurable threshold of base pair mismatches. The highly regular design from compact string matchers enables the implementation of thousands of parallel search engines on a single FPGA device. The presented map per platform combines highest computational performance with an excellent result accuracy. Its performance is more than twice as high as that of a recently published comparable FPGA map per. Already when implemented on a contemporary mid-size FPGA, it meets the search speed of software heuristics, which only detect little more than half of the valid read mappings. The map per easily scales to large FPGA devices, which can, thus, implement accurate high-performance volume mappers. Accurate mapping is made available in application domains that could only afford fuzzy heuristics by now.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116009904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

PATS: A Performance Aware Task Scheduler for Runtime Reconfigurable Processors 用于运行时可重构处理器的性能感知任务调度程序

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.43

L. Bauer, Artjom Grudnitsky, M. Shafique, J. Henkel

{"title":"PATS: A Performance Aware Task Scheduler for Runtime Reconfigurable Processors","authors":"L. Bauer, Artjom Grudnitsky, M. Shafique, J. Henkel","doi":"10.1109/FCCM.2012.43","DOIUrl":"https://doi.org/10.1109/FCCM.2012.43","url":null,"abstract":"Multi-tasking is one of the main requirements for complex embedded systems to fulfill user expectations (e.g. flexibility of the system), increase the resource utilization, and thus increase the system efficiency. In general, the flexibility and efficiency can be increased by incorporating a fine-grained reconfigurable fabric (e.g. an embedded FPGA) that is coupled with a general-purpose processor and accelerates the computationally intensive kernels. This work focuses on reconfigurable processors that use a reconfigurable fabric to implement Special Instructions (SIs) that are invoked by the processor and process data-dominant parts. For each SI the decision whether it is executed in hardware or emulated in software can be changed dynamically at runtime. In this paper, we present our novel Performance Aware Task Scheduler (PATS) that decides the task schedule at runtime while considering the specific system state of the reconfigurable processor. For instance, if a task t has to emulate several SI executions in software because reconfiguring the corresponding hardware implementations is not completed yet, then it might be more efficient to schedule other tasks first, depending on the soft-deadlines of the tasks, until the reconfigurations of that task t are completed. In comparison to other task schedulers (earliest deadline first, rate monotonic scheduling, and round robin), PATS achieves on average a 1.45x better system tardiness (i.e., the sum of cycles by which tasks miss their deadlines). Additionally, PATS reduces the make span (i.e. the time when all tasks have completed all of their jobs) on average by 1.17x (up to 1.58x). Especially in challenging multi-tasking scenarios with tight deadlines or a small reconfigurable fabric PATS performs significantly better than other task schedulers do.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124039459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

ZUMA: An Open FPGA Overlay Architecture ZUMA:一个开放的FPGA覆盖架构

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.25

Alexander Brant, G. Lemieux

引用次数: 113

Groundhog - A Serial ATA Host Bus Adapter (HBA) for FPGAs 土拨鼠-一个串行ATA主机总线适配器(HBA)的fpga

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.45

L. Woods, Ken Eguro

引用次数: 20

FX-SCORE: A Framework for Fixed-Point Compilation of SPICE Device Models Using Gappa++ FX-SCORE:使用gappa++的SPICE设备模型的定点编译框架

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.23

Hélène Martorell, Nachiket Kapre

{"title":"FX-SCORE: A Framework for Fixed-Point Compilation of SPICE Device Models Using Gappa++","authors":"Hélène Martorell, Nachiket Kapre","doi":"10.1109/FCCM.2012.23","DOIUrl":"https://doi.org/10.1109/FCCM.2012.23","url":null,"abstract":"Automated, offline precision-analysis of dataflow computation containing elementary functions (e.g. exp) and if-then-else control flow operations enables accurate fixed-point FPGA implementation of SPICE device equations. We perform interval analysis of these equations using Gappa++ to statically compare error bounds of fixed-point and double-precision implementations. This is possible due to the limited dynamic range of physical voltage, current and conductance quantities in a SPICE simulation of real-world circuits. In contrast to previous custom-precision SPICE device mappings, our fixed-point implementation has the same accuracy as double-precision implementation when compared to ideal arithmetic (reals). To deliver these implementations we develop FX-SCORE, a high-level framework based on the SCORE streaming FPGA framework, that automatically generates Gappa++ scripts and AutoESL circuits to explore the cost-quality tradeoffs of Fixed-point FPGA implementations. Using our methodology, we can determine whether fixed-point is always better than a double-precision implementation at the same relative error. We demonstrate 35% geometric mean area improvement for different SPICE device models such as Diode, Level-1 MOSFET and an Approximate MOSFET when comparing custom fixed-point implementations with standard double-precision realizations.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"3 17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129576680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Go Ahead: A Partial Reconfiguration Framework 继续:部分重构框架

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.17

Christian Beckhoff, Dirk Koch, J. Tørresen

引用次数: 145

Fast Multi-Objective Algorithmic Design Co-Exploration for FPGA-based Accelerators 基于fpga的加速器快速多目标算法设计协同探索

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI: 10.1109/FCCM.2012.21

Kumud Nepal, O. Ulusel, R. I. Bahar, S. Reda

{"title":"Fast Multi-Objective Algorithmic Design Co-Exploration for FPGA-based Accelerators","authors":"Kumud Nepal, O. Ulusel, R. I. Bahar, S. Reda","doi":"10.1109/FCCM.2012.21","DOIUrl":"https://doi.org/10.1109/FCCM.2012.21","url":null,"abstract":"The reconfigurability of Field Programmable Gate Arrays (FPGAs) makes them an attractive platform for accelerating algorithms. Accelerating a particular algorithm is a challenging task as the large number of possible algorithmic and hardware design parameters lead to different accelerator variant implementations, each with its own metrics such as performance, area, power, and arithmetic accuracy characteristics. To identify these parameters that optimize the accelerator for certain metrics, we propose techniques for fast design space exploration and non-linear multi-objective optimization (e.g., minimize power under arithmetic inaccuracy bounds). Our methodology samples a small part of the design space and uses measurements from the sampled implementations to train mathematical models for the different metrics. To automate and improve the model generation process, we propose the use of L1-regularized least squares regression techniques. To demonstrate the effectiveness of our approach, we implement a high-throughput real-time accelerator for image debluring. We demonstrate the accuracy (e.g., within 8% for power modeling) of our modeling techniques and their ability to identify the optimal accelerator designs with large speed-ups (340×) in comparison to brute-force enumeration.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124174079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8