2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines最新文献

筛选
英文 中文
Remote Execution in Distributed Memory MPSoC 分布式内存MPSoC中的远程执行
Rémi Busseuil, Luciano Ost, Rafael Garibotti, G. Sassatelli, M. Robert
{"title":"Remote Execution in Distributed Memory MPSoC","authors":"Rémi Busseuil, Luciano Ost, Rafael Garibotti, G. Sassatelli, M. Robert","doi":"10.1109/FCCM.2012.30","DOIUrl":"https://doi.org/10.1109/FCCM.2012.30","url":null,"abstract":"Message-passing is an increasingly popular design style for MPSoCs that usually results in systems that perform better compared to external shared-memory designs performance and power-wise, this because of much decreased data transfers with external memory. This scheme relies on explicit communications between processing tasks that participate in the application. Contrarily to shared-memory multiprocessors, tasks usually get assigned to processors at design-time. In order to cope with transient performance losses originating from various phenomena such as increased processing workload or peak traffic in the communication subsystem, various adaptation mechanisms based on task migration have been proposed in the literature. As Message-passing systems usually use PE-private memory architecture, these mechanisms imply migrating application code from processor to processor, which incurs penalty in performance and power consumption. This paper proposes a local shared-memory strategy in which processors execute code hosted in a remote processor.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124657018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Query Processing for Web Search Engine with FPGAs 基于fpga的Web搜索引擎高效查询处理
Jing Yan, Zhanxiang Zhao, Ningyi Xu, Xi Jin, Lingmin Zhang, Feng-Hsiung Hsu
{"title":"Efficient Query Processing for Web Search Engine with FPGAs","authors":"Jing Yan, Zhanxiang Zhao, Ningyi Xu, Xi Jin, Lingmin Zhang, Feng-Hsiung Hsu","doi":"10.1109/FCCM.2012.28","DOIUrl":"https://doi.org/10.1109/FCCM.2012.28","url":null,"abstract":"Web search engines are now using tens of thousands of index servers that consume huge amount of power. In this paper, we investigate FPGAs as the implementation platform for power efficient index serving. We propose the architecture of an FPGA-based inverted index search engine, as well as implementations of essential components, including decoder, matcher and ranker. We successfully boot up the FPGA-based search engine and run experiments on real-world data from a commercial search engine. The targeted FPGA-based hardware index server could achieve up to 19.52X power efficiency and 7.17X price efficiency over an Intel Xeon server with highly optimized software. This is the first complete work using FPGAs to implement query processing for Web search engines.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132236451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Designing Flexible Reconfigurable Regions to Relocate Partial Bitstreams 设计灵活的可重构区域来重新定位部分比特流
Y. Ichinomiya, Sadaki Usagawa, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi
{"title":"Designing Flexible Reconfigurable Regions to Relocate Partial Bitstreams","authors":"Y. Ichinomiya, Sadaki Usagawa, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi","doi":"10.1109/FCCM.2012.51","DOIUrl":"https://doi.org/10.1109/FCCM.2012.51","url":null,"abstract":"Current commercial SRAM-based FPGAs, such as Virtex-6 and Stratix-V, can perform dynamic partial reconfiguration (DPR). Partial reconfiguration (PR) can change a part of the device without reconfiguring the whole chip. Thus, we can switch the part of system with continuing the operation. However, the authorized design flow by Xilinx creates different PR bit stream (PRB) for each partially reconfigurable region (PRR) even if it is the same circuit. This indicates that N × M PRBs must be prepared to implement M types modules on N PRRs. This increases design time and memory usage to store PRBs. This paper presents a uniforming design technique for PRRs to relocate a PRB among them. In addition, uniformed PRRs can be used to implement large module by combining adjacent PRRs. In this work, we use Xilinx Virtex-6 XC6VLX240T and Integrated Software Environment 13.3 (ISE) to verify the proposed technique.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115803896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Short-Read Mapping by a Systolic Custom FPGA Computation 短读映射的收缩自定义FPGA计算
Thomas B. Preußer, Oliver Knodel, R. Spallek
{"title":"Short-Read Mapping by a Systolic Custom FPGA Computation","authors":"Thomas B. Preußer, Oliver Knodel, R. Spallek","doi":"10.1109/FCCM.2012.37","DOIUrl":"https://doi.org/10.1109/FCCM.2012.37","url":null,"abstract":"The mapping of reads, i.e. short DNA base pair strings, to large genome databases has become a critical operation for genetic analysis and diagnosis. Although this mapping operation is a simple string search tolerant of some character mismatches, it is yet extremely challenging due to the tremendous size of the searched genome databases. It is the heavy use of search heuristics such as BLAST, Maq and Bowtie, which makes the economic deployment of read mappers possible. While these heuristics achieve feasible computation times, they also sacrifice the accuracy of the mapping results, which is itself a high value for reliable diagnostics. The traditional software implementations are unable to exploit the tremendous parallelism, which is available in the mapping of thousands and millions of reads. Merely a handful of concurrent control flows, and thus searches, can be performed efficiently on contemporary multicores. Even GPU assistance only enables a few dozens of parallel searches. This paper proposes a systolic custom computation on FPGA, which implements the read mapping on a massively parallel architecture. It implements a true search and guarantees to find all read mappings under a configurable threshold of base pair mismatches. The highly regular design from compact string matchers enables the implementation of thousands of parallel search engines on a single FPGA device. The presented map per platform combines highest computational performance with an excellent result accuracy. Its performance is more than twice as high as that of a recently published comparable FPGA map per. Already when implemented on a contemporary mid-size FPGA, it meets the search speed of software heuristics, which only detect little more than half of the valid read mappings. The map per easily scales to large FPGA devices, which can, thus, implement accurate high-performance volume mappers. Accurate mapping is made available in application domains that could only afford fuzzy heuristics by now.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116009904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
PATS: A Performance Aware Task Scheduler for Runtime Reconfigurable Processors 用于运行时可重构处理器的性能感知任务调度程序
L. Bauer, Artjom Grudnitsky, M. Shafique, J. Henkel
{"title":"PATS: A Performance Aware Task Scheduler for Runtime Reconfigurable Processors","authors":"L. Bauer, Artjom Grudnitsky, M. Shafique, J. Henkel","doi":"10.1109/FCCM.2012.43","DOIUrl":"https://doi.org/10.1109/FCCM.2012.43","url":null,"abstract":"Multi-tasking is one of the main requirements for complex embedded systems to fulfill user expectations (e.g. flexibility of the system), increase the resource utilization, and thus increase the system efficiency. In general, the flexibility and efficiency can be increased by incorporating a fine-grained reconfigurable fabric (e.g. an embedded FPGA) that is coupled with a general-purpose processor and accelerates the computationally intensive kernels. This work focuses on reconfigurable processors that use a reconfigurable fabric to implement Special Instructions (SIs) that are invoked by the processor and process data-dominant parts. For each SI the decision whether it is executed in hardware or emulated in software can be changed dynamically at runtime. In this paper, we present our novel Performance Aware Task Scheduler (PATS) that decides the task schedule at runtime while considering the specific system state of the reconfigurable processor. For instance, if a task t has to emulate several SI executions in software because reconfiguring the corresponding hardware implementations is not completed yet, then it might be more efficient to schedule other tasks first, depending on the soft-deadlines of the tasks, until the reconfigurations of that task t are completed. In comparison to other task schedulers (earliest deadline first, rate monotonic scheduling, and round robin), PATS achieves on average a 1.45x better system tardiness (i.e., the sum of cycles by which tasks miss their deadlines). Additionally, PATS reduces the make span (i.e. the time when all tasks have completed all of their jobs) on average by 1.17x (up to 1.58x). Especially in challenging multi-tasking scenarios with tight deadlines or a small reconfigurable fabric PATS performs significantly better than other task schedulers do.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124039459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
ZUMA: An Open FPGA Overlay Architecture ZUMA:一个开放的FPGA覆盖架构
Alexander Brant, G. Lemieux
{"title":"ZUMA: An Open FPGA Overlay Architecture","authors":"Alexander Brant, G. Lemieux","doi":"10.1109/FCCM.2012.25","DOIUrl":"https://doi.org/10.1109/FCCM.2012.25","url":null,"abstract":"This paper presents the ZUMA open FPGA overlay architecture. It is an open-source, cross-compatible embedded FPGA architecture that is intended to overlay on top of an existing FPGA, in essence an ”FPGA-on-an-FPGA.” This approach has a number of benefits, including bitstream compatibility between different vendors and parts, compatibility with open FPGA tool Hows, and the ability to embed some programmable logic into systems on FPGAs without the need for releasing or recompiling the master netlist. These options can enhance design possibilities and improve designer productivity. Previous attempts to map an FPGA architecture into a commercial FPGA have had an area penalty of 100x at best [4]. Through careful architectural and implementation choices to exploit low-level elements of the host architecture, ZUMA reduces this penalty to as low as 40x. Using the VTR (VPR6) academic tool How, we have been able to compile the entire MCNC benchmark suite to ZUMA. We invite authors of other tool Hows to target ZUMA.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124846526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 113
Groundhog - A Serial ATA Host Bus Adapter (HBA) for FPGAs 土拨鼠-一个串行ATA主机总线适配器(HBA)的fpga
L. Woods, Ken Eguro
{"title":"Groundhog - A Serial ATA Host Bus Adapter (HBA) for FPGAs","authors":"L. Woods, Ken Eguro","doi":"10.1109/FCCM.2012.45","DOIUrl":"https://doi.org/10.1109/FCCM.2012.45","url":null,"abstract":"This paper describes Groundhog, an open-source SATA host bus adapter (HBA) for FPGAs. This system makes it easy for FPGA-based applications to directly interact with permanent storage devices. This allows reconfigurable computing devices to be used in new applications that require bulk storage and presents additional opportunities to increase performance, reduce power consumption and improve system integration. In addition to standard disk sector read/write commands, this framework also supports more advanced concepts such as native command queuing (NCQ) introduced with SATA II. We test the system with latest-generation SSDs and demonstrate the potential performance advantages and trade-offs of direct hardware access to bulk storage devices.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129566091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
FX-SCORE: A Framework for Fixed-Point Compilation of SPICE Device Models Using Gappa++ FX-SCORE:使用gappa++的SPICE设备模型的定点编译框架
Hélène Martorell, Nachiket Kapre
{"title":"FX-SCORE: A Framework for Fixed-Point Compilation of SPICE Device Models Using Gappa++","authors":"Hélène Martorell, Nachiket Kapre","doi":"10.1109/FCCM.2012.23","DOIUrl":"https://doi.org/10.1109/FCCM.2012.23","url":null,"abstract":"Automated, offline precision-analysis of dataflow computation containing elementary functions (e.g. exp) and if-then-else control flow operations enables accurate fixed-point FPGA implementation of SPICE device equations. We perform interval analysis of these equations using Gappa++ to statically compare error bounds of fixed-point and double-precision implementations. This is possible due to the limited dynamic range of physical voltage, current and conductance quantities in a SPICE simulation of real-world circuits. In contrast to previous custom-precision SPICE device mappings, our fixed-point implementation has the same accuracy as double-precision implementation when compared to ideal arithmetic (reals). To deliver these implementations we develop FX-SCORE, a high-level framework based on the SCORE streaming FPGA framework, that automatically generates Gappa++ scripts and AutoESL circuits to explore the cost-quality tradeoffs of Fixed-point FPGA implementations. Using our methodology, we can determine whether fixed-point is always better than a double-precision implementation at the same relative error. We demonstrate 35% geometric mean area improvement for different SPICE device models such as Diode, Level-1 MOSFET and an Approximate MOSFET when comparing custom fixed-point implementations with standard double-precision realizations.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"3 17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129576680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Go Ahead: A Partial Reconfiguration Framework 继续:部分重构框架
Christian Beckhoff, Dirk Koch, J. Tørresen
{"title":"Go Ahead: A Partial Reconfiguration Framework","authors":"Christian Beckhoff, Dirk Koch, J. Tørresen","doi":"10.1109/FCCM.2012.17","DOIUrl":"https://doi.org/10.1109/FCCM.2012.17","url":null,"abstract":"Exploiting the benefits of partial run-time reconfiguration requires efficient tools. In this paper, we introduce the tool Go Ahead that is able to implement run-time reconfigurable systems for all recent Xilinx FPGAs. This includes in particular support for low cost and low power Spartan-6 FPGAs. Go Ahead assists during floor planning and automates the constraint generation. It interacts with the Xilinx vendor tools and triggers the physical implementation phases all the way down to the final configuration bit streams. Go Ahead enables the building of flexible systems for integrating many reconfigurable modules very efficiently into a system. The tool targets (re)usability, portability to future devices, and migration paths among reconfigurable systems featuring different FPGAs or even FPGA families. Moreover, it provides a scripting interface and all features can be accessed remotely.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121362865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 145
Fast Multi-Objective Algorithmic Design Co-Exploration for FPGA-based Accelerators 基于fpga的加速器快速多目标算法设计协同探索
Kumud Nepal, O. Ulusel, R. I. Bahar, S. Reda
{"title":"Fast Multi-Objective Algorithmic Design Co-Exploration for FPGA-based Accelerators","authors":"Kumud Nepal, O. Ulusel, R. I. Bahar, S. Reda","doi":"10.1109/FCCM.2012.21","DOIUrl":"https://doi.org/10.1109/FCCM.2012.21","url":null,"abstract":"The reconfigurability of Field Programmable Gate Arrays (FPGAs) makes them an attractive platform for accelerating algorithms. Accelerating a particular algorithm is a challenging task as the large number of possible algorithmic and hardware design parameters lead to different accelerator variant implementations, each with its own metrics such as performance, area, power, and arithmetic accuracy characteristics. To identify these parameters that optimize the accelerator for certain metrics, we propose techniques for fast design space exploration and non-linear multi-objective optimization (e.g., minimize power under arithmetic inaccuracy bounds). Our methodology samples a small part of the design space and uses measurements from the sampled implementations to train mathematical models for the different metrics. To automate and improve the model generation process, we propose the use of L1-regularized least squares regression techniques. To demonstrate the effectiveness of our approach, we implement a high-throughput real-time accelerator for image debluring. We demonstrate the accuracy (e.g., within 8% for power modeling) of our modeling techniques and their ability to identify the optimal accelerator designs with large speed-ups (340×) in comparison to brute-force enumeration.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124174079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信