2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines最新文献

筛选
英文 中文
The Effect of Compiler Optimizations on High-Level Synthesis for FPGAs 编译器优化对fpga高级合成的影响
Qijing Huang, Ruolong Lian, Andrew Canis, Jongsok Choi, R. Xi, S. Brown, J. Anderson
{"title":"The Effect of Compiler Optimizations on High-Level Synthesis for FPGAs","authors":"Qijing Huang, Ruolong Lian, Andrew Canis, Jongsok Choi, R. Xi, S. Brown, J. Anderson","doi":"10.1109/FCCM.2013.50","DOIUrl":"https://doi.org/10.1109/FCCM.2013.50","url":null,"abstract":"We consider the impact of compiler optimizations on the quality of high-level synthesis (HLS)-generated FPGA hardware. Using a HLS tool implemented within the state-of-the-art LLVM [1] compiler, we study the effect of compiler optimizations on the hardware metrics of circuit area, execution cycles, Fmax, and wall-clock time. We evaluate 56 different compiler optimizations implemented within LLVM and show that some optimizations significantly affect hardware quality. Moreover, we show that hardware quality is also affected by the order in which optimizations are applied. We then present a new HLS-directed approach to compiler optimizations, wherein we execute partial HLS and profiling at intermittent points in the optimization process and use the results to judiciously undo the impact of optimization passes predicted to be damaging to the generated hardware quality. Results show that our approach produces circuits with 16% better speed performance, on average, versus using the standard -O3 optimization level.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121452588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Hardware-Software Codesign for Embedded Numerical Acceleration 嵌入式数值加速的软硬件协同设计
Ranko Sredojevic, A. Wright, V. Stojanović
{"title":"Hardware-Software Codesign for Embedded Numerical Acceleration","authors":"Ranko Sredojevic, A. Wright, V. Stojanović","doi":"10.1109/FCCM.2013.27","DOIUrl":"https://doi.org/10.1109/FCCM.2013.27","url":null,"abstract":"In this work we aim to strike a balance between performance, power consumption and design effort for complex digital signal processing within the power and size constraints of embedded systems. Looking across the design stack, from algorithm formulation down to accelerator microarchitecture, we show that a high degree of flexibility and design reuse can be achieved without much performance sacrifice. The foundation of our design is a numerical accelerator template. Extensively parameterized, it allows us to develop the design while postponing microarchitectural decisions until program is known. Statically scheduling compiler provides a link between the algorithm and template instantiation parameters. Results show that the derived design can significantly outperform embedded processors for similar power cost and also approach the high-performance processor performance for a fraction of the power cost.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125512557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory Access Scheduling on the Convey HC-1 HC-1上的内存访问调度
Zheming Jin, J. Bakos
{"title":"Memory Access Scheduling on the Convey HC-1","authors":"Zheming Jin, J. Bakos","doi":"10.1109/FCCM.2013.55","DOIUrl":"https://doi.org/10.1109/FCCM.2013.55","url":null,"abstract":"In this paper we describe a technique for scheduling memory accesses to improve effective memory bandwidth on the Convey HC-1 platform.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115679441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Global Control and Storage Synthesis for a System Level Synthesis Approach 系统级综合方法的全局控制与存储综合
Shuo Li, Nasim Farahini, A. Hemani
{"title":"Global Control and Storage Synthesis for a System Level Synthesis Approach","authors":"Shuo Li, Nasim Farahini, A. Hemani","doi":"10.1109/FCCM.2013.61","DOIUrl":"https://doi.org/10.1109/FCCM.2013.61","url":null,"abstract":"SYLVA is a System Level Architectural Synthesis Framework that translates Synchronous Data Flow (SDF) models of DSP sub-systems like modems and codecs into hardware implementation in ASIC/Standard Cells, FPGAs or CGRAs (Coarse Grain Reconfigurable Fabric).","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131189903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Exploiting Input Parameter Uncertainty for Reducing Datapath Precision of SPICE Device Models 利用输入参数不确定性降低SPICE器件模型数据路径精度
Nachiket Kapre
{"title":"Exploiting Input Parameter Uncertainty for Reducing Datapath Precision of SPICE Device Models","authors":"Nachiket Kapre","doi":"10.1109/FCCM.2013.28","DOIUrl":"https://doi.org/10.1109/FCCM.2013.28","url":null,"abstract":"Double-precision computations operating on inputs with uncertainty margins can be compiled to lower precision fixed-point datapaths with no loss in output accuracy. We observe that ideal SPICE model equations based on device physics include process parameters which must be matched with real-world measurements on specific silicon manufacturing processes through a noisy data-fitting process. We expose this uncertainty information to the open-source FX-SCORE compiler to enable automated error analysis using the Gappa++ backend and hardware circuit generation using Vivado HLS. We construct an error model based on interval analysis to statically identify sufficient fixedpoint precision in the presence of uncertainty as compared to reference double-precision design. We demonstrate 1-16× LUT count improvements, 0.5-2.4× DSP count reductions and 0.9-4× FPGA power reduction for SPICE devices such as Diode, Level-1 MOSFET and an Approximate MOSFET designs. We generate confidence in our approach using Monte-Carlo simulations with auto-generated Matlab models of the SPICE device equations.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114678121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Reconfigurable Architecture for 1-D and 2-D Discrete Wavelet Transform 一维和二维离散小波变换的可重构结构
Qing Sun, Jiang Jiang, Yongxin Zhu, Yuzhuo Fu
{"title":"A Reconfigurable Architecture for 1-D and 2-D Discrete Wavelet Transform","authors":"Qing Sun, Jiang Jiang, Yongxin Zhu, Yuzhuo Fu","doi":"10.1109/FCCM.2013.23","DOIUrl":"https://doi.org/10.1109/FCCM.2013.23","url":null,"abstract":"In this paper, we propose a novel architecture for DWT that can be reconfigured to be adapted to different kinds of filter banks and different sizes of inputs. High flexibility and generality are achieved by using the MAC loop based filter(MLBF). Classic methods, such as polyphase structure and fragment-based sample consumption, are used to enhance the parallelism of the system. The architecture can be reconfigured to 3 modes to deal with 1-D or 2-D DWT with different bandwidth and throughput requirements.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115721599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Latency-Optimized Networks for Clustering FPGAs 集群fpga的延迟优化网络
Trevor Bunker, S. Swanson
{"title":"Latency-Optimized Networks for Clustering FPGAs","authors":"Trevor Bunker, S. Swanson","doi":"10.1109/FCCM.2013.49","DOIUrl":"https://doi.org/10.1109/FCCM.2013.49","url":null,"abstract":"The data-intensive applications that will shape computing in the coming decades require scalable architectures that incorporate scalable data and compute resources and can support random requests to unstructured (e.g., logs) and semi-structured (e.g., large graph, XML) data sets. To explore the suitability of FPGAs for these computations, we are constructing an FPGAbased system with a memory capacity of 512 GB from a collection of 32 Virtex-5 FPGAs spread across 8 enclosures. This paper describes our work in exploring alternative interconnect technologies and network topologies for FPGA-based clusters. The diverse interconnects combine inter-enclosure high-speed serial links and wide, single-ended intra-enclosure on-board traces with network topologies that balance network diameter, network throughput, and FPGA resource usage. We discuss the architecture of high-radix routers in FPGAs that optimize for the asymmetry between the interand intra-enclosure links. We analyze the various interconnects that aim to efficiently utilize the prototype's total switching capacity of 2.43 Tb/s. The networks we present have aggregate throughputs up to 51.4 GB/s for random traffic, diameters as low as 845 nanoseconds, and consume less than 12% of the FPGAs' logic resources.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124484267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Minerva: Accelerating Data Analysis in Next-Generation SSDs Minerva:加速下一代ssd的数据分析
Arup De, M. Gokhale, Rajesh K. Gupta, S. Swanson
{"title":"Minerva: Accelerating Data Analysis in Next-Generation SSDs","authors":"Arup De, M. Gokhale, Rajesh K. Gupta, S. Swanson","doi":"10.1109/FCCM.2013.46","DOIUrl":"https://doi.org/10.1109/FCCM.2013.46","url":null,"abstract":"Emerging non-volatile memory (NVM) technologies have DRAM-like latency with storage-like density, offering unique capability to analyze large data sets significantly faster than flash or disk storage. However, the hybrid nature of these NVM technologies such as phase-change memory (PCM) make it difficult to use them to best advantage in the memory-storage hierarchy. These NVMs lack the fast write latency required of DRAM and are thus not suitable as DRAM equivalent on the memory bus, yet their low latency even in random access patterns is not easily exploited over an I/O bus. In this work, we describe an FPGA-based system to execute application-specific operations in the NVM controller and evaluate its performance on two microbenchmarks and a keyvalue store. Our system Minerva1extends the conventional solidstate drive (SSD) architecture to offload data or I/O intensive application code to the SSD to exploit the low latency and high internal bandwidth of NVMs. Performing computation in the FPGA-based NVM storage controller significantly reduces data traffic between the host and storage and serves as an offload engine for data analysis workloads. A runtime library enables the programmer to offload computations to the SSD without dealing with the complications of the underlying architecture and inter-controller communication management. We have implemented a prototype of Minerva on the BEE3 FPGA system. We compare the performance of Minerva to a state of the art PCIe-attached PCM-based SSD. Minerva improves performance by an order of magnitude on two microbenchmarks. Minerva based key-value store performs up to 5.2 M get operations/s and 4.0 M set operations/s which is 7.45× and 9.85× higher than the PCM-based SSD that uses the conventional I/O architecture. This huge improvement comes from the reduction of data transfer between the storage to the host and the FPGA-based data processing in the SSD.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116611081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
An Evaluation of High-Performance Embedded Processing on MPPAs MPPAs的高性能嵌入式处理评价
Zain-ul-Abdin, B. Svensson
{"title":"An Evaluation of High-Performance Embedded Processing on MPPAs","authors":"Zain-ul-Abdin, B. Svensson","doi":"10.1109/FCCM.2013.44","DOIUrl":"https://doi.org/10.1109/FCCM.2013.44","url":null,"abstract":"Embedded signal processing is facing the challenges of increased performance as well as to achieve energy efficiency. Massively parallel processor arrays (MPPAs) consisting of hundreds of processing cores offer the possibility of meeting the growing performance demand in an energy efficient way by exploiting parallelism instead of scaling the clock frequency of a single processor. In this paper we evaluate two selected commercial architectures belonging to the category of MPPA. The adopted approach for the evaluation is to implement a real, industrial application in the form of compute-intensive parts of Synthetic Aperture Radar (SAR) systems.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134233826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Hardware Communication on a Heterogeneous Computing System 硬件通信对异构计算系统的影响
Shanyuan Gao, Bin Huang, R. Sass
{"title":"The Impact of Hardware Communication on a Heterogeneous Computing System","authors":"Shanyuan Gao, Bin Huang, R. Sass","doi":"10.1109/FCCM.2013.43","DOIUrl":"https://doi.org/10.1109/FCCM.2013.43","url":null,"abstract":"This paper designed a MPI-like Message Passing Engine (MPE) as part of the on-chip network, providing point-to-point and collective communication primitives in hardware. On one hand, the MPE offloads the communication workload from the general processing elements. On the other hand, the MPE provides direct interface to the heterogeneous processing elements which can eliminate the data path going around the OS and libraries. The experimental results have shown that the MPE can significantly reduce the communication time and improve the overall performance, especially for heterogeneous computing systems.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127057712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信