FPGA. ACM International Symposium on Field-Programmable Gate Arrays最新文献_第7页

GROK-LAB: generating real on-chip knowledge for intra-cluster delays using timing extraction GROK-LAB:使用时序提取为簇内延迟生成真正的片上知识

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435281

Benjamin Gojman, Sirisha Nalmela, Nikil Mehta, N. Howarth, A. DeHon

{"title":"GROK-LAB: generating real on-chip knowledge for intra-cluster delays using timing extraction","authors":"Benjamin Gojman, Sirisha Nalmela, Nikil Mehta, N. Howarth, A. DeHon","doi":"10.1145/2435264.2435281","DOIUrl":"https://doi.org/10.1145/2435264.2435281","url":null,"abstract":"Timing Extraction identifies the delay of fine-grained components within an FPGA. From these computed delays, the delay of any path can be calculated. Moreover, a comparison of the fine-grained delays allows a detailed understanding of the amount and type of process variation that exists in the FPGA. To obtain these delays, Timing Extraction measures, using only resources already available in the FPGA, the delay of a small subset of the total paths in the FPGA. We apply Timing Extraction to the Logic Array Block (LAB) on an Altera Cyclone III FPGA to obtain a view of the delay down to near individual LUT granularity, characterizing components with delays on the order of a few hundred picoseconds with a resolution of ±3.2 ps. This information reveals that the 65 nm process used has, on average, random variation of Ã/¼ = 4.0% with components having an average maximum spread of 83 ps. Timing extraction also shows that as VDD decreases from 1.2 V to 0.9 V in a Cyclone IV 60 nm FPGA, paths slow down and variation increases from Ã/¼ = 4.3% to Ã/¼ = 5.8%, a clear indication that lowering VDD magnifies the impact of random variation.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"216 1","pages":"81-90"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75143661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

High performance architecture for object detection in streamed video (abstract only) 流视频中目标检测的高性能架构(仅抽象)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435319

P. Zemčík, Roman Juránek, Petr Musil, M. Musil, Michal Hradiš

引用次数: 1

Area-efficient near-associative memories on FPGAs fpga上的区域高效近联想存储器

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435298

Udit Dhawan, A. DeHon

引用次数: 10

Polyhedral-based data reuse optimization for configurable computing 基于多面体的可配置计算数据重用优化

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435273

L. Pouchet, Peng Zhang, P. Sadayappan, J. Cong

{"title":"Polyhedral-based data reuse optimization for configurable computing","authors":"L. Pouchet, Peng Zhang, P. Sadayappan, J. Cong","doi":"10.1145/2435264.2435273","DOIUrl":"https://doi.org/10.1145/2435264.2435273","url":null,"abstract":"Many applications, such as medical imaging, generate intensive data traffic between the FPGA and off-chip memory. Significant improvements in the execution time can be achieved with effective utilization of on-chip (scratchpad) memories, associated with careful software-based data reuse and communication scheduling techniques. We present a fully automated C-to-FPGA framework to address this problem. Our framework effectively implements data reuse through aggressive loop transformation-based program restructuring. In addition, our proposed framework automatically implements critical optimizations for performance such as task-level parallelization, loop pipelining, and data prefetching.\u0000 We leverage the power and expressiveness of the polyhedral compilation model to develop a multi-objective optimization system for off-chip communications management. Our technique can satisfy hardware resource constraints (scratchpad size) while still aggressively exploiting data reuse. Our approach can also be used to reduce the on-chip buffer size subject to bandwidth constraint. We also implement a fast design space exploration technique for effective optimization of program performance using the Xilinx high-level synthesis tool.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"15 1","pages":"29-38"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78881712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 163

Effect of fixed-point arithmetic on deep belief networks (abstract only) 不动点算法对深度信念网络的影响(仅摘要)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435331

Jingfei Jiang, Rongdong Hu, M. Luján

{"title":"Effect of fixed-point arithmetic on deep belief networks (abstract only)","authors":"Jingfei Jiang, Rongdong Hu, M. Luján","doi":"10.1145/2435264.2435331","DOIUrl":"https://doi.org/10.1145/2435264.2435331","url":null,"abstract":"Deep Belief Networks (DBNs) are state-of-the-art learning algorithms building on a subset of neural networks, Restricted Boltzmann Machine (RBM). DBNs are computationally intensive posing the question of whether DBNs can be FPGA accelerated. Fixed-point arithmetic can have an important influence on the execution time and prediction accuracy of a DBN. Previous studies have focused only on customized RBM accelerators with a fixed data-width. Our results experiments demonstrate that variable data-widths can obtain similar performance levels. We can also observe that the most suitable data-widths for different types of DBN are not unique or fixed. From this we conclude that a DBN accelerator should support various data-widths rather than only fixed one as done in previous work. The processing performance of DBN accelerators in FPGA is almost always constrained not by the capacity of the processing units, but by their on-chip RAM capacity and speed. We propose an efficient memory sub-system combining junction and padding methods to reduce bandwidth usage for DBN accelerators, which shows that supporting various data-widths is not as difficult as it may sound. The cost is only little in hardware terms and does not affect the critical path. We design a generation tool to help users reconfiguring the memory sub-system with arbitrary data-width flexibly. Our tool can also be used as an advanced IP core generator above FPGA memory controller supporting parallel memory access in irregular data-width for other applications.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"11 1","pages":"273"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76427075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Indirect connection aware attraction for FPGA clustering (abstract only) FPGA集群的间接连接感知吸引(仅摘要)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435310

Meng Yang, J. Tong, A. Almaini

{"title":"Indirect connection aware attraction for FPGA clustering (abstract only)","authors":"Meng Yang, J. Tong, A. Almaini","doi":"10.1145/2435264.2435310","DOIUrl":"https://doi.org/10.1145/2435264.2435310","url":null,"abstract":"Indirect connection aware attraction clustering algorithm is proposed for clustered field programmable gate array architecture model to achieve simultaneously optimization of several performance metrics. A new cost function considers the attraction of the subsequent basic logic elements (BLEs) to the selected cluster, the number of the used pins already in the cluster, as well as critical path delay. The attractions of which BLEs are directly and indirectly connected to the selected cluster are taken into account. As a result, more external nets are absorbed into clusters, less number of pins per cluster and fewer clusters are required. Hence, smaller channel width is required for routing and speed of the design is improved. Performance comparisons are carried out in details with respect to state-of-the-art clustering techniques interconnect resource aware clustering (iRAC) and many-objective clustering (MO-Pack). Results show that the proposed algorithm outperforms these two clustering approaches with achievements of 38.8% and 42.2% respectively in terms of channel widths and 40.1% and 44.8% respectively in terms of number of external nets but with no critical path and area overhead.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"41 1","pages":"265"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75509375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Rectification of advanced microprocessors without changing routing on FPGAs (abstract only) 不改变fpga路由的高级微处理器整流(仅摘要)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435347

Satoshi Jo, A. M. Gharehbaghi, Takeshi Matsumoto, M. Fujita

{"title":"Rectification of advanced microprocessors without changing routing on FPGAs (abstract only)","authors":"Satoshi Jo, A. M. Gharehbaghi, Takeshi Matsumoto, M. Fujita","doi":"10.1145/2435264.2435347","DOIUrl":"https://doi.org/10.1145/2435264.2435347","url":null,"abstract":"We propose a method for rectification of bugs in microprocessors that are implemented on FPGAs, by only changing the configuration of LUTs, without any modification to the routing. Therefore, correcting the bugs does not require resynthesis, which can be very long for complex microprocessors due to possible timing closure problems. As the structure of the circuit is preserved, correcting the bugs does not affect the timings of the circuit. In design phase, we may add additional LUTs to the original circuit, so that we can use them in the correction phase. After a bug is found, we perform the following two tasks. Fist, we find the candidate control signals as well as the required change to correct their behavior. This is done by using symbolic simulation and equivalency checking between the formal specification and the erroneous formal model of the processor. Then, we try to map the corrected functionality into the existing LUT structure. This is done by a novel method that formulates the problem as a QBF (Quantified Boolean Formula) problem, and solves it by repeatedly applying normal SAT solvers instead of QBF solvers under a CEGAR (Counter Example Guided Abstraction Refinement) paradigm. We show effectiveness of our method by correcting bugs in two complex out-of-order superscalar processors with two different timing error recovery mechanisms.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"111 1","pages":"279"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83737632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automating resource optimisation in reconfigurable design (abstract only) 在可重构设计中自动化资源优化(仅抽象)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435338

Xinyu Niu, T. Chau, Qiwei Jin, W. Luk, Qiang Liu

引用次数: 3

Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations 动态依赖分析FPGA管道融合和局部性优化

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435300

J. Fowers, G. Stitt

{"title":"Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations","authors":"J. Fowers, G. Stitt","doi":"10.1145/2435264.2435300","DOIUrl":"https://doi.org/10.1145/2435264.2435300","url":null,"abstract":"Although high-level synthesis improves FPGA productivity by enabling designers to use high-level code, the resulting performance is often significantly worse than register-transfer-level designs. One cause of such limited optimization is that high-level synthesis tools are restricted by multiple possible dependencies due to the undecidability of alias analysis. In this paper, we introduce the Dynafuse optimization, which analyzes dependencies dynamically to resolve aliases and enable runtime circuit optimizations. To resolve aliases, Dynafuse provides a specialized software data structure that dynamically determines definition-use chains between FPGA functions. In addition, Dynafuse statically creates a reconfigurable overlay network that uses detected dependencies to dynamically adjust connections between functions and memories in order to fuse pipelines and exploit data locality. Experimental results show that Dynafuse sped up two existing FPGA applications by 1.6-1.8x when exploiting locality and by 3-5x when fusing pipelines. Furthermore, the speedup from pipeline fusion increases linearly with the number of fused functions, which suggests larger applications will experience larger improvements.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"26 1","pages":"201-210"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86928287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Elastic CGRAs 弹性CGRAs

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435296

Yuanjie Huang, P. Ienne, O. Temam, Yunji Chen, Chengyong Wu

{"title":"Elastic CGRAs","authors":"Yuanjie Huang, P. Ienne, O. Temam, Yunji Chen, Chengyong Wu","doi":"10.1145/2435264.2435296","DOIUrl":"https://doi.org/10.1145/2435264.2435296","url":null,"abstract":"Vital technology trends such as voltage scaling and homogeneous multicore scaling have reached their limits and architects turn to alternate computing paradigms, such as heterogeneous and domain-specialized solutions. Coarse-Grain Reconfigurable Arrays (CGRAs) promise the performance of massively spatial computing while offering interesting trade-offs of flexibility versus energy efficiency. Yet, configuring and scheduling execution for CGRAs generally runs into the classic difficulties that have hampered Very-Long Instruction Word (VLIW) architectures: efficient schedules are difficult to generate, especially for applications with complex control flow and data structures, and they are inherently static - thus, in adapted to variable-latency components (such as the read ports of caches). Over the years, VLIWs have been relegated to important but specific application domains where such issues are more under the control of the designers; similarly, statically-scheduled CGRAs may prove inadequate for future general-purpose computing systems. In this paper, we introduce Elastic CGRAs, the superscalar processors of computing fabrics: no complex schedule needs to be computed at configuration time, and the operations execute dynamically in the CGRA when data are ready, thus exploiting the data parallelism that an application offers. We designed, down to a manufacturable layout, a simple CGRA where we demonstrated and optimized our elastic control circuitry. We also built a complete compilation toolchain that transforms arbitrary C code in a configuration for the array. The area overhead (26.2%), critical path overhead (8.2%) and energy overhead (53.6%) of Elastic CGRAs over non-elastic CGRAs are significantly lower than the overhead of superscalar processors over VLIWs, while providing the same benefits. At such moderate costs, elasticity may prove to be one of the key enablers to make the adoption of CGRAs widespread.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"44 1","pages":"171-180"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87537320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57