2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines最新文献_第2页

GraphStep: A System Architecture for Sparse-Graph Algorithms GraphStep:稀疏图算法的系统架构

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.45

Michael DeLorimier, Nachiket Kapre, Nikil Mehta, Dominic Rizzo, I. Eslick, Raphael Rubin, Tomás E. Uribe, T. Knight, A. DeHon

{"title":"GraphStep: A System Architecture for Sparse-Graph Algorithms","authors":"Michael DeLorimier, Nachiket Kapre, Nikil Mehta, Dominic Rizzo, I. Eslick, Raphael Rubin, Tomás E. Uribe, T. Knight, A. DeHon","doi":"10.1109/FCCM.2006.45","DOIUrl":"https://doi.org/10.1109/FCCM.2006.45","url":null,"abstract":"Many important applications are organized around long-lived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, data-dependent portion of the graph for each operation (e.g., the algorithm may need to walk the graph, visiting all nodes, or propagate changes through many nodes in the graph). On conventional microprocessors, the graph structures exceed on-chip cache capacities, making main-memory bandwidth and latency the key performance limiters. To avoid this \"memory wall,\" we introduce a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network. This gives us a scalable way to map these applications so that they can exploit the high-bandwidth and low-latency capabilities of embedded memories (e.g., FPGA Block RAMs). On typical spreading-activation queries on the ConceptNet Knowledge Base, a sample application, this translates into an order of magnitude speedup per FPGA compared to a state-of-the-art Pentium processor","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123947962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 108

Scalable Hardware Architecture for Real-Time Dynamic Programming Applications 实时动态规划应用的可扩展硬件架构

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.61

B. Matthews, I. Elhanany

引用次数: 1

Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths 基于fpga的全对最短路径的软硬件集成

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.48

Uday Bondhugula, A. Devulapalli, James Dinan, Joseph A. Fernando, P. Wyckoff, E. Stahlberg, P. Sadayappan

引用次数: 27

Systematic Characterization of Programmable Packet Processing Pipelines 可编程包处理管道的系统表征

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.67

Michael Attig, G. Brebner

引用次数: 6

A Scalable FPGA-based Multiprocessor 基于fpga的可扩展多处理器

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.17

A. Patel, Christopher A. Madill, Manuel Saldaña, C. Comis, R. Pomès, P. Chow

{"title":"A Scalable FPGA-based Multiprocessor","authors":"A. Patel, Christopher A. Madill, Manuel Saldaña, C. Comis, R. Pomès, P. Chow","doi":"10.1109/FCCM.2006.17","DOIUrl":"https://doi.org/10.1109/FCCM.2006.17","url":null,"abstract":"It has been shown that a small number of FPGAs can significantly accelerate certain computing tasks by up to two or three orders of magnitude. However, particularly intensive large-scale computing applications, such as molecular dynamics simulations of biological systems, underscore the need for even greater speedups to address relevant length and time scales. In this work, we propose an architecture for a scalable computing machine built entirely using FPGA computing nodes. The machine enables designers to implement large-scale computing applications using a heterogeneous combination of hardware accelerators and embedded microprocessors spread across many FPGAs, all interconnected by a flexible communication network. Parallelism at multiple levels of granularity within an application can be exploited to obtain the maximum computational throughput. By focusing on applications that exhibit a high computation-to-communication ratio, we narrow the extent of this investigation to the development of a suitable communication infrastructure for our machine, as well as an appropriate programming model and design flow for implementing applications. By providing a simple, abstracted communication interface with the objective of being able to scale to thousands of FPGA nodes, the proposed architecture appears to the programmer as a unified, extensible FPGA fabric. A programming model based on the MPI message-passing standard is also presented as a means for partitioning an application into independent computing tasks that can be implemented on our architecture. Finally, we demonstrate the first use of our design flow by developing a simple molecular dynamics simulation application for the proposed machine, which runs on a small platform of development boards","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128637759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 64

Parrotfish: Task Distribution in a Low Cost Autonomous ad hoc Sensor Network through Dynamic Runtime Reconfiguration 鹦鹉鱼:基于动态运行时重构的低成本自主自组织传感器网络任务分配

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.56

D. Efstathiou, Konstantinos Kazakos, A. Dollas

引用次数: 5

Enabling a Uniform Programming Model Across the Software/Hardware Boundary 实现跨软件/硬件边界的统一编程模型

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.40

E. Anderson, J. Agron, W. Peck, Jim Stevens, Fabrice Baijot, E. Komp, R. Sass, D. Andrews

{"title":"Enabling a Uniform Programming Model Across the Software/Hardware Boundary","authors":"E. Anderson, J. Agron, W. Peck, Jim Stevens, Fabrice Baijot, E. Komp, R. Sass, D. Andrews","doi":"10.1109/FCCM.2006.40","DOIUrl":"https://doi.org/10.1109/FCCM.2006.40","url":null,"abstract":"In this paper, we present hthreads, a unifying programming model for specifying application threads running within a hybrid CPU/FPGA system. Threads are specified from a single pthreads multithreaded application program and compiled to run on the CPU or synthesized to run on the FPGA. The hthreads system, in general, is unique within the reconfigurable computing community as it abstracts the CPU/FPGA components into a unified custom threaded multiprocessor architecture platform. To support the abstraction of the CPU/FPGA component boundary, we have created the hardware thread interface (HWTI) component that frees the designer from having to specify and embed platform specific instructions to form customized hardware/software interactions. Instead, the hardware thread interface supports the generalized pthreads API semantics, and allows passing of abstract data types between hardware and software threads. Thus the hardware thread interface provides an abstract, platform independent compilation target that enables thread and instruction-level parallelism across the software/hardware boundary","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121671758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

Floating-Point Accumulation Circuit for Matrix Applications 矩阵应用的浮点累加电路

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.41

M.R. Bodnar, J. Humphrey, P. Curt, J. Durbano, D. Prather

{"title":"Floating-Point Accumulation Circuit for Matrix Applications","authors":"M.R. Bodnar, J. Humphrey, P. Curt, J. Durbano, D. Prather","doi":"10.1109/FCCM.2006.41","DOIUrl":"https://doi.org/10.1109/FCCM.2006.41","url":null,"abstract":"Many scientific algorithms require floating-point reduction operations, or accumulations, including matrix-vector-multiply (MVM), vector dot-products, and the discrete cosine transform (DCT). Because FPGA implementations of each of these algorithms are desirable, it is clear that a high-performance, floatingpoint accumulation unit is necessary. However, this type of circuit is difficult to design in an FPGA environment due to the deep pipelining of the floatingpoint arithmetic units, which is needed in order to attain high performance designs (Durbano et al., 2004, Leeser and Wang, 2004). A deep pipeline requires special handling in feedback circuits because of the long delay, which is further complicated by a continuous input data stream. Proposed accumulator architectures, which overcome such performance bottlenecks, are described in Zuo et al. (2005) and Zuo and Prassana (2005). This paper presents a floating-point accumulation circuit that is a natural evolution of this work. The system can handle streams of arbitrary length, requires modest area, and can handle interrupted data inputs. In contrast to the designs proposed by Zhuo et al., the proposed architecture maintains buffers for partial result storage which utilize significantly less embedded memory resources, while maintaining fixed size and speed characteristics, regardless of stream length. The results for both single- and double-precision accumulation architectures was verified in a Virtex-II 8000-4 part clocked at more than 150 MHz, and the power of this design was demonstrated in a computationally intense, matrix-matrix-multiply application","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124995167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Intrinsic Hardware Evolution of Neural Networks in Reconfigurable Analogue and Digital Devices 可重构模拟和数字设备中神经网络的内在硬件演化

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.53

John Maher, Brian McGinley, P. Rocke, F. Morgan

引用次数: 21

DSynth: A Pipeline Synthesis Environment for FPGAs fpga的流水线合成环境

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.37

M. Wirthlin, Welson Sun

引用次数: 2