ACM/IEEE SC 2006 Conference (SC'06)最新文献_第3页

Multiple Range Query Optimization with Distributed Cache Indexing 多范围查询优化与分布式缓存索引

ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188560

Beomseok Nam, H. Andrade, A. Sussman

引用次数: 19

Designing a Runtime System for Volunteer Computing 志愿计算运行时系统设计

ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188586

David P. Anderson, C. Christensen, B. Allen

引用次数: 101

Quantifying the Potential Benefit of Overlapping Communication and Computation in Large-Scale Scientific Applications 量化大规模科学应用中重叠通信和计算的潜在效益

ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188585

J. Sancho, K. Barker, D. Kerbyson, K. Davis

{"title":"Quantifying the Potential Benefit of Overlapping Communication and Computation in Large-Scale Scientific Applications","authors":"J. Sancho, K. Barker, D. Kerbyson, K. Davis","doi":"10.1145/1188455.1188585","DOIUrl":"https://doi.org/10.1145/1188455.1188585","url":null,"abstract":"The design and implementation of a high performance communication network are critical factors in determining the performance and cost-effectiveness of a large-scale computing system. The major issues center on the trade-off between the network cost and the impact of latency and bandwidth on application performance. One promising technique for extracting maximum application performance given limited network resources is based on overlapping computation with communication, which partially or entirely hides communication delays. While this approach is not new, there are few studies that quantify the potential benefit of such overlapping for large-scale production scientific codes. We address this with an empirical method combined with a network model to quantify the potential overlap in several codes and examine the possible performance benefit. Our results demonstrate, for the codes examined, that a high potential tolerance to network latency and bandwidth exists because of a high degree of potential overlap. Moreover, our results indicate that there is often no need to use fine-grained communication mechanisms to achieve this benefit, since the major source of potential overlap is found in independent work-computation on which pending messages does not depend. This allows for a potentially significant relaxation of network requirements without a consequent degradation of application performance","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"70 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129654081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 72

Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI 大规模容错MPI的阻塞与非阻塞协调检查点

ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188587

Darius Buntinas, Camille Coti, Thomas Hérault, Pierre Lemarinier, Laurence Pilard, Ala Rezmerita, Eric Rodriguez, F. Cappello

{"title":"Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI","authors":"Darius Buntinas, Camille Coti, Thomas Hérault, Pierre Lemarinier, Laurence Pilard, Ala Rezmerita, Eric Rodriguez, F. Cappello","doi":"10.1145/1188455.1188587","DOIUrl":"https://doi.org/10.1145/1188455.1188587","url":null,"abstract":"A long-term trend in high-performance computing is the increasing number of nodes in parallel computing platforms, which entails a higher failure probability. Fault programming environments should be used to guarantee the safe execution of critical applications. Research in fault tolerant MPI has led to the development of several fault tolerant MPI environments. Different approaches are being proposed using a variety of fault tolerant message passing protocols based on coordinated checkpointing or message logging. The most popular approach is with coordinated checkpointing. In the literature, two different concepts of coordinated checkpointing have been proposed: blocking and non-blocking. However they have never been compared quantitatively and their respective scalability remains unknown. The contribution of this paper is to provide the first comparison between these two approaches and a study of their scalability. We have implemented the two approaches within the MPICH environments and evaluate their performance using the NAS parallel benchmarks","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132395281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 163

A Near-optimal Real-time Hardware Scheduler for Large Cardinality Crossbar Switches 大基数交叉开关近乎最优的实时硬件调度程序

ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188554

R. Hoare, Zhu Ding, A.K. Jones

{"title":"A Near-optimal Real-time Hardware Scheduler for Large Cardinality Crossbar Switches","authors":"R. Hoare, Zhu Ding, A.K. Jones","doi":"10.1145/1188455.1188554","DOIUrl":"https://doi.org/10.1145/1188455.1188554","url":null,"abstract":"The maximum matching algorithm for bipartite graphs can be used to provide optimal scheduling for crossbar based interconnection networks. Unfortunately, maximum matching requires O(N3) time for an N times N communication system, which has limited its application to real-time network scheduling. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By taking advantage of the inherent parallelism available in custom hardware design, we introduce three maximum matching implementations in hardware and show how we can trade design complexity for performance. Specifically, we examine a pure logic scheduler with three dimensions of parallelism, a matrix scheduler with two dimensions of parallelism and a vector scheduler with one dimension of parallelism. These designs reduce the algorithmic time complexity down to O(1), O(K), and O(KN), respectively, where K is the number of optimization steps. While an optimal scheduling algorithm requires K=2N-1 steps, our simulation results show that the scheduler can achieve 99% of the optimal schedule when K=9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024. Using FPGA synthesis results, we show that a greedy schedule for various sized crossbars, ranging from 8 times 8 to 256 times 256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024 times 1024 the scheduling can be completed in approximately 10 s with current technology and could reach under 90 ns with future technologies","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131859556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

PBPI: a High Performance Implementation of Bayesian Phylogenetic Inference PBPI:贝叶斯系统发育推理的高性能实现

ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188535

Xizhou Feng, K. Cameron, D. Buell

引用次数: 33

The Design Space of Data-Parallel Memory Systems 数据并行存储系统的设计空间

ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188540

Jung Ho Ahn, M. Erez, W. Dally

{"title":"The Design Space of Data-Parallel Memory Systems","authors":"Jung Ho Ahn, M. Erez, W. Dally","doi":"10.1145/1188455.1188540","DOIUrl":"https://doi.org/10.1145/1188455.1188540","url":null,"abstract":"Data-parallel memory systems must maintain a large number of outstanding memory references to fully use increasing DRAM bandwidth in the presence of rising latencies. Additionally, throughput is increasingly sensitive to the reference patterns due to the rising latency of issuing DRAM commands, switching between reads and writes, and precharging/activating internal DRAM banks. We study the design space of data-parallel memory systems in light of these trends of increasing concurrency, latency, and sensitivity to access patterns. We perform a detailed performance analysis of scientific and multimedia applications and micro-benchmarks, varying DRAM parameters and the memory-system configuration. We identify the interference between concurrent read and write memory-access threads, and bank conflicts, both within a single thread and across multiple threads, as the most critical factors affecting performance. We then develop hardware techniques to minimize throughput degradation. We advocate either relying on multiple concurrent accesses from a single memory-reference thread only, while sacrificing load-balance, or introducing new hardware to maintain both locality of reference and load-balance between multiple DRAM channels with multiple threads. We show that a low-cost configuration with only 16 channel-buffer entries achieves over 80% of peak throughput in most cases","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124080402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

100 Years of Digital Data 100年的数字数据

ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188531

F. Berman, Robert Chadduck, William G. LeFurgy, D. Atkins, A. Hey

引用次数: 1

CellSs: a Programming Model for the Cell BE Architecture Cell: Cell BE体系结构的编程模型

ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188546

Pieter Bellens, Josep M. Pérez, Rosa M. Badia, J. Labarta

引用次数: 352

High-Performance Dynamic Graphics Streaming for Scalable Adaptive Graphics Environment 用于可伸缩自适应图形环境的高性能动态图形流

ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188568

Byungil Jeong, L. Renambot, R. Jagodic, Rajvikram Singh, Julieta Aguilera, Andrew E. Johnson, J. Leigh

引用次数: 158