Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)最新文献_第6页

Performance adaptive power-aware reconfigurable optical interconnects for high-performance computing (HPC) systems 高性能计算(HPC)系统的性能自适应功率感知可重构光互连

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362631

Avinash Karanth Kodi, A. Louri

{"title":"Performance adaptive power-aware reconfigurable optical interconnects for high-performance computing (HPC) systems","authors":"Avinash Karanth Kodi, A. Louri","doi":"10.1145/1362622.1362631","DOIUrl":"https://doi.org/10.1145/1362622.1362631","url":null,"abstract":"As communication distances and bit rates increase, optoelectronic interconnects are being deployed for designing high-bandwidth low-latency interconnection networks for high performance computing (HPC) systems. While bandwidth scaling with efficient multiplexing techniques (wavelengths, time and space) are available, static assignment of wavelengths can be detrimental to network performance for non-uniform (adversial) workloads. Dynamic bandwidth re-allocation based on actual traffic pattern can lead to improved network performance by utilizing idle resources. While dynamic bandwidth re-allocation (DBR) techniques can alleviate interconnection bottlenecks, power consumption also increases considerably. In this paper, we propose to improve the performance of optical interconnects using DBR techniques and simultaneously optimize the power consumption using Dynamic Power Management (DPM) techniques. DBR, re-allocates idle channels to busy channels (wavelengths) for improving throughput and DPM regulates the bit rates and supply voltages for the individual channels. A reconfigurable opto-electronic architecture and a performance adaptive algorithm for implementing DBR and DPM are proposed in this paper. Our proposed reconfiguration algorithm achieves a significant reduction in power consumption and considerable improvement in throughput with a marginal increase in latency for various traffic patterns.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127372071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements DMTracker:通过检测数据移动中的异常来发现大规模并行程序中的错误

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-10 DOI: 10.1145/1362622.1362643

Qi Gao, Feng Qin, D. Panda

{"title":"DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements","authors":"Qi Gao, Feng Qin, D. Panda","doi":"10.1145/1362622.1362643","DOIUrl":"https://doi.org/10.1145/1362622.1362643","url":null,"abstract":"While software reliability in large-scale systems becomes increasingly important, debugging in large-scale parallel systems remains a daunting task. This paper proposes an innovative technique to find hard-to-detect software bugs that can cause severe problems such as data corruptions and deadlocks in parallel programs automatically via detecting their abnormal behaviors in data movements. Based on the observation that data movements in parallel programs typically follow certain patterns, our idea is to extract data movement (DM)-based invariants at program runtime and check the violations of these invariants. These violations indicate potential bugs such as data races and memory corruption bugs that manifest themselves in data movements. We have built a tool, called DMTracker, based on the above idea: automatically extract DM-based invariants and detect the violations of them. Our experiments with two real-world bug cases in MVAPICH/MVAPICH2, a popular MPI library, have shown that DMTracker can effectively detect them and report abnormal data movements to help programmers quickly diagnose the root causes of bugs. In addition, DMTracker incurs very low runtime overhead, from 0.9% to 6.0%, in our experiments with High Performance Linpack (HPL) and NAS Parallel Benchmarks (NPB), which indicates that DMTracker can be deployed in production runs.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"28 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124551765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

Application development on hybrid systems 混合系统的应用开发

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-10 DOI: 10.1145/1362622.1362690

R. Chamberlain, M. Franklin, Eric J. Tyson, J. Buhler, S. Gayen, P. Crowley, J. Buckley

引用次数: 19

Multi-level tiling: M for the price of one 多层平铺:M为1的价格

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-10 DOI: 10.1145/1362622.1362691

DaeGon Kim, Lakshminarayanan Renganarayanan, D. Rostron, S. Rajopadhye, M. Strout

引用次数: 83

Low-constant parallel algorithms for finite element simulations using linear octrees 利用线性八叉树进行有限元模拟的低常数并行算法

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-10 DOI: 10.1145/1362622.1362656

H. Sundar, R. Sampath, Santi S. Adavani, C. Davatzikos, G. Biros

{"title":"Low-constant parallel algorithms for finite element simulations using linear octrees","authors":"H. Sundar, R. Sampath, Santi S. Adavani, C. Davatzikos, G. Biros","doi":"10.1145/1362622.1362656","DOIUrl":"https://doi.org/10.1145/1362622.1362656","url":null,"abstract":"In this article we propose parallel algorithms for the construction of conforming finite-element discretization on linear octrees. Existing octree-based discretizations scale to billions of elements, but the complexity constants can be high. In our approach we use several techniques to minimize overhead: a novel bottom-up tree-construction and 2:1 balance constraint enforcement; a Golomb-Rice encoding for compression by representing the octree and element connectivity as an Uniquely Decodable Code (UDC); overlapping communication and computation; and byte alignment for cache efficiency. The cost of applying the Laplacian is comparable to that of applying it using a direct indexing regular grid discretization with the same number of elements. Our algorithm has scaled up to four billion octants on 4096 processors on a Cray XT3 at the Pittsburgh Supercomputing Center. The overall tree construction time is under a minute in contrast to previous implementations that required several minutes; the evaluation of the discretization of a variable-coefficient Laplacian takes only a few seconds.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114254025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 65

A job scheduling framework for large computing farms 用于大型计算场的作业调度框架

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-10 DOI: 10.1145/1362622.1362695

R. Baraglia, Gabriele Capannini, Patrizio Dazzi, Giancarlo Pagano

引用次数: 45

Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-07-31 DOI: 10.1145/1362622.1362674

Samuel Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, J. Demmel

{"title":"Optimization of sparse matrix-vector multiplication on emerging multicore platforms","authors":"Samuel Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, J. Demmel","doi":"10.1145/1362622.1362674","DOIUrl":"https://doi.org/10.1145/1362622.1362674","url":null,"abstract":"We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientific study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133312819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 831

Programming bits and atoms 编程位和原子

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2001-12-03 DOI: 10.1145/1362622.1362624

N. Gershenfeld

引用次数: 3