Proceedings of the IEEE/ACM SC95 Conference最新文献_第3页

Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations 平衡处理器负载和利用n体仿真中的数据局部性

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224306

I. Banicescu, S. F. Hummel

{"title":"Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations","authors":"I. Banicescu, S. F. Hummel","doi":"10.1145/224170.224306","DOIUrl":"https://doi.org/10.1145/224170.224306","url":null,"abstract":"Although N-body simulation algorithms are amenable to parallelization, performance gains from execution on parallel machines are difficult to obtain due to load imbalances caused by irregular distributions of bodies. In general, there is a tension between balancing processor loads and maintaining locality, as the dynamic re-assignment of work necessitates access to remote data. Fractiling is a dynamic scheduling scheme that simultaneously balances processor loads and maintains locality by exploiting the self-similarity properties of fractals. Fractiling is based on a probabilistic analysis, and thus, accommodates load imbalances caused by predictable phenomena, such as irregular data, and unpredictable phenomena, such as data-access latencies. In experiments on a KSR1, performance of N-body simulation codes were improved by as much as 53% by fractiling. Performance improvements were obtained on uniform and nonuniform distributions of bodies, underscoring the need for a scheduling scheme that accommodates system induced variance. As the fractiling scheme is orthogonal to the N-body algorithm, we could use simple codes that discretize space into equal-size subrectangles (2-d) or subcubes (3-d) as the base algorithms.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129224142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 77

A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers 分布式内存多计算机上细粒度语言的混合执行模型

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224302

John Plevyak, V. Karamcheti, Xingbin Zhang, A. Chien

{"title":"A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers","authors":"John Plevyak, V. Karamcheti, Xingbin Zhang, A. Chien","doi":"10.1145/224170.224302","DOIUrl":"https://doi.org/10.1145/224170.224302","url":null,"abstract":"While fine-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their flexibility has generally resulted in poor execution effciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose a hybrid execution model which dynamically adapts to runtime data layout, providing both sequential efficiency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential efficiency is obtained by dynamically coalescing threads via stack-based execution and parallel efficiency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential efficiency comparable to C programs. Experiments with regular and irregular application kernels on the CM5 and T3D demonstrate that it can yield 1.5 to 3 times better performance than code optimized for parallel execution alone.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126211182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Server-Directed Collective I/O in Panda 熊猫中的服务器定向的集体I/O

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224371

K. Seamons, Ying Chen, P. Jones, J. Jozwiak, M. Winslett

引用次数: 247

Predicting Application Behavior in Large Scale Shared-Memory Multiprocessors 预测大规模共享内存多处理器中的应用程序行为

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224356

Karim Harzallah, K. Sevcik

引用次数: 5

Distributed Information Management in the National HPCC Software Exchange 全国HPCC软件交换中的分布式信息管理

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224211

S. Browne, J. Dongarra, G. Fox, K. Hawick, K. Kennedy, R. Stevens, R. Olson, T. Rowan

引用次数: 0

The Living Textbook and the K-12 Classroom of the Future 生活教科书和未来的K-12课堂

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224196

Kim Mills, Geoffrey C. Fox, P. Coddington, Barbara Mihalas, M. Podgorny, Barbara Shelly, Steven Bossert

引用次数: 15

Distributing a Chemical Process Optimization Application Over a Gigabit Network 在千兆网络上分发化学过程优化应用程序

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224310

R. Clay, P. Steenkiste

引用次数: 6

Surface Fitting Using GCV Smoothing Splines on Supercomputers 基于GCV平滑样条的超级计算机曲面拟合

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224192

Alan Williams, K. Burrage

引用次数: 10

Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines 并行机器上快速连接组件算法的性能建模研究

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224275

S. Lumetta, A. Krishnamurthy, D. Culler

{"title":"Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines","authors":"S. Lumetta, A. Krishnamurthy, D. Culler","doi":"10.1145/224170.224275","DOIUrl":"https://doi.org/10.1145/224170.224275","url":null,"abstract":"We present and analyze a portable, high-performance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the Shiloach-Vishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in Split-C and measure performance on the the Cray T3D, the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surface-to-volume ratio, and relative communication cost dominate performance. By understanding the effect of machine characteristics on performance, the study sheds light on the impact of improvements in computational and/or communication performance on this challenging problem.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133309054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Large Eddy Simulation of a Spatially-Developing Boundary Layer 空间发展边界层的大涡模拟

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224408

Xiaohua Wu, K. Squires, T. Lund

{"title":"Large Eddy Simulation of a Spatially-Developing Boundary Layer","authors":"Xiaohua Wu, K. Squires, T. Lund","doi":"10.1145/224170.224408","DOIUrl":"https://doi.org/10.1145/224170.224408","url":null,"abstract":"A method for generation of a three-dimensional, time-dependent turbulent inflow condition for simulation of spatially-developing boundary layers is described. Assuming self-preservation of the boundary layer, a quasi-homogeneous coordinate is defined along which streamwise inhomogeneity is minimized (Spalart 1988). Using this quasi-homogeneous coordinate and decomposition of the velocity into a mean and periodic part, the velocity field at a location near the exit boundary of the computational domain is re-introduced at the in- flow boundary at each time step. The method was tested using large eddy simulations of a flat-plate boundary layer for momentum thickness Reynolds numbers ranging from 1470 to 1700. Subgrid scale stresses were modeled using the dynamic eddy viscosity model of Germano et al. (1991). Simulation results demonstrate that the essential features of spatially-developing turbulent boundary layers are reproduced using the present approach without the need for a prolonged and computationally expensive laminar-turbulent transition region. Boundary layer properties such as skin friction and shape factor as well as mean velocity profiles and turbulence intensities are in good agreement with experimental measurements and results from direct numerical simulation. Application of the method for calculation of spatially-developing complex turbulent boundary layers is also described.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132369789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11