Proceedings of the IEEE/ACM SC95 Conference最新文献

筛选
英文 中文
Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations 平衡处理器负载和利用n体仿真中的数据局部性
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224306
I. Banicescu, S. F. Hummel
{"title":"Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations","authors":"I. Banicescu, S. F. Hummel","doi":"10.1145/224170.224306","DOIUrl":"https://doi.org/10.1145/224170.224306","url":null,"abstract":"Although N-body simulation algorithms are amenable to parallelization, performance gains from execution on parallel machines are difficult to obtain due to load imbalances caused by irregular distributions of bodies. In general, there is a tension between balancing processor loads and maintaining locality, as the dynamic re-assignment of work necessitates access to remote data. Fractiling is a dynamic scheduling scheme that simultaneously balances processor loads and maintains locality by exploiting the self-similarity properties of fractals. Fractiling is based on a probabilistic analysis, and thus, accommodates load imbalances caused by predictable phenomena, such as irregular data, and unpredictable phenomena, such as data-access latencies. In experiments on a KSR1, performance of N-body simulation codes were improved by as much as 53% by fractiling. Performance improvements were obtained on uniform and nonuniform distributions of bodies, underscoring the need for a scheduling scheme that accommodates system induced variance. As the fractiling scheme is orthogonal to the N-body algorithm, we could use simple codes that discretize space into equal-size subrectangles (2-d) or subcubes (3-d) as the base algorithms.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129224142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers 分布式内存多计算机上细粒度语言的混合执行模型
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224302
John Plevyak, V. Karamcheti, Xingbin Zhang, A. Chien
{"title":"A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers","authors":"John Plevyak, V. Karamcheti, Xingbin Zhang, A. Chien","doi":"10.1145/224170.224302","DOIUrl":"https://doi.org/10.1145/224170.224302","url":null,"abstract":"While fine-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their flexibility has generally resulted in poor execution effciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose a hybrid execution model which dynamically adapts to runtime data layout, providing both sequential efficiency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential efficiency is obtained by dynamically coalescing threads via stack-based execution and parallel efficiency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential efficiency comparable to C programs. Experiments with regular and irregular application kernels on the CM5 and T3D demonstrate that it can yield 1.5 to 3 times better performance than code optimized for parallel execution alone.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126211182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Server-Directed Collective I/O in Panda 熊猫中的服务器定向的集体I/O
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224371
K. Seamons, Ying Chen, P. Jones, J. Jozwiak, M. Winslett
{"title":"Server-Directed Collective I/O in Panda","authors":"K. Seamons, Ying Chen, P. Jones, J. Jozwiak, M. Winslett","doi":"10.1145/224170.224371","DOIUrl":"https://doi.org/10.1145/224170.224371","url":null,"abstract":"We present the architecture and implementation results for Panda 2.0, a library for input and output of multidimensional arrays on parallel and sequential platforms. Panda achieves remarkable performance levels on the IBM SP2, showing excellent scalability as data size increases and as the number of nodes increases, and provides throughputs close to the full capacity of the AIX file system on the SP2 we used. We argue that this good performance can be traced to Panda's use of server-directed i/o (a logical-level version of disk-directed i/o [Kotz94b]) to perform array i/o using sequential disk reads and writes, a very high level interface for collective i/o requests, and built-in facilities for arbitrary rearrangements of arrays during i/o. Other advantages of Panda's approach are ease of use, easy application portability, and a reliance on commodity system software.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125820135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 247
Predicting Application Behavior in Large Scale Shared-Memory Multiprocessors 预测大规模共享内存多处理器中的应用程序行为
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224356
Karim Harzallah, K. Sevcik
{"title":"Predicting Application Behavior in Large Scale Shared-Memory Multiprocessors","authors":"Karim Harzallah, K. Sevcik","doi":"10.1145/224170.224356","DOIUrl":"https://doi.org/10.1145/224170.224356","url":null,"abstract":"In this paper we present an analytical-based framework for parallel program performance prediction. The main thrust of this work is to provide a means for treating realistic applications within a single unified framework. Our approach is based upon the specification of a set of non-linear equations which describe the application, processor configuration, network and memory operations. These equations are solved iteratively since the application execution rate depends on the communication latencies. The iterative solution technique is found to be efficient as it typically requires only few iterations to reach convergence. Our modeling methodology achieves a good balance between abstraction and accuracy. This is attained by accounting for both time and space dimensions of memory references, while maintaining a simple description of the workload. We demonstrate both the practicality and the accuracy of our approach by comparing predicted results with measurements taken on a commercial multiprocessor system. We found the model to be faithful in reflecting changes in processor speed, and changes in the number and placement of allocated processors.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123660587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Distributed Information Management in the National HPCC Software Exchange 全国HPCC软件交换中的分布式信息管理
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224211
S. Browne, J. Dongarra, G. Fox, K. Hawick, K. Kennedy, R. Stevens, R. Olson, T. Rowan
{"title":"Distributed Information Management in the National HPCC Software Exchange","authors":"S. Browne, J. Dongarra, G. Fox, K. Hawick, K. Kennedy, R. Stevens, R. Olson, T. Rowan","doi":"10.1145/224170.224211","DOIUrl":"https://doi.org/10.1145/224170.224211","url":null,"abstract":"The National HPCC Software Exchange is a collaborative effort by member institutions of the Center for Research on Parallel Computation to provide network access to HPCC-related software, documents, and data. Challenges for the NHSE include identifying, organizing, filtering, and indexing the rapidly growing wealth of relevant information available on the Web. The large quantity of information necessitates performing these tasks using automatic techniques, many of which make use of parallel and distribution computation, but human intervention is needed for intelligent abstracting, analysis, and critical review tasks. Thus, major goals of NHSE research are to find the right mix of manual and automated techniques, and to leverage the results of manual efforts to the maximum extent possible. This paper describes our current information gathering and processing techniques, as well as our future plans for integrating the manual and automated approaches. The NHSE home page is accessible at http://www.netlib.org/nhse/.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126308056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Living Textbook and the K-12 Classroom of the Future 生活教科书和未来的K-12课堂
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224196
Kim Mills, Geoffrey C. Fox, P. Coddington, Barbara Mihalas, M. Podgorny, Barbara Shelly, Steven Bossert
{"title":"The Living Textbook and the K-12 Classroom of the Future","authors":"Kim Mills, Geoffrey C. Fox, P. Coddington, Barbara Mihalas, M. Podgorny, Barbara Shelly, Steven Bossert","doi":"10.1145/224170.224196","DOIUrl":"https://doi.org/10.1145/224170.224196","url":null,"abstract":"The Living Textbook creates a unique learning environment enabling teachers and students to use educational resources on multimedia information servers, supercomputers, parallel databases, and network testbeds. We have three innovative educational software applications running in our laboratory, and under test in the classroom. Our education-focused goal is to learn how new, learner-driven, explorative models of learning can be supported by these high bandwidth, interactive applications and ultimately how they will impact the classroom of the future.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131811320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Distributing a Chemical Process Optimization Application Over a Gigabit Network 在千兆网络上分发化学过程优化应用程序
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224310
R. Clay, P. Steenkiste
{"title":"Distributing a Chemical Process Optimization Application Over a Gigabit Network","authors":"R. Clay, P. Steenkiste","doi":"10.1145/224170.224310","DOIUrl":"https://doi.org/10.1145/224170.224310","url":null,"abstract":"We evaluate the impact of a gigabit network on the implementation of a distributed chemical process optimization application. The optimization problem is formulated as a stochastic Linear Assignment Problem and was solved using the Thinking Machines CM-2 (SIMD) and the Cray C-90 (vector) computers at PSC, and the Intel iWarp (MIMD) system at CMU, connected by the Gigabit Nectar testbed. We report our experience distributing the application across this heterogeneous set of systems and present measurements that show how the communication requirements of the application depend on the structure of the application. We use detailed traces to build an application performance model that can be used to estimate the elapsed time of the application for different computer system and network combinations. Our results show that the application benefits from the high-speed network, and that the need for high network throughput is increasing as computer systems get faster. We also observed that supporting high burst rates is critical, although structuring the application so that communication is overlapped with computation relaxes the bandwidth requirements.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122292906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Surface Fitting Using GCV Smoothing Splines on Supercomputers 基于GCV平滑样条的超级计算机曲面拟合
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224192
Alan Williams, K. Burrage
{"title":"Surface Fitting Using GCV Smoothing Splines on Supercomputers","authors":"Alan Williams, K. Burrage","doi":"10.1145/224170.224192","DOIUrl":"https://doi.org/10.1145/224170.224192","url":null,"abstract":"The task of fitting smoothing spline surfaces to meteorological data such as temperature or rainfall observations is computationally intensive. The Generalised Cross Validation (GCV) smoothing algorithm is O(n³) computationally, and memory requirements are 0(n²). Fitting a spline to a moderately sized data set of, for example. 1080 observations and calculating an output surface grid of dimension 220 × 220 involves approximately 5 billion floating point operations, and takes approximately 19 minutes of execution time on a Sun SPARC2 workstation. Since fitting a surface to data collected from the whole of Australia could conceivably involve data sets with approximately 10000 points, and because it is desirable to be able to fit surfaces of at least 1000 data points in 1 to 5 seconds for use in interactive visualisations, it is crucial to be able to take advantage of supercomputing resources. This paper describes the adaptation of the surface fitting program to different supercomputing platforms, and the results achieved.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114994388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines 并行机器上快速连接组件算法的性能建模研究
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224275
S. Lumetta, A. Krishnamurthy, D. Culler
{"title":"Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines","authors":"S. Lumetta, A. Krishnamurthy, D. Culler","doi":"10.1145/224170.224275","DOIUrl":"https://doi.org/10.1145/224170.224275","url":null,"abstract":"We present and analyze a portable, high-performance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the Shiloach-Vishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in Split-C and measure performance on the the Cray T3D, the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surface-to-volume ratio, and relative communication cost dominate performance. By understanding the effect of machine characteristics on performance, the study sheds light on the impact of improvements in computational and/or communication performance on this challenging problem.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133309054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Large Eddy Simulation of a Spatially-Developing Boundary Layer 空间发展边界层的大涡模拟
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224408
Xiaohua Wu, K. Squires, T. Lund
{"title":"Large Eddy Simulation of a Spatially-Developing Boundary Layer","authors":"Xiaohua Wu, K. Squires, T. Lund","doi":"10.1145/224170.224408","DOIUrl":"https://doi.org/10.1145/224170.224408","url":null,"abstract":"A method for generation of a three-dimensional, time-dependent turbulent inflow condition for simulation of spatially-developing boundary layers is described. Assuming self-preservation of the boundary layer, a quasi-homogeneous coordinate is defined along which streamwise inhomogeneity is minimized (Spalart 1988). Using this quasi-homogeneous coordinate and decomposition of the velocity into a mean and periodic part, the velocity field at a location near the exit boundary of the computational domain is re-introduced at the in- flow boundary at each time step. The method was tested using large eddy simulations of a flat-plate boundary layer for momentum thickness Reynolds numbers ranging from 1470 to 1700. Subgrid scale stresses were modeled using the dynamic eddy viscosity model of Germano et al. (1991). Simulation results demonstrate that the essential features of spatially-developing turbulent boundary layers are reproduced using the present approach without the need for a prolonged and computationally expensive laminar-turbulent transition region. Boundary layer properties such as skin friction and shape factor as well as mean velocity profiles and turbulence intensities are in good agreement with experimental measurements and results from direct numerical simulation. Application of the method for calculation of spatially-developing complex turbulent boundary layers is also described.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132369789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信