Proceedings of the IEEE/ACM SC95 Conference最新文献

筛选
英文 中文
Interprocedural Compilation of Irregular Applications for Distributed Memory Machines 分布式内存机中不规则应用程序的程序间编译
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224336
G. Agrawal, J. Saltz
{"title":"Interprocedural Compilation of Irregular Applications for Distributed Memory Machines","authors":"G. Agrawal, J. Saltz","doi":"10.1145/224170.224336","DOIUrl":"https://doi.org/10.1145/224170.224336","url":null,"abstract":"Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applications having irregular data access patterns, when coded in such data parallel languages. We have developed an Interprocedural Partial Redundancy Elimination (IPRE) algorithm for optimized placement of runtime preprocessing routine and collective communication routines inserted for managing communication in such codes. We also present two new interprocedural optimizations, placement of scatter routines and use of coalescing and incremental routines. We then describe how program slicing can be used for further applying IPRE in more complex scenarios. We have done a preliminary implementation of the schemes presented here using the Fortran D compilation system as the necessary infrastructure. We present experimental results from two codes compiled using our system to demonstrate the efficacy of the presented schemes.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115689410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear Systems 稀疏线性系统直接解的正向和反向代换并行算法
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224471
Anshul Gupta, Vipin Kumar
{"title":"Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear Systems","authors":"Anshul Gupta, Vipin Kumar","doi":"10.1145/224170.224471","DOIUrl":"https://doi.org/10.1145/224170.224471","url":null,"abstract":"A few parallel algorithms for solving triangular systems resulting from parallel factorization of sparse linear systems have been proposed and implemented recently. We present a detailed analysis of parallel complexity and scalability of the best of these algorithms and the results of its implementation on up to 256 processors of the Cray T3D parallel computer. It has been a common belief that parallel sparse triangular solvers are quite unscalable due to a high communication to computation ratio. Our analysis and experiments show that, although not as scalable as the best parallel sparse Cholesky factorization algorithms, parallel sparse triangular solvers can yield reasonable speedups in runtime on hundreds of processors. We also show that for a wide class of problems, the sparse triangular solvers described in this paper are optimal and are asymptotically as scalable as a dense triangular solver.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126782265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Relative Debugging and its Application to the Development of Large Numerical Models 相对调试及其在大型数值模型开发中的应用
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224350
D. Abramson, Ian T Foster, J. Michalakes, R. Sosič
{"title":"Relative Debugging and its Application to the Development of Large Numerical Models","authors":"D. Abramson, Ian T Foster, J. Michalakes, R. Sosič","doi":"10.1145/224170.224350","DOIUrl":"https://doi.org/10.1145/224170.224350","url":null,"abstract":"Because large scientific codes are rarely static objects, developers are often faced with the tedious task of accounting for discrepancies between new and old versions. In this paper, we describe a new technique called relative debugging that addresses this problem by automating the process of comparing a modified code against a correct reference code. We examine the utility of the relative debugging technique by applying a relative debugger called Guard to a range of debugging problems in a large atmospheric circulation model. Our experience confirms the effectiveness of the approach. Using Guard, we are able to validate a new sequential version of the atmospheric model, and to identify the source of a significant discrepancy in a parallel version in a short period of time.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116227369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Quantum Chromodynamics Simulation on NWT NWT的量子色动力学模拟
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224403
M. Yoshida, A. Nakamura, M. Fukuda, Takashi Nakamura, S. Hioki
{"title":"Quantum Chromodynamics Simulation on NWT","authors":"M. Yoshida, A. Nakamura, M. Fukuda, Takashi Nakamura, S. Hioki","doi":"10.1145/224170.224403","DOIUrl":"https://doi.org/10.1145/224170.224403","url":null,"abstract":"A portable QCD simulation program running on NWT with 128 PE's shows the performance 0.032 micro sec/link update, which is about 189 times faster than a highly optimized code on CRAY X-MP/48 with four processors. The performance corresponds to 178 GFLOPS sustained speed.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128258047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
High-Performance Incremental Scheduling on Massively Parallel Computers — A Global Approach 大规模并行计算机上的高性能增量调度&#8212全球策略
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224358
Minyou Wu, W. Shu
{"title":"High-Performance Incremental Scheduling on Massively Parallel Computers — A Global Approach","authors":"Minyou Wu, W. Shu","doi":"10.1145/224170.224358","DOIUrl":"https://doi.org/10.1145/224170.224358","url":null,"abstract":"Runtime incremental parallel scheduling (RIPS) is a new approach for load balancing. In parallel scheduling, all processors cooperate together to balance the workload. Parallel scheduling accurately balances the load by using global load information. In incremental scheduling, the system scheduling activity alternates with the underlying computation work. RIPS produces high-quality load balancing and adapts to applications of nonuniform structures. This paper presents methods for scheduling a single job on a dedicated parallel machine.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133414439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Lazy Release Consistency for Hardware-Coherent Multiprocessors 硬件相干多处理器的延迟释放一致性
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224398
L. Kontothanassis, M. Scott, R. Bianchini
{"title":"Lazy Release Consistency for Hardware-Coherent Multiprocessors","authors":"L. Kontothanassis, M. Scott, R. Bianchini","doi":"10.1145/224170.224398","DOIUrl":"https://doi.org/10.1145/224170.224398","url":null,"abstract":"Release consistency is a widely accepted memory model for distributed shared memory systems. Eager release consistency represents the state of the art in release consistent protocols for hardware-coherent multiprocessors, while lazy release consistency has been shown to provide better performance for software distributed shared memory (DSM). Several of the optimizations performed by lazy protocols have the potential to improve the performance of hardware-coherent multiprocessors as well, but their complexity has precluded a hardware implementation. With the advent of programmable protocol processors it may become possible to use them after all. We present and evaluate a lazy release-consistent protocol suitable for machines with dedicated protocol processors. This protocol admits multiple concurrent writers, sends write notices concurrently with computation, and delays invalidations until acquire operations. We also consider a lazier protocol that delays sending write notices until release operations. Our results indicate that the first protocol outperforms eager release consistency by as much as 20% across a variety of applications. The lazier protocol, on the other hand, is unable to recoup its high synchronization overhead. This represents a qualitative shift from the DSM world, where lazier protocols always yield performance improvements. Based on our results, we conclude that machines with flexible hardware support for coherence should use protocols based on lazy release consistency, but in a less ''aggressively lazy'' form than is appropriate for DSM.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124813053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
MONSTER — The Ghost in the Connection Machine: Modularity of Neural Systems in Theoretical Evolutionary Research 怪物& # 8212;连接机器中的幽灵:理论进化研究中神经系统的模块化
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224226
Nigel Snoad, T. Bossomaier
{"title":"MONSTER — The Ghost in the Connection Machine: Modularity of Neural Systems in Theoretical Evolutionary Research","authors":"Nigel Snoad, T. Bossomaier","doi":"10.1145/224170.224226","DOIUrl":"https://doi.org/10.1145/224170.224226","url":null,"abstract":"Both genetic algorithms (GAs) and artificial neural networks (ANNs) (connectionist learning models) are effective generalisations of successful biological techniques to the artificial realm. Both techniques are inherently parallel and seem ideal for implementation on the current generation of parallel supercomputers. We consider how the two techniques complement each other and how combining them (i.e. evolving artificial neural networks with a genetic algorithm), may give insights into the evolution of structure and modularity in biological brains. The incorporation of evolutionary and modularity concepts into artificial systems has the potential to decrease the development time of ANNs for specific image and information processing applications. General considerations when genetically encoding ANNs are discussed, and a new encoding method developed, which has the potential to simplify the generation of complex modular networks. The implementation of this technique on a CM-5 parallel supercomputer raises many practical and theoretical questions in the application and use of evolutionary models with artificial neural networks.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129434892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms 在各种架构平台上并行化Navier-Stokes计算
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224410
D. Jayasimha, M. Hayder, S. K. Pillay
{"title":"Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms","authors":"D. Jayasimha, M. Hayder, S. K. Pillay","doi":"10.1145/224170.224410","DOIUrl":"https://doi.org/10.1145/224170.224410","url":null,"abstract":"We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies — the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128295216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Parallel Matrix-Vector Product Using Approximate Hierarchical Methods 并行矩阵向量积的近似层次方法
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224487
A. Grama, Vipin Kumar, A. Sameh
{"title":"Parallel Matrix-Vector Product Using Approximate Hierarchical Methods","authors":"A. Grama, Vipin Kumar, A. Sameh","doi":"10.1145/224170.224487","DOIUrl":"https://doi.org/10.1145/224170.224487","url":null,"abstract":"Matrix-vector products (mat-vecs) form the core of iterative methods used for solving dense linear systems. Often, these systems arise in the solution of integral equations used in electromagnetics, heat transfer, and wave propagation. In this paper, we present a parallel approximate method for computing mat-vecs used in the solution of integral equations. We use this method to compute dense mat-vecs of hundreds of thousands of elements. The combined speedups obtained from the use of approximate methods and parallel processing represent an improvement of several orders of magnitude over exact mat-vecs on uniprocessors. We demonstrate that our parallel formulation incurs minimal parallel processing overhead and scales up to a large number of processors. We study the impact of varying the accuracy of the approximate mat-vec on overall time and on parallel efficiency. Experimental results are presented for 256 processor Cray T3D and Thinking Machines CM5 parallel computers. We have achieved computation rates in excess of 5 GFLOPS on the T3D.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129651477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Index Array Flattening Through Program Transformation 通过程序变换使索引数组平坦化
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224420
R. Das, P. Havlak, J. Saltz, K. Kennedy
{"title":"Index Array Flattening Through Program Transformation","authors":"R. Das, P. Havlak, J. Saltz, K. Kennedy","doi":"10.1145/224170.224420","DOIUrl":"https://doi.org/10.1145/224170.224420","url":null,"abstract":"This paper presents techniques for compiling loops with complex, indirect array accesses into loops whose array references have at most one level of indirection. The transformation allows prefetching of array indices for more efficient structuring of communication on distributed-memory machines. It can also improve performance on other architectures by enabling prefetching of data between levels of the memory hierarchy or exploitation of hardware support for vectorized gather/scatter. Our techniques are implemented in a compiler for Fortran D and execution speed improvements are given for multiprocessor and vector machines.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"02 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127462353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信