Proceedings of the IEEE/ACM SC95 Conference最新文献_第5页

Compiling and Optimizing for Decoupled Architectures 解耦体系结构的编译和优化

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224301

N. Topham, A. Rawsthorne, Callum McLean, M. Mewissen, Peter L. Bird

{"title":"Compiling and Optimizing for Decoupled Architectures","authors":"N. Topham, A. Rawsthorne, Callum McLean, M. Mewissen, Peter L. Bird","doi":"10.1145/224170.224301","DOIUrl":"https://doi.org/10.1145/224170.224301","url":null,"abstract":"Decoupled architectures provide a key to the problem of sustained supercomputer performance through their ability to hide large memory latencies. When a program executes in a decoupled mode the perceived memory latency at the processor is zero; effectively the entire physical memory has an access time equivalent to the processor's register file, and latency is completely hidden. However, the asynchronous functional units within a decoupled architecture must occasionally synchronize, incurring a high penalty. The goal of compiling and optimizing for decoupled architectures is to partition the program between the asynchronous functional units in such a way that latencies are hidden but synchronization events are executed infrequently. This paper describes a model for decoupled compilation, and explains the effectiveness of compilation for decoupled systems. A number of new compiler optimizations are introduced and evaluated quantitatively using the Perfect Club scientific benchmarks. We show that with a suitable repertiore of optimizations, it is possible to hide large latencies most of the time for most of the programs in the Perfect Club.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127924457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Message Passing Versus Distributed Shared Memory on Networks of Workstations 消息传递与工作站网络上的分布式共享内存

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224285

Honghui Lu, S. Dwarkadas, A. Cox, W. Zwaenepoel

{"title":"Message Passing Versus Distributed Shared Memory on Networks of Workstations","authors":"Honghui Lu, S. Dwarkadas, A. Cox, W. Zwaenepoel","doi":"10.1145/224170.224285","DOIUrl":"https://doi.org/10.1145/224170.224285","url":null,"abstract":"The message passing programs are executed with the Parallel Virtual Machine (PVM) library and the shared memory programs are executed using TreadMarks. The programs are Water and Barnes-Hut from the SPLASH benchmark suite; 3-D FFT, Integer Sort (IS) and Embarrassingly Parallel (EP) from the NAS benchmarks; ILINK, a widely used genetic linkage analysis program; and Successive Over-Relaxation (SOR), Traveling Salesman (TSP), and Quicksort (QSORT). Two different input data sets were used for Water (Water-288 and Water-1728), IS (IS-Small and IS-Large), and SOR (SOR-Zero and SOR-NonZero). Our execution environment is a set of eight HP735 workstations connected by a 100Mbits per second FDDI network. For Water-1728, EP, ILINK, SOR-Zero, and SOR-NonZero, the performance of TreadMarks is within 10%of PVM. For IS-Small, Water-288, Barnes-Hut, 3-D FFT, TSP, and QSORT, differences are on the order of 10%to 30%. Finally, for IS-Large, PVM performs two times better than TreadMarks. More messages and more data are sent in TreadMarks, explaining the performance differences. This extra communication is caused by 1) the separation of synchronization and data transfer, 2) extra messages to request updates for data by the invalidate protocol used in TreadMarks, 3) false sharing, and 4) diff accumulation for migratory data in TreadMarks.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116302175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 112

A Performance Evaluation of the Convex SPP-1000 Scalable Shared Memory Parallel Computer 凸型SPP-1000可扩展共享内存并行计算机的性能评价

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.285573

T. Sterling, D. Savarese, P. MacNeice, K. Olson, C. Mobarry, B. Fryxell, P. Merkey

引用次数: 5

The Use of Cellular Automata in the Classroom 元胞自动机在课堂中的应用

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224204

H. A. Lilly

引用次数: 7

Astrophysical N-Body Simulations on the GRAPE-4 Special-Purpose Computer 在GRAPE-4专用计算机上的天体物理n体模拟

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224400

J. Makino, M. Taiji

引用次数: 17

Multicast Virtual Topologies for Collective Communication in MPCs and ATM Clusters mpc和ATM集群中集合通信的组播虚拟拓扑

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224188

Y. Huang, Chengchang Huang, P. McKinley

引用次数: 12

Pittsburgh Supercomputing Center High School Initiative in Computational Science Report on Findings School Years: 1991-92, 1992-93, 1993-4 匹兹堡超级计算中心高中计算科学研究报告学年:1991-92、1992-93、1993-4

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224200

C. Porto

引用次数: 1

I/O Limitations in Parallel Molecular Dynamics 并行分子动力学中的I/O限制

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224220

T. Clark, L. R. Scott, S. Wlodek, J. McCammon

引用次数: 3

A Multi-Level Algorithm For Partitioning Graphs 图的多级划分算法

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224228

B. Hendrickson, R. Leland

引用次数: 1300

A Parallel Software Infrastructure for Structured Adaptive Mesh Methods 结构化自适应网格方法的并行软件基础结构

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224283

S. Kohn, S. Baden

引用次数: 31