Proceedings of the IEEE/ACM SC95 Conference最新文献

筛选
英文 中文
Compiling and Optimizing for Decoupled Architectures 解耦体系结构的编译和优化
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224301
N. Topham, A. Rawsthorne, Callum McLean, M. Mewissen, Peter L. Bird
{"title":"Compiling and Optimizing for Decoupled Architectures","authors":"N. Topham, A. Rawsthorne, Callum McLean, M. Mewissen, Peter L. Bird","doi":"10.1145/224170.224301","DOIUrl":"https://doi.org/10.1145/224170.224301","url":null,"abstract":"Decoupled architectures provide a key to the problem of sustained supercomputer performance through their ability to hide large memory latencies. When a program executes in a decoupled mode the perceived memory latency at the processor is zero; effectively the entire physical memory has an access time equivalent to the processor's register file, and latency is completely hidden. However, the asynchronous functional units within a decoupled architecture must occasionally synchronize, incurring a high penalty. The goal of compiling and optimizing for decoupled architectures is to partition the program between the asynchronous functional units in such a way that latencies are hidden but synchronization events are executed infrequently. This paper describes a model for decoupled compilation, and explains the effectiveness of compilation for decoupled systems. A number of new compiler optimizations are introduced and evaluated quantitatively using the Perfect Club scientific benchmarks. We show that with a suitable repertiore of optimizations, it is possible to hide large latencies most of the time for most of the programs in the Perfect Club.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127924457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Message Passing Versus Distributed Shared Memory on Networks of Workstations 消息传递与工作站网络上的分布式共享内存
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224285
Honghui Lu, S. Dwarkadas, A. Cox, W. Zwaenepoel
{"title":"Message Passing Versus Distributed Shared Memory on Networks of Workstations","authors":"Honghui Lu, S. Dwarkadas, A. Cox, W. Zwaenepoel","doi":"10.1145/224170.224285","DOIUrl":"https://doi.org/10.1145/224170.224285","url":null,"abstract":"The message passing programs are executed with the Parallel Virtual Machine (PVM) library and the shared memory programs are executed using TreadMarks. The programs are Water and Barnes-Hut from the SPLASH benchmark suite; 3-D FFT, Integer Sort (IS) and Embarrassingly Parallel (EP) from the NAS benchmarks; ILINK, a widely used genetic linkage analysis program; and Successive Over-Relaxation (SOR), Traveling Salesman (TSP), and Quicksort (QSORT). Two different input data sets were used for Water (Water-288 and Water-1728), IS (IS-Small and IS-Large), and SOR (SOR-Zero and SOR-NonZero). Our execution environment is a set of eight HP735 workstations connected by a 100Mbits per second FDDI network. For Water-1728, EP, ILINK, SOR-Zero, and SOR-NonZero, the performance of TreadMarks is within 10%of PVM. For IS-Small, Water-288, Barnes-Hut, 3-D FFT, TSP, and QSORT, differences are on the order of 10%to 30%. Finally, for IS-Large, PVM performs two times better than TreadMarks. More messages and more data are sent in TreadMarks, explaining the performance differences. This extra communication is caused by 1) the separation of synchronization and data transfer, 2) extra messages to request updates for data by the invalidate protocol used in TreadMarks, 3) false sharing, and 4) diff accumulation for migratory data in TreadMarks.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116302175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 112
A Performance Evaluation of the Convex SPP-1000 Scalable Shared Memory Parallel Computer 凸型SPP-1000可扩展共享内存并行计算机的性能评价
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.285573
T. Sterling, D. Savarese, P. MacNeice, K. Olson, C. Mobarry, B. Fryxell, P. Merkey
{"title":"A Performance Evaluation of the Convex SPP-1000 Scalable Shared Memory Parallel Computer","authors":"T. Sterling, D. Savarese, P. MacNeice, K. Olson, C. Mobarry, B. Fryxell, P. Merkey","doi":"10.1145/224170.285573","DOIUrl":"https://doi.org/10.1145/224170.285573","url":null,"abstract":"The Convex SPP-1000 is the first commercial implementation of a new generation of scalable shared memory parallel computers with full cache coherence. It employs a hierarchical structure of processing communication and memory name-space management resources to provide a scalableNUMA environment. Ensembles of 8 HP PA-RISC7100 microprocessorsemploy an internal cross-bar switch and directory based cache coherence scheme to provide a tightly coupled SMP.Up to 16 processing ensembles are interconnected by a 4 ring network incorporating a full hardware implementation of the SCI protocol for a full system configuration of 128 processors. This paper presents the findings of a set of empirical studies using both synthetic test codes and full applications for the Earth and space sciences to characterize the performance properties of this new architecture. It is shown that overhead and latencies of global primitive mechanisms, while low in absolute time, are significantly more costly than similar functions local to an individual processor ensemble.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125813592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The Use of Cellular Automata in the Classroom 元胞自动机在课堂中的应用
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224204
H. A. Lilly
{"title":"The Use of Cellular Automata in the Classroom","authors":"H. A. Lilly","doi":"10.1145/224170.224204","DOIUrl":"https://doi.org/10.1145/224170.224204","url":null,"abstract":"The paper explains what a cellular automaton is and why schools would want to integrate the study of cellular automata into their curricula. Examples are given and suggestions for sample exercises follow. Each example is given a title, a discipline to which it relates, a source from which the example or the motivation for the example was taken, and a recommended grade level--middle school or high school. Source code in Microsoft's FORTRAN PowerStation, Version 1.0 is available for all of the examples. Each of the programs show a visualization of a particular cellular automaton over time. A cellular automaton is a modeling tool that can be used in the classroom with either pencil and paper or on computers. Cellular automata can be important in motivating students, reaching students with certain learning styles, helping students develop modeling skills, and in the development of curricula for teaching certain computer technologies.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"01 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127449738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Astrophysical N-Body Simulations on the GRAPE-4 Special-Purpose Computer 在GRAPE-4专用计算机上的天体物理n体模拟
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224400
J. Makino, M. Taiji
{"title":"Astrophysical N-Body Simulations on the GRAPE-4 Special-Purpose Computer","authors":"J. Makino, M. Taiji","doi":"10.1145/224170.224400","DOIUrl":"https://doi.org/10.1145/224170.224400","url":null,"abstract":"We report on resent astrophysical N-body simulations performed on the GRAPE-4 (GRAvity PipE 4) system, a special-purpose computer for astrophysical N-body simulations. We first review the astrophysical motivation, the algorithm, the structure of the GRAPE system, and the actual performance. The GRAPE-4 system consists of 1692 pipeline processors. The peak speed of one pipeline processor is 523 Mflops and that of the total system is 884 Gflops. The performance obtained is 529 Gflops for the simulation of two massive black holes in the core of a galaxy with 700,000 stars.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121464069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Multicast Virtual Topologies for Collective Communication in MPCs and ATM Clusters mpc和ATM集群中集合通信的组播虚拟拓扑
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224188
Y. Huang, Chengchang Huang, P. McKinley
{"title":"Multicast Virtual Topologies for Collective Communication in MPCs and ATM Clusters","authors":"Y. Huang, Chengchang Huang, P. McKinley","doi":"10.1145/224170.224188","DOIUrl":"https://doi.org/10.1145/224170.224188","url":null,"abstract":"This paper defines and describes the properties of a multicast virtual topology, the M-array and a resource-efficient variation, the REM-array. It is shown how several collective operations can be implemented efficiently using these virtual topologies, while maintaining low complexity. Because the methods are applicable to any parallel computing environment that supports multicast communication in hardware, they provide a framework for collective communication libraries that are portable and yet take advantage of such low-level hardware functionality. In particular, the paper describes the practical issues of using these methods in wormhole-routed massively parallel computers (MPCs) and in workstation clusters connected by Asynchronous Transfer Mode (ATM) networks. Performance results are given for both environments.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122772680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Pittsburgh Supercomputing Center High School Initiative in Computational Science Report on Findings School Years: 1991-92, 1992-93, 1993-4 匹兹堡超级计算中心高中计算科学研究报告学年:1991-92、1992-93、1993-4
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224200
C. Porto
{"title":"Pittsburgh Supercomputing Center High School Initiative in Computational Science Report on Findings School Years: 1991-92, 1992-93, 1993-4","authors":"C. Porto","doi":"10.1145/224170.224200","DOIUrl":"https://doi.org/10.1145/224170.224200","url":null,"abstract":"The purpose of the Pittsburgh Supercomputing Center's High School Initiative was to motivate students to pursue careers in science, mathematics, engineering and computer science. The initiative generated excitement among teachers and their students by providing them with the opportunity to work on a project of their choosing using the world's fastest supercomputer — the same machine used by leading researchers working on today's most challenging scientific problems. The program gave teachers the means and support to institutionalize their computational science project into the curriculum so that the impact of the program would continue from year to year with each new class of students.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131301639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
I/O Limitations in Parallel Molecular Dynamics 并行分子动力学中的I/O限制
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224220
T. Clark, L. R. Scott, S. Wlodek, J. McCammon
{"title":"I/O Limitations in Parallel Molecular Dynamics","authors":"T. Clark, L. R. Scott, S. Wlodek, J. McCammon","doi":"10.1145/224170.224220","DOIUrl":"https://doi.org/10.1145/224170.224220","url":null,"abstract":"We discuss data production rates and their impact on the performance of scientific applications using parallel computers. On one hand, too high rates of data production can be overwhelming, exceeding logistical capacities for transfer, storage and analysis. On the other hand, the rate limiting step in a computationally-based study should be the human-guided analysis, not the calculation. We present performance data for a biomolecular simulation of the enzyme, acetylcholinesterase, which uses the parallel molecular dynamics program EulerGROMOS. The actual production rates are compared against a typical time frame for results analysis where we show that the rate limiting step is the simulation, and that to overcome this will require improved output rates.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128988791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Multi-Level Algorithm For Partitioning Graphs 图的多级划分算法
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224228
B. Hendrickson, R. Leland
{"title":"A Multi-Level Algorithm For Partitioning Graphs","authors":"B. Hendrickson, R. Leland","doi":"10.1145/224170.224228","DOIUrl":"https://doi.org/10.1145/224170.224228","url":null,"abstract":"The graph partitioning problem is that of dividing the vertices of a graph into sets of specified sizes such that few edges cross between sets. This NP-complete problem arises in many important scientific and engineering problems. Prominent examples include the decomposition of data structures for parallel computation, the placement of circuit elements and the ordering of sparse matrix computations. We present a multilevel algorithm for graph partitioning in which the graph is approximated by a sequence of increasingly smaller graphs. The smallest graph is then partitioned using a spectral method, and this partition is propagated back through the hierarchy of graphs. A variant of the Kernighan-Lin algorithm is applied periodically to refine the partition. The entire algorithm can be implemented to execute in time proportional to the size of the original graph. Experiments indicate that, relative to other advanced methods, the multilevel algorithm produces high quality partitions at low cost.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123779319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1300
A Parallel Software Infrastructure for Structured Adaptive Mesh Methods 结构化自适应网格方法的并行软件基础结构
Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI: 10.1145/224170.224283
S. Kohn, S. Baden
{"title":"A Parallel Software Infrastructure for Structured Adaptive Mesh Methods","authors":"S. Kohn, S. Baden","doi":"10.1145/224170.224283","DOIUrl":"https://doi.org/10.1145/224170.224283","url":null,"abstract":"Structured adaptive mesh algorithms dynamically allocate computational resources to accurately resolve interesting portions of a numerical calculation. Such methods are difficult to implement and parallelize because they rely on dynamic, irregular data structures. We have developed an efficient, portable, parallel software infrastructure for adaptive mesh methods; our software provides computational scientists with high-level facilities that hide low-level details of parallelism and resource management. We have applied our software infrastructure to the solution of adaptive eigenvalue problems arising in materials design. We describe our software infrastructure and analyze its performance. We also present computational results which indicate that the uniformity restrictions imposed by a data parallel Fortran implementation of a structured adaptive mesh application would significantly impact performance.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128174520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信