{"title":"High Performance Visualization of Time-Varying Volume Data over a Wide-Area Network","authors":"K. Ma, David Camp","doi":"10.1109/SC.2000.10000","DOIUrl":"https://doi.org/10.1109/SC.2000.10000","url":null,"abstract":"This paper presents an end-to-end, low-cost solution for visualizing time-varying volume data rendered on a parallel computer located at a remote site. Pipelining and careful grouping of processors are used to hide I/O time and to maximize processors utilization. Compression is used to significantly cut down the cost of transferring output images from the parallel computer to a display device through a widearea network. This complete rendering pipeline makes possible highly efficient rendering and remote viewing of high resolution time-varying data sets in the absence of high-speed network and parallel I/O support. To study the performance of this rendering pipeline and to demonstrate high-performance remote visualization, tests were conducted on a PC cluster in Japan as well as an SGI Origin 2000 operated at the NASA Ames Research Center with the display located at UC Davis.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132403914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Architectural and Performance Evaluation of GigaNet and Myrinet Interconnects on Clusters of Small-Scale SMP Servers","authors":"J. Hsieh, T. Leng, V. Mashayekhi, R. Rooholamini","doi":"10.1109/SC.2000.10027","DOIUrl":"https://doi.org/10.1109/SC.2000.10027","url":null,"abstract":"GigaNet and Myrinet are two of the leading interconnects for clusters of commodity computer systems. Both provide memory-protected user-level network interface access, and deliver low-latency and high-bandwidth communication to applications. GigaNet is a connection-oriented interconnect based on a hardware implementation of Virtual Interface (VI) Architecture and Asynchronous Transfer Mode (ATM) technologies. Myrinet is a connection-less interconnect which leverages packet switching technologies from experimental Massively Parallel Processors (MPP) networks. This paper investigates their architectural differences and evaluates their performance on two commodity clusters based on two generations of Symmetric Multiple Processors (SMP) servers. The performance measurements reported here suggest that the implementation of Message Passing Interface (MPI) significantly affects the cluster performance. Although MPICH-GM over Myrinet demonstrates lower latency with small messages, the polling-driven implementation of MPICH-GM often leads to tight synchronization between communication processes and higher CPU overhead.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114417659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Pormann, C. Henriquez, J. Board, D. Rose, D. Harrild, Alexandra P. Henriquez
{"title":"Computer Simulations of Cardiac Electrophysiology","authors":"J. Pormann, C. Henriquez, J. Board, D. Rose, D. Harrild, Alexandra P. Henriquez","doi":"10.1109/SC.2000.10032","DOIUrl":"https://doi.org/10.1109/SC.2000.10032","url":null,"abstract":"CardioWave is a modular system for simulating wavefront conduction in the heart. These simulations may be used to investigate the factors that generate and sustain life-threatening arrhythmias such as ventricular fibrillation. The user selects a set of modules which most closely reflects the simulation they are interested in and the simulator is built automatically. Thus, we do not present one monolithic simulator, but rather a simulator-generator which allows the researcher to make the trade-offs of complexity versus performance. The results presented here are from simulations run on an IBM SP parallel computer and a cluster of workstations. The performance numbers show excellent scalability up through 128 processors. With the larger memory of the parallel machines, we have been able to perform highly realistic simulations of the human atria. These simulations include realistic, 3-D geometries with inhomogeneity and anisotropy as we as highly complex membrane dynamics.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125269690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Fault-Tolerant Distributed Shared Memory","authors":"F. Sultan, Thu D. Nguyen, L. Iftode","doi":"10.1109/SC.2000.10014","DOIUrl":"https://doi.org/10.1109/SC.2000.10014","url":null,"abstract":"This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a home-based lazy release consistency (HLRC) DSM system with independent check- pointing and logging to volatile memory, targeting shared-memory computing on very large LAN-based clusters. In these environments, where global coordination may be expensive, independent checkpointing becomes critical to scalability. However, independent checkpointing is only practical if we can control the size of the log and checkpoints in the absence of global coordination. In this paper we describe the design of our fault-tolerant DSM system and present our solutions to the problems of checkpoint and log management. We also present experimental results showing that our fault tolerance support is light-weight, adding only low messaging, logging and checkpointing overheads, and that our management algorithms can be expected to effectively bound the size of the checkpoints and logs or real applications.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121084631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Randomization, Speculation, and Adaptation in Batch Schedulers","authors":"Dejan Perkovic, P. Keleher","doi":"10.5555/370049.370063","DOIUrl":"https://doi.org/10.5555/370049.370063","url":null,"abstract":"This paper proposes extensions to the backfilling job-scheduling algorithm that significantly improve its performance. We introduce variations that sort the \"backfilling order\" in priority-based and randomized fashions. We examine the effectiveness of guarantees present in conservative backfilling and find that initial guarantees have limited practical value, while the performance of a \"no-guarantee\" algorithm can be significantly better when combined with extensions that we introduce. Our study differs from many similar studies in using traces that contain user estimates. We find that actual overestimates are large and significantly different from simple models. We propose the use of speculative backfilling and speculative test runs to counteract these large overestimations. Finally, we explore the impact of dynamic, system-directed adaptation of application parallelism. The cumulative improvements of these techniques decrease the bounded slowdown, our primary metric, to less then 15% of conservative backfilling.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121290806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PARALLEL UNSTEADY TURBOPUMP SIMULATIONS FOR LIQUID ROCKET ENGINES","authors":"C. Kiris, D. Kwak, W. Chan","doi":"10.1109/SC.2000.10053","DOIUrl":"https://doi.org/10.1109/SC.2000.10053","url":null,"abstract":"This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. The Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI, hybrid MPI/Open-MP, and MLP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Unsteady computations for SSME turbo-pump, which contains 101 zones with 31 Million grid points, are carried on Origin 2000 systems at NASA Ames Research Center. The approach taken for these simulations, and the performance of the parallel versions of the code are presented.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129610697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatically Tuned Collective Communications","authors":"Sathish S. Vadhiyar, G. Fagg, J. Dongarra","doi":"10.1109/SC.2000.10024","DOIUrl":"https://doi.org/10.1109/SC.2000.10024","url":null,"abstract":"The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, network parameters and the storage capacity of the underlying MPI implementation. In this paper, we discuss an approach in which the collective communications are tuned for a given system by conducting a series of experiments on the system. We also discuss a dynamic topology method that uses the tuned static topology shape, but re-orders the logical addresses to compensate for changing run time variations. A series of experiments were conducted comparing our tuned collective communication operations to various native vendor MPI implementations. The use of the tuned collective communications resulted in about 30%-650% improvement in performance over the native MPI implelementations.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114264329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maozhen Li, O. Rana, Matthew S. Shields, D. Walker
{"title":"A Wrapper Generator for Wrapping High Performance Legacy Codes as Java/CORBA Components","authors":"Maozhen Li, O. Rana, Matthew S. Shields, D. Walker","doi":"10.5555/370049.370072","DOIUrl":"https://doi.org/10.5555/370049.370072","url":null,"abstract":"This paper describes a Wrapper Generator for wrapping high performance legacy codes as Java/CORBA components for use in a distributed component-based problem- solving environment. Using the Wrapper Generator we ave automatically wrapped an MPI-based legacycode as a single CORBA object, and implemented a problem- solving environment for molecular dynamic simulations. Performance comparisons between runs of the CORBA object and the original legacy code on a cluster of workstations and on a parallel computer are also presented.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130302471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Humphreys, I. Buck, Matthew Eldridge, P. Hanrahan
{"title":"Distributed Rendering for Scalable Displays","authors":"G. Humphreys, I. Buck, Matthew Eldridge, P. Hanrahan","doi":"10.1109/SC.2000.10003","DOIUrl":"https://doi.org/10.1109/SC.2000.10003","url":null,"abstract":"We describe a novel distributed graphics system that allows an application to render to a large tiled display. Our system, called WireGL, uses a cluster of off-the-shelf PCs connected with a high-speed network. WireGL allows an unmodified existing application to achieve scalable output resolution on such a display. This paper presents an efficient sorting algorithm which minimizes the network traffic for a scalable display. We will demonstrate that for most applications, our system provides scalable output resolution with minimal performance impact.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133731628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Bustamante, G. Eisenhauer, K. Schwan, Patrick M. Widener
{"title":"Efficient Wire Formats for High Performance Computing","authors":"F. Bustamante, G. Eisenhauer, K. Schwan, Patrick M. Widener","doi":"10.1109/SC.2000.10046","DOIUrl":"https://doi.org/10.1109/SC.2000.10046","url":null,"abstract":"High performance computing is being increasingly utilized in non-traditional circumstances where it must interoperate with other applications. For example, online visualization is being used to monitor the progress of applications, and real-world sensors are used as inputs to simulations. Whenever these situations arise, there is a question of what communications infrastructure should be used to link the different components. Traditional HPC-style communications systems such as MPI offer relatively high performance, but are poorly suited for developing these less tightly-coupled cooperating applications. Object-based systems and meta-data formats like XML offer substantial plug-and-play flexibility, but with substantially lower performance. We observe that the flexibility and baseline performance of all these systems is strongly determined by their `wire format', or how they represent data for transmission in a heterogeneous environment. We examine the performance implications of different wire formats and present an alternative with significant advantages in terms of both performance and flexibility.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"281 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126017700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}