D. Reed, D. Padua, Ian T Foster, Dennis Gannon, B. Miller
{"title":"Delphi: an integrated, language-directed performance prediction, measurement and analysis environment","authors":"D. Reed, D. Padua, Ian T Foster, Dennis Gannon, B. Miller","doi":"10.1109/FMPC.1999.750595","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750595","url":null,"abstract":"Despite construction of powerful parallel systems and networked computational grids, achieving a large fraction of peak performance for a range of applications has proven very difficult. In this paper, we describe the components of Delphi, an integrated performance measurement and prediction environment that places system design on a solid performance engineering basis.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"427 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130950447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel algorithms on the rotation-exchange network-a trivalent variant of the star graph","authors":"C. Yeh, Emmanouel Varvarigos","doi":"10.1109/FMPC.1999.750613","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750613","url":null,"abstract":"We investigate a trivalent Cayley graph, which we call the rotation-exchange (RE) network, and present communication algorithms to perform one-to-one routing, single-node broadcasting, multinode broadcasting, and total exchange in it. The RE network can be viewed as a stargraph counterpart to the hypercubic shuffle-exchange network, with the important difference that the RE network is regular and symmetric. We show that RE networks can efficiently embed and emulate star graphs, meshes, hypercubes, cube connected cycles (CCC), pancake graphs, bubble-sort graphs, complete transposition graphs, and the shuffle-exchange permutation graphs. We also show that the performance of RE networks can be significantly improved for a variety of applications if the transmission rate of on-chip links is considerably higher than that of off-chip links.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133921889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Cactus computational collaboratory: enabling technologies for relativistic astrophysics, and a toolkit for solving PDE's by communities in science and engineering","authors":"Gabrielle Allen, T. Goodale, E. Seidel","doi":"10.1109/FMPC.1999.750582","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750582","url":null,"abstract":"We are developing a system for collaborative research and development for a distributed group of researchers at different institutions around the world. In a new paradigm for collaborative computational science, the computer code and supporting infrastructure itself becomes the collaborating instrument, just as an accelerator becomes the collaborating tool for large numbers of distributed researchers in particle physics, The design of this \"Collaboratory\" allows many users, with very different areas of expertise, to work coherently together on distributed computers around the world. Different supercomputers may be used separately, or for problems exceeding the capacity of any single system, multiple supercomputers may be networked together through high speed gigabit networks. Central to this Collaboratory is a new type of community simulation code, called \"Cactus\". The scientific driving force behind this project is the simulation of Einstein's equations for studying black holes, gravitational waves, and neutron stars, which has brought together researchers in very different fields from many groups around the world to make advances in the study of relativity and astrophysics. But the system is also being developed to provide scientists and engineers, without expert knowledge of parallel or distributed computing, mesh refinement, and so on, with a simple framework for solving any system of partial differential equations on many parallel computer systems, from traditional supercomputers to networks of workstations.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133930904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The priority broadcast scheme for dynamic broadcast in hypercubes and related networks","authors":"C. Yeh, Emmanouel Varvarigos, Hua Lee","doi":"10.1109/FMPC.1999.750612","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750612","url":null,"abstract":"Dynamic broadcast is a communication problem where each node in a parallel computer generates packets to be broadcast to all the other nodes according to a certain random process. The lower bound on the average time required by any oblivious dynamic broadcast algorithm in an n-dimensional hypercube is /spl Omega/(n+1/(1-/spl rho/)) when packets are generated according to a Poisson process, where /spl rho/ is the load factor. The best previous algorithms, however only achieve /spl Omega/(n/(1-/spl rho/)) time, which is suboptimal by a factor of /spl Theta/(n). In this paper we propose the priority broadcast scheme for designing dynamic broadcast algorithms that require optimal O(n+1/(1-/spl rho/)) time in an n-dimensional hypercube. We apply the routing scheme to other network topologies, including k-ary n-cubes, meshes, tori, star graphs, generalized hypercubes, as well as any symmetric network, for efficient dynamic broadcast. In particular the algorithms for star graphs, generalized hypercubes, and k-ary n-cubes with k=0(1) are also asymptotically optimal. We also propose a method for assigning priority classes to packets, called optimal priority assignment, which achieves the best possible performance for dynamic multiple broadcast in any network topology.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131172997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data sieving and collective I/O in ROMIO","authors":"R. Thakur, W. Gropp, W. Gropp, E. Lusk","doi":"10.1109/FMPC.1999.750599","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750599","url":null,"abstract":"The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access a noncontiguous data set with a single I/O function call. This feature provides MPI-IO implementations an opportunity to optimize data access. We describe how our MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance. We demonstrate the performance and portability with performance results for three applications-an astrophysics-application template (DIST3D) the NAS BTIO benchmark, and an unstructured code (UNSTRUC)-on five different parallel machines: HP Exemplar IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128714003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}