{"title":"Characterizing the Balance of Parallel 1/0 Systems","authors":"J. French","doi":"10.1109/DMCC.1991.633363","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633363","url":null,"abstract":"High pzrformance 110 subsystems are a key element in parallel computer systems that intend to compete with traditional supercomputers in the solution of large scientific and engineering problems. The hardware and software organization and perfomiance of these U0 subsystems are fundamental issues in parallel computer systems that have not been explored in depth. Two commercially available parallel file systems are the Intel iPSC/2 Concurrent File System (CFS) and the NCUBE/ten NChannel board and disk farm. Both systcms are aimed at support of high volume, large block I/O of the type typical of large scientific computations. The evaluation of these systems has proved difficult. There are many parameters affecting performance and the system dynamics are quite complex. In this paper we examine a method of quantifying the balance of an 1/0 system, that is, how well it services I/O rcquests with respect to fairness and distribution of overheads. One may gauge the degree of balance in a systcrn by asking: When resources become saturated, is the bottleneck felt equally by each process or are some processes given preferential service? This paper explores a simple yardstick of system balance. 1. Quantifying and Measuring Parallel I/O Suppose that we have p processes reading (writing) a file of N bytes in parallel. Each process i reads (writes) N n bytes in time ti where n = -. The individual data P transfer rate of a particular processor i is given by ri = z. The average individual data transfer rate is ti given by 7 = 1 firi. P I = There are at least two reasonable measures of the aggregate data transfer rate of the p processors. In the first case, we sum the data rates of the individual processors. This gives rise to the quantity ri called the maximum sustained aggregate rate (ma-SAR). We call this I = 4 tThis research was supported in part by JPL Contract #957721 and by the Department of Energy under Grant DE-FG05-88ER25063. measure the “maximum” rate because, by construction, it assumes that each processor i contributes a rate ri and all processors contribute during the same time instant, however brief. From the definition of F above, we see that max-SAR = ri = p 7 . (1) I = f i This interpretation is illustrated in Figure l(a). In the second case, we consider that all N bytes move through the system in z = max ti time units. That is, the entire file is not transferred until the slowest processor finishes reading (writing) its partition of the file. This gives rise to the quantity called the minimum sustained aggregate rate (min-SAR). We call this a “minimum” rate since this is the rate that an outside observer will perceive as the rate at which the entire processor ensemble is operating. From the definitions above, we see that 1","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115500873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Routing between Subcubes in a Hypercube","authors":"S. Padmanabhan","doi":"10.1109/DMCC.1991.633149","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633149","url":null,"abstract":"As hypercube sizes and node capabilities increase, applications such as database processing and task management which utilize parallelism within a task and between tasks are becoming important. These applications require a new routing paradigm, where data (or programs) residing an a subcube are transferred to another subcube. In this paper, we describe and analyze an algorithm for routing data from the nodes of a k-dimension subcube to the nodes of any other kdimension subcube in the hypercube. We show that the algorithm enables data transfer between the iwo subcubes to be performed optimally in the current generatie:% direct-connect hypercubes. Also, the algorithm is very simple and can be executed in @(n) steps, where n is the dimension of the hypercube.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126877600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bernd Bruegge, Hiroshi Nishikawa, Peter Steenkiste
{"title":"Computing over Networks: An Illustrated Example","authors":"Bernd Bruegge, Hiroshi Nishikawa, Peter Steenkiste","doi":"10.1109/DMCC.1991.633138","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633138","url":null,"abstract":"With the advances in high-speed networking, partitioning applications over a group of computer systems is becoming an attractive way of exploiting parallelism. Programming general multicomputers is however very challenging: nodes are typically heterogeneous and shared with other users, making the availability of computing cycles on the nodes and communication bandwidth on the network unpredictable, This environment often requires users to use a programming model based on dynamic load balancing. In this paper, we use an flow field generation application to look at the problems that come up in a network environment. We use BEE, a monitoring system that allows programmers to interactively monitor their application, to show the behavior of the program under different conditions.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"710 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121998608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scatter Scheduling for Problems with Unpredictable Structures","authors":"Minyou Wu, W. Shu","doi":"10.1109/DMCC.1991.633106","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633106","url":null,"abstract":"An extended scatter scheduling was applied to problems with unpredictable, asynchronous struc- tures. It has been found that with this simple schedul- ing strategy, good load balance can be reached with- out incurring much runtime overhead. This scheduling algorithm has been implemented on hypercube ma- chines, and its performance is compared with other scheduling strategies.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130363949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ALDIMS - A Language for Programming Distributed Memory Multiprocessors","authors":"K. G. Kumar, D. Kulkarni, A. Basu, A. Paulraj","doi":"10.1109/DMCC.1991.633132","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633132","url":null,"abstract":"In this paper we present ALDIMS, a language that combines the expressibility of general functional (MIMD) parallelism with compact expressibility of data (SPMD) parallelism. It uses distributed data structures for specifying data partitions and single assignment variables as abstract means of inter-process communication. Constructs for unstructured parallelism and process placement specifications make general MIMD parallelism expressible. We describe the issues of implementing process invocation and communication primitive generation. We also discuss source level parallelization and optimization issues and strategies.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132614714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mapping Precedence-Constrained Simulation Tasks for a Parallel Environment","authors":"J. Sartor, G. Lamont, R. Hammell, T. Hartrum","doi":"10.1109/DMCC.1991.633067","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633067","url":null,"abstract":"The Mapping Problem Classical results on the deterministic precedence- constrained scheduling problem are almost exclusively concerned with a single iteration of the task system. This paper explores the problem of mapping deter- ministic tasks to processors in a parallel simulation environment, with each task iterating multiple times. Counterexamples are shown to demonstrate that mul- tiple passes through an optimal mapping for one iter- ation of a task system may produce less-than-optimal results when compared to mappings based on the it- erative nature of the simulation. A level strategy for assigning iterative tasks to processors is developed, and theoretical and experimental results are discussed for different mapping strategies in a VHDL simulation. This paper examines the classical multiprocessor scheduling problem for application to deterministic simulation systems. The tasks in these systems are characterized by iterative executions: each task exe- cutes more than once in the course of a simulation run. The general task scheduling problem and its relation- ship to the mapping problem for simulation tasks are introduced. The problem space is constrained, lim- iting the scope of the study to systems which map equal-execution time tasks into identical processors. A theoretical basis for the level strategy of iterative task assignment is summarized, and a polynomial- time algorithm based on this strategy is given. The results of hypercube experiments based on different mapping strategies are discussed with application to VHDL logic simulation.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133487266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Static Program Assignment in Circuit Switched Multiprocessors","authors":"J. Lindberg","doi":"10.1109/DMCC.1991.633136","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633136","url":null,"abstract":"Exploiting the performance of distributed memory multiprocessors necessitates efficient algorithms for assigning concurrently executable program tasks to processors the well known mapping problem. Faditionally, solutions to the mapping problem have been based on a model of inter-processor communication where communication cost increases linearly with distance, and it is this cost that is the principal determinant ofper$ormance. Therefore, most existing algorithms attempt to find assignments that minimize the graph theoretic distance between communicating processors. In circuit switched multiprocessors, it is typically circuit blocking and not the inter-processor communication latency that dominates. We propose the use of an adaptive variant of simulated annealing to search for an acceptable assignment. This algorithm is useful for determining assignments for multiprocessor architectures implementing both the circuit switched and store-and-forward model of inter-processor communication.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131118851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Parallel Approach To Solving a 3-D Finite Element Problem on a Distributed Memory MIMD Machine","authors":"A. Amin, A. Chaudhary, P. Sadayappan","doi":"10.1109/DMCC.1991.633157","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633157","url":null,"abstract":"'A three-dimensional nonlinear rigid-viscoplastic metal forming finite element package, ALPID-3D, is being developed io run on distributed-memory MZMD parallel computers. Efficient parallelization of the applicarion requires identification and efficient mapping of the compute intensive part of the finite jlement code on the parallel machine. This primarily includes the generation and solution of finite element matrix governing equations within each nonlinear iteration. The Element By Element Preconditioned Conjugate Gradient (EBE-PCG) method is used for solving the finite element matrix equations. An approach to minimizing the communication overhead during the EBE-PCG iterations and timing results are presented.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133865075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multidimensional Spreadsheets in a Graphical Symbolic Debugger for the Ncube","authors":"A. Couch, D.W. Krumme","doi":"10.1109/DMCC.1991.633125","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633125","url":null,"abstract":"We consider the problem of data presentation in a command-oriented debugger for a distributed system. We present an extension of the command syntax found in most serial and several parallel debuggers, whereby a spreadsheet of textual information may be constructed from data of varying types obtained from distributed locations. This spreadsheet is presented to the user in a window which is scrollable in four independent dimensions under keypad control. This extension is implemented in the Seeplane debugger for the Nculie/2.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133792901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Processor-Time Tradeoffs for Cayley Graph Interconnection Networks","authors":"Marc Baumslagt, A. Rosenberg","doi":"10.1109/DMCC.1991.633348","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633348","url":null,"abstract":"We show that every processor array whose interconnection network is based on a Cayley graph of nonso that the graph's underlying group has a nontrivia size subgroup) can be emulated in a workpreserving manner, on general computations, by a (smaller) quotient array. If the underlying group has nontrivial snbgroups of several orders, one thus can choose among several matchups of time and hardware requirements. Our emulations gain efficiency when additional structural uniformity is present.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132642884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}