{"title":"Waveform Krylov Subspace Methods on a Massively Parallel Computer","authors":"W. Luk, O. Wing","doi":"10.1142/S0129053397000076","DOIUrl":"https://doi.org/10.1142/S0129053397000076","url":null,"abstract":"Recently, the waveform generalized minimal residual method (WGMRES) was proposed for solving differential-algebraic equations problems. Based on this, several waveform Krylov subspace methods are developed for comparison. Particularly, we propose using an adjoint operator for the waveform bi-conjugate gradient method and the waveform quasi-minimal residual method. The difficulties of developing the adjoint operator will be addressed. Furthermore, these methods are applied to solve a large sparse linear system of ordinary differential equations arising from a parabolic partial differential equation on a DECmpp 12000/Sx parallel computer for comparison. Numerical results show that the WGMRES method and the waveform bi-conjugate gradient stabilized method can achieve better performance than the conventional waveform relaxation methods.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131177437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Row Based Parallel Gaussian Elimination Algorithm for the Connection Machine CM-2","authors":"S. Noh, Soo-Mook Moon","doi":"10.1142/S0129053397000039","DOIUrl":"https://doi.org/10.1142/S0129053397000039","url":null,"abstract":"This paper presents an algorithm for the Gaussian elimination problem that reduces the length of the critical path compared to the algorithm of Lord et al. This is done by redefining the notion of a task. For all practical purposes, the issues of communication overhead and pivoting cannot be overlooked. We consider these issues for the new algorithm as well. Timing results of this algorithm as executed on the CM-2 model of the Connection Machine are presented. Another contribution of this paper is the use of logical pivoting for stable computation of the Gaussian elimination algorithm. Pivoting is essential in producing stable results. When pivoting occurs, an interchange of two rows is required. A physical interchange of the values can be avoided by providing a permutation vector in a globally accessible location. We show experimental results that substantiate the use of logical pivoting.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115928976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constant Time Algorithms for Graph Connectivity Problems on Reconfigurable Meshes Using Fewer Processors","authors":"T. Kao, S. Horng, Yi-Hong Guo","doi":"10.1142/S0129053396000215","DOIUrl":"https://doi.org/10.1142/S0129053396000215","url":null,"abstract":"This paper makes an efficient improvement of processor complexity while solving some connectivity problems on a reconfigurable meshes. We first derive two constant time algorithms in the proposed parallel processing system for computing the dominators and the dominator tree of an undirected graph either using a 3-D n×n×n processors or a 2-D n2×n2 processors, where n is the number of vertices of the graph. Then based on the dominator tree algorithm, we also solve many other graph connectivity problems in a constant time. They are the articulation points, bridges, biconnected components, and bridge-connected components problem in undirected graphs.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134443804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Horng-Ren Tsai, S. Horng, Shung-Shing Lee, S. Tsai, T. Kao
{"title":"Parallel Sorting Algorithms on a Hyper-Channel Broadcast Communication Model","authors":"Horng-Ren Tsai, S. Horng, Shung-Shing Lee, S. Tsai, T. Kao","doi":"10.1142/S0129053396000173","DOIUrl":"https://doi.org/10.1142/S0129053396000173","url":null,"abstract":"This paper presents a new improved architecture, named a hyper-channel broadcast communication model, as a computational model. The hyper-channel broadcast communication model consists of processors shared by some channels, and there are no local links between processors. Based on such an improved architecture, we first design two O(log N) cycles basic operations for finding the maximum/minimum of N real numbers and the ranks of a linked list using N and N×N processors, respectively. Then based on these proposed operations, two O(log N) cycles sorting algorithms are derived by either using N×N processors for the concurrent-write case or using N×N×N processors for the conflict-free case, respectively.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124906334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Parallel Three-Dimensional Incompressible Navier-Stokes Solver with a Parallel Multigrid Kernel","authors":"J. Lou, R. Ferraro","doi":"10.1142/S0129053396000185","DOIUrl":"https://doi.org/10.1142/S0129053396000185","url":null,"abstract":"The development and applications of a parallel, time-dependent, three-dimensional incompressible Navier-Stokes flow solver and a parallel multigrid elliptic kernel are described. The flow solver is based on a second-order projection method applied to a staggered finite-difference grid. The multigrid algorithms implemented in the parallel elliptic kernel, which is used by the flow solver, are V-cycle and full V-cycle schemes. A grid-partition strategy is used in the parallel implementations of both the flow solver and the multigrid kernel on all fine and coarse grids. Numerical experiments and parallel performance measurements show the parallel solver package is numerically stable, physically robust and computationally efficient. Both the multigrid kernel and the flow solver scale well to a large number of processors on Intel Paragon and Cray T3D/T3E for two-and three-dimensional problems with moderate granularity. The solver package has been carefully designed and implemented so that it can be easily adapted to solve a variety of interesting scientific and engineering flow problems. The code is portable to parallel computers that support MPI, PVM and NX for interprocessor communications.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130246339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Genetic Task Allocation Algorithm for Distributed Computing Systems Incorporating Problem Specific Knowledge","authors":"A. Tripathi, D. P. Vidyarthi, A. Mantri","doi":"10.1142/S0129053396000203","DOIUrl":"https://doi.org/10.1142/S0129053396000203","url":null,"abstract":"Distributed Computing Systems (DCS) promise a convenient platform for parallel processing and consequently can be expected to provide highly improved throughput and turnaround characteristics for all types of computing jobs. Task allocation in DCS remains to be an important and relevant problem attracting the attention of researchers in the discipline. Genetic Algorithms (GA) have successfully been used to solve various optimization problems. A GA based task allocation model for multiprocessors has been proposed by Hou, Ansari & Ren [3]. We present a Genetic Task Allocation Algorithm for DCS, wherein we have considered the underlying interconnection network of the processors, communication requirements among modules of the tasks apart from the precedence relation of the task graph that has been considered in [3] also. We have also considered multiprogramming at every processing nodes with related characteristic values. We have, intentionally, made use of the finding [4] that the incorporation of the problem specific knowledge in construction of GAs improves the initial population structures. The model and algorithm proposed by us is sufficiently simple and adequately usable for the purpose of simulation experiments and its possible incorporation in future operating systems of DCS.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117215913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and Performance Analysis of Multistage Interconnection Networks Using a Recursive Multicast Algorithm","authors":"Jaehyung Park, H. Yoon","doi":"10.1142/S0129053396000197","DOIUrl":"https://doi.org/10.1142/S0129053396000197","url":null,"abstract":"In this paper, we study issues of the multicast communication in the multistage interconnection networks (MINs) for large-scale multicomputers. In addition to point-to-point communication among processing nodes, efficient collective communication is critical to the performance of multicomputers. Multicast communication in which the same packet is delivered from a source node to an arbitrary number of destination nodes is fundamental in supporting collective communication primitives including broadcast, reduction, and barrier synchronization operations. This paper presents a new approach to support multicast communication, on the basis of a restricted address encoding scheme which constructs a short fixed-size multicast header, and a recursive scheme that recycles a multicast packet one or more times through the MIN to send it to the desired destination nodes. We propose a recursive multicast algorithm which provides deadlock-freedom for multiple multicast packets in MIN-based multicomputers. We also present performance model for the unbuffered MIN using the multicast algorithm and analyze its performance in terms of the network throughput, where several multicast communications are considered. The proposed algorithm can be easily applied to wormhole or virtual cut-through MIN-based multicomputers.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126668034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Performance Adviser for the Development of Parallel Programs","authors":"Kei-Chun Li, Kang Zhang","doi":"10.1142/S0129053396000136","DOIUrl":"https://doi.org/10.1142/S0129053396000136","url":null,"abstract":"The increasing complexity of parallel computing systems has brought about a crisis in parallel performance evaluation and tuning. Tools for performance measurement and visualization become necessary parts of programming environments for parallel computers. In this paper we describe a tool — which we call the Performance Adviser — that offers two different levels of performance information (high level and low level), guides the users to specific problem areas in the source code, and suggests actions to improve the performance of their parallel programs. Working behind the Performance Adviser is an expert system which derives high level concepts from the source code and a critical path analysis metric which derives low level performance information from the performance data collected in the execution of the program.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125830227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparison of Vectorizable Discrete Sampling Methods in Monte Carlo Applications","authors":"R. Sarno, V. Bhavsar, E. Hussein","doi":"10.1142/S0129053396000161","DOIUrl":"https://doi.org/10.1142/S0129053396000161","url":null,"abstract":"The performance of various vectorizable discrete random-sampling methods, along with the commonly used inverse sampling method, is assessed on a vector machine. Monte Carlo applications involving, one-dimensional, two-dimensional and multi-dimensional probability tables are used in the investigation. Various forms of the weighted sampling method and methods that transform the original probability table are examined. It is found that some form of weighted sampling is efficient, when the original probability distribution is not far from uniform or can be approximated analytically. Table transformation methods, though requiring additional memory storage, are best suited in applications where multidimensional tables are involved.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124165037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mapping Linear Recurrence Equations onto Systolic Architectures","authors":"Ladan Kazerouni, B. Rajan, R. Shyamasundar","doi":"10.1142/S0129053396000148","DOIUrl":"https://doi.org/10.1142/S0129053396000148","url":null,"abstract":"In this paper, we describe a methodology for mapping normal linear recurrence equations onto a spectrum of systolic architectures. First, we provide a method for mapping a system of directed recurrence equations, a subclass of linear recurrence equations, onto a very general architecture referred to as basic systolic architecture and establish correctness of the implementation. We also show how efficient transformations/implementations of programs for different systolic architectures can be obtained through transformations such as projections and translations. Next, we show that the method can be applied for the class of normal linear recurrence equations using the method for the class of directed recurrence equations. Finally, we provide a completely automated procedure called cubization to achieve better performance while mapping such equations. The method is illustrated with examples and a comparative evaluation is made with other works.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116435125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}