{"title":"On the h- and p-Versions of the Extrapolated Gordon's Projector with Applications to Elliptic Equations","authors":"J. Hennart, E. Mund","doi":"10.1137/0909052","DOIUrl":"https://doi.org/10.1137/0909052","url":null,"abstract":"This paper outlines a new application of Cordon's blending technique for the finite element approximation of elliptic boundary value problems. The algebraic structure of the discrete blended interpolation projector with its first- (or coarse-) and second- (or fine-) discretization levels suggests making linear combinations of independent calculations. For the interpolation of smooth data, each of the separate calculations yields a low-order result while their combination gives a high-order result. It is conjectured that this property holds for the approximation of regular boundary value problems (BVPs). The algorithm might therefore be viewed as an extrapolation procedure. Two different versions of the algorithm are proposed, which relate to the so-called h- and p-versions of finite element approximations. The computational complexities compare favorably with classical schemes. The implementation on parallel computers is discussed. Numerical results for some bivariate problems (both regular and singular) are presented. They indicate that for smooth problems, the algorithms behave as expected.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121620377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Optimal Circulant Preconditioner for Toeplitz Systems","authors":"T. Chan","doi":"10.1137/0909051","DOIUrl":"https://doi.org/10.1137/0909051","url":null,"abstract":"Given a Toeplitz matrix A, we derive an optimal circulant preconditioner C in the sense of minimizing ${|C - A|}_F $. It is in general different from the one proposed earlier by Strang [“A proposal for Toeplitz matrix calculations,” Stud. Appl. Math., 74(1986), pp. 171–176], except in the case when A is itself circulant. The new preconditioner is easy to compute and in preliminary numerical experiments performs better than Strang's preconditioner in terms of reducing the condition number of $C^{ - 1} A$ and comparably in terms of clustering the spectrum around unity.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124564760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"$LU$ Factorization Algorithms on Distributed-Memory Multiprocessor Architectures","authors":"G. Geist, C. Romine","doi":"10.1137/0909042","DOIUrl":"https://doi.org/10.1137/0909042","url":null,"abstract":"In this paper, we consider the effect that the data-storage scheme and pivoting scheme have on the efficiency of $LU$ factorization on a distributed-memory multiprocessor. Our presentation will focus on the hypercube architecture, but most of our results are applicable to distributed-memory architectures in general. We restrict our attention to two commonly used storage schemes (storage by rows and by columns) and investigate partial pivoting both by rows and by columns, yielding four factorization algorithms. Our goal is to determine which of these four algorithms admits the most efficient parallel implementation. We analyze factors such as load distribution, pivoting cost, and potential for pipelining. We conclude that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting. The two schemes that can be pipelined are pivoting by interchanging rows when the coefficient matrix is distributed to the processors by columns, and pivoting by interchanging columns when the matrix is distributed to the processors by rows.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126754114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Parallel and Vector Variant of the Cyclic Reduction Algorithm","authors":"R. Sweet","doi":"10.1137/0909050","DOIUrl":"https://doi.org/10.1137/0909050","url":null,"abstract":"The Buneman variant of the block cyclic reduction algorithm begins as a highly parallel algorithm, but collapses with each reduction to a very serial one. Using partial fraction expansions of rational matrix functions, it is shown how to regain the parallelism. The resulting algorithm using $n^2 $ processors runs in $O(log ^2 n)$ time.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121974716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Global Error Estimation in the Method of Lines for Parabolic Equations","authors":"M. Berzins","doi":"10.1137/0909045","DOIUrl":"https://doi.org/10.1137/0909045","url":null,"abstract":"A method is described for obtaining an indication of the error in the numerical solution of parabolic partial differential equations using the method of lines. The error indicator is derived by using a combination of existing global error estimating algorithms for initial value problems in ordinary differential equations with estimates for the PDE truncation error. An implementation of the algorithm is described and numerical examples are used to illustrate the reliability of the error estimates that are obtained.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114630911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fast Adaptive Multipole Algorithm for Particle Simulations","authors":"J. Carrier, L. Greengard, V. Rokhlin","doi":"10.1137/0909044","DOIUrl":"https://doi.org/10.1137/0909044","url":null,"abstract":"This paper describes an algorithm for the rapid evaluation of the potential and force fields in systems involving large numbers of particles whose interactions are described by Coulomb's law. Unlike previously published schemes, the algorithm of this paper has an asymptotic CPU time estimate of $O(N)$, where N is the number of particles in the simulation, and does not depend on the statistics of the distribution for its efficient performance. The numerical examples we present indicate that it should be an algorithm of choice in many situations of practical interest.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125346761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finite-Element Methods for the Streamfunction-Vorticity Equations","authors":"M. Gunzburger, J. Peterson","doi":"10.1137/0909043","DOIUrl":"https://doi.org/10.1137/0909043","url":null,"abstract":"Finite-element methods for the approximation of the solution of streamfunction-vorticity equations are considered. Among the issues dealt with are multiply connected domains, the use of low-order elements, the incorporation of a variety of boundary conditions into the methodology, error estimates, and the recovery of the primitive variables. Various numerical examples are also provided.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125223825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of Parallel Methods for a $1024$-Processor Hypercube","authors":"J. Gustafson, G. Montry, R. Benner","doi":"10.1137/0909041","DOIUrl":"https://doi.org/10.1137/0909041","url":null,"abstract":"We have developed highly efficient parallel solutions for three practical, full-scale scientific problems: wave mechanics, fluid dynamics, and structural analysis. Several algorithmic techniques are used to keep communication and serial overhead small as both problem size and number of processors are varied. A new parameter, operation efficiency, is introduced that quantifies the tradeoff between communication and redundant computation. A 1024-processor MIMD ensemble is measured to be 502 to 637 times as fast as a single processor when problem size for the ensemble is fixed, and 1009 to 1020 times as fast as a single processor when problem size per processor is fixed. The latter measure, denoted scaled speedup, is developed and contrasted with the traditional measure of parallel speedup. The scaled-problem paradigm better reveals the capabilities of large ensembles, and permits detection of subtle hardware-induced load imbalances (such as error correction and data-dependent MFLOPS rates) that may become increasingly important as parallel processors increase in node count. Sustained performance for the applications is 70 to 130 MFLOPS, validating the massively parallel ensemble approach as a practical alternative to more conventional processing methods. The techniques presented appear extensible to even higher levels of parallelism than the 1024-processor level explored here.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125131717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Algorithm for the MSAE and the MMAE Regression Problems","authors":"S. Narula, J. F. Wellington","doi":"10.1137/0909047","DOIUrl":"https://doi.org/10.1137/0909047","url":null,"abstract":"In the past quarter century, the minimum sum of absolute errors (MSAE) regression and the minimization of the maximum absolute error (MMAE) regression have attracted much attention as alternatives to the popular least squares regression. For the multiple linear regression, the MSAE and the MMAE regression problems can be formulated and solved as linear programming problems. Several efficient special purpose algorithms have been developed for each problem. Thus one needs two different algorithms and two separate computer codes to find the MSAE and the MMAE regression equations. In this paper, we develop an efficient algorithm to solve both problems. The proposed algorithm exploits the special structure of and the similarities between the problems. We illustrate it with a simple numerical example.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128401847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Direct Solution of Weighted and Equality Constrained Least-Squares Problems","authors":"J. Barlow, S. Handy","doi":"10.1137/0909046","DOIUrl":"https://doi.org/10.1137/0909046","url":null,"abstract":"We consider methods to solve two closely related linear least-squares problems. The first problem is that of minimizing ${|f - Ex|}_2 $ subject to the constraint $Cx = g$. We call this the linear least-squares (LSE) problem. The second is that of minimizing [ left| {left( {begin{array}{*{20}c} {tau g} f end{array} } right) - left( {begin{array}{*{20}c} {tau C} E end{array} } right)x} right|_2 ] for some large weight $tau $. This second problem is called the WLS problem.A column-pivoting strategy based entirely upon the constraint matrix C is developed for solving the weighted least-squares (WLS) problem. This strategy allows the user to perform the factorization of $(begin{array}{*{20}c} {tau C} E end{array} )$ in stable fashion while needing to access no more than one row of E at a time. Moreover, if the matrix E is changed without changing the sparsity pattern or the matrix C, then the pivoting need not be redone. We can simply reuse the same column ordering. This kind of computation frequently arises in optimization contexts.An error analysis of the method is presented. It is shown to be closely related to the error analysis of a procedure attributed to Bjorck and Golub in their solving of the LSE problem. The sparsity properties of the algorithm are demonstrated on some Harwell test matrices.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127988839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}