{"title":"A block-circulant preconditioner for domain decomposition algorithm for the solution of the elliptic problems by second order finite elements","authors":"B. Kiss, G. Molnárka","doi":"10.1016/0956-0521(95)00039-9","DOIUrl":"10.1016/0956-0521(95)00039-9","url":null,"abstract":"<div><p>A preconditioned conjugate gradient domain decomposition method was given Refs 1 and 2 for the solution of a system of linear equations arising in the finite element method applied to the elliptic Dirichlet, Neumann and mixed boundary value problems. We have proved that the construction can be generalized<sup>2</sup> for higher order finite element method. Here we give a construction and theoretical investigation of preconditioners for second order finite elements. A method and the the results of calculation is given. The presented numerical experiments show that this preconditioner works well.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 369-376"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00039-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90824717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualising parallel numerical software performance on a shared memory multiprocessor","authors":"Pete Lee, Chris Phillips","doi":"10.1016/0956-0521(95)00034-8","DOIUrl":"10.1016/0956-0521(95)00034-8","url":null,"abstract":"<div><p>We consider here the use of a software package which can be used to monitor and visualise the behaviour, with respect to data accesses, of parallel software in a multiprocessor environment supporting shared memory. The purpose of such monitoring is two-fold: to aid in the understanding of the behaviour of a given algorithm, and to support the debugging of that software. To illustrate the use of this package we analyse its facilities in connection with a parallel implementation of block Gaussian elimination to solve a system of equations which arises when a certain spectral method is employed to solve an elliptic partial differential equation in two dimensions. We outline the method, indicate the synchronisation mechanisms which are necessary to ensure that the correct sequence of operations take place, and briefly describe the facilities provided by Encore Parallel Fortran which support these mechanisms. We then examine the facilities of the visualisation software and indicate how these were adapted to monitor accesses to a packed storage representation of a block sparse array. Finally we illustrate the use of the software in the context of the solution of a particular partial differential equation.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 351-356"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00034-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72963611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A parallel implementation of an interactive ray-tracing algorithm","authors":"A.Augusto Sousa, F.Nunes Ferreira","doi":"10.1016/0956-0521(95)00037-2","DOIUrl":"10.1016/0956-0521(95)00037-2","url":null,"abstract":"<div><p>One of the most-used rendering algorithms in Computer Graphics is the Ray-Tracing. The “standard” (Whited like) Ray-Tracing is a good rendering algorithm but with a drawback: the time necessary to produce an image is too large (several hours of CPU time are necessary to make a good picture of a moderately sophisticated 3D scene) and the image is only ready to be observed at the end of processing. This kind of situation is difficult to accept in systems where interactivity is the first goal. “Increasing Realism” in Ray-Tracing tries to avoid the problem by supplying the user with a preview of the final image. This preview can be calculated in a considerably shorter time but permits that, with some margin of error, the user can imagine (even see, sometimes) some final effects. With more processing time the image quality continues improving without loss of previous results. The user can, at any time, interrupt the session if the image does not match what he wants. Simultaneously with the above idea, it is necessary to accelerate image production. Parallelism is then justified by the need of more processing power. The aim of this text is to describe the Interactive Ray-Tracing Algorithm implementation, using a parallel architecture based on Transputers. An overview of the architecture used is presented and the main parallel processes and related problems are discussed.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 409-414"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00037-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85788527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The effects on communication of data representation of nested preconditionings for massively parallel architectures","authors":"J.C. Díaz, F. Pradeau","doi":"10.1016/0956-0521(95)00032-1","DOIUrl":"10.1016/0956-0521(95)00032-1","url":null,"abstract":"<div><p>The effect which the representation of the data (matrices and vectors) has on the communication patterns of preconditionings for exploitation of massively parallel architectures is discussed. Preconditioned iterative methods are used to solve the sparse linear systems generated by discretizations of partial differential equations in many areas of science and engineering. The preconditionings considered are based on nested incomplete factorization with approximate tridiagonal inverses using a two color line ordering of the discretization grid. These preconditionings can be described in terms of <em>vector-vector to vector</em> operations of dimension equal to half the total number of grid points.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 437-441"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00032-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75600762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concurrent attribute evaluation","authors":"João Saraiva, Pedro Henriques","doi":"10.1016/0956-0521(95)00028-3","DOIUrl":"10.1016/0956-0521(95)00028-3","url":null,"abstract":"<div><p>This text presents an implementation of a concurrent attribute evaluator system. This system was developed with the main objective of allowing the implementation of several strategies of concurrent attribute evaluation and not to build a faster compiler to a specific case. The system is implemented in a tightly-coupled machine. One realistic compiler was built and the first results are discussed.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 451-457"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00028-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80472537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fernando José Ferreira , Paulo B. Vasconcelos , Filomena D. d'Almeida
{"title":"Performance of a QR algorithm implementation on a multicluster of transputers","authors":"Fernando José Ferreira , Paulo B. Vasconcelos , Filomena D. d'Almeida","doi":"10.1016/0956-0521(95)00025-9","DOIUrl":"10.1016/0956-0521(95)00025-9","url":null,"abstract":"<div><p>Some results of an implementation of the QR factorization by Householder reflectors, on a multicluster transputer system with distributed memory are presented, that show how important is the communication time between processor in the performance of the algorithm. The QR factorization was chosen as test method because it is required for many real life applications, for instance in least squares problems. We use a version of Householder transformation that is the basis for numerically stable QR factorization. The machine used was the MultiCluster 2 model of Parsytec which is distributed memory system with 16 Inmos T800 processors. The Helios operating system was chosen because it provides transparency in CPU management. However it limits the sets of connecting topologies to be used. The results are presented in terms of speedup and efficiency, showing the importance of the communication time on the total elapsed time.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 363-367"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00025-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77247019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory optimization for parallel functional programs","authors":"Balaram Sinharoy , Boleslaw Szymanski","doi":"10.1016/0956-0521(95)00030-5","DOIUrl":"10.1016/0956-0521(95)00030-5","url":null,"abstract":"<div><p>Parallel functional languages use single valued variables to avoid semantically irrelevant data dependence constraints. Programs containing iterations that redefine variables in a procedural language have the corresponding variables declared with additional dimensions in a single assignment language. This extra temporal dimension, unless optimized, requires an exorbitant amount of memory and in parallel programs imposes a large delay between the data producer and consumers. For certain loop arrangements, a window containing a few elements of the dimension can be created. Usually, there are many ways for defining a loop arrangement in an implementation of a functional program and a trade-off between the memory saving and the needed level of parallelism has to be taken into account when selecting the implementation. In this paper we prove that the problem of determining the best loop arrangement by partitioning the dependence graph is NP-hard. In addition, we describe a heuristic for solving this problem. Finally, we present examples of parallel functional programs in which the memory optimization results in reducing the local and shared memory requirements and communication delays.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 415-422"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00030-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80862790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient solution of fluid flow using the generalised conjugate grandient algorithm on a transputer-based machine","authors":"B.A. Tanyi, R.W. Thatcher","doi":"10.1016/0956-0521(95)00026-7","DOIUrl":"10.1016/0956-0521(95)00026-7","url":null,"abstract":"<div><p>The discretisation of the equations governing fluid flow gives rise to coupled, quasi-linear and non-symmetric systems. The solution is usually obtained by iteration using a guess-and-correct procedure where each iteration aims to improve the solution of the previous step. Each step or outer iteration of the process involves the solution of nominally linear algebraic systems. These systems are normally solved using methods based on the Gauss-Seidel iteration—such as the TDMA. However, these methods generally converge very slowly and can be very time consuming for realistic applications. In this paper, these equations are solved using the Generalised Conjugate Gradient (GCG) algorithm with a simple-to-implement Gauss-Seidel-based preconditioner on a distributed memory message-passing machine. We take advantage of the fact that only tentative improvements to the flow-field are sought during each iteration and study the convergence behaviour of the parallel implementation on a multi-processor environment.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 319-324"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00026-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80074528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel Eduardo Correia , Fernando M.A. Silva, Vítor Santos Costa
{"title":"Aurora vs. Muse: a portability study of two or-parallel Prolog systems","authors":"Manuel Eduardo Correia , Fernando M.A. Silva, Vítor Santos Costa","doi":"10.1016/0956-0521(95)00042-9","DOIUrl":"10.1016/0956-0521(95)00042-9","url":null,"abstract":"<div><p>Prolog programs have explicit parallelism, that is, parallelism which can be exploited by a machine with minimal user effort. Or-parallelism is one such form of parallelism, and is particularly useful in that it is present in the many Prolog applications where several alternatives need to be considered. Or-parallelism has been exploited successfully in several systems, and especially in the Aurora and Muse systems. In this paper we analyze the portability of these two parallel systems onto a commercial shared memory parallel computer, a Sun SPARCcenter 2000 with 8 processors, running the Solaris 2.2 Operating System. We also analyze both systems' performance for classical benchmark programs and for two large Prolog applications.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 345-349"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00042-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73563687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Columnwise block LU factorization using blas kernels on VAX 6520/2VP","authors":"Paulo B. Vasconcelos , Filomena D. D'Almeida","doi":"10.1016/0956-0521(95)00049-6","DOIUrl":"10.1016/0956-0521(95)00049-6","url":null,"abstract":"<div><p>The LU factorization of a matrix <em>A</em> is a widely used algorithm, for instance in the solution of linear systems <em>Ax</em> = <em>b</em>. The increasing capacities of high performance computers allow us to use direct methods for systems of large and dense matrices. To build portable and efficient LU codes for vector and parallel computers, this method is rewritten in block versions and BLAS (Basic Linear Algebra Subprograms) kernels are used to mask the architectural details and allow good performance of codes such as the LAPACK (Linear Algebra PACKage) library. In the references it was proved that this strategy leads to portability and efficiency of codes using tuned BLAS kernels. After a short description of the block versions we will present some results obtained on the VAX 6520/2VP, comparing the block algorithm versus point algorithm, and vectorized versions versus scalar versions. The three columnwise versions of the block algorithm showed similar performance for this computer and large matrix dimensions. The block size used is a crucial parameter for these algorithms and the results show that the best performance is obtained with block size 64 (for large matrices) which is the vector registered size of the machine used.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 423-429"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00049-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85349666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}