Computing Systems in Engineering最新文献_第3页

A block-circulant preconditioner for domain decomposition algorithm for the solution of the elliptic problems by second order finite elements 二阶有限元求解椭圆型问题的区域分解算法的块循环预条件

Computing Systems in Engineering Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00039-9

B. Kiss, G. Molnárka

引用次数: 1

Visualising parallel numerical software performance on a shared memory multiprocessor 共享内存多处理器上并行数值软件性能的可视化

Computing Systems in Engineering Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00034-8

Pete Lee, Chris Phillips

{"title":"Visualising parallel numerical software performance on a shared memory multiprocessor","authors":"Pete Lee, Chris Phillips","doi":"10.1016/0956-0521(95)00034-8","DOIUrl":"10.1016/0956-0521(95)00034-8","url":null,"abstract":"<div><p>We consider here the use of a software package which can be used to monitor and visualise the behaviour, with respect to data accesses, of parallel software in a multiprocessor environment supporting shared memory. The purpose of such monitoring is two-fold: to aid in the understanding of the behaviour of a given algorithm, and to support the debugging of that software. To illustrate the use of this package we analyse its facilities in connection with a parallel implementation of block Gaussian elimination to solve a system of equations which arises when a certain spectral method is employed to solve an elliptic partial differential equation in two dimensions. We outline the method, indicate the synchronisation mechanisms which are necessary to ensure that the correct sequence of operations take place, and briefly describe the facilities provided by Encore Parallel Fortran which support these mechanisms. We then examine the facilities of the visualisation software and indicate how these were adapted to monitor accesses to a packed storage representation of a block sparse array. Finally we illustrate the use of the software in the context of the solution of a particular partial differential equation.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 351-356"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00034-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72963611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A parallel implementation of an interactive ray-tracing algorithm 交互式光线跟踪算法的并行实现

Computing Systems in Engineering Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00037-2

A.Augusto Sousa, F.Nunes Ferreira

{"title":"A parallel implementation of an interactive ray-tracing algorithm","authors":"A.Augusto Sousa, F.Nunes Ferreira","doi":"10.1016/0956-0521(95)00037-2","DOIUrl":"10.1016/0956-0521(95)00037-2","url":null,"abstract":"<div><p>One of the most-used rendering algorithms in Computer Graphics is the Ray-Tracing. The “standard” (Whited like) Ray-Tracing is a good rendering algorithm but with a drawback: the time necessary to produce an image is too large (several hours of CPU time are necessary to make a good picture of a moderately sophisticated 3D scene) and the image is only ready to be observed at the end of processing. This kind of situation is difficult to accept in systems where interactivity is the first goal. “Increasing Realism” in Ray-Tracing tries to avoid the problem by supplying the user with a preview of the final image. This preview can be calculated in a considerably shorter time but permits that, with some margin of error, the user can imagine (even see, sometimes) some final effects. With more processing time the image quality continues improving without loss of previous results. The user can, at any time, interrupt the session if the image does not match what he wants. Simultaneously with the above idea, it is necessary to accelerate image production. Parallelism is then justified by the need of more processing power. The aim of this text is to describe the Interactive Ray-Tracing Algorithm implementation, using a parallel architecture based on Transputers. An overview of the architecture used is presented and the main parallel processes and related problems are discussed.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 409-414"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00037-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85788527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The effects on communication of data representation of nested preconditionings for massively parallel architectures 大规模并行体系结构中嵌套预处理对数据表示的影响

Computing Systems in Engineering Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00032-1

J.C. Díaz, F. Pradeau

引用次数: 0

Concurrent attribute evaluation 并发属性求值

Computing Systems in Engineering Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00028-3

João Saraiva, Pedro Henriques

引用次数: 3

Performance of a QR algorithm implementation on a multicluster of transputers 在多集群转发器上实现QR算法的性能

Computing Systems in Engineering Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00025-9

Fernando José Ferreira , Paulo B. Vasconcelos , Filomena D. d'Almeida

引用次数: 0

Memory optimization for parallel functional programs 并行函数程序的内存优化

Computing Systems in Engineering Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00030-5

Balaram Sinharoy , Boleslaw Szymanski

{"title":"Memory optimization for parallel functional programs","authors":"Balaram Sinharoy , Boleslaw Szymanski","doi":"10.1016/0956-0521(95)00030-5","DOIUrl":"10.1016/0956-0521(95)00030-5","url":null,"abstract":"<div><p>Parallel functional languages use single valued variables to avoid semantically irrelevant data dependence constraints. Programs containing iterations that redefine variables in a procedural language have the corresponding variables declared with additional dimensions in a single assignment language. This extra temporal dimension, unless optimized, requires an exorbitant amount of memory and in parallel programs imposes a large delay between the data producer and consumers. For certain loop arrangements, a window containing a few elements of the dimension can be created. Usually, there are many ways for defining a loop arrangement in an implementation of a functional program and a trade-off between the memory saving and the needed level of parallelism has to be taken into account when selecting the implementation. In this paper we prove that the problem of determining the best loop arrangement by partitioning the dependence graph is NP-hard. In addition, we describe a heuristic for solving this problem. Finally, we present examples of parallel functional programs in which the memory optimization results in reducing the local and shared memory requirements and communication delays.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 415-422"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00030-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80862790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Efficient solution of fluid flow using the generalised conjugate grandient algorithm on a transputer-based machine 在基于传输器的机器上应用广义共轭扩张算法求解流体流动

Computing Systems in Engineering Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00026-7

B.A. Tanyi, R.W. Thatcher

引用次数: 0

Aurora vs. Muse: a portability study of two or-parallel Prolog systems Aurora vs. Muse:两个或并行Prolog系统的可移植性研究

Computing Systems in Engineering Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00042-9

Manuel Eduardo Correia , Fernando M.A. Silva, Vítor Santos Costa

引用次数: 2

Columnwise block LU factorization using blas kernels on VAX 6520/2VP 在VAX 6520/2VP上使用blas内核的列式块LU分解

Computing Systems in Engineering Pub Date : 1995-08-01 DOI: 10.1016/0956-0521(95)00049-6

Paulo B. Vasconcelos , Filomena D. D'Almeida

{"title":"Columnwise block LU factorization using blas kernels on VAX 6520/2VP","authors":"Paulo B. Vasconcelos , Filomena D. D'Almeida","doi":"10.1016/0956-0521(95)00049-6","DOIUrl":"10.1016/0956-0521(95)00049-6","url":null,"abstract":"<div><p>The LU factorization of a matrix <em>A</em> is a widely used algorithm, for instance in the solution of linear systems <em>Ax</em> = <em>b</em>. The increasing capacities of high performance computers allow us to use direct methods for systems of large and dense matrices. To build portable and efficient LU codes for vector and parallel computers, this method is rewritten in block versions and BLAS (Basic Linear Algebra Subprograms) kernels are used to mask the architectural details and allow good performance of codes such as the LAPACK (Linear Algebra PACKage) library. In the references it was proved that this strategy leads to portability and efficiency of codes using tuned BLAS kernels. After a short description of the block versions we will present some results obtained on the VAX 6520/2VP, comparing the block algorithm versus point algorithm, and vectorized versions versus scalar versions. The three columnwise versions of the block algorithm showed similar performance for this computer and large matrix dimensions. The block size used is a crucial parameter for these algorithms and the results show that the best performance is obtained with block size 64 (for large matrices) which is the vector registered size of the machine used.</p></div>","PeriodicalId":100325,"journal":{"name":"Computing Systems in Engineering","volume":"6 4","pages":"Pages 423-429"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0956-0521(95)00049-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85349666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0