{"title":"Deriving structured parallel implementations for numerical methods","authors":"Thomas Rauber, Gudula Rünger","doi":"10.1016/0165-6074(96)00007-5","DOIUrl":"10.1016/0165-6074(96)00007-5","url":null,"abstract":"<div><p>The numerical solution of differential equations is an important problem in the natural sciences and engineering. But the computational effort to find a solution with the desired accuracy is usually quite large. This suggests the use of powerful parallel machines which often use a distributed memory organization. In this article, we present a parallel programming methodology to derive structured parallel implementations of numerical methods that exhibit two levels of potential parallelism, a coarse-grain method parallelism and a medium grain parallelism on data or systems. The derivation process is subdivided into three stages: The first stage identifies the potential for parallelism in the numerical method, the second stage fixes the implementation decisions for a parallel program and the third stage derives the parallel implementation for a specific parallel machine. The derivation process is supported by a group-SPMD computational model that allows the prediction of runtimes for a specific parallel machine. This enables the programmer to test different alternatives and to implement only the most promising one. We give several examples for the derivation of parallel implementations and of the performance prediction. Experiments on an Intel iPSC/860 confirm the accuracy of the runtime predictions. The parallel programming methodology separates the software issues from the architectural details, enables the design of well-structured, reusable and portable software and supplies a formal basis for automatic support.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 589-608"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(96)00007-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132860160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance evaluation and optimization in low-cost cellular SIMD systems","authors":"Alberto Broggi , Francesco Gregoretti","doi":"10.1016/0165-6074(96)00008-7","DOIUrl":"10.1016/0165-6074(96)00008-7","url":null,"abstract":"<div><p>Low-cost massively parallel architectures are generally characterized by a number of processors which is often far lower that the size of the data set, and by a limited amount of memory owned by each Processing Element. As a consequence, low-cost mesh-connected architectures can utilize only a specific processor virtualization mechanism which is based on the sequential scanning of the data set stored in an external memory. As a consequence of this virtualization mechanism, applications must be developed according to some precise criteria. This paper presents the optimization of some key parameters for the improvement of system performance. These optimizations are validated through an image processing case study.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 659-678"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(96)00008-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116159906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scope: An extensible interactive environment for the performance evaluation of parallel systems","authors":"Yves Arrouye","doi":"10.1016/0165-6074(96)00003-8","DOIUrl":"10.1016/0165-6074(96)00003-8","url":null,"abstract":"<div><p>This paper presents Scope, an environment for the performance analysis of parallel systems based on the analysis of execution traces. Scope's design stresses scalability and easy extensibility. It does encourage interactive and non-linear exploration of the studied system's execution.</p><p>We first explain our motivation for developing yet another performance evaluation tool, and see what the strong points of our environment are; we then give a non-technical, high-level overview of the design of some of the most interesting features of Scope and the current realizations. This presentation ends with some perspectives on the developments and experiments that will be done in the immediate future.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 609-623"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(96)00003-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132703642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling of optimal load balancing strategy using queueing theory","authors":"François Spies","doi":"10.1016/0165-6074(95)00006-2","DOIUrl":"10.1016/0165-6074(95)00006-2","url":null,"abstract":"<div><p>The aim of this article is to present an original modeling of dynamic load balancing, using queueing theory and probabilities. After briefly presenting the dynamic load balancing techniques, we model the <em>optimal</em> strategy. We verify the analytical results by using simulation techniques. This modeling method is applicable to other strategies, incorporating a greater number of variables. The analysis of the results obtained by the optimal model allows us to progress to the elaboration of other strategies to improve load balancing efficiency.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 555-570"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00006-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132246636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing parallel programs by the graphical language GRAPNEL","authors":"Péter Kacsuk, Gábor Dózsa, Tibor Fadgyas","doi":"10.1016/0165-6074(96)00005-1","DOIUrl":"10.1016/0165-6074(96)00005-1","url":null,"abstract":"<div><p>We propose a new visual programming language, called GRAPNEL (GRAphical Process's NEt Language), for designing distributed parallel programs based on the message passing programming paradigm. GRAPNEL supports graphically the Process Group abstraction and the automatic generation of several regular process topology based on predefined topology templates. Dynamic process creation and destruction are possible but can be applied only in a well structured manner.</p><p>GRAPNEL is a hybrid language, where the communication related parts of the program are described using graphical symbols but textual descriptions are applied where they are more appropriate. The first prototype of the GRAPNEL programming environment uses the PVM as the basis of the message passing mechanism. Textual program parts can be written in standard C. Other message passing libraries (e.g. MPI) and ordinary textual languages (e.g. FORTRAN) are to be supported in the future.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 625-643"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(96)00005-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115443053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting partial replication in unbalanced parallel loop scheduling on multicomputer","authors":"Salvatore Orlando , Raffaele Perego","doi":"10.1016/0165-6074(96)00002-6","DOIUrl":"10.1016/0165-6074(96)00002-6","url":null,"abstract":"<div><p>We consider the problem of scheduling parallel loops whose iterations operate on large array data structures and are characterized by highly varying execution times (<em>unbalanced or non-uniform</em> parallel loops). A general parallel loop implementation template for message-passing distributed-memory multiprocessors (<em>multicomputers</em>) is presented. Assuming that it is impossible to statically determine the distribution of the computational load on the data accessed, the template exploits a hybrid scheduling strategy. The data are partially replicated on the processor's local memories and iterations are statically scheduled until first load imbalances are detected. At this point an effective dynamic scheduling technique is adopted to move iterations among nodes holding the same data. Most of the communications needed to implement dynamic load balancing are overlapped with computations, as a very effective prefetching policy is adopted. The template scales very well, since knowing where data are replicated makes it possible to balance the load without introducing high overheads.</p><p>In the paper a formal characterization of load imbalance related to a generic problem instance is also proposed. This characterization is used to derive an analytical cost model for the template, and in particular, to tune those parameters of the template that depend on the costs related to the specific features of the target machine and the specific problem.</p><p>The template and the related cost model are validated by experiments conducted on a 128-node nCUBE 2, whose results are reported and discussed.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 645-658"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(96)00002-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125448052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel systems engineering","authors":"Peter Milligan, Stephen Winter","doi":"10.1016/S0165-6074(96)90000-9","DOIUrl":"10.1016/S0165-6074(96)90000-9","url":null,"abstract":"","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 523-524"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0165-6074(96)90000-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129130444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Conde, R. Menéndez, M. González Harbour, J.A. Gregorio
{"title":"A two-level programming strategy for distributed systems","authors":"D. Conde, R. Menéndez, M. González Harbour, J.A. Gregorio","doi":"10.1016/0165-6074(95)00032-1","DOIUrl":"10.1016/0165-6074(95)00032-1","url":null,"abstract":"<div><p>In this paper we present a global approach for programming distributed multiprocessor systems. In this approach, applications are developed as a global parallel program that is independent of the particular hardware architecture, and is represented through an extended Petri net model. The building blocks for the global program are tasks that are implemented using standard programming languages. A highly automated tool is used to allocate the different tasks to processing nodes in a near-optimum way, minimizing message traffic in the interconnection network and balancing the execution workload in the different nodes. The combined use of this tool with analysis and simulation tools for Petri nets allows us to obtain information about the performance and behavior of the global program. The tool divides the original extended Petri net into several subnets that are distributed among the different nodes, and provides for the installation, execution, and monitoring of the program. An example is presented in which our programming strategy is compared to PVM, which is a widely extended software tool for the distribution of programs in a network of computers.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 541-554"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00032-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134071098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From transformations to methodology in parallel program development: A case study","authors":"Sergei Gorlatch","doi":"10.1016/0165-6074(96)00004-X","DOIUrl":"10.1016/0165-6074(96)00004-X","url":null,"abstract":"<div><p>The Bird-Meertens formalism (BMF) of higher-order functions over lists is a mathematical framework supporting formal derivation of algorithms from functional specifications. This paper reports results of a case study on the systematic use of BMF in the process of parallel program development. We develop a parallel program for polynomial multiplication, starting with a straight-forward mathematical specification and arriving at the target processor topology together with a program for each processor of it. The development process is based on formal transformations; design decisions concerning data partitioning, processor interconnections, etc. are governed by formal type analysis and performance estimation rather than made <em>ad hoc</em>. The parallel target implementation is parameterized for an arbitrary number of processors; for the particular number, the target program is both time and cost-optimal. We compare our results with systolic solutions to polynomial multiplication.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 8","pages":"Pages 571-588"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(96)00004-X","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125315894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}