{"title":"A portable Software Architecture for Mesh-Independent Particle Tracking Algorithms","authors":"Jing-Ru C. Cheng, Mark T. Jones, P. Plassmann","doi":"10.1080/10637190410001725472","DOIUrl":"https://doi.org/10.1080/10637190410001725472","url":null,"abstract":"Particle tracking methods are central to a wide spectrum of scientific computing applications. To support such applications, this paper presents a compact software architecture that can be used to interface parallel particle tracking software to computational mesh management systems. A detailed description is presented of the in-element particle tracking framework supported by this software architecture—a framework that encompasses most particle tracking applications. The use of this parallel software architecture is illustrated through the implementation of two differential equation solvers, the forward Euler and an implicit trapezoidal method, on a distributed, unstructured, computational mesh. A design goal of this software effort has been to interface to software libraries such as Scalable Unstructured Mesh Algorithms and Applications (SUMAA3d) in addition to application codes (e.g. FEMWATER). This goal of portability is achieved through a software architecture that specifies a lightweight functional interface that maintains the full functionality required by particle–mesh methods. The use of this approach in parallel programming environments written in C and Fortran is demonstrated.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122996404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han-Ku Lee, Bryan Carpenter, Geoffrey C. Fox, S. Lim
{"title":"HP Java: Programming Support for High-Performance Grid-Enabled Applications","authors":"Han-Ku Lee, Bryan Carpenter, Geoffrey C. Fox, S. Lim","doi":"10.1080/10637190410001725481","DOIUrl":"https://doi.org/10.1080/10637190410001725481","url":null,"abstract":"The paper begins by considering what a grid computing environment might be, why it is demanded and how the authors' HP spmd programming fits into this picture. We then review our HP Java environment as a contribution towards programming support for high-performance grid-enabled environments. Future grid computing systems will need to provide programming models. In a proper programming model for grid-enabled environments and applications, high performance on multi-processor systems is a critical issue. We describe the features of HP Java, including run-time communication library, compilation strategies and optimization schemes. Through experiments, we compare HP Java programs against FORTRAN and ordinary Java programs. We aim to demonstrate that HP Java can be used “anywhere”—not only for high-performance parallel computing, but also for grid-enabled applications.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121584508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the double-vertex-cycle-connectivity of crossed cubes","authors":"Xiaofan Yang, G. Megson","doi":"10.1080/1063719042000208836","DOIUrl":"https://doi.org/10.1080/1063719042000208836","url":null,"abstract":"Crossed cube, a variation of hypercube, is a candidate for the interconnection network topology employed in parallel computing systems due to nearly half diameter and stronger subgraph embedding capabilities. Existence of various cycles (rings) in an interconnection network is essential for parallel algorithms that communicate data in token-ring mode. This paper addresses the existence of cycles with some specified properties in an n-dimensional crossed cube, CQ n . We first propose the notion of double-vertex-cycle-connectivity for a graph, which provides a new measure of cycle embedding capability of the graph. We then prove that, for any two distinct vertices on CQ n at a distance of d apart and each integer l satisfying CQ n contains a cycle of length l that goes through the two vertices. Due to the fact that a hypercube does not share these properties, crossed cube shows stronger cycle embedding capability than hypercube.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117202140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tools for Regularizing Array Designs","authors":"M. Manjunathaiah, G. Megson","doi":"10.1080/1063719042000208854","DOIUrl":"https://doi.org/10.1080/1063719042000208854","url":null,"abstract":"The theory of synthesis for designing regular array architectures has been established for some time and design tools which automate some of the design steps have been developed. However, the design process is complicated by the lack of high level tools for regularizing a design. A tool for regularizing systems of affine recurrence equations (SARE) into uniform recurrence format is presented in this article. Such formats are suitable for direct application of synthesis techniques for designing regular array architectures. The main difficulties in regularizing a design such as choosing regularization vectors verifying the consistency of the transformed system are overcome through a set of high-level transformations. These transformations, which are currently lacking in design environments, simplify the designer's task of specifying algorithms for synthesizing regular arrays. Examples are presented to illustrate the use of these high-level transformations in regularizing array designs.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130336256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Parallel four points modified explicit group algorithm on shared memory multiprocessors","authors":"M. Othman, A. Abdullah, D. J. Evans","doi":"10.1080/1063719042000208818","DOIUrl":"https://doi.org/10.1080/1063719042000208818","url":null,"abstract":"The four points modified explicit group (MEG) method for solving 2D Poisson equation was introduced by Othman and Abdullah [“An efficient Four Points Modified Explicit Group Poisson Solver”, Int. J. Comput. Math., 76 (2000) 203–217], which was shown to be superior to the four points-explicit decoupled group (EDG) and explicit group (EG) methods due to Abdullah [“The Four Explicit Decoupled Group (EDG) Method: A Fast Poisson Solver”, Int. J. Comput. Math., 38 (1991) 60–70] and Evans and Biggins [“The solution of elliptic partial differential equations by A New Block Over-Relaxation Technique”, Int. J. Comput. Math., 10 (1982) 269–282], respectively. These methods were found to be suitable for parallel implementation [see Evans, D.J. and Yousif, W.S. “The implementation of the Explicit Block Iterative Methods on the balance 8000 parallel computer”, Parallel Computing, 16 (1990) 81–97; Yousif, W.S. and Evans, D.J. “Explicit De-coupled Group Iterative Methods and their parallel implementations”, Parallel Algorithms and Applications, 7 (1995) 53–71]. In this paper, the implementation of the parallel four points MEG algorithm with the red black and four colors ordering strategies for solving the same equation on shared memory multiprocessors are presented. The experiment results of the test problem are included and compared with the parallel four points- EG and EDG algorithms.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130591325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"o(log4 n) time parallel maximal matching algorithm using linear number of processors","authors":"Alak Kumar Datta, Ranjan Kumar Sen†","doi":"10.1080/1063719042000208827","DOIUrl":"https://doi.org/10.1080/1063719042000208827","url":null,"abstract":"Computing maximal matching of a graph having n vertices in parallel within time using a linear number of processors on the EREW-PRAM is an open problem [Karp, R.M. and Ramachandran, V. (1990) “Parallel algorithms for shared-memory machines”, In: van Leeuwen, J., ed., Handbook of Theoretical Computer Science, Vol. A, pp 869–941]. In this paper, we resolve this by presenting a parallel algorithm on the EREW-PRAM that works in time using O(n) number of processors, where Δ(G) is the degree of the graph.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122504736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas I. Svolos, C. Konstantopoulos, C. Kaklamanis
{"title":"Efficient binary and grey level morphological operations on a massively parallel processor","authors":"Andreas I. Svolos, C. Konstantopoulos, C. Kaklamanis","doi":"10.1080/1063719042000208845","DOIUrl":"https://doi.org/10.1080/1063719042000208845","url":null,"abstract":"One of the most important features in image analysis and understanding is shape. Mathematical morphology is the image processing branch that deals with shape analysis. The definition of all morphological transformations is based on two primitive operations, i.e. dilation and erosion. Since many applications require the solution of morphological problems in real time, researching time efficient algorithms for these two operations is crucial. †The implementation of the above functions is beyond the scope of this paper. In this paper, efficient algorithms for the binary as well as the grey level dilation and erosion are presented and evaluated for an advanced associative processor. It is shown through simulation results that the above architecture is near optimal in the binary case and is also as efficient as the array processor with a 2D-mesh interconnection in the grey level case. Finally, it is proven that the implementation of this image processing machine is economically feasible.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115505083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible parallelization of fast wavelet transforms","authors":"J. Ford, Ke Chen, N. Ford","doi":"10.1080/10637190310001633637","DOIUrl":"https://doi.org/10.1080/10637190310001633637","url":null,"abstract":"In this paper, we present a new parallel algorithm for fast wavelet transforms (FWT) of a matrix of arbitrary size using any given number of parallel processors. The main idea in achieving the optimal load balancing is through a complexity analysis and flops minimization. This makes parallel implementation of FWT feasible and efficient on distributed memory machines with only a small number of processors and on local area networks. The new algorithm is tested by numerical experiments.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125112573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trinomial-tree Based Parallel Option Price Valuations","authors":"A. Gerbessiotis","doi":"10.1080/10637190310001633655","DOIUrl":"https://doi.org/10.1080/10637190310001633655","url":null,"abstract":"We examine how trinomial-tree based computations such as those involved in American or European-style option price valuations can be performed in parallel. Towards this we introduce a parallel algorithm for performing such computations on trinomial trees. The algorithm is described and analyzed in an architecture independent setting and achieves optimal theoretical speedup O( p) and is thus within a multiplicative factor of the corresponding sequential method. We verify the practicality and plausibility of the designed algorithm by carrying out an experimental study of an implementation of the algorithm on a high-latency parallel system, a cluster of PC workstations. The algorithmic and programming methodology used to design and analyze the algorithm allows its implementation to work with only recompilation of the source code under two parallel programming libraries: MPI (LAM-MPI) and BSPlib thus making the implementation not only architecture but also communication-library independent.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121121322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Parallel Algorithm for Polynomial Evaluation","authors":"Przemysław Stpiczyński","doi":"10.1080/10637190310001633673","DOIUrl":"https://doi.org/10.1080/10637190310001633673","url":null,"abstract":"We present a new efficient parallel algorithm for polynomial evaluation based on a previously introduced divide-and-conquer method for solving linear recurrence systems with constant coefficients, which is formulated in terms of the level 1 BLAS (Basic Linear Algebra Subprograms) routine AXPY. We also discuss its platform-independent implementation with OpenMP and finally present the results of experiments performed on a dual processor Pentium III computer running under Linux operating system with Altas as an efficient implementation of BLAS. The sequential version of the algorithm is up to three times faster than the Horner's scheme.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116619093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}