{"title":"EMPLOYING K-ARY n-CUBES FOR PARALLEL LAGRANGE INTERPOLATION","authors":"H. Sarbazi-Azad, M. Ould-Khaoua, L. Mackenzie","doi":"10.1080/01495730108935275","DOIUrl":"https://doi.org/10.1080/01495730108935275","url":null,"abstract":"This paper proposes a parallel algorithm for computing anN( = Kn) point Lagrange interpolation on fc-ary n-cube networks. The algorithm consists of three phases: initialisation, main and final. There is no computation in the initialisation phase. The main phase is composed of N/2 steps, each consisting of four multiplications and four subtractions, and an additional step including one division and one multiplication. Communication in the main phase is based on an all-to-all broadcast algorithm on a Hamiltonian ring embedded in a k-ary n-cube. The final phase is carried out in n x ⌊k/l⌋ steps, each requiring one addition. A performance evaluation of the proposed algorithm reveals a near to optimum speedup for a typical range of sy:;tem parameters used in current state-of-the-art implementations. Our study also reveals that when implementation cost is taken into account low-dimensional K-ary n-cubes achieve better speedup than their higher-dimensional counterparts.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124269981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ON THE PARALLEL IMPLEMENTATION OF A GENERALIZED BROADCAST","authors":"C. Wedler, C. Lengauer","doi":"10.1080/01495730108935266","DOIUrl":"https://doi.org/10.1080/01495730108935266","url":null,"abstract":"We prove the correctness of optimized parallel implementations of a generalized broadcast, in which a value b is distributed to a sequence of processors, indexed from 0 upwards, such that processor i receives gib (i.e., some function g applied i times to b). Its straight-forward implementation is of linear time complexity in the number of processors. This type of broadcast occurs when combining scans with an ordinary broadcast. The optimized parallel implementations we describe is based on an odd-even tree and has logarithmic time complexity.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133719144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SELF-STABILIZING DEPTH-FIRST MULTI-TOKEN CIRCULATION IN TREE NETWORKS","authors":"G. Antonoiu, P. Srimani","doi":"10.1080/01495730108935264","DOIUrl":"https://doi.org/10.1080/01495730108935264","url":null,"abstract":"Self stabilization is one of the new paradigms to investigate fault tolerance in distributed algorithm design. The multi-token or the l-exclusion problem is the logical generalization of standard mutual exclusion problem where l processes can enter their critical section at the same time. Self stabilizing single token circulation or the standard mutual exclusion problem has been investigated by a number of authors; recently authors in [16] have proposed a self stabilizing multi-token protocol for the ring networks. We propose a new self stabilizing algorithm (protocol) for depth first circulation of multiple tokens in a tree network. The algorithm uses only bounded integers and the correctness is proved by using induction.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134284579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EIGENPAIRS OF SYMMETRIC MATRICES USING THE QUADRATIC METHOD AND THE METHOD OF SUBDEFINITE CALCULATIONS","authors":"A. Kharab","doi":"10.1080/01495730108935277","DOIUrl":"https://doi.org/10.1080/01495730108935277","url":null,"abstract":"This paper is concerned with the calculation of the eigenpairs of real symmetric matrices. A new method based on using the quadratic method along with the method of subdefinite calculations is presented. The quadratic method consists on solving a quadratic system whose solutions are obtained using the method of subdefinite calculations. Both methods are presented and numerical results are given.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124451874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EFFECTIVE PARALLELIZATION TECHNIQUES FOR LOOP NESTS WITH NON-UNIFORM DEPENDENCES","authors":"Der-Lin Pean, Cheng Chen","doi":"10.1080/01495730108935265","DOIUrl":"https://doi.org/10.1080/01495730108935265","url":null,"abstract":"The parallelism of loop nests with non-uniform dependences is difficult to extract and ineffectively explored by the existing parallelization schemes. In this paper, we propose new efficient techniques in extracting parallelism of loop nests with non-uniform dependences using their irregularity. By this way, current highly parallel multiprocessor systems such as multithreaded and clustering multiprocessor systems can be fully utilized. These four mechanisms are (a) parallelization part splitting, (b) partial parallelization decomposition, (c) irregular loop interchange and (d) growing pattern detection. They explore parallelisms of special parallel patterns for nested loops with non-uniform dependences. The loop transformations used in uniform loops are also applied in non-uniform dependence loops after legality tests. We apply the results of classical convex theory and detect special parallel patterns of dependence vectors. We also proposed an algorithm that combines above mechanisms to enhance parallelism. We demonstrate that our technique gives much better speedup and extracts more parallelism than the existing techniques. Thus, we are encouraged by these apparent enhancements to pursue further development.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121551011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EFFICIENT PARALLEL RANGE SEARCHING AND PARTITIONING ALGORITHMS*","authors":"A. Datta","doi":"10.1080/01495730108935276","DOIUrl":"https://doi.org/10.1080/01495730108935276","url":null,"abstract":"We present an optimal parallel construction of the range tree data structure and use this construction to solve several geometric partitioning problems. In the range tree, we show how to perform a count-mode orthogonal range query in 0(log n) time by a single processor and a report mode orthogonal range query in 0(log n) time using 0(1 + log n) processors, where k is the number of points inside the query range. We consider partitioning problems of the following nature. Given a planar point set S (∣S∣ = ri) a measure μacting on 5 and a pair of values μ1 and μ2,the task is to find a partition of S into two components S1 and S2 (S = S1U S2) such that μ(S1) =μ1 for i=1, 2. We consider several measures like diameter under L∞ and l1 metric; area, perimeter of the smallest enclosing axes-parallel rectangle; and the side length of the smallest enclosing axes-parallel square. All our parallel algorithms foi partitioning problems run in 0(log n) time using 0(n) processors. Our algorithms are designed for the CREW PRAM model of parallel computation.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128394699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NESTED ALGORITHMIC SKELETONS FROM HIGHER ORDER FUNCTIONS","authors":"G. Michaelson, N. Scaife, Paul Bristow, P. King","doi":"10.1080/01495730108935271","DOIUrl":"https://doi.org/10.1080/01495730108935271","url":null,"abstract":"Algorithmic skeletons provide a promising basis for the automatic utilisation of parallelism at sites of higher order function use through static program analysis. However, decisions about whether or not to realise particular higher order function instances as skeletons must be based on information about processing resources available at runtime In principle, nested higher order functions may be realised as nested skeletons. However, where higher order function arguments result from partially applied functions, free-variable bindings must be identified and communicated through the corresponding skeleton hierarchy to where those arguments are actually applied Here, a skeleton based parallelising compiler for Standard ML is presented. Hybrid skeletons, which can change from parallel to serial evaluation at runtime, are considered and mechanisms for their nesting are discussed. The main compilation stages are illustrated for simple examples. A nested higher order function based algorithm for multiplying matrices of arbitrary length integers is presented along with performance figures for compiled code running on a Fujitsu AP3000.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127745747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LINEAR ARRAY FOR A CLASS OF NON UNIFORM RECURRENCE EQUATIONS","authors":"S. Fidanova","doi":"10.1080/01495730108935274","DOIUrl":"https://doi.org/10.1080/01495730108935274","url":null,"abstract":"In this paper we present a new scheduling algorithm to solve the class of non uniform recurrence equations on the linear systolic array. The main idea is to use the fact that data dependency graph of the dynamic programming function of the problem is an irregular mesh. We propose a scheduling algorithm for irregular meshes on the linear array by an allocation function and an initial time for each node.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116729584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MODELS AND TRENDS IN PARALLEL PROGRAMMING","authors":"D. Talia","doi":"10.1080/01495730108935270","DOIUrl":"https://doi.org/10.1080/01495730108935270","url":null,"abstract":"This paper introduces and discusses programming models for parallel processing and recent trends in the area of parallel programming. The paper discusses different parallel programming languages and tools that reflect various parallel computation models. These language differ in expressiveness, portability and performance. Software design and implementation largely varies by using different languages that make the programmer task easy or complex. We describe here the design goals and the main issues of parallel programming models and languages belonging to the following categories: shared-space based languages, message-based languages, parallel toolkits, data-parallel languages, parallel declarative languages, parallel object-oriented languages, and parallel composition-based languages. Tools and languages such as HPF, Linda, Java, OpenMP, PVM, MPI, Parallel C+ +, Sisal, Orca, Mentat, SkieCL, BSP and others are described in some detail. Their main features for design and implementation of high performance applications are discussed. Finally, we outline directions of research and development in the parallel programming area with a special attention to novel approaches based on high-level programming structures that make transparent to the users the architectural details of parallel computing machines.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128210444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DESIGN OF A PARALLEL INTERCONNECT BASED ON COMMUNICATION PATTERN CONSIDERATIONS","authors":"N. Suri, A. Mendelson","doi":"10.1080/01495730108935273","DOIUrl":"https://doi.org/10.1080/01495730108935273","url":null,"abstract":"An interconnects versatility is usually described by its ability to support a variety of algorithmic patterns based on the physical or logical embeddings within its topology that match the desired algorithmic patterns. However, most such embeddings are available on u discrete basis, though a particular algorithm may require a variety of embeddings during different phases of its operation. To provide for such varying embedded topology needs, we propose a simple and VLSI realizable interconnect structure, termed as a Union Graph (UG), which combines two discrete interconnects, with very individual and distinctive capabilities, through a union operation. We present the union of the binary deBruijn graph (BDG) and a Torus to demonstrate the effectiveness of this approach. The focus is on providing practical usability of the network for algorithmic support rather than on graph properties. We highlight the importance of communication aspects of different execution phases in designing an algo-rithmically specialized interconnect. A set of examples are used to demonstrate the UG's versatility for algorithmic support.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123509859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}