{"title":"AN IMPROVED PARALLEL ALGORITHM FOR A GEOMETRIC MATCHING PROBLEM WITH APPLICATION TO TRAPEZOID GRAPHS","authors":"M. H. Alsuwaiyel","doi":"10.1080/10637190208941437","DOIUrl":"https://doi.org/10.1080/10637190208941437","url":null,"abstract":"Let B be a set of n b blue points and R a set of nrred points in the plane, where nb + nr = n. A blue point b and a red point r can be matched if r dominates b, that is, if x(b) ≤ x(r) and y( b) ≤ y(r). We consider the problem of finding a maximum cardinality matching between the points in B and the points in R. We give an adaptive parallel algorithm to solve this problem that runs in O(log2n) time using the CREW PRAM with O(n2+ε/log n) processors for some ε,0 < ε < 1.It follows that finding the minimum number of colors to color a trapezoid graph can be solved within these resource bounds","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115258426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HARDWARE/SOFTWARE MODELING OF FPGA-BASED SYSTEMS","authors":"V. Sklyarov","doi":"10.1080/10637190208941432","DOIUrl":"https://doi.org/10.1080/10637190208941432","url":null,"abstract":"This paper discusses methods and tools for the modeling of FPGA-based systems for different kinds of practical applications, such as computational and peripheral devices, embedded controllers, etc. The system to be modeled is presented in the form of a traditional composition of an execution unit (a datapath) and a control part. They are implemented partially in reconfigurable hardware (in FPGA) and partially in software running on PC. The interaction between hardware/software components is organized through a defined software/hardware interface. For the majority of the applications considered, the control unit has been implemented entirely in FPGA. This allows the speed of execution of the respective algorithms to be increased and systems to be constructed with distributed control. Although the execution unit is emulated mainly in software, it can be partially implemented in hardware and the boundary between hardware/software components is considered to be fuzzy. The software components have been designed with the aid of object-oriented libraries and programs written in Visual C++. The models considered have been used for experiments with FPGAs and for educational purposes.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126873698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCALABILITY ANALYSIS OF PARALLEL GMRES IMPLEMENTATIONS","authors":"M. Sosonkina, D. Allison, L. Watson","doi":"10.1080/01495730208941444","DOIUrl":"https://doi.org/10.1080/01495730208941444","url":null,"abstract":"Abstract Applications involving large sparse nonsymmetric linear systems encourage parallel implementations of robust iterative solution methods, such as GMRES(k). Two parallel versions of GMRES(k) based on different data distributions and using Householder reflections in the orthogonalization phase are analyzed with respect to scalability (their ability to maintain fixed efficiency with an increase in problem size and number of processors). A theoretical algorithm-machine model for scalability of GMRES(k) with fixed k is derived and validated by experiments on three parallel computers, each with different machine characteristics. The analysis for an adaptive version of GMRES(k), in which the restart value k is adapted to the problem, is also presented and scalability results for this case are briefly discussed.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"29 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114110207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"STATIC PERFORMANCE PREDICTION OF SKELETAL PARALLEL PROGRAMS","authors":"Yasushi Hayashi, M. Cole","doi":"10.1080/10637190208941434","DOIUrl":"https://doi.org/10.1080/10637190208941434","url":null,"abstract":"We demonstrate that the run time of implicitly parallel programs can be statically predicted with considerable accuracy when expressed within the constraints of a skeletal, shapely parallel programming language. Our work constitutes the first completely static system to account for both computation and communication in such a context. We present details of our language and its BSP implementation strategy together with an account of the analysis mechanism. We examine the accuracy of our predictions against the performance of real parallel programs.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115659844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PARALLEL SORT-HASH OBJECT-ORIENTED COLLECTION JOIN ALGORITHMS FOR SHARED-MEMORY MACHINES","authors":"D. Taniar, J. Rahayu","doi":"10.1080/10637190208941435","DOIUrl":"https://doi.org/10.1080/10637190208941435","url":null,"abstract":"Collection join queries are join queries based on collection attributes (i.e. non-atomic attributes), which are common in object-oriented databases. We have identified three different kinds of collection join queries, namely; cullection-equi join, collection-intersect join, and sub-collection join. In this paper, we propose parallel join algorithms for these three collection join query types based on a combination of sort and hash methods, which we call parallel sort-hash, collection join algorithms. The proposed join algorithms play an important role in parallel object-oriented query processing, due to their superiority over the conventional join methods which are usually in a form of relational division, and also the inefficiency of the original join predicate processing. In our implementation of these algorithms on a shared-memory machine, we show that the combination between sort and hash methods is proven to be better than the conventional sort-merge and nested-loop based parallel join processing","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115369275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Holmgren, Markus Nordén, J. Rantakokko, Dan Wallin
{"title":"PERFORMANCE OF PDE SOLVERS ON A SELF-OPTIMIZING NUMA ARCHITECTURE","authors":"S. Holmgren, Markus Nordén, J. Rantakokko, Dan Wallin","doi":"10.1080/01495730208941445","DOIUrl":"https://doi.org/10.1080/01495730208941445","url":null,"abstract":"Abstract The performance of shared-memory (OpenMP) implementations of three different PDE solver kernels representing finite difference methods, finite volume methods and spectral methods has been investigated. The experiments have been performed on a self-optimizing NUMA system, the Sun Orange prototype, using different data placement and thread scheduling strategies. The results show that correct data placement is very important for the performance for all solvers. However, the Orange system has a unique capability of automatically changing the data distribution at run time through both migration and replication of data. For reasonable large PDE problems, we find that the time to do this is negligible compared to the total solve time. Also, the performance after the migration and replication process has reached steady-state is the same as what is achieved if data is optimally placed at the beginning of the execution using hand tuning. This shows that, for the application studied, the self-optimizing features are successful, and shared memory code without explicit data distribution directives yields good performance.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131342927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SELF-STABILIZING DISTRIBUTED SORTING IN TREE NETWORKS","authors":"A. Datta, S. Tixeuil","doi":"10.1080/01495730108935263","DOIUrl":"https://doi.org/10.1080/01495730108935263","url":null,"abstract":"This paper presents a self-stabilizing distributed sorting algorithm for tree networks. The distributed sorting problem can be informally described as follows: Nodes cooperate to reach a global configuration where every node, depending on its identifier, is assigned a specific final value taken from a set of input values distributed across all nodes. The input values may change in time. In our solution, the system reaches its final configuration in a finite time after the input values are stable and the faults cease. The fault-tolerance and the adaptivity to changing input is achieved using Dijkstra's paradigm of self-stabilization. A self-stabilizing algorithm, regardless of the initial system state, will converge in finite time to a set of legitimate states without the need for explicit exception handlers or backward recovery. Our solution is based on a continuous broadcast with acknowledgment along the tree edges to achieve the synchronization among processes in the system. It has 0(n ×h) time complexity and only 0(log(n) × ) memory requirement where h is the degree of the tree and h is the height of the tree.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114664504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Aldinucci, S. Gorlatch, C. Lengauer, S. Pelagatti
{"title":"TOWARDS PARALLEL PROGRAMMING BY TRANSFORMATION: THE FAN SKELETON FRAMEWORK","authors":"Marco Aldinucci, S. Gorlatch, C. Lengauer, S. Pelagatti","doi":"10.1080/01495730108935268","DOIUrl":"https://doi.org/10.1080/01495730108935268","url":null,"abstract":"A Functional Abstract Notation (FAN) is proposed for the specification and design of parallel algorithms by means of skeletons - high-level patterns with parallel semantics. The main weakness of the current programming systems based on skeletons ii that the user is still responsible for finding the most appropriate skeleton composition for a given application and a given parallel architecture We describe a transformational framework for the development of skeletal programs which is aimed at filling this gap. The framework makes use of transformation rules which are semantic equivalences among skeleton compositions. For a given problem, an initial, possibly inefficient skeleton specification is refined by applying a sequence of transformations. Transformations are guided by a set of performance prediction models which forecast the behavior of each skeleton and the performance benefits of different rules. The design process is supported by a graphical tool which locates applicable transformations and provides performance estimates, thereby helping the programmer in navigating through the program refinement space. We give an overview of the FAN framework and exemplify its use with performance-directed program derivations for simple case studies. Our experience can be viewed as a first feasibility study of methods and tools for transformational, performance-directed parallel programming using skeletons.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126799385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IMPROVED RESOURCE UTILIZATION WITH BUFFERED COSCHEDULING","authors":"F. Petrini, Wu-chun Feng","doi":"10.1080/01495730108935269","DOIUrl":"https://doi.org/10.1080/01495730108935269","url":null,"abstract":"We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of flow-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet ;.till expressive parallel programming model which offloads many resource-management tasks to the operating system. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads and is relatively insensitive to the local process scheduling strategy.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131962000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DISTRIBUTED SIMULATION OF HIGH-LEVEL ALGEBRAIC PETRI NETS WITH LIMITED CAPACITY PLACES","authors":"K. Djemame","doi":"10.1080/01495730108935272","DOIUrl":"https://doi.org/10.1080/01495730108935272","url":null,"abstract":"The aim of this paper is to search for techniques to accelerate simulations exploiting the parallelism available in current multicomputers, and to use these techniques to study a class of Petri nets called high-level algebraic nets. These nets exploit the rich theory of algebraic specifications for high-level Petri nets. They also gain a great deal of modelling power by representing dynamically changing items as structured tokens whereas algebraic specifications turned out to be an adequate and flexible instrument for handling structured items. We focus on ECATNets (Extended Concurrent Algebraic Term Nets), a kind of high-level algebraic Petri nets with limited capacity places Three distributed simulation techniques have been considered: asynchronous conservative, asynchronous optimistic and synchronous. These algorithms have been implemented in a network of workstations with MPI (Message Passing Interface). The influence that factors such as the characteristics of the simulated models, the organisation of the simulators and the characteristics of the target multicomputer have in the performance of the simulations have been measured and characterized It is concluded that distributed simulation of ECATNets on a multicomputer system can in fact gain speedup over the sequential simulation, and this can be achieved even for small scale simulation models.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115711966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}