{"title":"Cost-effective modeling for natural resource distribution systems","authors":"A. Al-Ayyoub","doi":"10.1080/10637190412331295148","DOIUrl":"https://doi.org/10.1080/10637190412331295148","url":null,"abstract":"Pipe systems are in the cores of many real life applications including water, oil and gas distribution as well as air-conditioning and compressed air management. Modeling and analysis of flow in pipe networks is of great practical significance in all these areas. Pipe networks are usually made up of thousands of components such as pipes, pumps, valves, tanks and reservoirs. One common way to model these networks is by using systems of linear equations. Practical sizes for these systems usually involve exhaustive calculations that require high computational power. This work emphasizes the design and evaluation of a concurrent system for modeling pipe networks using linear algebraic methods. The proposed approach offers low-cost and high-speed alternative to traditional solutions. It uses a unified row mapping method that exploits the properties of the pipe network matrix in order to achieve a balanced load distribution. This approach is based on cluster computing as a viable alternative to the expensive massively parallel processing systems. The performance of the proposed approach is investigated on a cluster of workstations connected by general-purpose networks.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132196995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Norhashidah Hj. Mohd Ali, Rosni Abdullah †, Kok Jun Lee ‡
{"title":"A comparative study of explicit group iterative solvers on a cluster of workstations","authors":"Norhashidah Hj. Mohd Ali, Rosni Abdullah †, Kok Jun Lee ‡","doi":"10.1080/10637190412331295157","DOIUrl":"https://doi.org/10.1080/10637190412331295157","url":null,"abstract":"In this paper, a group iterative scheme based on rotated (cross) five-point finite difference discretisation, i.e. the four-point explicit decoupled group (EDG) is considered in solving a second order elliptic partial differential equation (PDE). This method was firstly introduced by Abdullah [“The four point EDG method: a fast poisson solver”, Int. J. Comput. Math., 38 (1991) 61–70], where the method was found to be more superior than the common existing methods based on the standard five-point finite difference discretisation. The method was further extended to different type of PDE's, where similar improved results were established [Ali, N.H.M., Abdullah, A.R. Four Point EDG: A Fast Solver For The Navier–Stokes Equation, M.H.Hamza (ed.) Proceedings of the IASTED International Conference on Modelling Simulation And Optimization, Gold Coast, Australia, May 6–9 (1996) (CD Rom-File 242-165.pdf), ISBN: 0-88986-197-8; Ali, N.H.M., Abdullah, A.R. New Parallel Point Iterative Solutions For the Diffusion-Convection Equation Proceedings of the International Conference on Parallel and Distributed Computing and Networks Singapore, Aug. 11–13 (1997) 136–139; Ali, N.H.M., Abdullah, A.R. “New rotated iterative algorithms for the solution of a coupled system of elliptic equations” Int. J. Comput. Math. 74 (1999) 223–251]. These new iterative algorithms had been developed to run on the Sequent Balance, a shared memory parallel computer [A.R. Abdullah, N.M. Ali, The Comparative Study of Parallel Strategies For The Solution of Elliptic PDE's Parallel Algorithms and Applications Vol. 10 (1996) 93–103; Ali, N.H.M., Abdullah, A.R. “Parallel four point explicit decoupled group (EDG) method for elliptic PDE's” Proceedings of the Seventh IASTED/ISMM International Conference on Parallel and Distributed Computing and Systems (1995) 302–304 (Washington DC); Ali, N.H.M., Abdullah, A.R. New Parallel Point Iterative Solutions For the Diffusion-Convection Equation Proceedings of the International Conference on Parallel and Distributed Computing and Networks, Singapore, Aug. 11–13 (1997) 136–139; Yousif, W.S., Evans, D.J.“Explicit decoupled group iterative methods and their parallel implementations” Parallel Algorithms and Applications 7 (1995) 53–71] where they were shown to be suitable to be implemented in parallel. In this work, the four-point group algorithm was ported to run on a cluster of Sun workstations using a parallel virtual machine (PVM) programming environment together with the four-point explicit group (EG) method [Evans, D.J., Yousif, W.S. “The implementation of the explicit block iterative methods on the balance 8000 parallel computer” Parallel Computing 16 (1990) 81–97]. We describe the parallel implementations of these methods in solving the Poisson equation and the results of some computational experiments are compared and reported. rosni@cs.usm.my kokjl@hotmail.com","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125661288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and scalable parallel matrix computations with reconfigurable pipelined optical buses","authors":"Keqin Li","doi":"10.1080/10637190410001700604","DOIUrl":"https://doi.org/10.1080/10637190410001700604","url":null,"abstract":"We present fast and highly scalable parallel computations for a number of important and fundamental matrix problems on linear arrays with reconfigurable pipelined optical bus systems. These problems include computing the powers, the inverse, the characteristic polynomial, the determinant, the rank and an LU- and a QR-factorization of a matrix; multiplying a chain of matrices; and solving linear systems of equations. These computations are based on efficient implementation of the fastest sequential matrix multiplication algorithm, and are highly scalable over a wide range of system size. Such fast and scalable parallel matrix computations were not seen before on distributed memory parallel computing systems.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121591175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA implementation of a Cholesky algorithm for a shared-memory multiprocessor architecture","authors":"Satchidanand G. Haridas, Sotirios G. Ziavras","doi":"10.1080/10637190412331279957","DOIUrl":"https://doi.org/10.1080/10637190412331279957","url":null,"abstract":"Solving a system of linear equations is a key problem in engineering and science. Matrix factorization is a key component of many methods used to solve such equations. However, the factorization process is very time consuming, so these problems have often been targeted for parallel machines rather than sequential ones. Nevertheless, commercially available supercomputers are expensive and only large institutions have the resources to purchase them. Hence, efforts are on to develop moreaffordable alternatives. In this paper, we propose such an approach. We present an implementation of a parallel version of the Cholesky matrix factorization algorithm on a single-chip multiprocessor built inside an APEX20K series Field-Programmable Gate Array (FPGA) developed by Altera. Our multiprocessor system uses an asymmetric, shared-memoryMIMD architecture and was built using the configurable Nios™ processor core which was also developed by Altera. Our system was developed using Altera's System-On-a-Programmable-Chip (SOPC) Quartus II development environment. Our Cholesky implementation is based on an algorithm described by George et al. [6]. This algorithm is scalable and uses a “queue of tasks” approach to ensure dynamic load-balancing among the processing elements. Our implementation assumes dense matrices in the input. We present performance results for uniprocessor and multiprocessor implementations. Our results show that the implementation of multiprocessors inside FPGAs can benefit matrix operations, such as matrix factorization. Further benefits result from good dynamic load-balancing techniques.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126277372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of MPI-IO in Parallel Particle Transport Monte-Carlo Simulation","authors":"Mo Ze-yao, Huang Zhengfeng","doi":"10.1080/10637190412331295166","DOIUrl":"https://doi.org/10.1080/10637190412331295166","url":null,"abstract":"Parallel computers are increasingly being used to run large-scale applications that also have huge input/output (I/O) requirements. However, many applications usually obtain poor I/O performance on parallel machines. In this paper, we will address the parallel I/O of a parallel particle transport Monte-Carlo simulation code (PTMC) on a parallel computer. This paper shows that, without careful treatments, the I/O overheads will ultimately dominate the elapsed simulation time. Fortunately, we have successfully designed the parallel MPI I/O methods for it. In particular, for a benchmark application MAP6 with 105 steps of 100,000 samples, we have elevated the speedup from 10 with 64 processors to 56 with 90 processors. Moreover, our method is scalable for a larger number of CPUs and a larger number of samples.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126307270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Gravvanis, K. M. Giannoutakis, Nikolaos Missirlis
{"title":"A Distributed Normalized Explicit Preconditioned Conjugate Gradient Method","authors":"G. Gravvanis, K. M. Giannoutakis, Nikolaos Missirlis","doi":"10.1080/10637190412331279975","DOIUrl":"https://doi.org/10.1080/10637190412331279975","url":null,"abstract":"A new parallel normalized explicit preconditioned conjugate gradient method in conjunction with normalized approximate inverse matrix techniques is presented for solving efficiently sparse linear systems on multi-computer systems. Application of the proposed method on a three dimensional boundary value problem is discussed and numerical results are given. The implementation and performance on a distributed, memory MIMD machine, using message passing interface (MPI) is also investigated. E-mail: nmis@di.uoa.gr","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122185320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Journal of Parallel Algorithms and Applications: Special Issue on Parallel and Distributed Algorithms","authors":"G. Gravvanis, H. Arabnia","doi":"10.1080/10637190410001725445","DOIUrl":"https://doi.org/10.1080/10637190410001725445","url":null,"abstract":"The Journal of Parallel Algorithms and Applications publishes original quality research throughout various areas, including Parallel and Distributed Algorithms. The scope of the journal includes novel applications as well as fundamental contributions to the field. This Special Issue of The Journal of Parallel Algorithms and Applications contains selected articles presented at The International Multi-Conference in Computer Science and Computer Engineering; title of track: The 2003 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2003; June 23–26, 2003, Las Vegas, Nevada, USA). The main objective of the International Multi-Conference in Computer Science and Computer Engineering series is to create an international scientific forum for presentation and discussion of current research topics of Computer Science and Engineering. The six papers appearing in this special issue provide a variety and wealth of contributions and approaches in the field: In this special issue, Schimmler M., Schmidt B. and Lang H.W. present the design of a new bit-serial floating-point unit (FPU), which has been developed for the processors of the Instruction Systolic Array parallel computer model. The bit-serial approach requires a different data format. The proposed floating-point unit uses an IEEE compliant internal floating-point format that allows a fast least significant bit (LSB)-first arithmetic that can be efficiently implemented in hardware. Mohamed A.S. and Baydogan V.S. propose a broader generic application/language/ model independent multi-agent framework for dynamic load balancing. The framework is intended to handle varying levels of load changes in computations, I/O, and/or synchronization throughout the application run and it is an open-architecture that currently supports four multi-level parallel programming models. An open-architecture multi-agent load-balancing capability is proposed that currently makes use of a leading geometric partitioner engine at runtime. It has been shown that the framework is effective in monitoring, tuning, and rebalancing emerging computational, I/O and synchronization sources of load imbalance. Zafar B., Pinkston T.M., Bermudez A. and Duato J. discuss InfiniBand architecture which is a newly established general-purpose interconnect standard. A method for applying the Double Scheme over InfiniBand networks is proposed. The Double Scheme provides","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"85 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125718498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deadlock-free dynamic reconfiguration over InfiniBand™ NETWORKS","authors":"B. Zafar, T. Pinkston, Aurelio Bermúdez, J. Duato","doi":"10.1080/10637190410001725463","DOIUrl":"https://doi.org/10.1080/10637190410001725463","url":null,"abstract":"InfiniBand Architecture (IBA) is a newly established general-purpose interconnect standard applicable to local area, system area and storage area networking and I/O. Networks based on this standard should be capable of tolerating topological changes due to resource failures, link/switch activations, and/or hot swapping of components. In order to maintain connectivity, the network's routing function may need to be reconfigured on each topological change. Although the architecture has various mechanisms useful for configuring the network, no strategy or procedure is specified for ensuring deadlock freedom during dynamic network reconfiguration. In this paper, a method for applying the Double Scheme over InfiniBand networks is proposed. The Double Scheme provides a systematic way of reconfiguring a network dynamically while ensuring freedom from deadlocks. We show how features and mechanisms available in IBA for other purposes can also be used to implement dynamic network reconfiguration based on the Double Scheme. We also propose new mechanisms that may be considered in future versions of the IBA specification for making dynamic reconfiguration and other subnet management operations more efficient.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124578584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A bit-serial floating-point unit for a massively parallel system on a chip","authors":"M. Schimmler, B. Schmidt, Hans-Werner Lang","doi":"10.1080/10637190410001725454","DOIUrl":"https://doi.org/10.1080/10637190410001725454","url":null,"abstract":"This paper presents the design of a new bit-serial floating-point unit (FPU). It has been developed for the processors of the instruction systolic array (ISA) parallel computer model. In contrast to conventional bit-parallel FPUs the bit-serial approach requires a different data format. Our FPU uses an IEEE compliant internal floating-point format that allows a fast least significant bit (LSB)-first arithmetic and can be efficiently implemented in hardware. Tel.:+49-431-880-4480. Fax:+49-431-880-4054masch@informatik.uni-kiel.de Tel.:+49-461-8051235. Fax:+49-461-8051527lang@fh-flensburg.de","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126568166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Locality-conscious load-balancer based on negotiations in dynamic unstructured mesh computations","authors":"A. Mohamed, Veysel S. Baydogan","doi":"10.1080/10637190412331279966","DOIUrl":"https://doi.org/10.1080/10637190412331279966","url":null,"abstract":"Recently hybrid/multi-level parallel programming models are gaining lots of momentum basically because they have proven to provide better scalability, speedup and utilization than any single parallel programming model alone. In such models, load balancing should not only mean balancing the computational loads (as it has always been perceived), but should also mean balancing I/O imbalance as well as synchronization imbalance. In this paper, we propose a broader generic application/language/model independent multi-agent framework for dynamic load balancing. It takes most of the load-balancing burden away from programmers. It is not a library but a runtime support system that is not hardwired to the parallel applications. The framework is intended to handle varying levels of load changes in computations, I/O and/or synchronization throughout the application run and it is an open-architecture that currently supports four multi-level parallel programming models. It has a clean interface to the application, runs in parallel and provides additional functionality such as determination of when to balance load and provide interface to end users. The proposed open-architecture multi-agent load-balancing capability currently makes use of a leading geometric partitioner engine (Chaco) at runtime. A mesh solver may initially create hundreds of lightweight threads, each handling a small submesh by calling Chaco partitioning engine in a pre-processing stage. This partitioner engine might be called again by these light-weight threads if a divide-and-conquer process is deemed necessary when the sub-domain (submesh) served by this thread grows out beyond certain threshold limits and thus creates an imbalance. In the proposed framework, the multi-agent is a set of SMP-based load balancers (agents) that do not have to share any data structure with the parallel application threads. They just monitor and collect system and application data frequently from the outside of the multi-threaded parallel application solver and send adjustments and negotiation plans to the SMP-load balancers and the application threads whenever a need for load balancing arises. The proposed framework has been deployed in four hybrid/multi-level parallel programming models and its capabilities of issuing corrective actions against emerging imbalances were tested in the context of an adaptive mesh refinement application. Experimental results show that the framework is effective in monitoring, tuning and rebalancing emerging computational, I/O and synchronization sources of load imbalance.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131512258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}