{"title":"Distributed electric field approximation","authors":"D. Trybus, Z. Kucerovsky, A. Ieta, T. Doyle","doi":"10.1109/HPCSA.2002.1019167","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019167","url":null,"abstract":"Grid or mesh techniques are frequently used to approximate continuous entities that behave in a wave or fluid-like fashion. Partial Differential Equations (PDEs) are usually involved in the description of such entities or processes. Distributed parallel computation was used in various computer cluster configurations to calculate PDE solutions of electrostatic field. The study of the efficacy of the selected architecture using mesh techniques was intended. The match between the algorithm and the architecture in achieving maximum computational performance was also investigated. The developed architectures, algorithms, and findings are presented in the paper.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123678957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The characterization of parallel real-time optimization problems","authors":"S. Bruda, S. Akl","doi":"10.1109/HPCSA.2002.1019137","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019137","url":null,"abstract":"We identify the class of optimization problem expressible as independence systems that can be solved in real time using a parallel machine with polynomially bounded resources as being exactly the class of matroid for which the size of the optimal solution can be computed in parallel real time. We also extend previous results, showing that the solution obtained by a parallel algorithm is arbitrarily better than the solution reported by a sequential one not only for the real-time minimum-weight spanning tree (as previously known). Indeed, we show that, for all practical purposes, such a property does in fact hold for any optimization problem that falls into the aforementioned class.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122760379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A parallel shared memory simulator for command and control","authors":"Christophe Jaillet, M. Krajecki, J. Fugère","doi":"10.1109/HPCSA.2002.1019162","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019162","url":null,"abstract":"Introduces a military application in the command and control field. The main feature of this study is the parallelization of the simulator. The simulator is object-oriented and written in C++. It uses the OpenMP standard for the parallel version. To produce an efficient parallel simulator, we have to deal with the dynamic load balancing problem.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"45 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120839442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An evaluation of thread migration for exploiting distributed array locality","authors":"S. Jenks, J. Gaudiot","doi":"10.1109/HPCSA.2002.1019154","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019154","url":null,"abstract":"Thread migration is one approach to remote memory accesses on distributed memory parallel computers. In thread migration, threads of control migrate between processors to access data local to those processors, while conventional approaches tend to move data to the threads that need them. Migration approaches enhance spatial locality by making large address spaces local, but are less adept at exploiting temporal locality. Data-moving approaches, such as cached remote memory fetches or distributed shared memory, can use both types of locality. We present experimental evaluation of thread migration's ability to reduce the impact of remote array accesses across distributed-memory computers. Nomadic Threads uses compiler-generated fine-grain threads which either migrate to make data local or fetch cache lines, tolerating latency with multithreading. We compare these alternatives using various array access patterns.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132543386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient static assignment parallelization scheme for algebraic fractals","authors":"F. C. MacPhee, V. Bhavsar","doi":"10.1109/HPCSA.2002.1019160","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019160","url":null,"abstract":"Although parallelization of algebraic fractal computations has been done in the past, the issue of efficient parallel computation has not been fully addressed in the literature. The objective of this paper is to examine the computational characteristics of algebraic fractal computations and determine an efficient scheme for parallel computation.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122694790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Numerical methods for molecular photonics - the nonlinear nonperturbative response of molecules in intense laser fields","authors":"A. Bandrauk, Hon Chem, Le Conseil","doi":"10.1109/HPCSA.2002.1019174","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019174","url":null,"abstract":"Different approaches are presented to simulate the nonlinear nonperturbative response of molecules to short intense laser fields. Different time scales, electronic vs. nuclear and transitions from bound (discrete spectra) to continuum states are inherent problems in current simulations. An overview of these problems encountered and estimates of computer requirements are made for this new branch of science and high technology: molecular photonics.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114987679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The IBiCGStab method on bulk synchronous parallel architectures","authors":"L. Yang, R. E. Shaw","doi":"10.1109/HPCSA.2002.1019147","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019147","url":null,"abstract":"In this paper, an improved version of the BiCGStab method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed. The method combines elements of numerical stability and parallel algorithm design without increasing the computational costs. The algorithm is derived such that all inner products of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication can be significantly reduced. In this paper, the bulk synchronous parallel (BSP) model is used to design a fully efficient, scalable and portable parallel proposed algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. This performance model provides us useful insight in the time complexity of the method using only a few system dependent parameters based on a simple and accurate cost modelling. The theoretical performance prediction are compared with some preliminary measured timing results of a numerical application from ocean flow simulation.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130671699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compiler-controlled parallelism-independent scheduling method for cluster computing systems","authors":"K. Nikolova, M. Sowa","doi":"10.1109/HPCSA.2002.1019153","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019153","url":null,"abstract":"We propose a hybrid parallelism-independent scheduling method, predominantly performed at compile time, which generates a machine code efficiently executable on any number of workstations or PCs in a cluster computing environment. Our scheduling algorithm called the dynamical level parallelism-independent scheduling algorithm (DLPIS) is applicable for distributed computer systems because additionally to the task scheduling, we perform message communication scheduling. It provides an explicit task synchronization mechanism guiding the task allocation and data dependency solution at run time at reduced overhead. Furthermore, we provide a mechanism allowing the self-adaptation of the machine code to the degree of parallelism of the system at run-time. Therefore our scheduling method supports the variable number of processors in the users' computing systems and the adaptive parallelism, which may occur in distributed computing systems due to computer or link failure.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129698283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel sorting on heterogeneous platforms","authors":"G. Mateescu","doi":"10.1109/HPCSA.2002.1019143","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019143","url":null,"abstract":"We present a method for load balancing parallel sorting on heterogeneous networks of workstations and clusters. Load balancing is achieved by exploiting information about the available throughput of the processors. First, the problem is partitioned into subproblems such that the times taken by the processors to solve the subproblems are balanced. Determining the partition involves solving a nonlinear system for finding the subproblem sizes. Second, the data are sorted by each process and are merged by choosing a processor topology which minimizes the critical path.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"46 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114129246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asynchronism for iterative algorithms in a global computing environment","authors":"J. Bahi, S. Contassot-Vivier, R. Couturier","doi":"10.1109/HPCSA.2002.1019139","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019139","url":null,"abstract":"The subject of this paper is to show the very high power of asynchronism for iterative algorithms in the context of global computing, that is to say, with machines scattered all around the world. The question is whether or not asynchronism helps to reduce the communication penalty and the overall computation time of a given parallel algorithm. The asynchronous programming model is applied to a given problem implemented with a multi-threaded environment and tested over two kinds of clusters of workstations; a homogeneous local cluster and a heterogeneous non-local one. The main features of this programming model are exhibited and the high efficiency and interest of such algorithms is pointed out.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123790693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}