{"title":"The design and analysis of parallel algorithms","authors":"C. Rodríguez","doi":"10.1109/EMPDP.2002.994219","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994219","url":null,"abstract":"omputing Models provide frames for the analysis and design of algorithms. Unfortunately, the balance required between simplicity and realism makes it difficult to guarantee the necessary accuracy for the whole range of algorithms and machines. Simplicity implies a minimal number of architecture parameters (usually including computational power, bandwidth and latency). Accuracy implies just the opposite. The short history of Parallel Computing has seen the arrival (and the departure) of many proposals. Undoubtedly, the best known among those is the Parallel Random Access Machine (PRAM), the Postal/LogP Model and the Bulk Synchronous Parallel Model (BSP). From these three, the oldest one, the PRAM model, has been discarded as unrealistic. The other two, LogP and BSP, remain but do not escape of those aforementioned conflicts. Each model enforces/matches a different parallel programming style. To make the situation worse, none of these two styles agrees completely with the currently dominant style in parallel and distributed programming: MPI message passing. The talk will make emphasis on BSP, its weakness and strengths. As developing examples, we will use two programming paradigms: nested data parallelism and pipelining. C","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131772524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting the multilevel parallelism and the problem structure in the numerical solution of stiff ODEs","authors":"J. M. Mantas, J. Ortega, J. Carrillo","doi":"10.1109/EMPDP.2002.994262","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994262","url":null,"abstract":"A component-based methodology to derive parallel stiff ordinary differential equation (ODE) solvers for multicomputers is presented. The methodology allows the exploitation of the multilevel parallelism of this kind of numerical algorithm and the particular structure of ODE systems by using parallel linear algebra modules. The approach promotes the reusability of design specifications and clear structuring of the derivation process. Two types of components are defined to enable the separate treatment of different aspects during the derivation of a parallel stiff ODE solver. The approach has been applied to the implementation of an advanced numerical stiff ODE solver on a PC cluster. Following the approach, the parallel numerical scheme has been optimized and adapted to the solution of two modelling problems which involve stiff ODE systems with dense and narrow banded structures respectively. Numerical experiments have been performed to compare the solver with the state-of-the-art sequential stiff ODE solver. The results show that the parallel solver performs especially well with dense ODE systems and reasonably well with narrow banded systems.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124052278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Jiménez-González, J. Navarro, J. Larriba-Pey
{"title":"The effect of local sort on parallel sorting algorithms","authors":"Daniel Jiménez-González, J. Navarro, J. Larriba-Pey","doi":"10.1109/EMPDP.2002.994310","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994310","url":null,"abstract":"We show the importance of sequential sorting in the context of in-memory parallel sorting of large data sets of 64-bit keys. First, we analyze several sequential strategies, like Straight Insertion, Quick sort, Radix sort and Cache-Conscious Radix sort (CC-Radix sort). As a consequence of the analysis, we propose a new algorithm that we call the Sequential Counting Split Radix sort (SCS-Radix sort). This is a combination of some of the algorithms analyzed and other new ideas. There are three important contributions in SCS-Radix sort: first, the work saved by detecting data skew dynamically; second, the exploitation of the memory hierarchy done by the algorithm; and third, the execution time stability of SCS-Radix when sorting data sets with different characteristics. We evaluate the use of SCS-Radix sort in the context of a parallel sorting algorithm on an SGI Origin 2000. The parallel algorithm is 1.2 to 45 times faster using the SCS-Radix sort than using the Radix sort or Quick sort.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"509 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115890074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Acacio, José González, José M. García, J. Duato
{"title":"Reducing the latency of L2 misses in shared-memory multiprocessors through on-chip directory integration","authors":"M. Acacio, José González, José M. García, J. Duato","doi":"10.1109/EMPDP.2002.994312","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994312","url":null,"abstract":"Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller and the network interface. In this paper, we exploit such an integration scale, presenting a new three-level directory architecture aimed at reducing the long L2 miss latencies and the memory overhead that characterize cc-NUMA machines and limit their scalability. The proposed architecture is based on the integration into the processor chip of the directory controller and a small first-level directory cache that stores precise information for the most recently referenced memory lines, as the means to reduce miss latencies. The second- and third-level directories are located near the main memory and they are only accessed when a directory entry for a certain memory line is not present in the first-level directory. This off-chip structure achieves the performance of a large and non-scalable full-map directory with a very significant reduction in the memory overhead. Using execution-driven simulations, we show that substantial latency reductions can be obtained by using the proposed directory architecture. Load, store and read-modify-write misses are significantly accelerated (latency reductions of more than 35% in some cases). These reductions translate into important improvements on the final application performance (reductions up to 20% in execution time).","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115772045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SICOSYS: an integrated framework for studying interconnection network performance in multiprocessor systems","authors":"Valentin Puente, J. Gregorio, R. Beivide","doi":"10.1109/EMPDP.2002.994207","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994207","url":null,"abstract":"An environment has been developed which is capable of determining the impact that a multiprocessor interconnection subsystem causes on real application execution time. A general-purpose interconnection network simulator, called SICOSYS, able to capture essential aspects of the low-level implementation, has been integrated into two execution driven simulators for multiprocessors: RSIM and SimOS. The enhancement of both tools allows the analysis of new proposals for the interconnection subsystem of a cc-NUMA machine, from the VLSI level up to the real application level. Any new proposal can be translated to a specific message router architecture and by using a low-level implementation tool, the parameter delays of a detailed router model to be used by SICOSYS can be obtained.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122019503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient implementation of reduce-scatter in MPI","authors":"M. Bernaschi, G. Iannello, Mario Lauria","doi":"10.1109/EMPDP.2002.994296","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994296","url":null,"abstract":"We discuss the efficient implementation of the MPI collective operation called reduce-scatter. We describe the implementation issues and the performance characterization of two algorithms for the reduce-scatter that have been proven to be highly efficient in theory under the assumption of fully connected parallel system. A performance comparison with existing mainstream implementations of the operation is presented which confirms the practical advantage of the new algorithms. Experiments show that the two algorithms have different characteristics which make them complementary in providing a performance gain over standard algorithms.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128258576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Balázs Goldschmidt, Z. László, M. Döller, H. Kosch
{"title":"Mobile agents in a distributed heterogeneous database system","authors":"Balázs Goldschmidt, Z. László, M. Döller, H. Kosch","doi":"10.1109/EMPDP.2002.994247","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994247","url":null,"abstract":"The purpose of this paper is to present a new infrastructure for multimedia database searches based on CORBA and mobile agent technology. A new mobile agent system, called Vagabond, was implemented in pure Java using only standard CORBA facilities. The fundamental agent design and architecture is introduced. Measurements demonstrated the merits of Vagabond, namely the simple design, the implicit heterogeneity inherited from CORBA, and its speed. The system (renamed as M/sup 3/) was implanted inside an Oracle8i database system which is able to run Java code as a stored procedure. Further measurements have justified the idea presented above, ie. sending agents directly inside the database can decrease the response time of multimedia content search and retrieval. However, the required modifications made the embedded agency accessible for clients using only Aurora, Oracle's modified Visi-broker ORB. On the basis of the Proxy design pattern, the paper presents a proxy solution that encapsulates the specific protocol issues that restricted interoperability, and thus provides the user of the infrastructure with the benefits of a truly heterogeneous environment.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132636313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Hadjidoukas, E. D. Polychronopoulos, T. Papatheodorou
{"title":"Integrating MPI and nanothreads programming model","authors":"P. Hadjidoukas, E. D. Polychronopoulos, T. Papatheodorou","doi":"10.1109/EMPDP.2002.994297","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994297","url":null,"abstract":"This paper presents a prototype runtime system that integrates MPI, used on distributed memory systems, and nanothreads programming model (NPM), a programming model for shared memory multiprocessors. This integration does not alter the independence of the two models, since the runtime system is based on a multilevel design that supports each of them individually but offers the capability of combining their advantages. Existing MPI codes can be executed without any changes, codes for shared memory machines can be used directly, while the concurrent use of both models is easy. Major feature of the runtime system is portability, as it is based exclusively on calls to MPI and Nthlib, a user-level threads library that has been ported to several operating systems. The runtime system supports the hybrid-programming model (MPI+OpenMP), providing also a solution for better load balancing in MPI applications. Moreover, it extends the AN and the multiprogramming functionality of the NPM on clusters of multiprocessors and can support an extension of the OpenMP standard on distributed memory multiprocessors.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132847234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Santana, Javier Miranda, J. M. Santos, Ernestina Martel, Luis Hernández, E. Pulido
{"title":"Programming distributed systems with Group_IO","authors":"F. Santana, Javier Miranda, J. M. Santos, Ernestina Martel, Luis Hernández, E. Pulido","doi":"10.1109/EMPDP.2002.994266","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994266","url":null,"abstract":"This paper describes Group_IO, a software library written in Ada which facilitates the construction of distributed applications by means of the group paradigm, an abstraction which considers a set of processes as an individual entity. Group_IO provides support for replicated as well as cooperative groups. Group_IO offers a straightforward interface to reliable, atomic, causal, and uniform multicast services, and it allows client-server interactions where the client may be a process group. It relies on an own consensus protocol to implement the uniform broadcast protocols. Group_IO provides support for the client/server group (1-to-M) communication, client group/server (N-to-1) and client group/group server (N-to-M) communication. Group_IO is the basis on which the programming language Drago has been implemented.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127285743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An observation mechanism of distributed objects in Java","authors":"Amer Bouchi, R. Olejnik, B. Toursel","doi":"10.1109/EMPDP.2002.994246","DOIUrl":"https://doi.org/10.1109/EMPDP.2002.994246","url":null,"abstract":"We present an observation mechanism of distributed objects in the context of irregular applications developed in distributed Java. This mechanism predicts the tendencies of the communication between these objects. To ensure a good effectiveness of the execution, the obtained predictions are integrated into a distribution mechanism for the objects of the application.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127650400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}