{"title":"Restructuring the flow of image and video processing programs to increase instruction level parallelism","authors":"M. Maresca, N. Zingirian","doi":"10.1109/EMPDP.2001.905066","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905066","url":null,"abstract":"This paper addresses the problem of preparing efficient implementations of Image Processing (IP) tasks for Instruction Level Parallel (ILP, i.e., superscalar and pipelined) architectures. First it shows an accurate analysis of ILP architectures and IP task structures. This analysis allows identifying specific sources of inefficiency that affect typical implementations of IP programs for ILP architectures. Then, it introduces a novel processing model, named Bucket Processing (BP), aimed at reducing the inefficiencies of IP programs characterized by the presence of nested loops, typical of image processing, and by the presence of conditional statements in the innermost loop bodies. Finally, it describes how BP restructures the program flow in such a way to deliver significant speed up in programs running on real ILP platforms.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124823038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javier Miranda, F. Santana, A. Alvarez, S. Arévalo
{"title":"Programming cooperative systems in Drago","authors":"Javier Miranda, F. Santana, A. Alvarez, S. Arévalo","doi":"10.1109/EMPDP.2001.905045","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905045","url":null,"abstract":"Drago is an experimental Ada extension designed to facilitate the implementation of fault-tolerant and cooperative distributed applications. It is the result of an effort to impose discipline and give linguistic support to the main concepts of the group communication paradigm. In this paper we focus our attention on the Drago linguistic support for the implementation of distributed cooperative applications. We introduce Drago and give some simple examples of its use.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125061211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coarse reconfigurable multimedia unit extension","authors":"Stephan Wong, S. Cotofana, S. Vassiliadis","doi":"10.1109/EMPDP.2001.905048","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905048","url":null,"abstract":"In this paper we introduce a coarse reconfigurable multimedia functional unit (rMFU) extension to a superscalar general-purpose processor (GPP) and a set of specialized multimedia instructions to extend the GPPs instruction set. Two multimedia operations, the DCT operation and the Huffman encoding operation, were chosen to assess the expected performance of our proposal. The performance of the extended processor including the rMFU was evaluated by utilizing modified versions of the ijpeg and mpeg2enc benchmarks and a cycle accurate simulator. Our experiments suggest that the usage of the rMFU in an out-of-order superscalar processor (without increasing the cycle time) is able to decrease the total number of execution cycles by a value between 12.40% and 23.72% when compared to the same processor without such an unit. Moreover, the number of executed instructions are reduced by between 13.67% and 23.61% and the executed branches by between 9.83% and 15.98%.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128875098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MPI collective communication operations on large shared memory systems","authors":"M. Bernaschi, G. Richelli","doi":"10.1109/EMPDP.2001.905038","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905038","url":null,"abstract":"Collective communication performance is critical in a number of MPI applications yet relatively few results are available to assess the performance of MPI implementations specially for shared memory multiprocessors. In this paper we focus on the most widely used primitive, broadcast, and present experimental results for the Sun Enterprise 10000. We compare the performance of the Sun MPI primitives with our implementation based on a quasi-optimal algorithm. Our tests highlight advantages and drawbacks of vendors' implementations of collective communication primitives and suggest that the choice of the best algorithm may depend on exogenous factors like load balancing among tasks.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121407180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The SDAARC architecture","authors":"R. Moore, B. Klauer, K. Waldschmidt","doi":"10.1109/EMPDP.2001.905071","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905071","url":null,"abstract":"While traditional parallel computing systems are still struggling to gain a wider acceptance, the largest parallel computer that has ever been available is currently growing with the communication resource Internet. Unfortunately it is also rarely used in the parallel computation field. The reason for the rejection of parallel computers is mainly the difficulty of parallel programming. In this paper we propose the Self Distributing Associative ARChitecture (SDAARC). It has been derived from the Cache Only Memory Architecture (COMA). COMAs provide a distributed shared memory (DSM) with automatic distribution of data. We show how this paradigm of data distribution can be extended to the automatic distribution of instruction sequences (microthreads). We show how microthreads can be extracted from legacy C code to produce code that can automatically be parallelized by SDAARC at run time. We also discuss how SDAARC can be implemented on a rightly coupled multiprocessor systems on heterogenous LAN based computer networks (Intranet) and on WANs of computing resources.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126223014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olivier Beaumont, Vincent Boudet, Arnaud Legrand, F. Rastello, Y. Robert
{"title":"Heterogeneous matrix-matrix multiplication or partitioning a square into rectangles: NP-completeness and approximation algorithms","authors":"Olivier Beaumont, Vincent Boudet, Arnaud Legrand, F. Rastello, Y. Robert","doi":"10.1109/EMPDP.2001.905056","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905056","url":null,"abstract":"In this paper, we deal with two geometric problems arising from heterogeneous parallel computing: how to partition the unit square into p rectangles of given area s/sub 1/, s/sub 2/, ..., s/sub p/ (such that /spl Sigma//sub i=1//sup p/ s/sub i/=1), so as to minimize (i) either the sum of the p perimeters of the rectangles (ii) or the largest perimeter of the p rectangles. For both problems, we prove NP-completeness and we introduce approximation algorithms.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114296736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel simulated annealing for the delivery problem","authors":"Z. Czech","doi":"10.1109/EMPDP.2001.905046","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905046","url":null,"abstract":"A delivery problem which reduces to an NP-complete set-partitioning problem is considered. Two algorithms of parallel simulated annealing, i.e. the simultaneous independent searches and the simultaneous periodically interacting searches are investigated. The objective is to improve the accuracy of solutions to the problem by applying parallelism. The accuracy of a solution is meant as its proximity to the optimum solution. The empirical evidence supported by the statistical analysis indicates that the interaction of processes in parallel simulated annealing can yield more accurate solutions to the delivery problem as compared to the case when the processes run independently.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114411193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"COOPE: a tool for representing concurrent object-oriented program execution through visualisation","authors":"Hugo Leroux, C. Exton","doi":"10.1109/EMPDP.2001.905016","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905016","url":null,"abstract":"There has been a move to introduce concurrency and object-orientation in the undergraduate curriculum. However, both bring forth challenging new concepts to the students. Despite these challenges, the benefits gained from learning concurrent object-oriented programming are numerous. Visualisation holds great promise in expediting comprehension of such complex issues. The aim of this paper is to discuss the potential of our visualisation tool, COOPE, to assist the students in comprehending the complexities of concurrent object-oriented programs. We thus present some broad requirements of a visualisation tool and discuss the design and implementation of COOPE.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128708534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the relative behavior of source and distributed routing in NOWs using Up*/Down* routing schemes","authors":"J. Sancho, A. Robles, J. Duato","doi":"10.1109/EMPDP.2001.904962","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.904962","url":null,"abstract":"Networks of workstations (NOWs) are arranged as a switch-based network with irregular topology, which makes routing and deadlock avoidance quite complicated. Current proposals use the up*/down* routing algorithm to remove cyclic dependencies between channels and avoid deadlock. Recently, a simple and effective methodology to compute up*/down* routing tables has been proposed by us. The resulting up*/down* routing scheme increases the number of alternative paths between every pair of switches and allows most messages to follow minimal paths. Also, up*/down* routing is suitable to be implemented using source or distributed routing. Source routing provides a safer and lower cost implementation of up*/down* routing than that provided by distributed routing. However distributed routing may benefit from routing messages through alternative paths to reach their destination. In this paper we evaluate the performance of up*/down* routing when using two methodologies to compute routing tables, and when both source and distributed routing are used. Evaluation results show that it is not worth to implement up*/down* routing in a distributed way in a NOW environment, since its performance is very close to that achieved by implementing it with source routing when a traffic-balancing algorithm is used. Moreover it is shown that a greater improvement in performance can be achieved by modifying the method to compute up*/down* routing tables when source routing is used.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117204836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. D. Florio, Geert Deconinck, R. Lauwereins, S. Graeber
{"title":"Design and implementation of a data stabilizing software tool","authors":"V. D. Florio, Geert Deconinck, R. Lauwereins, S. Graeber","doi":"10.1109/EMPDP.2001.905009","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905009","url":null,"abstract":"We describe a software tool which implements a software system for stabilizing data values, capable of tolerating both permanent faults in memory and transient faults affecting computation, input and memory devices by means of a strategy coupling temporal and spatial redundancy. The tool maximizes data integrity allowing a new value to enter the system only after a user-parameterizable stabilization procedure has been successfully passed. Designed and developed in the framework of the ESPRIT project TIRAN, the tool can be used stand-alone but can also be coupled with other dependable mechanisms developed within that project. Its use is being currently investigated within ENEL, the main Italian electricity supplier in order to replace a hardware stable storage device adopted in their high-voltage sub-stations.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128207851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}