{"title":"An optimal parallel algorithm for the Euclidean distance maps of binary images","authors":"A. Fujiwara, T. Masuzawa, H. Fujiwara","doi":"10.1109/ICAPP.1995.472293","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472293","url":null,"abstract":"The Euclidean distance map (EDM) of a black and white n/spl times/n binary image is the n/spl times/n map where each element has the Euclidean distance between the corresponding pixel and the nearest black pixel. The EDM plays an important role in machine vision, pattern recognition and robotics. Many algorithms have been proposed for computing the EDM. In recent years, O(n/sup 2/) time sequential algorithms were presented for computing the EDM. Hirata and Kato (1994) showed that their algorithm can be parallelized to run in O(n/sup 2//p) time using p processors (1/spl les/p/spl les/n) on the EREW PRAM. We present a parallel algorithm for computing the EDM. The algorithm runs in O(log n) time using n/sup 2//log n processors on the EREW PRAM and in O(log n/log log n) time using n/sup 2/ log log n/log n processors on the common CRCW PRAM, respectively. The algorithm is optimal in the sense that the product of the time and the number of processors is equal to the lower bound of the sequential time for computing the EDM.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127804549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance implications of virtualisation of massively parallel algorithm implementation","authors":"C. A. Farrell, D. Kieronska, M. Korda","doi":"10.1109/ICAPP.1995.472275","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472275","url":null,"abstract":"In this paper we investigate the accuracy of performance prediction for virtualised implementations of parallel algorithms on massively parallel SIMD architectures. The main contributions of this paper are the adaption and practical evaluation of the best known algorithms for merging and sorting.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114596763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HPSF: a horizontally-divided parallel signature file method","authors":"Jeong-Ki Kim, Jae-Woo Chang","doi":"10.1109/ICAPP.1995.472242","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472242","url":null,"abstract":"In order to achieve good performance, the signature file approach has been required to support parallel database processing. Therefore, in this paper we propose a horizontally-divided parallel signature file method (HPSF) using extendible hashing and frame-slicing techniques. In addition, we propose a heuristic processor allocation methods so that we may assign signatures into a given number of processors in a uniform way. To show the efficiency of HPSF, we evaluate the performance of HPSF in terms of retrieval time, storage overhead, and insertion time. Finally, we show from the performance results that HPSF outperforms the conventional parallel signature file methods on retrieval performance as well as insertion time.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115092047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Architectural characteristics and hardware cost of a class of interconnection networks","authors":"M. Hamdi","doi":"10.1109/ICAPP.1995.472177","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472177","url":null,"abstract":"A new class of interconnection networks is proposed for interconnecting the processors of a general purpose parallel computer which is based on the hierarchical application of a complete graph compound. The systematic construction of this new class of interconnection networks, RCC, is shown and its properties are derived and are compared favorably to other interconnection networks. A specific instance of this class, RCC-CUBE, is shown to have desirable network properties such as small diameter, small degree, high density, and high bandwidth. The hardware cost and physical time performance are estimated for RCC-CUBE and compared to those of the hypercube and the 2-D mesh demonstrating an overall cost-effectiveness for RCC-CUBE. Thus, the RCC-CUBE appears to be a good candidate for next generation massively parallel computer systems.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116924657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scheduling of precedence constrained tasks on multiprocessor systems","authors":"Chih-Ming Yen, S. Tseng, Chao-Tung Yang","doi":"10.1109/ICAPP.1995.472208","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472208","url":null,"abstract":"The problem of scheduling a set of precedence constraint tasks onto a finite number of identical processors with and without communication overhead is studied. The objective is to minimize the makespan. In this paper, we are concerned with a priority-list scheduling method. A new policy for ranking the priority of each task is proposed. Under this priority policy, two heuristic algorithms are proposed to solve task scheduling problems with and without communication overheads. Experiments show that our algorithm for solving the problem without communication overhead improves previous result by about 20%; for problems with communication overhead the improvement is about 70% over previous work.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116396379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An improvement to dynamic stream handling in dataflow computers","authors":"V. Lakshmi, C. Arnold","doi":"10.1109/ICAPP.1995.472306","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472306","url":null,"abstract":"This paper presents a new method of implementing dynamic streams of streams using token relabelling which reduces the complexity and drawbacks of the previously proposed method due to Gaudiot. Consider a sequence of tokens, Vi/sub [ui]/, which will appear in sequence on the stream-carrying arc. Two tokens Va/sub [ux]/ and Vb/sub [uy]/, will be considered belonging to the same stream if they have the same context: [ux]=[uy]. Elements within a stream are ordered according to the sequence in time that they appear on the arc. Let the highest level of streams has the context [uO], that of the surrounding block. Thus the highest level stream is the sequence of values Vi/sub [uO]/. Each element of this stream has as its value a unique context, namely, that of the stream that it represents. So the token Vi/sub [uO]/ identifies as a stream the sequence of tokens whose context is [Vi].<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129003729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reduced reachability graphs with parallel actions and dynamic replacement","authors":"H. Mountassir","doi":"10.1109/ICAPP.1995.472231","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472231","url":null,"abstract":"Analysis of communication protocols by the conventional state exploration is a well known technique. It is actually implemented in several tools of validation. The major problem of this technique is its restricted applicability and depends on the available memory. The number of reachable states is often large and sometimes infinite. In this paper we discuss a reduction technique to build small graphs as possible which preserve same properties. At this end vectors of executable actions are proposed to eliminate redundancy of sequences and intermediate states. The depth-first and the breadth-first algorithms based on the concept of dynamic replacement are used in the order to reduce the final graphs. Two major questions are discussed: the finiteness of the graphs and the verification of the communication properties.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130887462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithmic aspects and computing trends in computational electromagnetics using massively parallel architectures","authors":"C. Rowell, V. Shankar, W. Hall, A. Mohammadian","doi":"10.1109/ICAPP.1995.472266","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472266","url":null,"abstract":"Accurate and rapid evaluation of radar signature for alternative aircraft/store configurations would be of substantial benefit in the evolution of integrated designs that meet radar cross section requirements across the threat spectrum. Finite-volume time domain methods offer the possibility of modeling the whole aircraft, including penetrable regions and stores, at longer wavelengths on today's supercomputers and at typical airborne radar wavelengths on the massively parallel teraflop computers of tomorrow. To realize this potential, practical means are being developed for the rapid generation of grids on and around the aircraft, and numerical algorithms that maintain high order accuracy on such grids are being constructed. A structured grid and an unstructured grid based finite-volume, time-domain Maxwell's equation solver has been developed incorporating modeling techniques for general radar absorbing materials. Using this work as a base, the goal of the computational electromagnetics effort is to define, implement, and evaluate rapid prototype signature prediction, addressing many issues related to (1) physics of electromagnetics, (2) efficient and higher-order accurate algorithms, (3) boundary condition procedures, (4) geometry and gridding (structured and unstructured), (5) computer architecture, and (6) validation.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125349277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization in a hierarchical distributed performance monitoring system","authors":"Ling Shi, O. de Vel, Jiannong Cao, M. Cosnard","doi":"10.1109/ICAPP.1995.472238","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472238","url":null,"abstract":"Monitoring program execution in a distributed system can generate large quantities of data, and the collection and processing of the monitoring data is one of the primary factors that contribute to the complexity of distributed monitoring. In order to reduce such complexity, a hierarchical distributed performance monitoring system has been developed. In this paper we describe an optimization method to improve the efficiency of the monitoring system. By considering the topology used by the application program and the distribution of monitoring records, an optimized grouping can be determined to obtain an improved performance for the monitoring system. The experiments presented in this paper have demonstrated such an improvement in performance.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115215497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Suzaki, T. Kurita, H. Tanuma, S. Hirano, Y. Ichisugi
{"title":"Two dynamic performance tuning methods for portable parallel programs","authors":"K. Suzaki, T. Kurita, H. Tanuma, S. Hirano, Y. Ichisugi","doi":"10.1109/ICAPP.1995.472244","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472244","url":null,"abstract":"We present two dynamic performance tuning methods for portable parallel programs on various parallel computers. In parallel programs the affinity between parallel algorithms and the architecture of the target parallel computer is very important. In this paper we focus on the parallelism in view of the number of micro-tasks which are processing units in parallel programs. The presented methods estimate the optimal number of micro-tasks before the parallel processing is invoked. Furthermore, they shorten the execution time of the parallel program so that it is close to the optimal execution time. The estimation is based on the result of pre-executions of the program for different sizes of the data to be processed on a target parallel computer. One tuning method uses nearest-neighbor interpolation and the other uses spline interpolation for the estimation. We tested these tuning methods using a parallel square-matrix multiplication program written in Dataparallel C on three different parallel computers; a Paragon, an iPSC/2, and an nCUBE/2. In these experiments, the method using nearest-neighbor interpolation brought the execution time closer to the optimum than did the method using spline interpolation. The nearest-neighbor interpolation method yielded average execution times, which are given in terms of the optimal execution time, of 1.01 for the Paragon, 1.005 for the iPSC/2, and 1.052 for the nCUBE/2.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"33 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120999263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}