{"title":"Parallel algorithms for extracting ridges and ravines","authors":"R. Huang, T. Kunii","doi":"10.1109/AISPAS.1995.401362","DOIUrl":"https://doi.org/10.1109/AISPAS.1995.401362","url":null,"abstract":"This paper proposes two parallel algorithms called an even region parallel algorithm (ERPA) and an even strip parallel algorithm (ESPA) respectively for extracting ridge and ravine geometric features of a surface. The parallel programs were implemented on a GCcl-1/64 T805 transputer based parallel machine with maximum 64 transputers. The performance of these two algorithms are reported and analyzed in respect of a load balance problem and communication overheads. The efficiency and speed-up versus the number of transputers used and the problem size chosen are shown and discussed.<<ETX>>","PeriodicalId":321580,"journal":{"name":"Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122080774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Program transformations and skeletons: formal derivation of parallel programs","authors":"A. Geerling","doi":"10.1109/AISPAS.1995.401332","DOIUrl":"https://doi.org/10.1109/AISPAS.1995.401332","url":null,"abstract":"The paper describes-from a software engineering perspective-a framework for the formal development of parallel algorithms on arbitrary architectures. The algorithms are synthesised in a transformational way, i.e. by applying correctness preserving rewrite rules to a formal specification. The architectures are modelled by skeletons-higher order functions that represent elementary computations on a certain architecture. It is shown that the combination of transformational programming and skeletons stimulates the reuse of program derivations. Furthermore, interskeleton transformations will provide the means for architecture independent program development.<<ETX>>","PeriodicalId":321580,"journal":{"name":"Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123906259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the effect of spare positioning on the reconfigurability of two-dimensional processor arrays","authors":"V. Obac Roda, T. Lin","doi":"10.1109/AISPAS.1995.401343","DOIUrl":"https://doi.org/10.1109/AISPAS.1995.401343","url":null,"abstract":"We investigated some reconfiguration and routing aspects of fault tolerant processing arrays. An interconnection topology with disjoint buses for the horizontal and vertical connections, called \"double bus array\", was adopted. Reconfiguration of the array after diagnosis encompasses the allocation of spare units to replace the faulty processors, renaming of the processor elements and interconnecting (routing) data through the operating processors according to the initial specified operation. We fully simulated reconfiguration and routing for arrays of size N, from 5 to 25 processors and faults from 1 to 2N+1. Faults were generated randomly to simulate defects on a wafer. We present the results of the simulations and discuss the possible reasons for reliability improvements.<<ETX>>","PeriodicalId":321580,"journal":{"name":"Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116057455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvement of duplication scheduling heuristic algorithm with nonstrict triggering of program graph nodes","authors":"B. Benko, M. Ojsteršek, V. Zumer","doi":"10.1109/AISPAS.1995.401321","DOIUrl":"https://doi.org/10.1109/AISPAS.1995.401321","url":null,"abstract":"The problem of multiprocessor scheduling can be stated as finding a schedule for a general task graph to be executed on a multiprocessor system so that the schedule length can be minimised. This scheduling problem is known to be NP-hard, and heuristic algorithms have been proposed to obtain optimal and suboptimal solutions. Duplication scheduling heuristic algorithm solves the max-min problem of parallel processor scheduling by duplicating selected scheduled tasks on some PEs. The max-min problem is caused by the trade-off between maximum parallelism versus minimum communication delay. This paper introduces an extension of the near optimal scheduling heuristic, based on a duplication scheduling heuristic. We have focused our research efforts to three main extensions of the original heuristic.<<ETX>>","PeriodicalId":321580,"journal":{"name":"Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115512750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable performance analysis tool for PowerPC based MPP systems","authors":"O. Hansen, J. Krammer","doi":"10.1109/AISPAS.1995.401352","DOIUrl":"https://doi.org/10.1109/AISPAS.1995.401352","url":null,"abstract":"This paper introduces a tool for optimizing programs on massively parallel computing systems. The tool has been implemented for a PowerPC based parallel computing platform. It is scalable with respect to its implementation and an the way it presents performance data. A major feature contributing to the scalable representation of performance data is the ability to focus measurements on points of interest in the program execution by specifying behavioral attributes. Behavioral attributes are given as thresholds to the results of other measurements. Thus a direct link between results of different measurements can be made which enables the user to link global system behavior to the execution of individual program parts.<<ETX>>","PeriodicalId":321580,"journal":{"name":"Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114265673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallelizing a PDE solver: experiences with PISCES-MP","authors":"B. Herndon, A. Raefsky, R. Dutton","doi":"10.1109/AISPAS.1995.401327","DOIUrl":"https://doi.org/10.1109/AISPAS.1995.401327","url":null,"abstract":"The paper presents a methodology for adapting dusty deck PDE solvers for parallel execution. Our approach minimizes changes to existing code and data structures, thereby preserving the value captured within dusty decks. This scheme uses the single program multiple data programming paradigm on message passing distributed memory architectures. To demonstrate the viability of our methodology the commercially available, dusty deck semiconductor device modeling program, PISCES, has been adapted for parallel execution. Simulating realistic complex device structures, we have achieved excellent performance gains over high performance serial workstations. Also, the scalability of the parallel simulator allows the simulation of structures too large for our existing serial computers.<<ETX>>","PeriodicalId":321580,"journal":{"name":"Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115560750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hamiltonicity, vertex symmetry, and broadcasting of uni-directional hypercubes","authors":"S. Chern, Tai-Ching Tuan, J. Jwo","doi":"10.1109/AISPAS.1995.401339","DOIUrl":"https://doi.org/10.1109/AISPAS.1995.401339","url":null,"abstract":"We show that the two uni-directional n-cubes, namely UHC1/sub n/ and UHC2/sub n/ proposed by Chou and Du (1990) as interconnection schemes are Hamiltonian. In addition, we show that (1) if n is even, both architectures are vertex symmetric; and (2) if n is odd, both architectures have exactly two vertex-symmetric components. By studying symmetry, we further prove that the maximum delay of one-port one-to-all broadcasting for either architecture is at most 1.5n.<<ETX>>","PeriodicalId":321580,"journal":{"name":"Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131694331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel polygon rendering on the graphics computer VC-1","authors":"T. Kunii, S. Nishimura","doi":"10.1109/AISPAS.1995.401361","DOIUrl":"https://doi.org/10.1109/AISPAS.1995.401361","url":null,"abstract":"This paper describes a parallel polygon rendering method on the graphics computer VC-1. The architecture of the VC-1 is a loosely-coupled array of general-purpose processors, each of which is equipped with a local frame buffer. The contents of the local frame buffers are merged into one in real time considering the visibility control based on screen depth. In our polygon rendering method, polygons are distributed among the processors and each processor independently computes the image of the assigned polygons using the Z-buffer method. To achieve load balancing, a technique called adaptive parallel rasterization is developed. The adaptive parallel rasterization automatically selects the appropriate parallelizing approach according to the estimated size of polygons displayed on the screen. The measured rendering performance of VC-1 using this polygon rendering method is shown.<<ETX>>","PeriodicalId":321580,"journal":{"name":"Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130754650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Shieh, An-Chow Lai, Jyh-Chang Ueng, Tyng-Yeu Liang, Tzu-Chiang Chang, Su-Cheong Mac
{"title":"Cohesion: an efficient distributed shared memory system supporting multiple memory consistency models","authors":"C. Shieh, An-Chow Lai, Jyh-Chang Ueng, Tyng-Yeu Liang, Tzu-Chiang Chang, Su-Cheong Mac","doi":"10.1109/AISPAS.1995.401322","DOIUrl":"https://doi.org/10.1109/AISPAS.1995.401322","url":null,"abstract":"This paper describes a prototype of DSM called Cohesion which supports two memory consistency models, namely Sequential consistency and Release consistency, within a single program to improve the performance and supports wide-variety of parallel programs for the system. Memory that is sequentially consistent is further divided into object-based and conventional (page-based) memory; where they are constructed in user-level and kernel-level, respectively. In object-based memory, the shared data are kept consistent at the granularity of an object; it is provided to improve the performance of the fine-grained parallel applications that may incur a significant overhead in conventional or release memory, as well as to eliminate unnecessary movement of the pages which are protected in a critical section. On the other hand, the Release consistency model is supported in Cohesion to attack the problem of excessive network traffic and false sharing. Cohesion programs are written in C++, and the annotation of shared objects for release and object-based memory is accomplished by inheriting a system-provided base class. Finally, three application programs including Matrix Multiplication, SOR, and Nbody have been employed to evaluate the efficiency of Cohesion. In addition, a Producer-Consumer program is tested to show that the object-based memory will benefit us in a critical section.<<ETX>>","PeriodicalId":321580,"journal":{"name":"Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis","volume":"513 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123433492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault tolerant routing in toroidal networks","authors":"Q. Gu, S. Peng","doi":"10.1109/AISPAS.1995.401342","DOIUrl":"https://doi.org/10.1109/AISPAS.1995.401342","url":null,"abstract":"We give an O(r/sup 2/) time algorithm for constructing a fault-free routing path of optimal length between any true non-fault nodes of an r-dimensional torus with 2r-1 faulty nodes. We show that the Rabin diameter of a r-dimensional torus is its diameter plus one. We also describe a cluster fault tolerant (CFT) routing model and give an efficient algorithm for node-to-node CFT routing.<<ETX>>","PeriodicalId":321580,"journal":{"name":"Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133195075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}