{"title":"Mapping pyramids into 3-D meshes","authors":"K. Chung, Yu-Wei Chen","doi":"10.1109/ICPADS.1994.590361","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590361","url":null,"abstract":"Embedding one parallel architecture into another is very important in the area of parallel processing because parallel architectures can vary widely. Given a pyramid architecture of (4/sup N/-1)/3 nodes and height N, this paper presents a mapping method to embed the pyramid architecture into a 2/sup N-1-k//spl times/2/sup N-1-k//spl times/(4/sup k+1/+2)/3 mesh for 0/spl les/k/spl les/N-1. Our method has dilation max{4/sup k/, 2/sup N-2-k/} and expansion 1+2/(4k+1). When setting k=(N-2)/3, the pyramid can be embedded into a 2/sup (2N-1//3)/spl times/2/sup (2N-1//3)/spl times/[4/sup (N+1//3)+2]/3 mesh, and it has dilation and expansion 1+2/[4/sup (N+1//3)]. This result has can optimal expansion when N is sufficiently large and is superior to the previous mapping methods in terms of the same gauges.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116853025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A tree-based distributed algorithm for the K-entry critical section problem","authors":"S. Wang, S. Lang","doi":"10.1109/ICPADS.1994.590400","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590400","url":null,"abstract":"We present a token-based algorithm for solving the K-entry critical section problem. Based on Raymond's (1989) tree-based approach, we regard the nodes as being arranged in a directed tree structure, and all messages used in the algorithm are sent along the directed edges of the tree. There are K tokens in the system; we use a bag structure at each node to record the collection of the neighboring nodes, possibly with multiple occurrences of the same node, through which the K tokens can be located. As a result, there are K paths from each node leading to the K tokens in the system. Our algorithm requires at most 2 KD messages for a node to enter the CS, where D is the diameter of the tree. Therefore, when the diameter D is much smaller than N, the number of nodes, e.g. D=O(1) as in a star or D=O(logN) as in a binary tree, our algorithm's upper bound on the number of messages per CS is smaller than those previously reported.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"26 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131726982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CCT: a new VLSI architecture for parallel processing","authors":"S. K. Basu, J. Dattagupta, R. Dattagupta","doi":"10.1109/ICPADS.1994.590444","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590444","url":null,"abstract":"We propose a VLSI implementable architecture called Cube Connected Tree having advantages of both trees and hypercubes. This structure has fixed low degree nodes for any size of network, unlike hypercubes, where the node degree is dependent on the size of the hypercube. Complexity of VLSI layout of this structure has been addressed within the grid model of C.D. Thompson (1984). By using spare links and PE's, fault-tolerance capabilities of the system has been enhanced. Easy programmability of this structure has been demonstrated by designing polyalgorithmic algorithms for sorting and discrete Fourier transform.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122996901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study of cache hashing functions for symbolic applications in micro-parallel processors","authors":"Ching-Long Su, Chin-Chi Teng, A. Despain","doi":"10.1109/ICPADS.1994.590367","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590367","url":null,"abstract":"This paper presents a study of cache hashing functions for micro-parallel processors (e.g., superpipeline and super-scalar processors). Several novel cache hashing functions are experimented. Our simulation results show that an unconventional cache hashing function applied on a direct-mapped cache results in hit rates as good as a two-way set associative cache with traditional mapping, while the cache hit times are as fast as a direct-mapped cache with traditional mapping.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123338922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient fault tolerance: an approach to deal with transient faults in multiprocessor architectures","authors":"A. Bondavalli, S. Chiaradonna, F. Giandomenico","doi":"10.1109/ICPADS.1994.590322","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590322","url":null,"abstract":"Dynamic error processing approaches are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To this end, dynamic error processing must be integrated with a fault treatment approach aiming at optimising resource utilisation. In this paper we propose a diagnosis approach that, accounting for transient faults, tries to remove units very cautiously and to balance between two conflicting requirements. The first is to avoid the removal of units that have experienced transient faults and can be still useful for the system and the other is to avoid to keep failed units whose usage may lead to a premature failure of the system. The proposed fault treatment approach is integrated with a mechanism for dynamic error processing in a complete fault tolerance strategy. Reliability analyses based on the Markov approach and an efficiency evaluation performed by simulation are carried out.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124894466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Torus with slotted rings architecture for a cache-coherent multiprocessor","authors":"J.-H. Chuang, W.-C. Chao","doi":"10.1109/ICPADS.1994.589899","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.589899","url":null,"abstract":"The slotted ring is a point-to-point unidirectional connection for multiprocessor systems which resolves most of the problems associated with the bus system. However, the cycle time of the ring becomes the bottleneck when the system grows. Torus with slotted rings which is composed of multiple rings is proposed to reduce the cycle time of the resulting system. It is similar to the Wisconsin Multicube built by a grid of buses. The proposed architecture adopts a ring-map directory cache coherence scheme to avoid occupying too many rings during invalidation. Through performance evaluation, it is verified that the torus with slotted rings with ring-map directory scheme is better than the Wisconsin Multicube with the pure snooping scheme.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123979737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Grouping array layouts to reduce communication and improve locality of parallel programs","authors":"Tien-Pao Shih, E. Davidson","doi":"10.1109/ICPADS.1994.590375","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590375","url":null,"abstract":"A data layout method, array grouping, is proposed to improve communication efficiency and cache utilization of parallel programs containing indirect array references or nonunit stride indexing. Conditions on where to apply this technique are specified in a series of theorems. The technique is then applied to a real finite element application. The experimental results show that communication is reduced by 15%, and data subcache misses by 40% on 56 processors of the KSR1 parallel computer.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"600 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120876090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient technique to remove transformations [program codes]","authors":"C. Dow, M. Soffa, Shi-Kuo Chang","doi":"10.1109/ICPADS.1994.590343","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590343","url":null,"abstract":"Although the application of code transformations is critical to exploit parallelism in program code, few guidelines or tools are provided to determine what transformations should be applied and where they should be applied. In this paper, we approach this problem by first providing a taxonomy of code transformations to assist the user in parallelizing programs. We then present an efficient technique to remove transformations from the code when it is determined that they are ineffective or prevent more beneficial transformations from being applied. The technique to remove transformations employs inverse primitive actions, making it transformation independent. The technique uses the program dependence graph as the intermediate representation, making it language independent.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116777385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Convexity problems on reconfigurable meshes","authors":"Chia-Long Lee, Wen-Tsuen Chen","doi":"10.1109/ICPADS.1994.589893","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.589893","url":null,"abstract":"The reconfigurable mesh, a parallel system with bus autonomy, can support various interconnection schemes during execution of an algorithm. It offers very efficient computation power in many application domains. In digital image processing and computer vision, convexity is a natural shape descriptor and a classifier for objects in the image space. We first present that the problem of identifying extreme points of convex hulls can be solved in O(1) time on the reconfigurable mesh proposed. Furthermore, we present constant time algorithms for a number of convexity-related problems on reconfigurable meshes. These problems include point inclusion, interior detection, area, and width of convex hulls.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129374053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient implementation of sorting algorithms on asynchronous distributed-memory machines","authors":"B. Zhou, R. Brent, A. Tridgell","doi":"10.1109/icpads.1994.590058","DOIUrl":"https://doi.org/10.1109/icpads.1994.590058","url":null,"abstract":"The problem of merging two sequences of elements which are stored separately in two processing elements (PEs) occurs in the implementation of many existing sorting algorithms. We describe efficient algorithms for the merging problem on asynchronous distributed-memory machines. The algorithms reduce the cost of the merge operation and of communication, as well as partly solving the problem of load balancing. Experimental results on a Fujitsu AP1000 are reported.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127138246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}