{"title":"A tree-based distributed algorithm for the K-entry critical section problem","authors":"S. Wang, S. Lang","doi":"10.1109/ICPADS.1994.590400","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590400","url":null,"abstract":"We present a token-based algorithm for solving the K-entry critical section problem. Based on Raymond's (1989) tree-based approach, we regard the nodes as being arranged in a directed tree structure, and all messages used in the algorithm are sent along the directed edges of the tree. There are K tokens in the system; we use a bag structure at each node to record the collection of the neighboring nodes, possibly with multiple occurrences of the same node, through which the K tokens can be located. As a result, there are K paths from each node leading to the K tokens in the system. Our algorithm requires at most 2 KD messages for a node to enter the CS, where D is the diameter of the tree. Therefore, when the diameter D is much smaller than N, the number of nodes, e.g. D=O(1) as in a star or D=O(logN) as in a binary tree, our algorithm's upper bound on the number of messages per CS is smaller than those previously reported.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"26 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131726982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. G. Palmer, H. Siegel, Janet M. Siegel, J. Antonio
{"title":"Implementation of a tree-structured vector quantizer for image compression on the MasPar MP-1 parallel machine","authors":"R. G. Palmer, H. Siegel, Janet M. Siegel, J. Antonio","doi":"10.1109/ICPADS.1994.590302","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590302","url":null,"abstract":"The transmission of digitized images over limited bandwidth channels motivates the use of data compression techniques. Many data compression techniques are not suitable for such applications because compression ratios of more than 20:1 are often required. One technique that can provide this level of compression is vector quantization. The processes of codebook generation and, especially, encoding and decoding are tasks well suited for execution on a massively parallel machine. For codebook generation, an SIMD algorithm is developed whose control flow is based on sequencing through the training data, rather than the tree structure, to achieve improved performance. Results from execution on the 16384 processor MasPar MP-1 SIMD machine are presented. The approaches taken could be adapted for other SIMD as well as MIMD machines.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123830243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Load balancing and query optimization in dataflow parallel evaluation of Datalog programs","authors":"J. F. A. Montes, E. Alba, J. M. Troya","doi":"10.1109/ICPADS.1994.590436","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590436","url":null,"abstract":"A dataflow model to obtain parallelism in the evaluation of Datalog is presented. This model performs query evaluation as a dataflow through a network of communicating concurrent processes capable of solving the query. This process network is based on the intensional database definition, plus the concrete query to be evaluated. A cost model to cope with the load balancing problem is described. A load balancing algorithm is presented and discussed. An algorithm to optimize the evaluation is described which is based on process network rewriting. This utilizes information in the query bindings to be evaluated in order to optimize the dataflow graph.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114631324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient fault tolerance: an approach to deal with transient faults in multiprocessor architectures","authors":"A. Bondavalli, S. Chiaradonna, F. Giandomenico","doi":"10.1109/ICPADS.1994.590322","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590322","url":null,"abstract":"Dynamic error processing approaches are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To this end, dynamic error processing must be integrated with a fault treatment approach aiming at optimising resource utilisation. In this paper we propose a diagnosis approach that, accounting for transient faults, tries to remove units very cautiously and to balance between two conflicting requirements. The first is to avoid the removal of units that have experienced transient faults and can be still useful for the system and the other is to avoid to keep failed units whose usage may lead to a premature failure of the system. The proposed fault treatment approach is integrated with a mechanism for dynamic error processing in a complete fault tolerance strategy. Reliability analyses based on the Markov approach and an efficiency evaluation performed by simulation are carried out.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124894466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Grouping array layouts to reduce communication and improve locality of parallel programs","authors":"Tien-Pao Shih, E. Davidson","doi":"10.1109/ICPADS.1994.590375","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590375","url":null,"abstract":"A data layout method, array grouping, is proposed to improve communication efficiency and cache utilization of parallel programs containing indirect array references or nonunit stride indexing. Conditions on where to apply this technique are specified in a series of theorems. The technique is then applied to a real finite element application. The experimental results show that communication is reduced by 15%, and data subcache misses by 40% on 56 processors of the KSR1 parallel computer.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"600 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120876090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study of cache hashing functions for symbolic applications in micro-parallel processors","authors":"Ching-Long Su, Chin-Chi Teng, A. Despain","doi":"10.1109/ICPADS.1994.590367","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590367","url":null,"abstract":"This paper presents a study of cache hashing functions for micro-parallel processors (e.g., superpipeline and super-scalar processors). Several novel cache hashing functions are experimented. Our simulation results show that an unconventional cache hashing function applied on a direct-mapped cache results in hit rates as good as a two-way set associative cache with traditional mapping, while the cache hit times are as fast as a direct-mapped cache with traditional mapping.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123338922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient technique to remove transformations [program codes]","authors":"C. Dow, M. Soffa, Shi-Kuo Chang","doi":"10.1109/ICPADS.1994.590343","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590343","url":null,"abstract":"Although the application of code transformations is critical to exploit parallelism in program code, few guidelines or tools are provided to determine what transformations should be applied and where they should be applied. In this paper, we approach this problem by first providing a taxonomy of code transformations to assist the user in parallelizing programs. We then present an efficient technique to remove transformations from the code when it is determined that they are ineffective or prevent more beneficial transformations from being applied. The technique to remove transformations employs inverse primitive actions, making it transformation independent. The technique uses the program dependence graph as the intermediate representation, making it language independent.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116777385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Torus with slotted rings architecture for a cache-coherent multiprocessor","authors":"J.-H. Chuang, W.-C. Chao","doi":"10.1109/ICPADS.1994.589899","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.589899","url":null,"abstract":"The slotted ring is a point-to-point unidirectional connection for multiprocessor systems which resolves most of the problems associated with the bus system. However, the cycle time of the ring becomes the bottleneck when the system grows. Torus with slotted rings which is composed of multiple rings is proposed to reduce the cycle time of the resulting system. It is similar to the Wisconsin Multicube built by a grid of buses. The proposed architecture adopts a ring-map directory cache coherence scheme to avoid occupying too many rings during invalidation. Through performance evaluation, it is verified that the torus with slotted rings with ring-map directory scheme is better than the Wisconsin Multicube with the pure snooping scheme.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123979737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming-Syan Chen, Tao-Heng Yang, Philip S. Yu, Tze-Shiu Liu
{"title":"On parallel transaction processing in a coupled system","authors":"Ming-Syan Chen, Tao-Heng Yang, Philip S. Yu, Tze-Shiu Liu","doi":"10.1109/ICPADS.1994.590422","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590422","url":null,"abstract":"A performance study is conducted on parallel transaction processing in a coupled system, which is a multi-node system with a shared global buffer. We develop a multiple system simulator and obtain several performance results from it. This simulator has been run against three workloads, and the coupled system behavior with these three different inputs is studied. Several statistics, including those on local and global buffer hits, page writes to the global buffer, cross-invalidations and castouts, are comparatively analysed, and their relationship to the degree of data skew is explored.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128824721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CCT: a new VLSI architecture for parallel processing","authors":"S. K. Basu, J. Dattagupta, R. Dattagupta","doi":"10.1109/ICPADS.1994.590444","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590444","url":null,"abstract":"We propose a VLSI implementable architecture called Cube Connected Tree having advantages of both trees and hypercubes. This structure has fixed low degree nodes for any size of network, unlike hypercubes, where the node degree is dependent on the size of the hypercube. Complexity of VLSI layout of this structure has been addressed within the grid model of C.D. Thompson (1984). By using spare links and PE's, fault-tolerance capabilities of the system has been enhanced. Easy programmability of this structure has been demonstrated by designing polyalgorithmic algorithms for sorting and discrete Fourier transform.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122996901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}