{"title":"Algorithms for node disjoint paths in incomplete star networks","authors":"Q. Gu, S. Peng","doi":"10.1109/ICPADS.1994.590312","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590312","url":null,"abstract":"We give efficient algorithms for node disjoint path problems in incomplete star graphs which are defined in this paper to reduce the large gaps in the size of systems based on star graph topologies. Four disjoint path paradigms in incomplete star graphs are discussed: (1) disjoint paths between a pair of nodes s and t, (2) disjoint paths from a node s to a set T of nodes, (3) disjoint paths from a set S of nodes to a set T of nodes, and (4) disjoint paths between node pairs (s/sub i/,t/sub i/). We give algorithms which can find the maximum number of disjoint paths for these paradigms in optimal time. For an n-dimensional incomplete star graph G/sub n,m/, the length of the disjoint paths constructed by our algorithms is at most d(G/sub n,m/)+c, where d(G/sub n,m/) is the diameter of G and c is a small constant. This paper also shows that the k-wide-diameter d/sub n-2//sup W/(G/sub m,n/), k-Rabin-diameter d/sub n-2//sup R/(G/sub m,n/), k-set-diameter d/sub n-2//sup S/(G/sub m,n/), and k-pair-diameter d/sub n-2//sup P/(G/sub m,n/) of G/sub n,m/ are at d(G/sub n,m/)+c.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"294 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132545316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel execution of nested loops in band parallelism","authors":"Z. Chen, C.-C. Chang, C. Tsai","doi":"10.1109/ICPADS.1994.590316","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590316","url":null,"abstract":"How to execute a nested loop in band parallelism on a multiprocessor system is addressed. The mathematical models of the waveband method, the hyperplane method, the modified hyperplane method and the linear band method are derived and compared. Since the structures of the real multiprocessor systems are at most 3-dimensional, in order to map the loop into these systems, an efficient algorithm for finding the optimal linear band in 2-dimensional index space, instead of a high dimensional index space, is proposed.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133688611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. G. Palmer, H. Siegel, Janet M. Siegel, J. Antonio
{"title":"Implementation of a tree-structured vector quantizer for image compression on the MasPar MP-1 parallel machine","authors":"R. G. Palmer, H. Siegel, Janet M. Siegel, J. Antonio","doi":"10.1109/ICPADS.1994.590302","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590302","url":null,"abstract":"The transmission of digitized images over limited bandwidth channels motivates the use of data compression techniques. Many data compression techniques are not suitable for such applications because compression ratios of more than 20:1 are often required. One technique that can provide this level of compression is vector quantization. The processes of codebook generation and, especially, encoding and decoding are tasks well suited for execution on a massively parallel machine. For codebook generation, an SIMD algorithm is developed whose control flow is based on sequencing through the training data, rather than the tree structure, to achieve improved performance. Results from execution on the 16384 processor MasPar MP-1 SIMD machine are presented. The approaches taken could be adapted for other SIMD as well as MIMD machines.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123830243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new approach for finding loop transformation matrices","authors":"Hua Lin, Mi Lu, J. Fang","doi":"10.1109/ICPADS.1994.590342","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590342","url":null,"abstract":"Traditional approach for generating loop transformation matrix, which is based upon the computation of distance vectors or direction vectors, does not work for those nested loops whose distance vectors are uncomputable and direction vectors contain no useful information. In this paper, we present a new technique for generating transformation matrix that is based upon identifying certain types of linear equations or inequalities of distance vectors. Two issues related to this technique are discussed in this paper: 1) Given a nested loop how to identify these linear equations or inequalities; 2) Given such a linear equation or inequality how to generate a legal and unimodular transformation matrix for the purpose of loop parallelization.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121081848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed routing schemes for strictly nonblocking networks","authors":"Fong-Chih Shao, A. Oruç","doi":"10.1109/ICPADS.1994.590408","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590408","url":null,"abstract":"Strictly nonblocking networks may hold the key for high performance multiprocessor systems. While a number of strictly nonblocking networks have been reported in the literature, their use in multiprocessors is hampered by a lack of efficient distributed routing at algorithms to set paths in these networks. As a step in this direction, the paper presents two distributed routing algorithms for D.G. Cantor's (1971) strictly nonblocking network. For N inputs, the first routing algorithm takes O(t)+O(logt log N) steps/sup 1/ to routing t requests in parallel. While this algorithm performs quite well for i=O(log/sup 2/ N), for larger valves of t, we present a randomized version of the same algorithm with an expected time complexity of O(log/sup 2/ N) for any number of requests. These results, when combined with the crosspoints and depth complexities of a Cantor network, give a strictly nonblocking network with O(N log/sup 2/ N) crosspoints, O(logN) depth and O(log/sup 2/N) routing time.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126600122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient emulation for tree-connected networks","authors":"Daw-Jong Shyu, Biing-Feng Wang, C. Tang","doi":"10.1109/ICPADS.1994.590362","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590362","url":null,"abstract":"Efficient emulations provide general methods to convert algorithms designed on a network into algorithms on smaller networks (with the same interconnection structure). In this paper, an optimal emulation for trees is proposed. With slight modification, our emulation can be applied to X-trees without loss of any efficiency. By the strategy of our emulation, optimal emulations for m-ary trees and pyramids can be obtained. An extended problem of the emulation problem on trees is to emulate a weighted tree, in which every node is associated with a weight by a smaller tree. In this paper, we also consider the extended problem and show that the problem is NP-hard.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126612238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of relaxed memory consistency models for multithreaded multiprocessors","authors":"Y. Chong, K. Hwang","doi":"10.1109/ICPADS.1994.590358","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590358","url":null,"abstract":"Stochastic timed Petri nets are developed to evaluate the relative performance of distributed shared memory models for scalable multithreaded multiprocessors. The shared memory models evaluated include the Sequential Consistency (SC), the Weak Consistency (WC), the Processor Consistency (PC) and the Release Consistency (RC) models. Under saturated conditions, we found that multithreading contributes more than 50% of the performance improvement, while the improvement from memory consistency models varies between 20% to 40% of the total performance gain. Our analytical results reveal the lowest performance of the SC model. The PC model requires to use larger write buffers and may perform even lower than the SC model if a small buffer was used. The performance of the WC model depends heavily on the synchronization rate in user code. For a low synchronization rate, the WC model performs as well as the RC model. With sufficient multithreading and network bandwidth, the RC model shows the best performance among the four models.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114929722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the development paradigm of distributed applications","authors":"Chih-Ping Chu, Chi-Jen Tzeng","doi":"10.1109/ICPADS.1994.590458","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590458","url":null,"abstract":"We propose an ideal development paradigm facilitating the implementation of distributed applications. In this paradigm a developer focuses his mind only on the application itself and does not need to spend time on application-unrelated activities. The programming style is nearly consistent with that of centralized system. The mechanism of the general support environment to this paradigm is described. An example explaining the implementation for an environment using the Sun RPC facility as the underlying communication component is also presented.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129640973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Delayed precise invalidation-a software cache coherence scheme","authors":"Tang-Show Hwang, C. Chung","doi":"10.1109/ICPADS.1994.590365","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590365","url":null,"abstract":"Software-based cache coherence scheme is very desirable in scalable multiprocessor as well as massively parallel processor designs. We propose a software-based cache coherence scheme named delayed precise invalidation. The delayed precise invalidation is based on compiler time markings of references and a hardware-based local explicit invalidation of stale data in parallel and selectively. With a small amount of additional hardware and a small set of cache management instructions, the delayed precise invalidation provides more cacheability and allows invalidation of partial elements in an array, overcoming some of the inefficiencies and deficiencies of previous schemes. A correctness proof and a qualitative performance evaluation of the proposed scheme are also presented. Finally, the simulated cache hit ratios of the delayed precise invalidation and the parallel explicit invalidation scheme are given. Simulation results show that the delayed precise invalidation outperforms the parallel explicit invalidation scheme by 1O%.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122109521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance tuning of message passing programs through visual analysis","authors":"S. Lei, Kang Zhang","doi":"10.1109/ICPADS.1994.590455","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590455","url":null,"abstract":"Designing a parallel program to fully utilise the processing power of a multiprocessor machine requires a series of performance analysis and tuning. The paper describes a performance tuning tool for message passing parallel programs. The tool combines the advantages of relational databases and spreadsheets to organise the performance data and analyse the program performance through visualisation. Various graphical displays which assist the user to fine tune the performance of message passing programs are discussed.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122894202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}