{"title":"Simulation and performance evaluation of a modularly configurable attached processor","authors":"Yi-Chieh Chang, G. Gibson, Claudia Ayala","doi":"10.1109/ICPADS.1994.590056","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590056","url":null,"abstract":"A new architecture for high-performance parallel attached processors is studied in this paper. The unique features are that the attached processor can be configured to match a set of algorithms and its memory controllers can be programmed to fit the access patterns required by the algorithms. As a result, high utilization of the processing logic for given sets of algorithms can be obtained. A simulator with interactive graphic interface is designed to study the performance of the proposed architecture. An example based on matrix multiplication is used for illustration. The simulation results show that a sustained execution rate as high as 95% of the peak speed for matrices with a size of 128/spl times/128 can be achieved in the proposed attached processor architecture. If CMOS technology is chosen to implement the MCAP architecture, a sustained speed of 190 MFLOPS can be obtained for matrix multiplication with four multipliers and four adders.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125552232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Full-color real-time video broadcasting over ATM LAN","authors":"Chia-Yiu Maa","doi":"10.1109/ICPADS.1994.590112","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590112","url":null,"abstract":"In this paper our experience of broadcasting full-color, full-size, real-time video over an ATM LAN is reported. The broadcasting system is adaptive in that each receiver can view video at its highest affordable frame rate. Effort has been made to guarantee continuous audio and best-effort audio and video synchronization. The key factors of the major components of the system which limit the performance of real-time ATM video applications are then discussed. Suggestions are given to address these limitations.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114175926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Plenary Address 2: Computing in the '90s, Microsoft, and Supercomputers","authors":"George Spix","doi":"10.1109/ICPADS.1994.589876","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.589876","url":null,"abstract":"After 50 years of building high performance scientific computers, two major architectures exist: (1) clusters of “Cray-style” vector supercomputers; (2) clusters of scalar uniand multi-processors. Clusters are in transition from (a) massively parallel computers and clusters running proprietary software to (b) proprietary clusters running standard software, and (c) do-it-yourself Beowulf clusters built from commodity hardware and software. In 2001, only five years after its introduction, Beowulf has mobilized a community around a standard architecture and tools. Beowulf’s economics and sociology are poised to kill off the other architectural lines – and will likely affect traditional super-computer centers as well. Peer-to-peer and Grid communities are beginning to provide significant advantages for embarrassingly parallel problems and sharing vast numbers of files. The Computational Grid can federate systems into supercomputers far beyond the power of any current computing center. The centers will become super-data and super-application centers. While these trends make highperformance computing much less expensive and much more accessible, there is a dark side. Clusters perform poorly on applications that require large shared memory. Although there is vibrant computer architecture activity on microprocessors and on high-end cellular architectures, we appear to be entering an era of super-computing mono-culture. Investing in next generation software and hardware supercomputer architecture is essential to improve the efficiency and efficacy of systems.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128223811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two expansible multistage interconnection networks","authors":"C. S. Yang, L. Zu","doi":"10.1109/ICPADS.1994.590333","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590333","url":null,"abstract":"Two new construction methods for multistage interconnection networks (MINs) are proposed. These methods provide low overhead in enlarging the size of MIN scheme and they have the same features as those of MIN schemes by typically designed method. The first proposed method proves that, in enlarging the size of MIN scheme, the full access interconnection property and the self-routing ability are also available in the enlarged scheme and the hardware and reconstruction overhead is low. However, the requests to be accepted by the MIN scheme depend not only on the configuration of MIN but also on another mechanism in the first proposed method. It results in low success probability for any request. The second proposed method releases this disadvantage. The following are the features of our two proposed schemes: (1) The full-access interconnection property; (2) Simple and distributed self-routing ability; (3) The least hardware cost; (4) Low reconstruction overhead.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124017617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An optimal fault-tolerant design approach for array processors","authors":"Chang Nian Zhang, T. M. Bachtiar, W. Chou","doi":"10.1109/ICPADS.1994.590320","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590320","url":null,"abstract":"A systematic approach for designing fault tolerant systolic array using space/time redundancy is proposed. The approach is based upon a fault tolerant mapping theory which relates space-time mapping and concurrent error detection techniques. By this design approach, the resulting systolic array is fault tolerant and optimal. Besides, it has the capability to compute more problem instances simultaneously without extra cost.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121302423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Obtaining nondominated k-coteries for fault-tolerant distributed k-mutual exclusion","authors":"Jehn-Ruey Jiang, Shing-Tsaan Huang","doi":"10.1109/ICPADS.1994.590392","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590392","url":null,"abstract":"A k-coterie is a family of sets (called quorums) in which any (k+1) quorums contain at least a pair of quorums intersecting each other. K-coteries can be used to develop distributed k-mutual exclusion algorithms that are resilient to node and/or communication link failures. A k-coterie is said to dominate another k-coterie if and only if every quorum in the latter is a super set of some quorum in the former. Obviously the dominating one has more chance than the dominated one for a quorum to be formed successfully in an error-prone environment. Thus, we should always concentrate on nondominated k-coteries that no k-coterie can dominate. We introduce a theorem for checking the nondomination of k-coteries, define a class of special nondominated k-coteries-strongly nondominated (SND) k-coteries, and propose two operations to generate new SND k-coteries from known SND k-coteries.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126322412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sorting networks with built-in error correction","authors":"Y. Hsu, E. Swartzlander","doi":"10.1109/ICPADS.1994.590339","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590339","url":null,"abstract":"A sorting network with built-in error correction is proposed in this paper. A time shared TMR scheme is used to achieve the error correcting capability. A quarter of the original sorting network based on perfect shuffle is triplicated and voted in each stage. The hardware complexity of this time shared TMR error correcting sorting network is a little more than the original sorting network. The price is that the delay time increases by a factor of 4. However, the throughput penalty can be minimized by pipelining. A technology-independent gate level analysis of hardware complexity and delay time is included in this paper. Possible variations of the basic design are also discussed.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132801304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of a portable parallelizing compiler with loop partition","authors":"M.-C. Hsiao, S. Tseng, Chao-Tung Yang, C.-S. Chen","doi":"10.1109/ICPADS.1994.590318","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590318","url":null,"abstract":"We have implemented a portable FORTRAN parallelizing compiler with loop partition on our experimental target system, Acer Altos 10000, running OSF/1 operating system. We have defined a minimal set of thread-related functions and data types, called B Threads, that is required to support the execution of this parallelizing compiler. Our compiler is highly modularized so that the porting to other platforms will be very easy, and it can partition parallel loops into multithreaded codes based on several loop partition algorithms. We have also proposed a general model of parallel compilers, which is an extension from previous model and is useful in constructing a parallelizing compiler for a particular language. The experimental results show that the best speedups are 3.75, 3.46, and 3.81 for matrix multiplication, adjoint convolution, and increasing workload sample, respectively, when the number of processors is four. It has been shown that this approach works and the experimental results are satisfied.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127324468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A parallel programming environment based on message passing","authors":"Y. Wen, D. Wang, M. Shen, W. Zhen","doi":"10.1109/ICPADS.1994.590452","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590452","url":null,"abstract":"With the development of parallel processing technology, more and more high-performance parallel computer systems have been developed. The convenient and flexible parallel programming environment plays an important role in the spread of parallel computing. How to write efficient parallel codes and how to convert the existing sequential applications into parallel codes have become a very important issue in parallel processing. We introduce a parallel programming environment based on message passing, which is simple to develop parallel applications and has high performance.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124163569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Varietal hypercube-a new interconnection network topology for large scale multicomputer","authors":"S.-Y. Cheng, J.-H. Chuang","doi":"10.1109/ICPADS.1994.590445","DOIUrl":"https://doi.org/10.1109/ICPADS.1994.590445","url":null,"abstract":"The paper proposes a new interconnection network topology, called varietal hypercube for large scale multicomputer systems. An n-dimensional varietal hypercube is constructed by two (n-1)-dimensional varietal hypercubes in a way similar to that for the hypercube except for some minor modifications. The resulting network has the same number of nodes and links as the hypercube, and has most of the desirable properties of the hypercube, including recursive structure, partionability, strong connectivity, and the ability to embed other architectures such as ring and mesh. The diameter of the varietal hypercube is about two thirds of the diameter of the hypercube. The average distance of the varietal hypercube is also smaller than that of the hypercube. Optimal routing and broadcasting algorithms which guarantee the shortest path communication are developed. Comparisons with other variations of the hypercube, such as twisted cube, folded hypercube, and crossed cube, are also included.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116873965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}