{"title":"Bounded-response-time self-stabilizing OPS5 production systems","authors":"A. Cheng, Seiya Fujii","doi":"10.1109/IPDPS.2000.846012","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846012","url":null,"abstract":"This paper examines the task of constructing bounded-time self-stabilizing rule-based systems that take their input from an external environment. Bounded response-time and self-stabilization are essential for rule-based programs that must be highly fault-tolerant and perform in a real-time environment. We present an approach for solving this problem using the OPS5 programming language as it is one of the most expressive and widely used rule-based programming languages. Bounded response-time of the program is ensured by constructing the state space graph so that the programmer can visualize the control flow of the program execution, and any possible infinite execution leaps should be detected. Both the input variables and internal variables are made fault tolerant from corruption caused by transient faults via the introduction of new self-stabilizing rules in the program. Finally the timing analysis of the self-stabilizing OPS5 program is shown in terms of the number of rule firings and the comparisons performed in the Rete network.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115431566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semigroup and prefix computations on improved generalized mesh-connected computers with multiple buses","authors":"Y. Pan, Si-Qing Zheng, Keqin Li, Hong Shen","doi":"10.1109/IPDPS.2000.845992","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845992","url":null,"abstract":"Various augmenting mechanisms have been proposed to enhance the communication efficiency of mesh-connected computers (MCCs). One major approach is to add nonconfigurable buses for improved broadcasting. A typical example is the mesh-connected computer with multiple buses (MMB). In this paper, we propose a new class of generalized MMBs, the improved generalized MMBs (IMMBs). Each processor in an IMMB is connected to exactly two buses. We show the power of IMMBs by considering semigroup and prefix computations. Specifically, we show that semigroup and prefix computations on N operands, and data broadcasting all take O(log N) time on IMMBs. This is the first O(log N) time algorithm for these problems on arrays with fixed broadcasting buses.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127210922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On optimal fill-preserving orderings of sparse matrices for parallel Cholesky factorizations","authors":"Wen-Yang Lin, Chuen-Liang Chen","doi":"10.1109/IPDPS.2000.846067","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846067","url":null,"abstract":"In this paper, we consider the problem of finding fill-preserving ordering of a sparse symmetric and positive definite matrix such that the reordered matrix is suitable for parallel factorization. We extended the unit-cost fill-preserving ordering into a generalized class that can adopt various aspects in parallel factorization, such as computation, communication and algorithmic diversity. Based on the elimination tree model, we show that as long as the node cost function for factoring a column/row satisfies two mandatory properties, we can deploy a greedy-based algorithm to find the corresponding optimal ordering. The complexity of our algorithm is O(q log q+/spl kappa/), where q denotes the number of maximal cliques, and /spl kappa/ the sum of all maximal clique sizes in the filled graph. Our experiments reveal that on the average, our minimum completion cost ordering (MinCP) would reduce up to 17% the cost to factor than minimum height ordering (Jess-Kees).","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127275748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficiency of dynamic load balancing based on permanent cells for parallel molecular dynamics simulation","authors":"R. Hayashi, S. Horiguchi","doi":"10.1109/IPDPS.2000.845968","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845968","url":null,"abstract":"This paper addresses a dynamic load balancing method of domain decomposition for 3-dimensional molecular dynamics on parallel computers. In order to reduce interprocessor communication overhead, we are introducing a concept of permanent cells to the dynamic load balancing method. Molecular dynamics simulations on a parallel computer T3E prove that the proposed method using load balancing much improves the execution time. Furthermore, we analyze theoretical effective ranges of the dynamic load balancing method, and compare them with experimental effective ranges obtained by parallel molecular dynamics simulations. As a result, the theoretical upper bounds predict experimental effective ranges and are also valid on commercial parallel computers.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115047304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance of the IBM general parallel file system","authors":"T. Jones, A. Koniges, R. K. Yates","doi":"10.1109/IPDPS.2000.846052","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846052","url":null,"abstract":"We measure the performance and scalability of IBM's General Parallel File System (GPFS) under a variety of conditions. The measurements are based on benchmark programs that allow us to vary block sizes, access patterns, etc., and to measure aggregate throughput rates. We use the data to give performance recommendations for application development and as a guide to the improvement of parallel file systems.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129777456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multicomputer algorithms for wavelet packet image decomposition","authors":"M. Feil, A. Uhl","doi":"10.1109/IPDPS.2000.846066","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846066","url":null,"abstract":"In this paper we describe and analyze algorithms for 2-D wavelet packet decomposition for MIMD distributed memory architectures. We discuss two different approaches: On the one hand algorithms generating the entire wavelet packet subband structure (as required for adaptive applications), on the other hand algorithms generating the lowest subband level only (as required for numerical applications). We investigate several optimizations and generalizations of corresponding message passing algorithms and finally compare the results obtained on a Cray T3D and a Parsytec GCel 1024.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128589723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Support for recoverable memory in the distributed virtual communication machine","authors":"Marcel-Catalin Rosu, K. Schwan","doi":"10.1109/IPDPS.2000.845981","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845981","url":null,"abstract":"Distributed Virtual Communication Machine (DVCM) is a software communication architecture for clusters of workstations equipped with programmable network interfaces (Nls) for high-speed networks. DVCM is an extensible architecture, which promotes the transfer of application modules to the NI. By executing \"closer\" to the network, on the NI CoProcessor, these modules can communicate with significantly higher message rates and lower latencies than achievable at the CPU-level. This paper describes how DVCM modules can be used to enhance the performance of the Cluster Recoverable Memory system (CRMem), a transaction-processing kernel for memory-resident databases. By using the NI CoProcessor for CRMem's remote operations, our implementation achieves more than 3,000 trans/sec on a simplified TpcB benchmark.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128266769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable parallel matrix multiplication on distributed memory parallel computers","authors":"Keqin Li","doi":"10.1109/IPDPS.2000.846000","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846000","url":null,"abstract":"Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N/sup /spl alpha//), where 2</spl alpha//spl les/3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O (log N) time by using N/sup /spl alpha///log N processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1/spl les/p/spl les/N/spl alpha//sup /spl alpha///log N, multiplying two N/spl times/N matrices can be performed by a DMPC with p processors in O(N/sup /spl alpha///p) rime, i.e., linear speedup and cost optimality can be achieved in the range [1..N/sup /spl alpha///log N]. This unifies all known algorithms for matrix multiplication on DMPC, standard or non-standard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121468384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using postordering and static symbolic factorization for parallel sparse LU","authors":"M. Cosnard, L. Grigori","doi":"10.1109/IPDPS.2000.846068","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846068","url":null,"abstract":"In this paper we present several improvements of widely used parallel LU factorization methods on sparse matrices. First we introduce the LU elimination forest and then we characterize the L, U factors in terms of their corresponding LU elimination forest. This characterization can be used as a compact storage scheme of the matrix as well as of the task dependence graph. To improve the use of BLAS in the numerical factorization, we perform a postorder traversal of the LU elimination forest, thus obtaining larger supernodes. To expose more task parallelism for a sparse matrix, we build a more accurate task dependence graph that includes only the least necessary dependences. Experiments compared favorably our methods against methods implemented in the S* environment on the SGI's Origin2000 multiprocessor.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128175174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A simple and efficient mechanism to prevent saturation in wormhole networks","authors":"Elvira Baydal, P. López, J. Duato","doi":"10.1109/IPDPS.2000.846043","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846043","url":null,"abstract":"Both deadlock avoidance and recovery techniques suffer from severe performance degradation when the network is close to or beyond saturation. This performance degradation appears because messages block in the network faster than they are drained by the escape paths in the deadlock avoidance strategies or the deadlock recovery mechanism. Many parallel applications produce bursty traffic that may saturate the network during some intervals, significantly increasing execution time. Therefore, the use of techniques that prevent network saturation are of crucial importance. Although several mechanisms have been proposed in the literature to reach this goal, some of them introduce some penalty when the network is not fully saturated, require complex hardware to be implemented or do not behave well under all network load conditions. In this paper we propose a new mechanism to avoid network saturation that overcomes these drawbacks.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130982795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}