{"title":"The performance of consistent checkpointing in distributed shared memory systems","authors":"G. Cabillic, Gilles Muller, I. Puaut","doi":"10.1109/RELDIS.1995.526217","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526217","url":null,"abstract":"This paper presents the design and implementation of a consistent checkpointing scheme for distributed shared memory (DSM) systems. Our approach relies on the integration of checkpoints within synchronization barriers already existing in applications; this avoids the need to introduce an additional synchronization mechanism. The main advantage of our checkpointing mechanism is that performance degradation arises only when a checkpoint is being taken; hence, the programmer can adjust the trade-off between the cost of checkpointing and the cost of longer rollbacks by adjusting the time between two successive checkpoints. The paper compares several implementations of the proposed consistent checkpointing mechanism (incremental, non-blocking, and pre-flushing) on the Intel Paragon multicomputer for several parallel scientific applications. Performance measures show that a careful optimization of the checkpointing protocol can reduce the time overhead of checkpointing from 8% to 0.04% of the application duration for a 6 mn checkpointing interval.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"295 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124736483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A correctness criterion for advanced transaction models","authors":"A. Rakotonirainy","doi":"10.1109/RELDIS.1995.518720","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.518720","url":null,"abstract":"The transaction concept was originally applied to database applications. Serializability theory captured transaction correctness and database objects consistency properties in a single notion. Today, increasingly sophisticated information requires new correctness criteria due to the limitation of classical serialisability theory which allows only a limited cooperation between its components. Several models relaxing the ACID (Atomicity, Consistency, Isolation, Durability) properties in a controlled manner have been developed. These approaches exploit separately the semantics properties of operations (object semantic approach) and application semantics (transaction interleaving approach). The notion of correctness can be refined with the help of the two previous approaches whilst increasing concurrency. In this paper, we will the gap between transaction and object semantic correctness criteria. We define a new class of schedule called Multilevel Relative Serialisability (MLRS) to combine the two approaches. This class of schedule preserve correctness properties defined in terms of object and transaction semantics. We use ACTA formalism to express object consistency, transaction correctness and MLRS. This work merges existing /spl Lt/relaxed/spl Gt/ transaction models into a unified concept. This concept is useful for long-lived, cooperative and hierarchical transaction models.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"145 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125851980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance analysis of a regeneration-based dynamic voting algorithm","authors":"Robert J. Hilderman, Howard J. Hamilton","doi":"10.1109/RELDIS.1995.526227","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526227","url":null,"abstract":"RVC2 is a consistency control algorithm for replicated data objects in a distributed computing system. It is a dynamic voting algorithm which utilizes selective regeneration and recovery mechanisms for failed copies. Virtual copies which record information about the current state of a data object, but do not contain actual data, are used to reduce network and storage overhead. Experimental results for availability, storage cost, and message cost, obtained through simulation, are discussed. Our results show that the replacement of real copies with virtual copies has no significant impact on the availability of a data object. Neither does varying the generation threshold. We also show that high availability can be maintained without regeneration. We conclude that regeneration makes no significant contribution to the high availability of RVC2.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126350685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new deadlock detection algorithms for distributed real-time database systems","authors":"C. Yeung, S. Hung","doi":"10.1109/RELDIS.1995.526222","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526222","url":null,"abstract":"Recently the concurrency control issue of real-time transactions is gaining increasing attention of researchers in the database community. One of the major design issue in concurrency control of real-time transactions is the resolution of local as well as distributed deadlocks while at the same time meeting the timing requirements of the transactions. In this paper, a new deadlock detection algorithm specially designed for distributed real-time database systems is proposed. The performance of the proposed algorithm is evaluated through extensive simulation experiments. Studies have also been carried out to compare the performance of the real-time deadlock detection algorithm with a non real-time algorithm for both firm and soft real-time transactions. Results indicated that the real-time deadlock detection algorithm performs better than the non real-lime deadlock detection algorithm. Results also indicated that the performance of the new algorithm is substantially better for soft real-time than that of firm real-time systems.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127767334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting semantics-based transaction processing in mobile database applications","authors":"Gary D. Walborn, Panos K. Chrysanthis","doi":"10.1109/RELDIS.1995.518721","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.518721","url":null,"abstract":"Advances in computer and telecommunication technologies have made mobile computing a reality. However, greater mobility implies a more tenuous network connection and a higher rate of disconnection. In order to tolerate disconnections as well as to reduce the delays and cost of wireless communication, it is necessary to support autonomous mobile operations on data shared by stationary hosts. This would allow the part of a computation executing on a mobile host to continue executing while the mobile host is not connected to the network. In this paper, we examine whether object semantics can be exploited to facilitate autonomous and disconnected operation in mobile database applications. We define the class of fragmentable objects which may be split among a number of sites, operated upon independently at each site, and then recombined in a semantically consistent fashion. A number of objects with such characteristics are presented and an implementation of fragmentable stacks is shown and discussed.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127315502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Failure detection algorithms for a reliable execution of parallel programs","authors":"S. Chabridon, E. Gelenbe","doi":"10.1109/RELDIS.1995.526230","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526230","url":null,"abstract":"We report on the design and simulation of novel algorithms which will ensure that application software runs correctly on a MIMD system in which processing units (PU) can fail. The effect of these algorithms is evaluated for random task graphs using simulation as failure rates increase. An example of a specific application is also examined (the Fast Fourier Transform) for which we construct the task graph and then simulate its execution under various values of the failure rates of processors.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123110117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An integer programmimg approach for assigning votes in a distributed system","authors":"D. Venkaiah, P. Jalote","doi":"10.1109/RELDIS.1995.526220","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526220","url":null,"abstract":"Voting is a general approach to maintain consistency of replicated data under node failures and network partitions. In voting, each node as assigned a particular number of votes, and any group with majority of votes can perform operations. Votes assigned to the nodes have a significant impact on the performance of a voting system. In this report, we propose an integer programming approach for determining the vote assignment for maximizing the throughput. We use Monte-Carlo simulation to find the most likely groups formed due to partition failures and use these groups to formulate vote assignment as an integer programming problem. We have developed a tool called vote assignment tool (VAT) that implements this approach. VAT takes as input the configuration of the network, and after formulating the problem as integer programming exercise, solves it to output a vote assignment. We have tried this approach for different networks and have found that in many cases this approach assigns votes equivalent to or better than the best vote assignment given by the various heuristics.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"34 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133517105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hierarchy of totally ordered multicasts","authors":"U. Wilhelm, A. Schiper","doi":"10.1109/RELDIS.1995.526218","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526218","url":null,"abstract":"The increased interest in protocols that provide a total order on message delivery has led to several different definitions of total order. In this paper we investigate these different definitions and propose a hierarchy that helps to better understand the implications of the different possibilities in terms of guarantees and communication cost. We identify two definitions: weak total order and strong total order, which are at the extremes of the proposed hierarchy, and incorporate them into a consistent design. Finally, we propose high-level algorithms based on a virtually synchronous communication environment that implement the given definitions.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"307 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123926999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximum and minimum consistent global checkpoints and their applications","authors":"Yi-Min Wang","doi":"10.1109/RELDIS.1995.526216","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526216","url":null,"abstract":"This paper considers the problem of constructing the maximum and the minimum consistent global checkpoints that contain a target set of checkpoints, and identify it as a generic issue in recovery-related applications. We formulate the problem as a reachability analysis problem on a directed rollback-dependency graph, and develop efficient algorithms to calculate the two consistent global checkpoints for both general nondeterministic executions and piecewise deterministic executions. We also demonstrate that the approach provides a generalization and unifying framework for many existing and potential applications including software error recovery, mobile computing recovery, parallel debugging and output commits.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124405078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the design of systems of cooperating functional processes","authors":"Claus Aßmann, W. Kluge","doi":"10.1109/RELDIS.1995.518723","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.518723","url":null,"abstract":"This paper describes a design concept for systems of cooperating distributed processes based on a variant of coloured Petri-nets. It cleanly separates graphical specification of processes and their interaction (or communication) from the algorithmic specifications of the computations that need to be performed by the individual processes. Designing complex process systems is aided by abstractions similar to those that are available in programming languages. In conjunction with a small set of well-defined interaction schemes for process communication it ensures well-behaving systems largely by construction. Essential invariance properties of small subsystems which in incremental steps may either be verified by formal methods or validated by simulation are not corrupted when embedding them in the context of larger systems. The paper focuses particularly on the construction of large systems by recursive abstractions of small net templates which, at execution time, may be recursively expanded to distribute application problems evenly over several processing sites for concurrent processing.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131686104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}