{"title":"32-bit cyclic redundancy codes for Internet applications","authors":"P. Koopman","doi":"10.1109/DSN.2002.1028931","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028931","url":null,"abstract":"Standardized 32-bit cyclic redundancy codes provide fewer bits of guaranteed error detection than they could, achieving a Hamming Distance (HD) of only 4 for maximum-length Ethernet messages, whereas HD=6 is possible. Although research has revealed improved codes, exploring the entire design space has previously been computationally intractable, even for special-purpose hardware. Moreover, no CRC polynomial has yet been found that satisfies an emerging need to attain both HD=6 for 12K bit messages and HD=4 for message lengths beyond 64 Kbits. This paper presents results from the first exhaustive search of the 32-bit CRC design space. Results from previous research are validated and extended to include identifying all polynomials achieving a better HD than the IEEE 802.3 CRC-32 polynomial. A new class of polynomials is identified that provides HD=6 up to nearly 16K bit and HD=4 up to 114K bit message lengths, providing the best achievable design point that maximizes error detection for both legacy and new applications, including potentially iSCSI and application-implemented error checks.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"7 1","pages":"459-468"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82952295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A portable and fault-tolerant microprocessor based on the SPARC v8 architecture","authors":"J. Gaisler","doi":"10.1109/DSN.2002.1028926","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028926","url":null,"abstract":"The architecture and implementation of the LEON-FT processor is presented. LEON-FT is a fault-tolerant 32 bit processor based on the SPARC V8 instruction set. The processors tolerates transient SEU errors by using techniques such as TMR registers, on-chip EDAC, parity, pipeline restart, and forced cache miss. The first prototypes were manufactured on the Atmel ATC35 0.35 /spl mu/m CMOS process, and subjected to heavy-ion fault-injection at the Louvain Cyclotron. The heavy-ion tests showed that all of the injected errors (>100,000) were successfully corrected without timing or software impact. The device SEU threshold was measured to be below 6 MeV while ion energy-levels of up to 110 MeV were used for error injection.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"1 1","pages":"409-415"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82296899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Franceschinis, M. Gribaudo, M. Iacono, V. Vittorini, Claudio Bertoncello
{"title":"DrawNet++: a flexible framework for building dependability models","authors":"G. Franceschinis, M. Gribaudo, M. Iacono, V. Vittorini, Claudio Bertoncello","doi":"10.1109/DSN.2002.1028961","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028961","url":null,"abstract":"The DrawNet++ project addresses the compositional construction of dependability models. Its main goals are to provide: a) a GUI to any graph-based formalism; b) a support to the design process of dependability models, according to concepts inspired by object orientation (OO); c) a user friendly front-end for different classes of analysis/simulation tools.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"28 1","pages":"540-"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81518648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Haverkort, L. Cloth, H. Hermanns, J. Katoen, C. Baier
{"title":"Model checking performability properties","authors":"B. Haverkort, L. Cloth, H. Hermanns, J. Katoen, C. Baier","doi":"10.1109/DSN.2002.1028891","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028891","url":null,"abstract":"Model checking has been introduced as an automated technique to verify whether functional properties, expressed in a formal logic like computational tree logic (CTL), do hold in a formally-specified system. We present a number of computational procedures to perform model checking of continuous stochastic reward logic (CSRL) over finite Markov reward models, thereby stressing their computational complexity (time and space) and applicability from a practical point of view (accuracy, stability). A case study in the area of ad hoc mobile computing under power constraints shows the merits of CSRL and the new computational procedures.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"62 1","pages":"103-112"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82377931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ditto processor","authors":"S. Lai, Shih-Lien Lu, J. Peir","doi":"10.1109/DSN.2002.1028947","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028947","url":null,"abstract":"Concentration of design effort for current single-chip commercial-off-the-shelf (COTS) microprocessors has been directed towards performance. Reliability has not been the primary focus. As supply voltage scales to accommodate technology scaling and to lower power consumption, transient errors are more likely to be introduced. The basic idea behind any error tolerance scheme involves some type of redundancy. Redundancy techniques can be categorized in three general categories: (1) hardware redundancy, (2) information redundancy, and (3) time redundancy. Existing time redundant techniques for improving reliability of a superscalar processor utilize the otherwise unused hardware resources as much as possible to hide the overhead of program re-execution and verification. However, our study reveals that re-executing of long latency operations contributes to performance loss. We suggest a method to handle short and long latency instructions in slightly different ways to reduce the performance degradation. Our goal is to minimize the hardware overhead and performance degradation while maximizing the fault detection coverage. Experimental studies through microarchitecture simulation are used to compare performance lost due to the proposed scheme with non-fault tolerant design and different existing time redundant fault tolerant schemes. Fourteen integer and floating-point benchmarks are simulated with 1.8/spl sim/13.3% performance loss when compared with non-fault-tolerant superscalar processor.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"33 2 1","pages":"525-534"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83650597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Madeira, R. Some, F. Moreira, D. Costa, D. Rennels
{"title":"Experimental evaluation of a COTS system for space applications","authors":"H. Madeira, R. Some, F. Moreira, D. Costa, D. Rennels","doi":"10.1109/DSN.2002.1028916","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028916","url":null,"abstract":"This paper evaluates the impact of transient errors in the operating system of a COTS-based system (CETIA board with two PowerPC 750 processors running LynxOS) and quantifies their effects at both the OS and at the application level. The study has been conducted using a Software-Implemented Fault Injection tool (Xception) and both realistic programs and synthetic workloads (to focus on specific OS features) have been used. The results provide a comprehensive picture of the impact of faults on LynxOS key features (process scheduling and the most frequent system calls), data integrity, error propagation, application termination, and correctness of application results.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"29 1","pages":"325-330"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85287621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the use of disaster prediction for failure-tolerance in feedback control systems","authors":"J. Cunha, M. Z. Rela","doi":"10.1109/DSN.2002.1028893","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028893","url":null,"abstract":"Feedback control algorithms are inherently designed to compensate for external disturbances that the controlled system may suffer. This resilience is also extensible to late or wrong control actions produced by a failed controller computer, providing a degree of fault tolerance without the use of any particular mechanism. However, some controller failures, due to their duration or value, may indeed collapse the system, and thus other recovery measures must be taken. This paper proposes the inclusion of an Oracle that calculates, in a timely manner, the controlled system behavior under a failed controller, and triggers recovery when the control algorithm is predictably no more able to compensate for a particular controller failure. The systems so built follow the Fail-Bounded model. The main contribution of this paper is to show how this model can be implemented in a practical way for the very important class of applications based on feedback control, thus turning that model into a technique that can be used effectively to build production systems. The method was validated experimentally through fault injection on the controller computer of an inverted pendulum, one of the most time-demanding control system benchmarks.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"85 1","pages":"123-132"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76172983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing the cost of group communication with semantic view synchrony","authors":"J. Pereira, L. Rodrigues, R. Oliveira","doi":"10.1109/DSN.2002.1028913","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028913","url":null,"abstract":"View synchrony (VS) is a powerful abstraction in the design and implementation of dependable distributed systems. By ensuring that processes deliver the same set of messages in each view, it allows them to maintain consistency across membership changes. However experience indicates that it is hard to combine strong reliability guarantees as offered by VS with stable high performance. In this paper we propose a novel abstraction, semantic view synchrony (SVS), that exploits the application's semantics to cope with high throughput applications. This is achieved by allowing some messages to be dropped while still preserving consistency when new views are installed. Thus, SVS inherits the elegance of view synchronous communication. The paper describes how SVS can be implemented and illustrates its usefulness in the context of distributed multi-player games.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"1 1","pages":"293-302"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78609756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A realistic look at failure detectors","authors":"C. Delporte-Gallet, H. Fauconnier, R. Guerraoui","doi":"10.1109/DSN.2002.1028919","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028919","url":null,"abstract":"This paper shows that, in an environment where we do not bound the number of faulty processes, the class P of perfect failure detectors is the weakest (among realistic failure detectors) to solve fundamental agreement problems like uniform consensus, atomic broadcast, and terminating reliable broadcast (also called Byzantine generals). Roughly speaking, in this environment, we collapse the Chandra-Toueg failure detector hierarchy, by showing that P ends up being the only class to solve those agreement problems. This contributes in explaining why most reliable distributed systems we know of do rely on some group membership service that precisely aims at emulating P. As an interesting side effect of our work, we show that, in our general environment, uniform consensus is strictly harder than consensus, and we revisit the view that uniform consensus and atomic broadcast are strictly weaker than terminating reliable broadcast.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"20 1","pages":"345-353"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73718510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic generation of availability models in RAScad","authors":"D. Tang, Ji Zhu, Roy Andrada","doi":"10.1109/DSN.2002.1028935","DOIUrl":"https://doi.org/10.1109/DSN.2002.1028935","url":null,"abstract":"RAScad is a Sun internal Web-based reliability, availability, serviceability (RAS) architecture modeling and analysis tool for use in the computer system design and development phase. Two major goals of RAScad are: making availability modeling possible for design engineers without background in mathematical modeling and making availability modeling efficient for RAS engineers who understand underlying mathematical models. To achieve these goals, RAScad integrates two modules: Model Generator (MG) which provides automatic model generation specific to Sun product RAS characteristics, and Graphical Model Builder (GMB) which provides general, graphical Markov, semi-Markov and reliability block diagram modeling capabilities. An MG model is a hierarchical specification, in terms of an engineering language (MTBF, MTTR, redundancy, etc.), of the constituent components and associated parameters for the modeled system and the user does not have to understand the underlying mathematical models generated by RAScad.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"8 1","pages":"488-492"},"PeriodicalIF":0.0,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84777085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}