{"title":"Residual fault density prediction using regression methods","authors":"J. A. Morgan, G. Knafl","doi":"10.1109/ISSRE.1996.558706","DOIUrl":"https://doi.org/10.1109/ISSRE.1996.558706","url":null,"abstract":"Regression methods are used to model residual fault density in terms of several product and testing process measures. Process measures considered include discovered fault density, test set size and various coverage measures such as block, decision and all-uses coverage. Product measures considered include lines of code as well as block, decision and all-uses counts. The relative importance of these product/process measures for predicting residual fault density is assessed for a specific data set. Only selected testing process measures, in particular discovered fault density and decision coverage, are important predictors in this case while all product measures considered are important. These results are based on consideration of a substantial family of models, specifically, the family of quadratic response surface models with two-way interaction. Model selection is based on \"leave one out at a time\" cross-validation using the predicted residual sum of squares (PRESS) criterion.","PeriodicalId":441362,"journal":{"name":"Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122133804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Primary-shadow consistency issues in the DRB scheme and the recovery time bound","authors":"K. Kim, L. Bacellar, C. Subbaraman","doi":"10.1109/ISSRE.1996.558888","DOIUrl":"https://doi.org/10.1109/ISSRE.1996.558888","url":null,"abstract":"The distributed recovery block (DRB) scheme is an approach for realizing both hardware and software fault tolerance in real time distributed and parallel computer systems. We point out that in order for the DRB scheme to yield a high fault coverage and a low recovery time bound, some important consistency requirements must be satisfied by the replicated application tasks in a DRB computing station. Newly identified approaches for meeting the consistency requirements, which involve, among other things, integration of network surveillance and reconfiguration (NSR) techniques with the DRB scheme, are presented. The paper then presents an analysis of the recovery time bound of the DRB scheme. The analysis is based on a modular structured concrete implementation model of the DRB scheme for local area network (LAN) based distributed computer systems, which is called the DRB/T LAN scheme and incorporates an NSR scheme and the newly identified consistency ensuring mechanisms. Finally, we consider approaches for applying the DRB scheme to new types of application computation segments that were not considered before and then discuss approaches for meeting the consistency requirements in such DRB stations. These approaches broaden the application range of the DRB scheme significantly.","PeriodicalId":441362,"journal":{"name":"Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122162245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing reliable systems from reliable components using the context-dependent constraint concept","authors":"Peter Molin","doi":"10.1109/ISSRE.1996.558738","DOIUrl":"https://doi.org/10.1109/ISSRE.1996.558738","url":null,"abstract":"The problem of composing a system from well-behaving components is discussed. Specifically, necessary conditions for preserving the behaviour in a system context are analysed in this paper. Such conditions are defined as Context-Dependent Constraints (CDC). A non-formal approach is taken based on common system integration errors. It is suggested that the identification and verification of CDCs should be part of any development method based on component verification. The CDCs can also serve as an aid for designing reliable and maintainable systems, where the goal of the design process is to reduce the number of CDCs.","PeriodicalId":441362,"journal":{"name":"Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering","volume":"80 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125890117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software reliability engineering for client-server systems","authors":"N. Schneidewind","doi":"10.1109/ISSRE.1996.558829","DOIUrl":"https://doi.org/10.1109/ISSRE.1996.558829","url":null,"abstract":"Too often, when doing software reliability modeling and prediction, the assumption is made that the software involves either a single module or a single node. The reality in today's increasing use of multi-node client-server systems is that there are multiple software entities that execute on multiple nodes that must be modeled in a system context, if realistic reliability predictions and assessments are to be made. For example, if there are N/sub c/ clients and N/sub x/ servers in a client-server system, it is not necessarily the case that a software failure in any of the N/sub c/ clients or N/sub x/ servers will cause the system to fail. Thus, if such a system were to be modeled as a single entity, the predicted reliability would be much lower than the true reliability, because the prediction would not account for criticality and redundancy. The first factor accounts for the possibility that the survivability of some clients and servers will be more critical to continued system operation than others, while the second factor accounts for the possibility of using redundant nodes to allow for system recovery should a critical node fail. To address this problem, we must identify which nodes-clients and servers-are critical and which are not critical, as defined by whether these nodes are used for critical or non-critical functions, respectively.","PeriodicalId":441362,"journal":{"name":"Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129810363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reliability and availability of a wide area network-based education system","authors":"P. Dixit, M. Vouk, D. Bitzer, Christopher Alix","doi":"10.1109/ISSRE.1996.558815","DOIUrl":"https://doi.org/10.1109/ISSRE.1996.558815","url":null,"abstract":"An important class of quality of service (QoS)-dependent network-based applications are computer-based education systems. A successful network-based education (NBE) system needs to provide appropriate QoS at the user level. This includes adequate end-to-end response delay and adequate system reliability and availability. This paper presents results from a reliability and availability evaluation of NovaNET. NovaNET is a successful low-overhead multimedia education system which serves thousands of users on a daily basis. We analyze eight years of failure data and examine correlations among system failure events. The NovaNET data are used to discuss practical bounds on the reliability and availability of an NBE system.","PeriodicalId":441362,"journal":{"name":"Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117328341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Method for designing and placing check sets based on control flow analysis of programs","authors":"S. Geoghegan, D. Avresky","doi":"10.1109/ISSRE.1996.558838","DOIUrl":"https://doi.org/10.1109/ISSRE.1996.558838","url":null,"abstract":"Proposes a formal approach for adding fault detection to software. An assertion-based formalism is used to represent algorithm specifications. This representation is then used to generate a flowgraph or decision-to-decision graph (ddgraph), which is used to construct an execution path tree. The information gained from this algorithm representation is used to aid in the design of software-based fault tolerance techniques. Algorithm-based fault tolerance (ABFT) techniques are used to detect data structure-corrupting faults and checks are added to detect program flow errors. Flowgraph and ddgraph representations provide information to predict future program flow from the current flow. During execution, the current program location is recorded, along with the expected flow. Checks are placed to verify that the program flow follows the predicted flow. Fault coverage has been estimated through experiments with SOFIT (SOftware-based Fault Injection Tool), and the data is presented to demonstrate the effectiveness of the method.","PeriodicalId":441362,"journal":{"name":"Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128579702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DQS's experience with SRE","authors":"W. Everett, J. Gobat","doi":"10.1109/ISSRE.1996.558820","DOIUrl":"https://doi.org/10.1109/ISSRE.1996.558820","url":null,"abstract":"This is an experience report on the application of software reliability engineering (SRE) in developing the Data Quality System (DQS). DQS is a knowledge-based software system developed by Lucent Technologies. It was developed to synchronize data spread among several disparate databases. The first application of DQS was to telecommunications network databases. Because the development of DQS was a large custom software development project, SRE was employed to assist in measuring product quality and readiness for production use. In this report, we first describe what we wanted to accomplish in applying SRE methods in the DQS project. The AT&T best current practice outlined the prerequisites for applying SRE during testing. We describe how we addressed these prerequisites in the DQS project. In particular, we discuss specifics on: what assumptions were made regarding an operational profile; how test cases were selected to conform to operational profile usage; and how test log forms were designed and used to collect failure data and to manage the test effort itself. Next, we share the results of reliability growth modeling during the system testing and field trials. Finally, we highlight what we learned in our initial implementation of SRE and what plans were recommended for subsequent implementations.","PeriodicalId":441362,"journal":{"name":"Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121582262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An on-line algorithm for checkpoint placement","authors":"A. Ziv, Jehoshua Bruck","doi":"10.1109/ISSRE.1996.558869","DOIUrl":"https://doi.org/10.1109/ISSRE.1996.558869","url":null,"abstract":"Checkpointing is a common technique for reducing the time to recover from faults in computer systems. By saving intermediate states of programs in a reliable storage device, checkpointing enables one to reduce the processing time loss caused by faults. The length of the intervals between the checkpoints affects the execution time of the programs. Long intervals lead to a long re-processing time, while too-frequent checkpointing leads to a high checkpointing overhead. In this paper, we present an online algorithm for the placement of checkpoints. The algorithm uses online knowledge of the current cost of a checkpoint when it decides whether or not to place a checkpoint. We show how the execution time of a program using this algorithm can be analyzed. The total overhead of the execution time when the proposed algorithm is used is smaller than the overhead when fixed intervals are used. Although the proposed algorithm uses only online knowledge about the cost of checkpointing, its behavior is close to that of the off-line optimal algorithm that uses the complete knowledge of the checkpointing cost.","PeriodicalId":441362,"journal":{"name":"Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129099982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reliability of a commercial telecommunications system","authors":"M. Kaâniche, K. Kanoun","doi":"10.1109/ISSRE.1996.558807","DOIUrl":"https://doi.org/10.1109/ISSRE.1996.558807","url":null,"abstract":"Analyzes data collected on a commercial telecommunications system and summarizes some of the lessons learned from this study. The data correspond to failure and fault information recorded during system validation and operation: 3,063 trouble reports, corresponding to a five-year period during which five versions of the system have been developed and more than 100 systems have been introduced in the field. The failure information includes software failures as well as hardware failures due to design faults.","PeriodicalId":441362,"journal":{"name":"Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130027562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Gokhale, T. Philip, P. Marinos, Kishor S. Trivedi
{"title":"Unification of finite failure non-homogeneous Poisson process models through test coverage","authors":"S. Gokhale, T. Philip, P. Marinos, Kishor S. Trivedi","doi":"10.1109/ISSRE.1996.558886","DOIUrl":"https://doi.org/10.1109/ISSRE.1996.558886","url":null,"abstract":"A number of analytical software reliability models have been proposed for estimating the reliability growth of a software product. We present an Enhanced Non-Homogeneous Poisson Process (ENHPP) model and show that previously reported Non-Homogeneous Poisson Process (NHPP) based models, with bounded mean valve functions, are special cases of the ENHPP model. The ENHPP model differs from previous models in that it incorporates explicitly the time varying test coverage function in its analytical formulation, and provides for defective fault detection and test coverage during the testing and operational phases. The ENHPP model is validated using several available failure data sets.","PeriodicalId":441362,"journal":{"name":"Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127943481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}