{"title":"将软件故障行为与规范联系起来","authors":"Frederick T. Sheldon, K. Kavi","doi":"10.1109/WIEM.1994.654403","DOIUrl":null,"url":null,"abstract":"POSITION S TATEMENT For the past two decades, various software fault-tolerance (IT) schemes have been proposed, e.g., N-Version Programming, Self-checking and Recovery Block schemes among others. Yet, few real systems have incorporated software fault-tolerance schemes in practice. The reluctance to use software fault-tolerance schemes stems from some formidable sources: (a) inherent complexity and development risk, (b) high cost of HW lk SW redundancy, (c) realization of acceptability test logic (including the overhead imposed on performancej, and (d) lack of trustworthy evaluation methods for determining system reliability. We seek to better understand the source and mechanism of software failures, and to identify the software fault-tolerance mechanism most appropriate for a articular class of failures. We are attempting to relate the failure behavior of software to the formal specification of the software system at higher levels. RELATED WORK: RELIABILITY GROWTH TESTING Current approaches utilize reliability growth testing which is highly dependent on the predictive validity of the model, test coverage and operational profile. These approaches often employ goodness of fit and recalibration techniques to enable the user to gauge how well the model is working. Software reliability can be predicted based on measurable characteristics of the software development process and artifacts. A program's failure rate is related to the fault hazard rate profile. Unfortunately, the hazard rate profile is usually determined by \"fault seeding\" or by retrospective failure analysis. Under a particular operational profile, provided the same information (i.e., frequency with which potential faults are encountered) can be provided by adding randomly placed counters within the code. SYSTEMS LEVEL: INHERENT RELIABILITY OF A SW FT DESIGN CANDIDATE Thee different result spaces are possible from software fault-tolerance (see Figure 1): 1) Intended or correct results, shown by the horizontally oriented oval, which fulfill the intention of the user a d are defined by system requirements, 2) Actuai results, those produced by the system (Ova at 4s0), and 3) Accepted results. those admitted by the error detection module as being tolerable (vertical ovai). The relationshp between these three result sets make possible four state categones (see tree structure): i) No error: actual result is correct and accepted. ii) False alarm: X t U a l result is correct but not accepted, iii) Missing alarm: actual result is not correct but accepted, and iv) Detected error: actual result is not correct and not accepted.","PeriodicalId":386840,"journal":{"name":"Third Int'l Workshop on Integrating Error Models with Fault Injection","volume":"121 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Linking Software Failure Behavior To Specification\",\"authors\":\"Frederick T. Sheldon, K. Kavi\",\"doi\":\"10.1109/WIEM.1994.654403\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"POSITION S TATEMENT For the past two decades, various software fault-tolerance (IT) schemes have been proposed, e.g., N-Version Programming, Self-checking and Recovery Block schemes among others. Yet, few real systems have incorporated software fault-tolerance schemes in practice. The reluctance to use software fault-tolerance schemes stems from some formidable sources: (a) inherent complexity and development risk, (b) high cost of HW lk SW redundancy, (c) realization of acceptability test logic (including the overhead imposed on performancej, and (d) lack of trustworthy evaluation methods for determining system reliability. We seek to better understand the source and mechanism of software failures, and to identify the software fault-tolerance mechanism most appropriate for a articular class of failures. We are attempting to relate the failure behavior of software to the formal specification of the software system at higher levels. RELATED WORK: RELIABILITY GROWTH TESTING Current approaches utilize reliability growth testing which is highly dependent on the predictive validity of the model, test coverage and operational profile. These approaches often employ goodness of fit and recalibration techniques to enable the user to gauge how well the model is working. Software reliability can be predicted based on measurable characteristics of the software development process and artifacts. A program's failure rate is related to the fault hazard rate profile. Unfortunately, the hazard rate profile is usually determined by \\\"fault seeding\\\" or by retrospective failure analysis. Under a particular operational profile, provided the same information (i.e., frequency with which potential faults are encountered) can be provided by adding randomly placed counters within the code. SYSTEMS LEVEL: INHERENT RELIABILITY OF A SW FT DESIGN CANDIDATE Thee different result spaces are possible from software fault-tolerance (see Figure 1): 1) Intended or correct results, shown by the horizontally oriented oval, which fulfill the intention of the user a d are defined by system requirements, 2) Actuai results, those produced by the system (Ova at 4s0), and 3) Accepted results. those admitted by the error detection module as being tolerable (vertical ovai). The relationshp between these three result sets make possible four state categones (see tree structure): i) No error: actual result is correct and accepted. ii) False alarm: X t U a l result is correct but not accepted, iii) Missing alarm: actual result is not correct but accepted, and iv) Detected error: actual result is not correct and not accepted.\",\"PeriodicalId\":386840,\"journal\":{\"name\":\"Third Int'l Workshop on Integrating Error Models with Fault Injection\",\"volume\":\"121 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1994-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Third Int'l Workshop on Integrating Error Models with Fault Injection\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WIEM.1994.654403\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Third Int'l Workshop on Integrating Error Models with Fault Injection","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WIEM.1994.654403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
摘要
在过去的二十年里,人们提出了各种各样的软件容错(IT)方案,如n版本编程、自检和恢复块方案等。然而,很少有实际系统在实践中采用软件容错方案。不愿意使用软件容错方案源于一些可怕的来源:(a)固有的复杂性和开发风险,(b)硬件和软件冗余的高成本,(c)可接受测试逻辑的实现(包括强加在性能上的开销),以及(d)缺乏确定系统可靠性的可靠评估方法。我们试图更好地理解软件故障的来源和机制,并确定最适合特定类型故障的软件容错机制。我们试图将软件的故障行为与更高层次的软件系统的正式规范联系起来。相关工作:可靠性增长测试目前的方法利用可靠性增长测试,这高度依赖于模型的预测有效性、测试覆盖率和操作概况。这些方法通常采用拟合优度和重新校准技术,使用户能够衡量模型的工作情况。软件可靠性可以基于软件开发过程和工件的可测量特征来预测。程序的故障率与故障危险率曲线有关。不幸的是,危险率概况通常是由“故障播种”或回顾性故障分析确定的。在特定的操作概要下,可以通过在代码中添加随机放置的计数器来提供相同的信息(即,遇到潜在故障的频率)。软件容错可能产生三种不同的结果空间(见图1):1)预期的或正确的结果,由水平方向的椭圆显示,满足用户的意图,由系统需求定义,2)实际结果,由系统产生的结果(Ova为450),以及3)可接受的结果。被错误检测模块承认为可容忍的(垂直ovai)。这三个结果集之间的关系使四种状态类别成为可能(参见树结构):i)没有错误:实际结果是正确的和可接受的。ii)虚警:X t U al结果正确但不被接受;iii)漏警:实际结果不正确但被接受;iv)检测错误:实际结果不正确但不被接受。
Linking Software Failure Behavior To Specification
POSITION S TATEMENT For the past two decades, various software fault-tolerance (IT) schemes have been proposed, e.g., N-Version Programming, Self-checking and Recovery Block schemes among others. Yet, few real systems have incorporated software fault-tolerance schemes in practice. The reluctance to use software fault-tolerance schemes stems from some formidable sources: (a) inherent complexity and development risk, (b) high cost of HW lk SW redundancy, (c) realization of acceptability test logic (including the overhead imposed on performancej, and (d) lack of trustworthy evaluation methods for determining system reliability. We seek to better understand the source and mechanism of software failures, and to identify the software fault-tolerance mechanism most appropriate for a articular class of failures. We are attempting to relate the failure behavior of software to the formal specification of the software system at higher levels. RELATED WORK: RELIABILITY GROWTH TESTING Current approaches utilize reliability growth testing which is highly dependent on the predictive validity of the model, test coverage and operational profile. These approaches often employ goodness of fit and recalibration techniques to enable the user to gauge how well the model is working. Software reliability can be predicted based on measurable characteristics of the software development process and artifacts. A program's failure rate is related to the fault hazard rate profile. Unfortunately, the hazard rate profile is usually determined by "fault seeding" or by retrospective failure analysis. Under a particular operational profile, provided the same information (i.e., frequency with which potential faults are encountered) can be provided by adding randomly placed counters within the code. SYSTEMS LEVEL: INHERENT RELIABILITY OF A SW FT DESIGN CANDIDATE Thee different result spaces are possible from software fault-tolerance (see Figure 1): 1) Intended or correct results, shown by the horizontally oriented oval, which fulfill the intention of the user a d are defined by system requirements, 2) Actuai results, those produced by the system (Ova at 4s0), and 3) Accepted results. those admitted by the error detection module as being tolerable (vertical ovai). The relationshp between these three result sets make possible four state categones (see tree structure): i) No error: actual result is correct and accepted. ii) False alarm: X t U a l result is correct but not accepted, iii) Missing alarm: actual result is not correct but accepted, and iv) Detected error: actual result is not correct and not accepted.