{"title":"航天飞机容错:模拟和数字团队合作","authors":"H. Blair-Smith","doi":"10.1109/DASC.2009.5347450","DOIUrl":null,"url":null,"abstract":"The Space Shuttle control system (including the avionics suite) was developed during the 1970s to meet stringent survivability requirements that were then extraordinary but today may serve as a standard against which modern avionics can be measured. In 30 years of service, only two major malfunctions have occurred, both due to failures far beyond the reach of fault tolerance technology: the explosion of an external fuel tank, and the destruction of a launch-damaged wing by re-entry friction. The Space Shuttle is among the earliest systems (if not the earliest) designed to a “FO-FO-FS” criterion, meaning that it had to Fail (fully) Operational after any one failure, then Fail Operational after any second failure (even of the same kind of unit), then Fail Safe after most kinds of third failure. The computer system had to meet this criterion using a Redundant Set of 4 computers plus a backup of the same type, which was (ostensibly!) a COTS type. Quadruple redundancy was also employed in the hydraulic actuators for elevons and rudder. Sensors were installed with quadruple, triple, or dual redundancy. For still greater fault tolerance, these three redundancies (sensors, computers, actuators) were made independent of each other so that the reliability criterion applies to each category separately. The mission rule for Shuttle flights, as distinct from the design criterion, became “FO-FS,” so that a mission continues intact after any one failure, but is terminated with a safe return after any second failure of the same type. To avoid an unrecoverable flat spin during the most dynamic flight phases, the overall system had to continue safe operation within 400 msec of any failure, but the decision to shut down a computer had to be made by the crew. Among the interesting problems to be solved were “control slivering” and “sync holes.” The first flight test (Approach and Landing only) was the proof of the pudding: when a key wire harness solder joint was jarred loose by the Shuttle's being popped off the back of its 747 mother ship, one of the computers “went bananas” (actual quote from an IBM expert).","PeriodicalId":313168,"journal":{"name":"2009 IEEE/AIAA 28th Digital Avionics Systems Conference","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Space shuttle fault tolerance: Analog and digital teamwork\",\"authors\":\"H. Blair-Smith\",\"doi\":\"10.1109/DASC.2009.5347450\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Space Shuttle control system (including the avionics suite) was developed during the 1970s to meet stringent survivability requirements that were then extraordinary but today may serve as a standard against which modern avionics can be measured. In 30 years of service, only two major malfunctions have occurred, both due to failures far beyond the reach of fault tolerance technology: the explosion of an external fuel tank, and the destruction of a launch-damaged wing by re-entry friction. The Space Shuttle is among the earliest systems (if not the earliest) designed to a “FO-FO-FS” criterion, meaning that it had to Fail (fully) Operational after any one failure, then Fail Operational after any second failure (even of the same kind of unit), then Fail Safe after most kinds of third failure. The computer system had to meet this criterion using a Redundant Set of 4 computers plus a backup of the same type, which was (ostensibly!) a COTS type. Quadruple redundancy was also employed in the hydraulic actuators for elevons and rudder. Sensors were installed with quadruple, triple, or dual redundancy. For still greater fault tolerance, these three redundancies (sensors, computers, actuators) were made independent of each other so that the reliability criterion applies to each category separately. The mission rule for Shuttle flights, as distinct from the design criterion, became “FO-FS,” so that a mission continues intact after any one failure, but is terminated with a safe return after any second failure of the same type. To avoid an unrecoverable flat spin during the most dynamic flight phases, the overall system had to continue safe operation within 400 msec of any failure, but the decision to shut down a computer had to be made by the crew. Among the interesting problems to be solved were “control slivering” and “sync holes.” The first flight test (Approach and Landing only) was the proof of the pudding: when a key wire harness solder joint was jarred loose by the Shuttle's being popped off the back of its 747 mother ship, one of the computers “went bananas” (actual quote from an IBM expert).\",\"PeriodicalId\":313168,\"journal\":{\"name\":\"2009 IEEE/AIAA 28th Digital Avionics Systems Conference\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE/AIAA 28th Digital Avionics Systems Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DASC.2009.5347450\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE/AIAA 28th Digital Avionics Systems Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DASC.2009.5347450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Space shuttle fault tolerance: Analog and digital teamwork
The Space Shuttle control system (including the avionics suite) was developed during the 1970s to meet stringent survivability requirements that were then extraordinary but today may serve as a standard against which modern avionics can be measured. In 30 years of service, only two major malfunctions have occurred, both due to failures far beyond the reach of fault tolerance technology: the explosion of an external fuel tank, and the destruction of a launch-damaged wing by re-entry friction. The Space Shuttle is among the earliest systems (if not the earliest) designed to a “FO-FO-FS” criterion, meaning that it had to Fail (fully) Operational after any one failure, then Fail Operational after any second failure (even of the same kind of unit), then Fail Safe after most kinds of third failure. The computer system had to meet this criterion using a Redundant Set of 4 computers plus a backup of the same type, which was (ostensibly!) a COTS type. Quadruple redundancy was also employed in the hydraulic actuators for elevons and rudder. Sensors were installed with quadruple, triple, or dual redundancy. For still greater fault tolerance, these three redundancies (sensors, computers, actuators) were made independent of each other so that the reliability criterion applies to each category separately. The mission rule for Shuttle flights, as distinct from the design criterion, became “FO-FS,” so that a mission continues intact after any one failure, but is terminated with a safe return after any second failure of the same type. To avoid an unrecoverable flat spin during the most dynamic flight phases, the overall system had to continue safe operation within 400 msec of any failure, but the decision to shut down a computer had to be made by the crew. Among the interesting problems to be solved were “control slivering” and “sync holes.” The first flight test (Approach and Landing only) was the proof of the pudding: when a key wire harness solder joint was jarred loose by the Shuttle's being popped off the back of its 747 mother ship, one of the computers “went bananas” (actual quote from an IBM expert).