{"title":"计算网格的误差范围:理论与实践","authors":"D. Thain, M. Livny","doi":"10.1109/HPDC.2002.1029919","DOIUrl":null,"url":null,"abstract":"Error propagation is a central problem in grid computing. We re-learned this while adding a Java feature to the Condor computational grid. Our initial experience with the system was negative, due to the large number of new ways in which the system could fail. To reason about this problem, we developed a theory of error propagation. Central to our theory is the concept of an error's scope, defined as the portion of a system that it invalidates. With this theory in hand, we recognized that the expanded system did not properly consider the scope of errors it discovered. We modified the system according to our theory, and succeeded in making it a more robust platform for distributed computing.","PeriodicalId":279053,"journal":{"name":"Proceedings 11th IEEE International Symposium on High Performance Distributed Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":"{\"title\":\"Error scope on a computational grid: theory and practice\",\"authors\":\"D. Thain, M. Livny\",\"doi\":\"10.1109/HPDC.2002.1029919\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Error propagation is a central problem in grid computing. We re-learned this while adding a Java feature to the Condor computational grid. Our initial experience with the system was negative, due to the large number of new ways in which the system could fail. To reason about this problem, we developed a theory of error propagation. Central to our theory is the concept of an error's scope, defined as the portion of a system that it invalidates. With this theory in hand, we recognized that the expanded system did not properly consider the scope of errors it discovered. We modified the system according to our theory, and succeeded in making it a more robust platform for distributed computing.\",\"PeriodicalId\":279053,\"journal\":{\"name\":\"Proceedings 11th IEEE International Symposium on High Performance Distributed Computing\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"46\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 11th IEEE International Symposium on High Performance Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPDC.2002.1029919\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 11th IEEE International Symposium on High Performance Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.2002.1029919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Error scope on a computational grid: theory and practice
Error propagation is a central problem in grid computing. We re-learned this while adding a Java feature to the Condor computational grid. Our initial experience with the system was negative, due to the large number of new ways in which the system could fail. To reason about this problem, we developed a theory of error propagation. Central to our theory is the concept of an error's scope, defined as the portion of a system that it invalidates. With this theory in hand, we recognized that the expanded system did not properly consider the scope of errors it discovered. We modified the system according to our theory, and succeeded in making it a more robust platform for distributed computing.