{"title":"分布式服务的协同选择性复兴","authors":"Guanhua Tian, Dan Meng","doi":"10.1109/ICPADS.2010.10","DOIUrl":null,"url":null,"abstract":"Service availability and QoS, in terms of customer affecting performance metrics, is crucial for service systems. However, the increasing complexity in distributed service systems introduce hidden space for software faults, which undermine system availability, leading to fault or even down time. In this paper, we introduce a composition technique, Coordinated Selective Rejuvenation, to automate the whole procession of fault component identification and rejuvenation arbitration, in order to guarantee distributed service system's customer-affecting metrics. We take evaluation with fault injection experiment on RUBiS, which simulates distributed eCommerce of eBay.com. The results indicate that our request path analysis approach and system model technique are effective for fault component's location, Bayesian network technique is feasible for fault pinpointing, in terms of request tracing context. Meanwhile, the arbitration scheme, can effectively guarantee system QoS, by identifying and rejuvenating most likely performance fault tier, before the degradation of customer affecting performance metric become severe.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Coordinated Selective Rejuvenation for Distributed Services\",\"authors\":\"Guanhua Tian, Dan Meng\",\"doi\":\"10.1109/ICPADS.2010.10\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Service availability and QoS, in terms of customer affecting performance metrics, is crucial for service systems. However, the increasing complexity in distributed service systems introduce hidden space for software faults, which undermine system availability, leading to fault or even down time. In this paper, we introduce a composition technique, Coordinated Selective Rejuvenation, to automate the whole procession of fault component identification and rejuvenation arbitration, in order to guarantee distributed service system's customer-affecting metrics. We take evaluation with fault injection experiment on RUBiS, which simulates distributed eCommerce of eBay.com. The results indicate that our request path analysis approach and system model technique are effective for fault component's location, Bayesian network technique is feasible for fault pinpointing, in terms of request tracing context. Meanwhile, the arbitration scheme, can effectively guarantee system QoS, by identifying and rejuvenating most likely performance fault tier, before the degradation of customer affecting performance metric become severe.\",\"PeriodicalId\":365914,\"journal\":{\"name\":\"2010 IEEE 16th International Conference on Parallel and Distributed Systems\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 16th International Conference on Parallel and Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS.2010.10\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.2010.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Coordinated Selective Rejuvenation for Distributed Services
Service availability and QoS, in terms of customer affecting performance metrics, is crucial for service systems. However, the increasing complexity in distributed service systems introduce hidden space for software faults, which undermine system availability, leading to fault or even down time. In this paper, we introduce a composition technique, Coordinated Selective Rejuvenation, to automate the whole procession of fault component identification and rejuvenation arbitration, in order to guarantee distributed service system's customer-affecting metrics. We take evaluation with fault injection experiment on RUBiS, which simulates distributed eCommerce of eBay.com. The results indicate that our request path analysis approach and system model technique are effective for fault component's location, Bayesian network technique is feasible for fault pinpointing, in terms of request tracing context. Meanwhile, the arbitration scheme, can effectively guarantee system QoS, by identifying and rejuvenating most likely performance fault tier, before the degradation of customer affecting performance metric become severe.