{"title":"工程弹性:预测变化对可重构系统性能和可用性的影响","authors":"M. A. Hakamian","doi":"10.1109/ISSREW51248.2020.00054","DOIUrl":null,"url":null,"abstract":"Modern distributed systems are supposed to be resilience and continue to operate according to agreed-on Quality of Service (QoS) despite the failure of few services or variations in workload. Real-world incidents show that systems still undergo unacceptable QoS degradations or significant service outages. The main reasons are updates of the system or infrastructural services, and subsequently, faulty recovery logic. Frequent updates and faulty recovery logic result in a correlated set of failure modes that impact the system’s QoS. Software architects need assurance that the system satisfies agreed-on QoS despite updates in the system or infrastructural services. In this research, we propose systematic identification of the risk of a correlated set of failure modes due to updates that cause unacceptable performance degradation or service outage. According to the Architecture Tradeoff Analysis Method (ATAM), we propose to formulate collected risks into a scenario structure for a precise resilience requirement characterization. Furthermore, we propose model-based prediction methods for scenario-based resilience evaluation of the system. Therefore, the software architect has a measurement-based evaluation of system resilience and can incorporate the evaluation result for further system resilience improvement or specifying a precise service level agreement.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Engineering Resilience: Predicting The Change Impact on Performance and Availability of Reconfigurable Systems\",\"authors\":\"M. A. Hakamian\",\"doi\":\"10.1109/ISSREW51248.2020.00054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern distributed systems are supposed to be resilience and continue to operate according to agreed-on Quality of Service (QoS) despite the failure of few services or variations in workload. Real-world incidents show that systems still undergo unacceptable QoS degradations or significant service outages. The main reasons are updates of the system or infrastructural services, and subsequently, faulty recovery logic. Frequent updates and faulty recovery logic result in a correlated set of failure modes that impact the system’s QoS. Software architects need assurance that the system satisfies agreed-on QoS despite updates in the system or infrastructural services. In this research, we propose systematic identification of the risk of a correlated set of failure modes due to updates that cause unacceptable performance degradation or service outage. According to the Architecture Tradeoff Analysis Method (ATAM), we propose to formulate collected risks into a scenario structure for a precise resilience requirement characterization. Furthermore, we propose model-based prediction methods for scenario-based resilience evaluation of the system. Therefore, the software architect has a measurement-based evaluation of system resilience and can incorporate the evaluation result for further system resilience improvement or specifying a precise service level agreement.\",\"PeriodicalId\":202247,\"journal\":{\"name\":\"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSREW51248.2020.00054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW51248.2020.00054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Engineering Resilience: Predicting The Change Impact on Performance and Availability of Reconfigurable Systems
Modern distributed systems are supposed to be resilience and continue to operate according to agreed-on Quality of Service (QoS) despite the failure of few services or variations in workload. Real-world incidents show that systems still undergo unacceptable QoS degradations or significant service outages. The main reasons are updates of the system or infrastructural services, and subsequently, faulty recovery logic. Frequent updates and faulty recovery logic result in a correlated set of failure modes that impact the system’s QoS. Software architects need assurance that the system satisfies agreed-on QoS despite updates in the system or infrastructural services. In this research, we propose systematic identification of the risk of a correlated set of failure modes due to updates that cause unacceptable performance degradation or service outage. According to the Architecture Tradeoff Analysis Method (ATAM), we propose to formulate collected risks into a scenario structure for a precise resilience requirement characterization. Furthermore, we propose model-based prediction methods for scenario-based resilience evaluation of the system. Therefore, the software architect has a measurement-based evaluation of system resilience and can incorporate the evaluation result for further system resilience improvement or specifying a precise service level agreement.