{"title":"改进云应用服务弹性的故障相关性","authors":"D. Mathews, Mudit Verma, P. Aggarwal, J. Lakshmi","doi":"10.1145/3492323.3495586","DOIUrl":null,"url":null,"abstract":"Autonomously dealing with disruptions is necessary for maintaining the quality of a cloud application service. A fault, error, or failure in any component across the application service stack can potentially disrupt the service delivery. Fault localization and failure prediction are essential techniques in managing service failures. Emerging cloud computing paradigms are pushing application services to be built as loosely coupled distributed components for independent scaling. However, such architectures render existing approaches for fault localization and failure prediction to be limiting. Prevalent works on fault localization and failure prediction focus on a specific cloud service architecture layer or a subset of service components or specific fault types. These approaches restrict the view on the impact of the fault on the application service and obviate more intelligent methods for localizing faults or predicting failures, and thus efficiently dealing with service disruptions in an autonomous way. This paper contemplates the propagation of faults in multi-tiered architectures like clouds and uses a real-world disruption scenario to emphasize the need for correlating the faults across the service layers to acquire insights for end-to-end fault analysis for cloud application services.","PeriodicalId":440884,"journal":{"name":"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards failure correlation for improved cloud application service resilience\",\"authors\":\"D. Mathews, Mudit Verma, P. Aggarwal, J. Lakshmi\",\"doi\":\"10.1145/3492323.3495586\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Autonomously dealing with disruptions is necessary for maintaining the quality of a cloud application service. A fault, error, or failure in any component across the application service stack can potentially disrupt the service delivery. Fault localization and failure prediction are essential techniques in managing service failures. Emerging cloud computing paradigms are pushing application services to be built as loosely coupled distributed components for independent scaling. However, such architectures render existing approaches for fault localization and failure prediction to be limiting. Prevalent works on fault localization and failure prediction focus on a specific cloud service architecture layer or a subset of service components or specific fault types. These approaches restrict the view on the impact of the fault on the application service and obviate more intelligent methods for localizing faults or predicting failures, and thus efficiently dealing with service disruptions in an autonomous way. This paper contemplates the propagation of faults in multi-tiered architectures like clouds and uses a real-world disruption scenario to emphasize the need for correlating the faults across the service layers to acquire insights for end-to-end fault analysis for cloud application services.\",\"PeriodicalId\":440884,\"journal\":{\"name\":\"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3492323.3495586\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3492323.3495586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards failure correlation for improved cloud application service resilience
Autonomously dealing with disruptions is necessary for maintaining the quality of a cloud application service. A fault, error, or failure in any component across the application service stack can potentially disrupt the service delivery. Fault localization and failure prediction are essential techniques in managing service failures. Emerging cloud computing paradigms are pushing application services to be built as loosely coupled distributed components for independent scaling. However, such architectures render existing approaches for fault localization and failure prediction to be limiting. Prevalent works on fault localization and failure prediction focus on a specific cloud service architecture layer or a subset of service components or specific fault types. These approaches restrict the view on the impact of the fault on the application service and obviate more intelligent methods for localizing faults or predicting failures, and thus efficiently dealing with service disruptions in an autonomous way. This paper contemplates the propagation of faults in multi-tiered architectures like clouds and uses a real-world disruption scenario to emphasize the need for correlating the faults across the service layers to acquire insights for end-to-end fault analysis for cloud application services.