{"title":"云计算系统的数据驱动监控","authors":"Daniel Gehberger, P. Mátray, G. Németh","doi":"10.1145/2996890.2996893","DOIUrl":null,"url":null,"abstract":"The end-to-end monitoring of inter-dependent applications in the cloud is challenging. Difficulties arise from the complexity of computations and the highly distributed nature of the deployment. Due to the lack of a comprehensive observability solution, it is very difficult to apply autonomous mechanisms to ensure service guarantees in the cloud. To tackle the problem, we propose the method of data-driven monitoring, that provides a detailed, live view on how data is flowing through a possibly complex compute system. The method is based on the tracing of individual input events and the collection of resource usage metrics along the paths. By reconstructing causal and temporal relationships, we can detect degradations in performance, pinpoint root causes and apply corrective actions before end-to-end requirements are endangered. To demonstrate the potential of the concept, we created a prototype implementation in a big data compute platform, and also developed two automated optimization algorithms.","PeriodicalId":350701,"journal":{"name":"2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Data-Driven Monitoring for Cloud Compute Systems\",\"authors\":\"Daniel Gehberger, P. Mátray, G. Németh\",\"doi\":\"10.1145/2996890.2996893\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The end-to-end monitoring of inter-dependent applications in the cloud is challenging. Difficulties arise from the complexity of computations and the highly distributed nature of the deployment. Due to the lack of a comprehensive observability solution, it is very difficult to apply autonomous mechanisms to ensure service guarantees in the cloud. To tackle the problem, we propose the method of data-driven monitoring, that provides a detailed, live view on how data is flowing through a possibly complex compute system. The method is based on the tracing of individual input events and the collection of resource usage metrics along the paths. By reconstructing causal and temporal relationships, we can detect degradations in performance, pinpoint root causes and apply corrective actions before end-to-end requirements are endangered. To demonstrate the potential of the concept, we created a prototype implementation in a big data compute platform, and also developed two automated optimization algorithms.\",\"PeriodicalId\":350701,\"journal\":{\"name\":\"2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2996890.2996893\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2996890.2996893","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The end-to-end monitoring of inter-dependent applications in the cloud is challenging. Difficulties arise from the complexity of computations and the highly distributed nature of the deployment. Due to the lack of a comprehensive observability solution, it is very difficult to apply autonomous mechanisms to ensure service guarantees in the cloud. To tackle the problem, we propose the method of data-driven monitoring, that provides a detailed, live view on how data is flowing through a possibly complex compute system. The method is based on the tracing of individual input events and the collection of resource usage metrics along the paths. By reconstructing causal and temporal relationships, we can detect degradations in performance, pinpoint root causes and apply corrective actions before end-to-end requirements are endangered. To demonstrate the potential of the concept, we created a prototype implementation in a big data compute platform, and also developed two automated optimization algorithms.