{"title":"利用数据改进云服务","authors":"Ranjita Bhagwan","doi":"10.1145/3472883.3517038","DOIUrl":null,"url":null,"abstract":"Today's cloud services are large, complex, and dynamic, often supporting billions of users. Such a complex and dynamic environment poses several challenges such as ensuring fast and secure development and deployment, and prompt resolution of service disruptions. Nevertheless, new opportunities to address such challenges have emerged. Large-scale services generate petabytes of code, test, and usage-related data within just one day. This data can be harnessed to provide valuable insights to engineers on how to improve service performance, security and reliability. However, cherry-picking important information from such vast amounts of systems-related data proves to be a formidable task. Over the last few years, we have developed many analysis tools that leverage code, test logs and telemetry to address these challenges. In this talk, I will talk about our experience with building such tools, and describe our journey which started with determining the right problems to solve, making research contributions and ended with widespread deployment across Microsoft's services.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Leveraging Data to Improve Cloud Services\",\"authors\":\"Ranjita Bhagwan\",\"doi\":\"10.1145/3472883.3517038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today's cloud services are large, complex, and dynamic, often supporting billions of users. Such a complex and dynamic environment poses several challenges such as ensuring fast and secure development and deployment, and prompt resolution of service disruptions. Nevertheless, new opportunities to address such challenges have emerged. Large-scale services generate petabytes of code, test, and usage-related data within just one day. This data can be harnessed to provide valuable insights to engineers on how to improve service performance, security and reliability. However, cherry-picking important information from such vast amounts of systems-related data proves to be a formidable task. Over the last few years, we have developed many analysis tools that leverage code, test logs and telemetry to address these challenges. In this talk, I will talk about our experience with building such tools, and describe our journey which started with determining the right problems to solve, making research contributions and ended with widespread deployment across Microsoft's services.\",\"PeriodicalId\":91949,\"journal\":{\"name\":\"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3472883.3517038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3472883.3517038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Today's cloud services are large, complex, and dynamic, often supporting billions of users. Such a complex and dynamic environment poses several challenges such as ensuring fast and secure development and deployment, and prompt resolution of service disruptions. Nevertheless, new opportunities to address such challenges have emerged. Large-scale services generate petabytes of code, test, and usage-related data within just one day. This data can be harnessed to provide valuable insights to engineers on how to improve service performance, security and reliability. However, cherry-picking important information from such vast amounts of systems-related data proves to be a formidable task. Over the last few years, we have developed many analysis tools that leverage code, test logs and telemetry to address these challenges. In this talk, I will talk about our experience with building such tools, and describe our journey which started with determining the right problems to solve, making research contributions and ended with widespread deployment across Microsoft's services.