悟空:在大系统尺度上有效的bug诊断

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI:10.1145/2442516.2442563

Bowen Zhou, Milind Kulkarni, S. Bagchi

{"title":"悟空:在大系统尺度上有效的bug诊断","authors":"Bowen Zhou, Milind Kulkarni, S. Bagchi","doi":"10.1145/2442516.2442563","DOIUrl":null,"url":null,"abstract":"A key challenge in developing large scale applications (both in system size and in input size) is finding bugs that are latent at the small scales of testing, only manifesting when a program is deployed at large scales. Traditional statistical techniques fail because no error-free run is available at deployment scales for training purposes. Prior work used scaling models to detect anomalous behavior at large scales without being trained on correct behavior at that scale. However, that work cannot localize bugs automatically. In this paper, we extend that work in three ways: (i) we develop an automatic diagnosis technique, based on feature reconstruction; (ii) we design a heuristic to effectively prune the feature space; and (iii) we validate our design through one fault-injection study, finding that our system can effectively localize bugs in a majority of cases.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"WuKong: effective diagnosis of bugs at large system scales\",\"authors\":\"Bowen Zhou, Milind Kulkarni, S. Bagchi\",\"doi\":\"10.1145/2442516.2442563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A key challenge in developing large scale applications (both in system size and in input size) is finding bugs that are latent at the small scales of testing, only manifesting when a program is deployed at large scales. Traditional statistical techniques fail because no error-free run is available at deployment scales for training purposes. Prior work used scaling models to detect anomalous behavior at large scales without being trained on correct behavior at that scale. However, that work cannot localize bugs automatically. In this paper, we extend that work in three ways: (i) we develop an automatic diagnosis technique, based on feature reconstruction; (ii) we design a heuristic to effectively prune the feature space; and (iii) we validate our design through one fault-injection study, finding that our system can effectively localize bugs in a majority of cases.\",\"PeriodicalId\":286119,\"journal\":{\"name\":\"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2442516.2442563\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2442516.2442563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

开发大规模应用程序(包括系统大小和输入大小)的一个关键挑战是找到在小规模测试中隐藏的错误，这些错误只有在程序大规模部署时才会显现出来。传统的统计技术之所以失败，是因为无法在部署规模上进行无差错的运行以用于培训目的。先前的工作使用缩放模型来检测大尺度上的异常行为，而没有在该尺度上训练正确的行为。然而，这项工作不能自动定位bug。在本文中，我们从三个方面扩展了这项工作:(i)我们开发了一种基于特征重构的自动诊断技术;(ii)设计了一种启发式算法来有效地修剪特征空间;(iii)我们通过一个故障注入研究验证了我们的设计，发现我们的系统在大多数情况下可以有效地定位错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

WuKong: effective diagnosis of bugs at large system scales

A key challenge in developing large scale applications (both in system size and in input size) is finding bugs that are latent at the small scales of testing, only manifesting when a program is deployed at large scales. Traditional statistical techniques fail because no error-free run is available at deployment scales for training purposes. Prior work used scaling models to detect anomalous behavior at large scales without being trained on correct behavior at that scale. However, that work cannot localize bugs automatically. In this paper, we extend that work in three ways: (i) we develop an automatic diagnosis technique, based on feature reconstruction; (ii) we design a heuristic to effectively prune the feature space; and (iii) we validate our design through one fault-injection study, finding that our system can effectively localize bugs in a majority of cases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming

自引率

0.00%

发文量