Addressing software dependability with statistical and machine learning techniques

A. Fox
{"title":"Addressing software dependability with statistical and machine learning techniques","authors":"A. Fox","doi":"10.1145/1062455.1062462","DOIUrl":null,"url":null,"abstract":"Our ability to design and deploy large complex systems is outpacing our ability to understand their behavior. How do we detect and recover from \"heisenbugs,\" which account for up to 40% of failures in complex Internet systems, without extensive application-specific coding? Which users were affected, and for how long? How do we diagnose and correct problems caused by configuration errors or operator errors? Although these problems are posed at a high level of abstraction, all we can usually measure directly are low-level behaviors---analogous to driving a car while looking through a magnifying glass. Machine learning can bridge this gap using techniques that learn \"baseline\" models automatically or semi-automatically, allowing the characterization and monitoring of systems whose structure is not well understood a priori. I'll discuss initial successes and future challenges in using machine learning for failure detection anbd diagnosis, configuration troubleshooting, attribution (which low-level properties appear to be correlated with an observed high-level effect such as decreased performance), and failure forecasting.","PeriodicalId":196748,"journal":{"name":"Proceedings of the 27th international conference on Software engineering","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th international conference on Software engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1062455.1062462","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Our ability to design and deploy large complex systems is outpacing our ability to understand their behavior. How do we detect and recover from "heisenbugs," which account for up to 40% of failures in complex Internet systems, without extensive application-specific coding? Which users were affected, and for how long? How do we diagnose and correct problems caused by configuration errors or operator errors? Although these problems are posed at a high level of abstraction, all we can usually measure directly are low-level behaviors---analogous to driving a car while looking through a magnifying glass. Machine learning can bridge this gap using techniques that learn "baseline" models automatically or semi-automatically, allowing the characterization and monitoring of systems whose structure is not well understood a priori. I'll discuss initial successes and future challenges in using machine learning for failure detection anbd diagnosis, configuration troubleshooting, attribution (which low-level properties appear to be correlated with an observed high-level effect such as decreased performance), and failure forecasting.
用统计和机器学习技术解决软件可靠性问题
我们设计和部署大型复杂系统的能力超过了我们理解它们行为的能力。在没有大量特定应用程序编码的情况下,我们如何检测和恢复“海森bug”?在复杂的互联网系统中,海森bug占故障的40%。哪些用户受到了影响,影响了多长时间?我们如何诊断和纠正由配置错误或操作错误引起的问题?尽管这些问题是在高层次的抽象中提出的,但我们通常可以直接测量的都是低级行为——类似于开车时通过放大镜观察。机器学习可以使用自动或半自动学习“基线”模型的技术来弥补这一差距,从而允许对结构无法很好地先验理解的系统进行表征和监控。我将讨论在使用机器学习进行故障检测和诊断、配置故障排除、归因(低级属性似乎与观察到的高级影响(如性能下降)相关)和故障预测方面的初步成功和未来挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信