Online Behavior Identification in Distributed Systems

J. '. Cid-Fuentes, Claudia Szabo, K. Falkner
{"title":"Online Behavior Identification in Distributed Systems","authors":"J. '. Cid-Fuentes, Claudia Szabo, K. Falkner","doi":"10.1109/SRDS.2015.16","DOIUrl":null,"url":null,"abstract":"The diagnosis, prediction, and understanding of unexpected behavior is crucial for long running, large scale distributed systems. However, existing works focus on the identification of faults in specific time moments preceded by significantly abnormal metric readings, or require a previous analysis of historical failure data. In this work, we propose an online behavior classification system to identify a wide range of undesired behaviors, which may appear even in healthy systems, and their evolution over time. We employ a two-step process involving two online classifiers on periodically collected system metrics to identify at runtime normal and anomalous behaviors such as deadlock, starvation and livelock, without any previous analysis of historical failure data. Our approach achieves over 80% accuracy in detecting unexpected behaviors and over 90% accuracy in identifying their type with a short delay after the anomalies appear, and with minimal expert intervention. Our experimental analysis uses system execution traces obtained from a Google cluster and from our in-house distributed system with varied behaviors, and shows the benefits of our approach as well as future research challenges.","PeriodicalId":244925,"journal":{"name":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2015.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The diagnosis, prediction, and understanding of unexpected behavior is crucial for long running, large scale distributed systems. However, existing works focus on the identification of faults in specific time moments preceded by significantly abnormal metric readings, or require a previous analysis of historical failure data. In this work, we propose an online behavior classification system to identify a wide range of undesired behaviors, which may appear even in healthy systems, and their evolution over time. We employ a two-step process involving two online classifiers on periodically collected system metrics to identify at runtime normal and anomalous behaviors such as deadlock, starvation and livelock, without any previous analysis of historical failure data. Our approach achieves over 80% accuracy in detecting unexpected behaviors and over 90% accuracy in identifying their type with a short delay after the anomalies appear, and with minimal expert intervention. Our experimental analysis uses system execution traces obtained from a Google cluster and from our in-house distributed system with varied behaviors, and shows the benefits of our approach as well as future research challenges.
分布式系统中的在线行为识别
对意外行为的诊断、预测和理解对于长期运行的大规模分布式系统至关重要。然而,现有的工作侧重于在明显异常的度量读数之前的特定时间时刻识别故障,或者需要事先分析历史故障数据。在这项工作中,我们提出了一个在线行为分类系统来识别广泛的不良行为,这些行为甚至可能出现在健康系统中,并随着时间的推移而演变。我们采用两步流程,包括两个在线分类器对定期收集的系统指标进行分类,以在运行时识别正常和异常行为,如死锁、饥饿和活锁,而无需对历史故障数据进行任何先前的分析。我们的方法在检测异常行为方面达到80%以上的准确率,在异常出现后的短时间内识别其类型的准确率超过90%,并且专家干预最少。我们的实验分析使用了从Google集群和我们内部具有不同行为的分布式系统获得的系统执行跟踪,并显示了我们方法的好处以及未来研究的挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信