Identifying Changed or Sick Resources from Logs

2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W) Pub Date : 2018-09-01 DOI:10.1109/FAS-W.2018.00030

A. Harutyunyan, A. Poghosyan, Naira Grigoryan, N. Kushmerick, Harutyun Beybutyan

{"title":"Identifying Changed or Sick Resources from Logs","authors":"A. Harutyunyan, A. Poghosyan, Naira Grigoryan, N. Kushmerick, Harutyun Beybutyan","doi":"10.1109/FAS-W.2018.00030","DOIUrl":null,"url":null,"abstract":"The identification of important changes in a complex distributed system is a challenging data science problem. Solving this problem is critical for tools for managing modern cloud infrastructure stacks and other large complex distributed systems. In this paper, we investigate two specific approaches to using log data to solve this problem. The first approach is comparing a source's current and past behavior. Some solutions that perform anomaly detection on numeric data from the data center are inevitably relying on global change point detection concepts. On the other hand, while log data promises a significantly different perspectives and dimensions to accomplish a similar task, state-of-the-art of solutions lack a capability to automatically detect significant change points in the log stream of an event source through learning its behavioral patterns. Such change points indicate the most important times when the source's behavior significantly differs from the past. A second complementary approach to real-time change detection involves comparing a source's current behavior with the current behavior of its peers in a population of sources serving a common role in the data center. Employing the concept of event types of log messages introduced earlier, we propose algorithms for each of these approaches that apply classical statistical and machine learning techniques to data capturing the distribution of those constructs. We demonstrate experimental results from our prototype algorithms.","PeriodicalId":164903,"journal":{"name":"2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)","volume":"301 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FAS-W.2018.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The identification of important changes in a complex distributed system is a challenging data science problem. Solving this problem is critical for tools for managing modern cloud infrastructure stacks and other large complex distributed systems. In this paper, we investigate two specific approaches to using log data to solve this problem. The first approach is comparing a source's current and past behavior. Some solutions that perform anomaly detection on numeric data from the data center are inevitably relying on global change point detection concepts. On the other hand, while log data promises a significantly different perspectives and dimensions to accomplish a similar task, state-of-the-art of solutions lack a capability to automatically detect significant change points in the log stream of an event source through learning its behavioral patterns. Such change points indicate the most important times when the source's behavior significantly differs from the past. A second complementary approach to real-time change detection involves comparing a source's current behavior with the current behavior of its peers in a population of sources serving a common role in the data center. Employing the concept of event types of log messages introduced earlier, we propose algorithms for each of these approaches that apply classical statistical and machine learning techniques to data capturing the distribution of those constructs. We demonstrate experimental results from our prototype algorithms.

查看原文本刊更多论文

从日志中识别已更改或病态的资源

识别复杂分布式系统中的重要变化是一个具有挑战性的数据科学问题。解决这个问题对于管理现代云基础设施堆栈和其他大型复杂分布式系统的工具至关重要。在本文中，我们研究了利用测井数据解决这一问题的两种具体方法。第一种方法是比较源的当前和过去的行为。对来自数据中心的数字数据执行异常检测的一些解决方案不可避免地依赖于全局变化点检测概念。另一方面，虽然日志数据可以提供完全不同的视角和维度来完成类似的任务，但最先进的解决方案缺乏通过学习事件源的行为模式来自动检测事件源的日志流中的重要更改点的能力。这些变化点表明了震源的行为与过去显著不同的最重要时刻。实时变更检测的第二种补充方法涉及将源的当前行为与数据中心中服务于公共角色的源群中的对等源的当前行为进行比较。利用前面介绍的日志消息事件类型的概念，我们为这些方法中的每一种提出了算法，这些算法应用经典的统计和机器学习技术来捕获这些结构的分布。我们展示了我们的原型算法的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)

自引率

0.00%

发文量