Tiresias: Online Anomaly Detection for Hierarchical Operational Network Data

C. Hong, M. Caesar, N. Duffield, Jia Wang
{"title":"Tiresias: Online Anomaly Detection for Hierarchical Operational Network Data","authors":"C. Hong, M. Caesar, N. Duffield, Jia Wang","doi":"10.1109/ICDCS.2012.30","DOIUrl":null,"url":null,"abstract":"Operational network data, management data such as customer care call logs and equipment system logs, is a very important source of information for network operators to detect problems in their networks. Unfortunately, there is lack of efficient tools to automatically track and detect anomalous events on operational data, causing ISP operators to rely on manual inspection of this data. While anomaly detection has been widely studied in the context of network data, operational data presents several new challenges, including the volatility and sparseness of data, and the need to perform fast detection (complicating application of schemes that require offline processing or large/stable data sets to converge). To address these challenges, we propose Tiresias, an automated approach to locating anomalous events on hierarchical operational data. Tiresias leverages the hierarchical structure of operational data to identify high-impact aggregates (e.g., locations in the network, failure modes) likely to be associated with anomalous events. To accommodate different kinds of operational network data, Tiresias consists of an online detection algorithm with low time and space complexity, while preserving high detection accuracy. We present results from two case studies using operational data collected at a large commercial IP network operated by a Tier-1 ISP: customer care call logs and set-top box crash logs. By comparing with a reference set verified by the ISP's operational group, we validate that Tiresias can achieve >;94% accuracy in locating anomalies. Tiresias also discovered several previously unknown anomalies in the ISP's customer care cases, demonstrating its effectiveness.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"20 1","pages":"173-182"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 32nd International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2012.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Operational network data, management data such as customer care call logs and equipment system logs, is a very important source of information for network operators to detect problems in their networks. Unfortunately, there is lack of efficient tools to automatically track and detect anomalous events on operational data, causing ISP operators to rely on manual inspection of this data. While anomaly detection has been widely studied in the context of network data, operational data presents several new challenges, including the volatility and sparseness of data, and the need to perform fast detection (complicating application of schemes that require offline processing or large/stable data sets to converge). To address these challenges, we propose Tiresias, an automated approach to locating anomalous events on hierarchical operational data. Tiresias leverages the hierarchical structure of operational data to identify high-impact aggregates (e.g., locations in the network, failure modes) likely to be associated with anomalous events. To accommodate different kinds of operational network data, Tiresias consists of an online detection algorithm with low time and space complexity, while preserving high detection accuracy. We present results from two case studies using operational data collected at a large commercial IP network operated by a Tier-1 ISP: customer care call logs and set-top box crash logs. By comparing with a reference set verified by the ISP's operational group, we validate that Tiresias can achieve >;94% accuracy in locating anomalies. Tiresias also discovered several previously unknown anomalies in the ISP's customer care cases, demonstrating its effectiveness.
分层操作网络数据的在线异常检测
运营网络数据、客户关怀呼叫日志、设备系统日志等管理数据是网络运营商发现网络问题的重要信息来源。不幸的是,缺乏有效的工具来自动跟踪和检测运行数据中的异常事件,导致ISP运营商依赖于人工检查这些数据。虽然异常检测已经在网络数据的背景下得到了广泛的研究,但操作数据提出了一些新的挑战,包括数据的波动性和稀疏性,以及执行快速检测的需求(需要离线处理或大型/稳定数据集收敛的方案的复杂应用)。为了解决这些挑战,我们提出了一种自动化的方法来定位分层操作数据上的异常事件。Tiresias利用操作数据的分层结构来识别可能与异常事件相关的高影响聚合(例如,网络中的位置,故障模式)。为了适应不同类型的运营网络数据,Tiresias采用了一种低时间和空间复杂度的在线检测算法,同时保持了较高的检测精度。我们介绍了两个案例研究的结果,这些研究使用了由一级ISP运营的大型商业IP网络收集的操作数据:客户服务呼叫日志和机顶盒崩溃日志。通过与ISP运营组验证的参考集进行比较,我们验证了Tiresias在定位异常方面可以达到> 94%的准确率。泰瑞西亚斯还在ISP的客户服务案例中发现了一些以前未知的异常情况,证明了其有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信