DOVE:诊断驱动的SLO违规检测

Yiran Lei, Yu Zhou, Yunsenxiao Lin, Mingwei Xu, Yangyang Wang
{"title":"DOVE:诊断驱动的SLO违规检测","authors":"Yiran Lei, Yu Zhou, Yunsenxiao Lin, Mingwei Xu, Yangyang Wang","doi":"10.1109/ICNP52444.2021.9651986","DOIUrl":null,"url":null,"abstract":"Service-level objectives (SLOs), as network performance requirements for delay and packet loss typically, should be guaranteed for increasing high-performance applications, e.g., telesurgery and cloud gaming. However, SLO violations are common and destructive in today’s network operation. Detection and diagnosis, meaning monitoring performance to discover anomalies and analyzing causality of SLO violations respectively, are crucial for fast recovery. Unfortunately, existing diagnosis approaches require exhaustive causal information to function. Meanwhile, existing detection tools incur large overhead or are only able to provide limited information for diagnosis. This paper presents DOVE, a diagnosis-driven SLO detection system with high accuracy and low overhead. The key idea is to identify and report the information needed by diagnosis along with SLO violation alerts from the data plane selectively and efficiently. Network segmentation is introduced to balance scalability and accuracy. Novel algorithms to measure packet loss and percentile delay are implemented completely on the data plane without the involvement of the control plane for fine-grained SLO detection. We implement and deploy DOVE on Tofino and P4 software switch (BMv2) and show the effectiveness of DOVE with a use case. The reported SLO violation alerts and diagnosis-needing information are compared with ground truth and show high accuracy (>97%). Our evaluation also shows that DOVE introduces up to two orders of magnitude less traffic overhead than NetSight. In addition, memory utilization and required processing ability are low to be deployable in real network topologies.","PeriodicalId":343813,"journal":{"name":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"DOVE: Diagnosis-driven SLO Violation Detection\",\"authors\":\"Yiran Lei, Yu Zhou, Yunsenxiao Lin, Mingwei Xu, Yangyang Wang\",\"doi\":\"10.1109/ICNP52444.2021.9651986\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Service-level objectives (SLOs), as network performance requirements for delay and packet loss typically, should be guaranteed for increasing high-performance applications, e.g., telesurgery and cloud gaming. However, SLO violations are common and destructive in today’s network operation. Detection and diagnosis, meaning monitoring performance to discover anomalies and analyzing causality of SLO violations respectively, are crucial for fast recovery. Unfortunately, existing diagnosis approaches require exhaustive causal information to function. Meanwhile, existing detection tools incur large overhead or are only able to provide limited information for diagnosis. This paper presents DOVE, a diagnosis-driven SLO detection system with high accuracy and low overhead. The key idea is to identify and report the information needed by diagnosis along with SLO violation alerts from the data plane selectively and efficiently. Network segmentation is introduced to balance scalability and accuracy. Novel algorithms to measure packet loss and percentile delay are implemented completely on the data plane without the involvement of the control plane for fine-grained SLO detection. We implement and deploy DOVE on Tofino and P4 software switch (BMv2) and show the effectiveness of DOVE with a use case. The reported SLO violation alerts and diagnosis-needing information are compared with ground truth and show high accuracy (>97%). Our evaluation also shows that DOVE introduces up to two orders of magnitude less traffic overhead than NetSight. In addition, memory utilization and required processing ability are low to be deployable in real network topologies.\",\"PeriodicalId\":343813,\"journal\":{\"name\":\"2021 IEEE 29th International Conference on Network Protocols (ICNP)\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 29th International Conference on Network Protocols (ICNP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNP52444.2021.9651986\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNP52444.2021.9651986","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

服务水平目标(slo),作为延迟和数据包丢失的网络性能要求,应该保证不断增长的高性能应用程序,例如远程外科手术和云游戏。然而,在当今的网络运营中,违反SLO是常见的和具有破坏性的。检测和诊断,即分别监控性能以发现异常和分析违反SLO的因果关系,对于快速恢复至关重要。不幸的是,现有的诊断方法需要详尽的因果信息才能发挥作用。同时,现有的检测工具开销较大,或者只能提供有限的诊断信息。本文介绍了一种诊断驱动的高精度低开销SLO检测系统DOVE。关键思想是有选择地有效地识别和报告诊断所需的信息以及来自数据平面的SLO违规警报。为了平衡可扩展性和准确性,引入了网络分段。为了实现细粒度的SLO检测,在数据平面上完全实现了测量丢包和百分位延迟的新算法,而不需要控制平面的参与。我们在Tofino和P4软件交换机(BMv2)上实现和部署了DOVE,并通过一个用例展示了DOVE的有效性。将报告的SLO违规警报和需要诊断的信息与实际情况进行比较,显示出较高的准确性(>97%)。我们的评估还表明,DOVE带来的流量开销比NetSight少两个数量级。此外,内存利用率和所需的处理能力较低,无法在实际网络拓扑中部署。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DOVE: Diagnosis-driven SLO Violation Detection
Service-level objectives (SLOs), as network performance requirements for delay and packet loss typically, should be guaranteed for increasing high-performance applications, e.g., telesurgery and cloud gaming. However, SLO violations are common and destructive in today’s network operation. Detection and diagnosis, meaning monitoring performance to discover anomalies and analyzing causality of SLO violations respectively, are crucial for fast recovery. Unfortunately, existing diagnosis approaches require exhaustive causal information to function. Meanwhile, existing detection tools incur large overhead or are only able to provide limited information for diagnosis. This paper presents DOVE, a diagnosis-driven SLO detection system with high accuracy and low overhead. The key idea is to identify and report the information needed by diagnosis along with SLO violation alerts from the data plane selectively and efficiently. Network segmentation is introduced to balance scalability and accuracy. Novel algorithms to measure packet loss and percentile delay are implemented completely on the data plane without the involvement of the control plane for fine-grained SLO detection. We implement and deploy DOVE on Tofino and P4 software switch (BMv2) and show the effectiveness of DOVE with a use case. The reported SLO violation alerts and diagnosis-needing information are compared with ground truth and show high accuracy (>97%). Our evaluation also shows that DOVE introduces up to two orders of magnitude less traffic overhead than NetSight. In addition, memory utilization and required processing ability are low to be deployable in real network topologies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信