SampleHST-X: A Point and Collective Anomaly-Aware Trace Sampling Pipeline with Approximate Half Space Trees

IF 4.1 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Alim Ul Gias, Yicheng Gao, Matthew Sheldon, José A. Perusquía, Owen O’Brien, Giuliano Casale
{"title":"SampleHST-X: A Point and Collective Anomaly-Aware Trace Sampling Pipeline with Approximate Half Space Trees","authors":"Alim Ul Gias, Yicheng Gao, Matthew Sheldon, José A. Perusquía, Owen O’Brien, Giuliano Casale","doi":"10.1007/s10922-024-09818-8","DOIUrl":null,"url":null,"abstract":"<p>The storage requirement for distributed tracing can be reduced significantly by sampling only the anomalous or interesting traces that occur rarely at runtime. In this paper, we introduce an unsupervised sampling pipeline for distributed tracing that ensures high sampling accuracy while reducing the storage requirement. The proposed method, SampleHST-X, extends our recent work SampleHST. It operates based on a budget which limits the percentage of traces to be sampled while adjusting the storage quota of normal and anomalous traces depending on the size of this budget. The sampling process relies on accurately defining clusters of normal and anomalous traces by leveraging the distribution of mass scores, which characterize the probability of observing different traces, obtained from a forest of Half Space Trees (HST). In our experiments, using traces from a cloud data center, SampleHST yields 2.3<span>\\(\\times\\)</span> to 9.5<span>\\(\\times\\)</span> better sampling performance. SampleHST-X further extends the SampleHST approach by incorporating a novel class of Half Space Trees, namely Approximate HST, that uses approximate counters to update the mass scores. These counters significantly reduces the space requirement for HST while the sampling performance remains similar. In addition to this extension, SampleHST-X includes a Family of Graph Spectral Distances (FGSD) based trace characterization component, which, in addition to point anomalies, enables it to sample traces with collective anomalies. For such traces, we observe that the SampleHST-X approach can yield 1.2<span>\\(\\times\\)</span> to 19<span>\\(\\times\\)</span> better sampling performance.</p>","PeriodicalId":50119,"journal":{"name":"Journal of Network and Systems Management","volume":"4 1","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Network and Systems Management","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10922-024-09818-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The storage requirement for distributed tracing can be reduced significantly by sampling only the anomalous or interesting traces that occur rarely at runtime. In this paper, we introduce an unsupervised sampling pipeline for distributed tracing that ensures high sampling accuracy while reducing the storage requirement. The proposed method, SampleHST-X, extends our recent work SampleHST. It operates based on a budget which limits the percentage of traces to be sampled while adjusting the storage quota of normal and anomalous traces depending on the size of this budget. The sampling process relies on accurately defining clusters of normal and anomalous traces by leveraging the distribution of mass scores, which characterize the probability of observing different traces, obtained from a forest of Half Space Trees (HST). In our experiments, using traces from a cloud data center, SampleHST yields 2.3\(\times\) to 9.5\(\times\) better sampling performance. SampleHST-X further extends the SampleHST approach by incorporating a novel class of Half Space Trees, namely Approximate HST, that uses approximate counters to update the mass scores. These counters significantly reduces the space requirement for HST while the sampling performance remains similar. In addition to this extension, SampleHST-X includes a Family of Graph Spectral Distances (FGSD) based trace characterization component, which, in addition to point anomalies, enables it to sample traces with collective anomalies. For such traces, we observe that the SampleHST-X approach can yield 1.2\(\times\) to 19\(\times\) better sampling performance.

Abstract Image

SampleHST-X:具有近似半空间树的点和集体异常感知跟踪采样管道
通过只对运行时很少出现的异常或有趣轨迹进行采样,可以大大降低分布式跟踪的存储需求。在本文中,我们介绍了一种用于分布式跟踪的无监督采样管道,它能在降低存储需求的同时确保高采样精度。我们提出的 SampleHST-X 方法扩展了我们最近的研究成果 SampleHST。该方法的运行基于预算,预算限制了要采样的痕迹百分比,同时根据预算的大小调整正常痕迹和异常痕迹的存储配额。采样过程依赖于利用从半空间树(HST)森林中获得的质量分数分布来准确定义正常和异常痕迹群,质量分数描述了观察到不同痕迹的概率。在我们的实验中,使用来自云数据中心的痕迹,SampleHST的采样性能提高了2.3到9.5倍。SampleHST-X 进一步扩展了 SampleHST 方法,纳入了一类新的半空间树,即近似 HST,它使用近似计数器来更新质量分数。这些计数器大大减少了 HST 所需的空间,而采样性能却保持不变。除这一扩展外,SampleHST-X 还包含基于图谱距离(FGSD)的迹线特征描述组件,除点异常外,还能对具有集体异常的迹线进行采样。对于这类踪迹,我们发现 SampleHST-X 方法的采样性能可以提高 1.2 (次)到 19 (次)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.60
自引率
16.70%
发文量
65
审稿时长
>12 weeks
期刊介绍: Journal of Network and Systems Management, features peer-reviewed original research, as well as case studies in the fields of network and system management. The journal regularly disseminates significant new information on both the telecommunications and computing aspects of these fields, as well as their evolution and emerging integration. This outstanding quarterly covers architecture, analysis, design, software, standards, and migration issues related to the operation, management, and control of distributed systems and communication networks for voice, data, video, and networked computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信