Enhanced State History Tree (eSHT): A Stateful Data Structure for Analysis of Highly Parallel System Traces

2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-10-06 DOI:10.1109/BigDataCongress.2016.19

Loic Prieur-Drevon, R. Beamonte, Naser Ezzati-Jivan, M. Dagenais

{"title":"Enhanced State History Tree (eSHT): A Stateful Data Structure for Analysis of Highly Parallel System Traces","authors":"Loic Prieur-Drevon, R. Beamonte, Naser Ezzati-Jivan, M. Dagenais","doi":"10.1109/BigDataCongress.2016.19","DOIUrl":null,"url":null,"abstract":"Behaviors of distributed systems with many cores and/or many threads are difficult to understand. This is why dynamic analysis tools such as tracers are useful to collect run-time data and help programmers debug and optimize complex programs. However, manual trace analysis on very large traces with billions of events can be a difficult problem which automated trace visualizers and analyzers aim to solve. Trace analysis and visualization software needs fast access to data which it cannot achieve by searching through the entire trace for every query. A number of solutions have adopted stateful analysis which rearranges events into a more query friendly structures after a single pass through the trace. In this paper, we look into current implementations and model the behavior of previous work, the State History Tree (SHT), on traces with many thread creation and deletion. This allows us to identify which properties of the SHT are responsible for inefficient disk usage and high memory consumption. We then propose a more efficient data structure, the enhanced State History Tree (eSHT), to store and query computed states, in order to limit disk usage and reduce the query time for any state. Next, we compare the use of SHT and eSHT on traces with many attributes. We finally verify the scalability of our new data structure according to trace size. As shown by our results, the proposed solution makes near optimal use of disk space, reduces the algorithm's memory usage logarithmically for previously problematic cases, and speeds up queries on traces with many attributes by an order of magnitude. The proposed solution builds upon our previous work, enabling it to easily scale up to traces containing a million threads.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Congress on Big Data (BigData Congress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2016.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Behaviors of distributed systems with many cores and/or many threads are difficult to understand. This is why dynamic analysis tools such as tracers are useful to collect run-time data and help programmers debug and optimize complex programs. However, manual trace analysis on very large traces with billions of events can be a difficult problem which automated trace visualizers and analyzers aim to solve. Trace analysis and visualization software needs fast access to data which it cannot achieve by searching through the entire trace for every query. A number of solutions have adopted stateful analysis which rearranges events into a more query friendly structures after a single pass through the trace. In this paper, we look into current implementations and model the behavior of previous work, the State History Tree (SHT), on traces with many thread creation and deletion. This allows us to identify which properties of the SHT are responsible for inefficient disk usage and high memory consumption. We then propose a more efficient data structure, the enhanced State History Tree (eSHT), to store and query computed states, in order to limit disk usage and reduce the query time for any state. Next, we compare the use of SHT and eSHT on traces with many attributes. We finally verify the scalability of our new data structure according to trace size. As shown by our results, the proposed solution makes near optimal use of disk space, reduces the algorithm's memory usage logarithmically for previously problematic cases, and speeds up queries on traces with many attributes by an order of magnitude. The proposed solution builds upon our previous work, enabling it to easily scale up to traces containing a million threads.

查看原文本刊更多论文

增强状态历史树(eSHT):一种用于分析高度并行系统轨迹的有状态数据结构

具有多核和/或多线程的分布式系统的行为很难理解。这就是为什么动态分析工具(如跟踪程序)对于收集运行时数据和帮助程序员调试和优化复杂程序非常有用。然而，对具有数十亿个事件的非常大的跟踪进行手动跟踪分析可能是一个难题，而自动化跟踪可视化器和分析器旨在解决这个问题。轨迹分析和可视化软件需要快速访问数据，这是无法通过对每个查询进行整个轨迹搜索来实现的。许多解决方案都采用了有状态分析，这种分析在通过跟踪后将事件重新排列成更适合查询的结构。在本文中，我们研究了当前的实现，并对以前的工作，状态历史树(SHT)的行为进行建模，跟踪许多线程的创建和删除。这使我们能够确定SHT的哪些属性导致了低效的磁盘使用和高内存消耗。然后，我们提出了一种更有效的数据结构，增强的状态历史树(eSHT)，用于存储和查询计算状态，以限制磁盘使用并减少任何状态的查询时间。接下来，我们比较在带有许多属性的轨迹上使用SHT和eSHT。最后，我们根据跟踪大小验证了新数据结构的可伸缩性。正如我们的结果所示，所提出的解决方案使磁盘空间的使用接近最优，在以前有问题的情况下对数地减少了算法的内存使用，并将具有许多属性的轨迹的查询速度提高了一个数量级。建议的解决方案建立在我们之前的工作基础上，使其能够轻松扩展到包含一百万个线程的跟踪。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE International Congress on Big Data (BigData Congress)

自引率

0.00%

发文量