JetStream: Cluster-Scale Parallelization of Information Flow Queries

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation Pub Date : 2016-11-02 DOI:10.5555/3026877.3026912

Andrew Quinn, David Devecsery, Peter M. Chen, J. Flinn

{"title":"JetStream: Cluster-Scale Parallelization of Information Flow Queries","authors":"Andrew Quinn, David Devecsery, Peter M. Chen, J. Flinn","doi":"10.5555/3026877.3026912","DOIUrl":null,"url":null,"abstract":"Dynamic information flow tracking (DIFT) is an important tool in many domains, such as security, debugging, forensics, provenance, configuration troubleshooting, and privacy tracking. However, the usability of DIFT is currently limited by its high overhead; complex information flow queries can take up to two orders of magnitude longer to execute than the original execution of the program. This precludes interactive uses in which users iteratively refine queries to narrow down bugs, leaks of private data, or performance anomalies.JetStream applies cluster computing to parallelize and accelerate information flow queries over past executions. It uses deterministic record and replay to time slice executions into distinct contiguous chunks of execution called epochs, and it tracks information flow for each epoch on a separate core in the cluster. It structures the aggregation of information flow data from each epoch as a streaming computation. Epochs are arranged in a sequential chain from the beginning to the end of program execution; relationships to program inputs (sources) are streamed forward along the chain, and relationships to program outputs (sinks) are streamed backward. Jet-Stream is the first system to parallelize DIFT across a cluster. Our results show that JetStream queries scale to at least 128 cores over a wide range of applications. JetStream accelerates DIFT queries to run 12-48 times faster than sequential queries; in most cases, queries run faster than the original execution of the program.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"1 1","pages":"451-466"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/3026877.3026912","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

Dynamic information flow tracking (DIFT) is an important tool in many domains, such as security, debugging, forensics, provenance, configuration troubleshooting, and privacy tracking. However, the usability of DIFT is currently limited by its high overhead; complex information flow queries can take up to two orders of magnitude longer to execute than the original execution of the program. This precludes interactive uses in which users iteratively refine queries to narrow down bugs, leaks of private data, or performance anomalies.JetStream applies cluster computing to parallelize and accelerate information flow queries over past executions. It uses deterministic record and replay to time slice executions into distinct contiguous chunks of execution called epochs, and it tracks information flow for each epoch on a separate core in the cluster. It structures the aggregation of information flow data from each epoch as a streaming computation. Epochs are arranged in a sequential chain from the beginning to the end of program execution; relationships to program inputs (sources) are streamed forward along the chain, and relationships to program outputs (sinks) are streamed backward. Jet-Stream is the first system to parallelize DIFT across a cluster. Our results show that JetStream queries scale to at least 128 cores over a wide range of applications. JetStream accelerates DIFT queries to run 12-48 times faster than sequential queries; in most cases, queries run faster than the original execution of the program.

查看原文本刊更多论文

JetStream:信息流查询的集群级并行化

动态信息流跟踪(DIFT)是许多领域中的重要工具，例如安全、调试、取证、来源、配置故障排除和隐私跟踪。然而，DIFT的可用性目前受到其高开销的限制;复杂信息流查询的执行时间可能比程序的原始执行时间长两个数量级。这就排除了交互使用，在交互使用中，用户迭代地改进查询以缩小错误、私有数据泄漏或性能异常。JetStream应用集群计算来并行化和加速过去执行的信息流查询。它使用确定性记录和重放将执行时间切片为不同的连续执行块(称为epoch)，并在集群中单独的核心上跟踪每个epoch的信息流。它将来自每个epoch的信息流数据聚合为流计算。epoch从程序执行的开始到结束排列在一个顺序链中;与程序输入(源)的关系沿着链向前流，而与程序输出(接收)的关系向后流。Jet-Stream是第一个跨集群并行DIFT的系统。我们的结果表明，JetStream查询在广泛的应用中至少可以扩展到128核。JetStream加速DIFT查询，运行速度比顺序查询快12-48倍;在大多数情况下，查询的运行速度比程序的原始执行要快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation

自引率

0.00%

发文量