A Hypothesis Testing Approach to Sharing Logs with Confidence

Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy Pub Date : 2020-03-16 DOI:10.1145/3374664.3375743

Yunhui Long, Le Xu, Carl A. Gunter

{"title":"A Hypothesis Testing Approach to Sharing Logs with Confidence","authors":"Yunhui Long, Le Xu, Carl A. Gunter","doi":"10.1145/3374664.3375743","DOIUrl":null,"url":null,"abstract":"Logs generated by systems and applications contain a wide variety of heterogeneous information that is important for performance profiling, failure detection, and security analysis. There is a strong need for sharing the logs among different parties to outsource the analysis or to improve system and security research. However, sharing logs may inadvertently leak confidential or proprietary information. Besides sensitive information that is directly saved in logs, such as user-identifiers and software versions, indirect evidence like performance metrics can also lead to the leakage of sensitive information about the physical machines and the system. In this work, we introduce a game-based definition of the risk of exposing sensitive information through released logs. We propose log indistinguishability, a property that is met only when the logs leak little information about the protected sensitive attributes. We design an end-to-end framework that allows a user to identify risk of information leakage in logs, to protect the exposure with log redaction and obfuscation, and to release the logs with a much lower risk of exposing the sensitive attribute. Our framework contains a set of statistical tests to identify violations of the log indistinguishability property and a variety of obfuscation methods to prevent the leakage of sensitive information. The framework views the log-generating process as a black-box and can therefore be applied to different systems and processes. We perform case studies on two different types of log datasets: Spark event log and hardware counters. We show that our framework is effective in preventing the leakage of the sensitive attribute with a reasonable testing time and an acceptable utility loss in logs.","PeriodicalId":171521,"journal":{"name":"Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3374664.3375743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Logs generated by systems and applications contain a wide variety of heterogeneous information that is important for performance profiling, failure detection, and security analysis. There is a strong need for sharing the logs among different parties to outsource the analysis or to improve system and security research. However, sharing logs may inadvertently leak confidential or proprietary information. Besides sensitive information that is directly saved in logs, such as user-identifiers and software versions, indirect evidence like performance metrics can also lead to the leakage of sensitive information about the physical machines and the system. In this work, we introduce a game-based definition of the risk of exposing sensitive information through released logs. We propose log indistinguishability, a property that is met only when the logs leak little information about the protected sensitive attributes. We design an end-to-end framework that allows a user to identify risk of information leakage in logs, to protect the exposure with log redaction and obfuscation, and to release the logs with a much lower risk of exposing the sensitive attribute. Our framework contains a set of statistical tests to identify violations of the log indistinguishability property and a variety of obfuscation methods to prevent the leakage of sensitive information. The framework views the log-generating process as a black-box and can therefore be applied to different systems and processes. We perform case studies on two different types of log datasets: Spark event log and hardware counters. We show that our framework is effective in preventing the leakage of the sensitive attribute with a reasonable testing time and an acceptable utility loss in logs.

查看原文本刊更多论文

可信共享日志的假设检验方法

系统和应用程序生成的日志包含各种各样的异构信息，这些信息对于性能分析、故障检测和安全分析非常重要。在不同方面之间共享日志是一种强烈的需求，以外包分析或改进系统和安全性研究。但是，共享日志可能会无意中泄露机密或专有信息。除了直接保存在日志中的敏感信息(如用户标识符和软件版本)之外，性能指标等间接证据也可能导致有关物理机器和系统的敏感信息泄露。在这项工作中，我们引入了通过发布日志暴露敏感信息的风险的基于游戏的定义。我们提出了日志不可区分性，只有当日志泄漏的有关受保护敏感属性的信息很少时，才满足该属性。我们设计了一个端到端框架，允许用户识别日志中信息泄漏的风险，通过日志编印和混淆来保护暴露，并以更低的暴露敏感属性的风险发布日志。我们的框架包含一组统计测试，以识别违反日志不可区分属性和各种混淆方法，以防止敏感信息的泄漏。该框架将日志生成过程视为黑盒，因此可以应用于不同的系统和过程。我们对两种不同类型的日志数据集进行了案例研究:Spark事件日志和硬件计数器。通过合理的测试时间和可接受的日志效用损失，我们证明了我们的框架在防止敏感属性泄漏方面是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy

自引率

0.00%

发文量