Enforcing k-anonymity in Web Mail Auditing

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI:10.1145/2835776.2835803

Dotan Di Castro, L. Lewin-Eytan, Y. Maarek, R. Wolff, Eyal Zohar

{"title":"Enforcing k-anonymity in Web Mail Auditing","authors":"Dotan Di Castro, L. Lewin-Eytan, Y. Maarek, R. Wolff, Eyal Zohar","doi":"10.1145/2835776.2835803","DOIUrl":null,"url":null,"abstract":"We study the problem of k-anonymization of mail messages in the realistic scenario of auditing mail traffic in a major commercial Web mail service. Mail auditing is necessary in various Web mail debugging and quality assurance activities, such as anti-spam or the qualitative evaluation of novel mail features. It is conducted by trained professionals, often referred to as \"auditors\", who are shown messages that could expose personally identifiable information. We address here the challenge of k-anonymizing such messages, focusing on machine generated mail messages that represent more than 90% of today's mail traffic. We introduce a novel message signature Mail-Hash, specifically tailored to identifying structurally-similar messages, which allows us to put such messages in a same equivalence class. We then define a process that generates, for each class, masked mail samples that can be shown to auditors, while guaranteeing the k-anonymity of users. The productivity of auditors is measured by the amount of non-hidden mail content they can see every day, while considering normal working conditions, which set a limit to the number of mail samples they can review. In addition, we consider k-anonymity over time since, by definition of k-anonymity, every new release places additional constraints on the assignment of samples. We describe in details the results we obtained over actual Yahoo mail traffic, and thus demonstrate that our methods are feasible at Web mail scale. Given the constantly growing concern of users over their email being scanned by others, we argue that it is critical to devise such algorithms that guarantee k-anonymity, and implement associated processes in order to restore the trust of mail users.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2835776.2835803","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

We study the problem of k-anonymization of mail messages in the realistic scenario of auditing mail traffic in a major commercial Web mail service. Mail auditing is necessary in various Web mail debugging and quality assurance activities, such as anti-spam or the qualitative evaluation of novel mail features. It is conducted by trained professionals, often referred to as "auditors", who are shown messages that could expose personally identifiable information. We address here the challenge of k-anonymizing such messages, focusing on machine generated mail messages that represent more than 90% of today's mail traffic. We introduce a novel message signature Mail-Hash, specifically tailored to identifying structurally-similar messages, which allows us to put such messages in a same equivalence class. We then define a process that generates, for each class, masked mail samples that can be shown to auditors, while guaranteeing the k-anonymity of users. The productivity of auditors is measured by the amount of non-hidden mail content they can see every day, while considering normal working conditions, which set a limit to the number of mail samples they can review. In addition, we consider k-anonymity over time since, by definition of k-anonymity, every new release places additional constraints on the assignment of samples. We describe in details the results we obtained over actual Yahoo mail traffic, and thus demonstrate that our methods are feasible at Web mail scale. Given the constantly growing concern of users over their email being scanned by others, we argue that it is critical to devise such algorithms that guarantee k-anonymity, and implement associated processes in order to restore the trust of mail users.

查看原文本刊更多论文

在Web邮件审计中实施k-匿名

我们在一个主要的商业Web邮件服务审核邮件流量的实际场景中研究邮件消息的k-匿名化问题。邮件审计在各种Web邮件调试和质量保证活动中是必要的，例如反垃圾邮件或对新邮件特性进行定性评估。它是由训练有素的专业人员进行的，通常被称为“审计员”，他们会看到可能暴露个人身份信息的信息。我们在这里讨论k匿名化这类消息的挑战，重点关注占当今邮件流量90%以上的机器生成的邮件消息。我们引入了一种新的消息签名Mail-Hash，专门用于识别结构相似的消息，它允许我们将这样的消息放在同一个等价类中。然后，我们定义一个流程，为每个类生成可以显示给审计人员的屏蔽邮件样本，同时保证用户的k-匿名性。审核员的工作效率是通过他们每天可以看到的非隐藏邮件内容的数量来衡量的，同时考虑到正常的工作条件，这限制了他们可以审查的邮件样本的数量。此外，我们考虑k-匿名随着时间的推移，因为根据k-匿名的定义，每个新版本都会对样本分配施加额外的约束。我们详细描述了我们在实际Yahoo邮件流量中获得的结果，从而证明我们的方法在Web邮件规模上是可行的。鉴于用户对其电子邮件被他人扫描的担忧不断增加，我们认为设计这样的算法来保证k-匿名性，并实现相关的过程，以恢复邮件用户的信任是至关重要的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量