伪装和招摇:通过用户匿名的视角理解数据敏感性

2014 IEEE Symposium on Security and Privacy Pub Date : 2014-04-30 DOI:10.1109/SP.2014.38

Sai Teja Peddinti, A. Korolova, Elie Bursztein, Geetanjali Sampemane

{"title":"伪装和招摇:通过用户匿名的视角理解数据敏感性","authors":"Sai Teja Peddinti, A. Korolova, Elie Bursztein, Geetanjali Sampemane","doi":"10.1109/SP.2014.38","DOIUrl":null,"url":null,"abstract":"Most of what we understand about data sensitivity is through user self-report (e.g., surveys), this paper is the first to use behavioral data to determine content sensitivity, via the clues that users give as to what information they consider private or sensitive through their use of privacy enhancing product features. We perform a large-scale analysis of user anonymity choices during their activity on Quora, a popular question-and-answer site. We identify categories of questions for which users are more likely to exercise anonymity and explore several machine learning approaches towards predicting whether a particular answer will be written anonymously. Our findings validate the viability of the proposed approach towards an automatic assessment of data sensitivity, show that data sensitivity is a nuanced measure that should be viewed on a continuum rather than as a binary concept, and advance the idea that machine learning over behavioral data can be effectively used in order to develop product features that can help keep users safe.","PeriodicalId":196038,"journal":{"name":"2014 IEEE Symposium on Security and Privacy","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":"{\"title\":\"Cloak and Swagger: Understanding Data Sensitivity through the Lens of User Anonymity\",\"authors\":\"Sai Teja Peddinti, A. Korolova, Elie Bursztein, Geetanjali Sampemane\",\"doi\":\"10.1109/SP.2014.38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of what we understand about data sensitivity is through user self-report (e.g., surveys), this paper is the first to use behavioral data to determine content sensitivity, via the clues that users give as to what information they consider private or sensitive through their use of privacy enhancing product features. We perform a large-scale analysis of user anonymity choices during their activity on Quora, a popular question-and-answer site. We identify categories of questions for which users are more likely to exercise anonymity and explore several machine learning approaches towards predicting whether a particular answer will be written anonymously. Our findings validate the viability of the proposed approach towards an automatic assessment of data sensitivity, show that data sensitivity is a nuanced measure that should be viewed on a continuum rather than as a binary concept, and advance the idea that machine learning over behavioral data can be effectively used in order to develop product features that can help keep users safe.\",\"PeriodicalId\":196038,\"journal\":{\"name\":\"2014 IEEE Symposium on Security and Privacy\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"42\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE Symposium on Security and Privacy\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SP.2014.38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2014.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 42

摘要

我们对数据敏感性的了解大多是通过用户自我报告(如调查)，本文是第一个使用行为数据来确定内容敏感性的，通过用户通过使用隐私增强产品功能给出的线索来确定他们认为哪些信息是隐私或敏感的。我们对用户在Quora(一个受欢迎的问答网站)上活动时的匿名选择进行了大规模分析。我们确定了用户更有可能匿名的问题类别，并探索了几种机器学习方法来预测特定答案是否会匿名编写。我们的研究结果验证了所提出的自动评估数据敏感性方法的可行性，表明数据敏感性是一种微妙的度量，应该被视为一个连续体，而不是一个二元概念，并提出了这样一种观点，即机器学习优于行为数据，可以有效地用于开发有助于保护用户安全的产品功能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cloak and Swagger: Understanding Data Sensitivity through the Lens of User Anonymity

Most of what we understand about data sensitivity is through user self-report (e.g., surveys), this paper is the first to use behavioral data to determine content sensitivity, via the clues that users give as to what information they consider private or sensitive through their use of privacy enhancing product features. We perform a large-scale analysis of user anonymity choices during their activity on Quora, a popular question-and-answer site. We identify categories of questions for which users are more likely to exercise anonymity and explore several machine learning approaches towards predicting whether a particular answer will be written anonymously. Our findings validate the viability of the proposed approach towards an automatic assessment of data sensitivity, show that data sensitivity is a nuanced measure that should be viewed on a continuum rather than as a binary concept, and advance the idea that machine learning over behavioral data can be effectively used in order to develop product features that can help keep users safe.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE Symposium on Security and Privacy

自引率

0.00%

发文量