伪装和招摇:通过用户匿名的视角理解数据敏感性

Sai Teja Peddinti, A. Korolova, Elie Bursztein, Geetanjali Sampemane
{"title":"伪装和招摇:通过用户匿名的视角理解数据敏感性","authors":"Sai Teja Peddinti, A. Korolova, Elie Bursztein, Geetanjali Sampemane","doi":"10.1109/SP.2014.38","DOIUrl":null,"url":null,"abstract":"Most of what we understand about data sensitivity is through user self-report (e.g., surveys), this paper is the first to use behavioral data to determine content sensitivity, via the clues that users give as to what information they consider private or sensitive through their use of privacy enhancing product features. We perform a large-scale analysis of user anonymity choices during their activity on Quora, a popular question-and-answer site. We identify categories of questions for which users are more likely to exercise anonymity and explore several machine learning approaches towards predicting whether a particular answer will be written anonymously. Our findings validate the viability of the proposed approach towards an automatic assessment of data sensitivity, show that data sensitivity is a nuanced measure that should be viewed on a continuum rather than as a binary concept, and advance the idea that machine learning over behavioral data can be effectively used in order to develop product features that can help keep users safe.","PeriodicalId":196038,"journal":{"name":"2014 IEEE Symposium on Security and Privacy","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":"{\"title\":\"Cloak and Swagger: Understanding Data Sensitivity through the Lens of User Anonymity\",\"authors\":\"Sai Teja Peddinti, A. Korolova, Elie Bursztein, Geetanjali Sampemane\",\"doi\":\"10.1109/SP.2014.38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of what we understand about data sensitivity is through user self-report (e.g., surveys), this paper is the first to use behavioral data to determine content sensitivity, via the clues that users give as to what information they consider private or sensitive through their use of privacy enhancing product features. We perform a large-scale analysis of user anonymity choices during their activity on Quora, a popular question-and-answer site. We identify categories of questions for which users are more likely to exercise anonymity and explore several machine learning approaches towards predicting whether a particular answer will be written anonymously. Our findings validate the viability of the proposed approach towards an automatic assessment of data sensitivity, show that data sensitivity is a nuanced measure that should be viewed on a continuum rather than as a binary concept, and advance the idea that machine learning over behavioral data can be effectively used in order to develop product features that can help keep users safe.\",\"PeriodicalId\":196038,\"journal\":{\"name\":\"2014 IEEE Symposium on Security and Privacy\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"42\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE Symposium on Security and Privacy\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SP.2014.38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2014.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 42

摘要

我们对数据敏感性的了解大多是通过用户自我报告(如调查),本文是第一个使用行为数据来确定内容敏感性的,通过用户通过使用隐私增强产品功能给出的线索来确定他们认为哪些信息是隐私或敏感的。我们对用户在Quora(一个受欢迎的问答网站)上活动时的匿名选择进行了大规模分析。我们确定了用户更有可能匿名的问题类别,并探索了几种机器学习方法来预测特定答案是否会匿名编写。我们的研究结果验证了所提出的自动评估数据敏感性方法的可行性,表明数据敏感性是一种微妙的度量,应该被视为一个连续体,而不是一个二元概念,并提出了这样一种观点,即机器学习优于行为数据,可以有效地用于开发有助于保护用户安全的产品功能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cloak and Swagger: Understanding Data Sensitivity through the Lens of User Anonymity
Most of what we understand about data sensitivity is through user self-report (e.g., surveys), this paper is the first to use behavioral data to determine content sensitivity, via the clues that users give as to what information they consider private or sensitive through their use of privacy enhancing product features. We perform a large-scale analysis of user anonymity choices during their activity on Quora, a popular question-and-answer site. We identify categories of questions for which users are more likely to exercise anonymity and explore several machine learning approaches towards predicting whether a particular answer will be written anonymously. Our findings validate the viability of the proposed approach towards an automatic assessment of data sensitivity, show that data sensitivity is a nuanced measure that should be viewed on a continuum rather than as a binary concept, and advance the idea that machine learning over behavioral data can be effectively used in order to develop product features that can help keep users safe.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信