Fighting authorship linkability with crowdsourcing

M. A. Mishari, Ekin Oguz, G. Tsudik
{"title":"Fighting authorship linkability with crowdsourcing","authors":"M. A. Mishari, Ekin Oguz, G. Tsudik","doi":"10.1145/2660460.2660486","DOIUrl":null,"url":null,"abstract":"Massive amounts of contributed content -- including traditional literature, blogs, music, videos, reviews and tweets -- are available on the Internet today, with authors numbering in many millions. Textual information, such as product or service reviews, is an important and increasingly popular type of content that is being used as a foundation of many trendy community-based reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown that, due partly to their specialized/topical nature, sets of reviews authored by the same person are readily linkable based on simple stylometric features. In practice, this means that individuals who author more than a few reviews under different accounts (whether within one site or across multiple sites) can be linked, which represents a significant loss of privacy.\n In this paper, we start by showing that the problem is actually worse than previously believed. We then explore ways to mitigate authorship linkability in community-based reviewing. We first attempt to harness the global power of crowdsourcing by engaging random strangers into the process of re-writing reviews. As our empirical results (obtained from Amazon Mechanical Turk) clearly demonstrate, crowdsourcing yields impressively sensible reviews that reflect sufficiently different stylometric characteristics such that prior stylometric linkability techniques become largely ineffective. We also consider using machine translation to automatically re-write reviews. Contrary to what was previously believed, our results show that translation decreases authorship linkability as the number of intermediate languages grows. Finally, we explore the combination of crowdsourcing and machine translation and report on results.","PeriodicalId":304931,"journal":{"name":"Conference on Online Social Networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Online Social Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2660460.2660486","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27

Abstract

Massive amounts of contributed content -- including traditional literature, blogs, music, videos, reviews and tweets -- are available on the Internet today, with authors numbering in many millions. Textual information, such as product or service reviews, is an important and increasingly popular type of content that is being used as a foundation of many trendy community-based reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown that, due partly to their specialized/topical nature, sets of reviews authored by the same person are readily linkable based on simple stylometric features. In practice, this means that individuals who author more than a few reviews under different accounts (whether within one site or across multiple sites) can be linked, which represents a significant loss of privacy. In this paper, we start by showing that the problem is actually worse than previously believed. We then explore ways to mitigate authorship linkability in community-based reviewing. We first attempt to harness the global power of crowdsourcing by engaging random strangers into the process of re-writing reviews. As our empirical results (obtained from Amazon Mechanical Turk) clearly demonstrate, crowdsourcing yields impressively sensible reviews that reflect sufficiently different stylometric characteristics such that prior stylometric linkability techniques become largely ineffective. We also consider using machine translation to automatically re-write reviews. Contrary to what was previously believed, our results show that translation decreases authorship linkability as the number of intermediate languages grows. Finally, we explore the combination of crowdsourcing and machine translation and report on results.
用众包对抗作者链接
如今,互联网上提供了大量的内容,包括传统文学、博客、音乐、视频、评论和推文,作者数量达到数百万。文本信息,如产品或服务评论,是一种重要且日益流行的内容类型,被用作许多流行的基于社区的评论网站的基础,如TripAdvisor和Yelp。最近的一些结果表明,部分由于其专业/专题性质,同一个人撰写的评论集很容易基于简单的文体特征进行链接。在实践中,这意味着在不同账户下(无论是在一个网站内还是在多个网站上)撰写多个评论的个人可以被链接,这意味着严重的隐私损失。在本文中,我们首先表明,这个问题实际上比以前认为的更严重。然后,我们探讨了在基于社区的评审中减轻作者链接性的方法。我们首先尝试利用全球众包的力量,让随机的陌生人参与重写评论的过程。正如我们的实证结果(来自Amazon Mechanical Turk)清楚地表明的那样,众包产生了令人印象深刻的明智评论,这些评论充分反映了不同的风格特征,使得先前的风格链接技术在很大程度上变得无效。我们也考虑使用机器翻译来自动重写评论。与之前认为的相反,我们的研究结果表明,随着中间语言数量的增加,翻译会降低作者的可链接性。最后,我们探索了众包和机器翻译的结合,并报告了结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信