Who Knows I Like Jelly Beans? An Investigation Into Search Privacy

Daniel Kats, David Silva, Johann Roturier
{"title":"Who Knows I Like Jelly Beans? An Investigation Into Search Privacy","authors":"Daniel Kats, David Silva, Johann Roturier","doi":"10.2478/popets-2022-0053","DOIUrl":null,"url":null,"abstract":"Abstract Internal site search is an integral part of how users navigate modern sites, from restaurant reservations to house hunting to searching for medical solutions. Search terms on these sites may contain sensitive information such as location, medical information, or sexual preferences; when further coupled with a user’s IP address or a browser’s user agent string, this information can become very specific, and in some cases possibly identifying. In this paper, we measure the various ways by which search terms are sent to third parties when a user submits a search query. We developed a methodology for identifying and interacting with search components, which we implemented on top of an instrumented headless browser. We used this crawler to visit the Tranco top one million websites and analyzed search term leakage across three vectors: URL query parameters, payloads, and the Referer HTTP header. Our crawler found that 512,701 of the top 1 million sites had internal site search. We found that 81.3% of websites containing internal site search sent (or leaked from a user’s perspective) our search terms to third parties in some form. We then compared our results to the expected results based on a natural language analysis of the privacy policies of those leaking websites (where available) and found that about 87% of those privacy policies do not mention search terms explicitly. However, about 75% of these privacy policies seem to mention the sharing of some information with third-parties in a generic manner. We then present a few countermeasures, including a browser extension to warn users about imminent search term leakage to third parties. We conclude this paper by making recommendations on clarifying the privacy implications of internal site search to end users.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"426 - 446"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/popets-2022-0053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Abstract Internal site search is an integral part of how users navigate modern sites, from restaurant reservations to house hunting to searching for medical solutions. Search terms on these sites may contain sensitive information such as location, medical information, or sexual preferences; when further coupled with a user’s IP address or a browser’s user agent string, this information can become very specific, and in some cases possibly identifying. In this paper, we measure the various ways by which search terms are sent to third parties when a user submits a search query. We developed a methodology for identifying and interacting with search components, which we implemented on top of an instrumented headless browser. We used this crawler to visit the Tranco top one million websites and analyzed search term leakage across three vectors: URL query parameters, payloads, and the Referer HTTP header. Our crawler found that 512,701 of the top 1 million sites had internal site search. We found that 81.3% of websites containing internal site search sent (or leaked from a user’s perspective) our search terms to third parties in some form. We then compared our results to the expected results based on a natural language analysis of the privacy policies of those leaking websites (where available) and found that about 87% of those privacy policies do not mention search terms explicitly. However, about 75% of these privacy policies seem to mention the sharing of some information with third-parties in a generic manner. We then present a few countermeasures, including a browser extension to warn users about imminent search term leakage to third parties. We conclude this paper by making recommendations on clarifying the privacy implications of internal site search to end users.
谁知道我喜欢果冻豆?搜索隐私调查
摘要内部网站搜索是用户浏览现代网站的一个组成部分,从餐厅预订到找房,再到搜索医疗解决方案。这些网站上的搜索词可能包含敏感信息,如位置、医疗信息或性偏好;当进一步与用户的IP地址或浏览器的用户代理字符串相结合时,这些信息可能会变得非常具体,在某些情况下可能会进行识别。在本文中,我们衡量了当用户提交搜索查询时,将搜索词发送给第三方的各种方式。我们开发了一种用于识别搜索组件并与之交互的方法,该方法是在插入指令的无头浏览器上实现的。我们使用这个爬虫访问了Tranco排名前100万的网站,并分析了三个向量的搜索词泄漏:URL查询参数、有效载荷和Referer HTTP标头。我们的爬虫发现,在排名前100万的网站中,有512701个进行了内部网站搜索。我们发现,81.3%的包含内部网站搜索的网站以某种形式向第三方发送(或从用户的角度泄露)我们的搜索词。然后,我们将我们的结果与基于对泄露网站隐私政策(如有)的自然语言分析的预期结果进行了比较,发现约87%的隐私政策没有明确提及搜索词。然而,在这些隐私政策中,约75%似乎提到了以通用方式与第三方共享某些信息。然后,我们提出了一些对策,包括浏览器扩展,以警告用户搜索词即将泄露给第三方。最后,我们就澄清内部网站搜索对最终用户的隐私影响提出了建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信