Who Knows I Like Jelly Beans? An Investigation Into Search Privacy

Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium Pub Date : 2022-03-03 DOI:10.2478/popets-2022-0053

Daniel Kats, David Silva, Johann Roturier

{"title":"Who Knows I Like Jelly Beans? An Investigation Into Search Privacy","authors":"Daniel Kats, David Silva, Johann Roturier","doi":"10.2478/popets-2022-0053","DOIUrl":null,"url":null,"abstract":"Abstract Internal site search is an integral part of how users navigate modern sites, from restaurant reservations to house hunting to searching for medical solutions. Search terms on these sites may contain sensitive information such as location, medical information, or sexual preferences; when further coupled with a user’s IP address or a browser’s user agent string, this information can become very specific, and in some cases possibly identifying. In this paper, we measure the various ways by which search terms are sent to third parties when a user submits a search query. We developed a methodology for identifying and interacting with search components, which we implemented on top of an instrumented headless browser. We used this crawler to visit the Tranco top one million websites and analyzed search term leakage across three vectors: URL query parameters, payloads, and the Referer HTTP header. Our crawler found that 512,701 of the top 1 million sites had internal site search. We found that 81.3% of websites containing internal site search sent (or leaked from a user’s perspective) our search terms to third parties in some form. We then compared our results to the expected results based on a natural language analysis of the privacy policies of those leaking websites (where available) and found that about 87% of those privacy policies do not mention search terms explicitly. However, about 75% of these privacy policies seem to mention the sharing of some information with third-parties in a generic manner. We then present a few countermeasures, including a browser extension to warn users about imminent search term leakage to third parties. We conclude this paper by making recommendations on clarifying the privacy implications of internal site search to end users.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"426 - 446"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/popets-2022-0053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Abstract Internal site search is an integral part of how users navigate modern sites, from restaurant reservations to house hunting to searching for medical solutions. Search terms on these sites may contain sensitive information such as location, medical information, or sexual preferences; when further coupled with a user’s IP address or a browser’s user agent string, this information can become very specific, and in some cases possibly identifying. In this paper, we measure the various ways by which search terms are sent to third parties when a user submits a search query. We developed a methodology for identifying and interacting with search components, which we implemented on top of an instrumented headless browser. We used this crawler to visit the Tranco top one million websites and analyzed search term leakage across three vectors: URL query parameters, payloads, and the Referer HTTP header. Our crawler found that 512,701 of the top 1 million sites had internal site search. We found that 81.3% of websites containing internal site search sent (or leaked from a user’s perspective) our search terms to third parties in some form. We then compared our results to the expected results based on a natural language analysis of the privacy policies of those leaking websites (where available) and found that about 87% of those privacy policies do not mention search terms explicitly. However, about 75% of these privacy policies seem to mention the sharing of some information with third-parties in a generic manner. We then present a few countermeasures, including a browser extension to warn users about imminent search term leakage to third parties. We conclude this paper by making recommendations on clarifying the privacy implications of internal site search to end users.

查看原文本刊更多论文

谁知道我喜欢果冻豆？搜索隐私调查

摘要内部网站搜索是用户浏览现代网站的一个组成部分，从餐厅预订到找房，再到搜索医疗解决方案。这些网站上的搜索词可能包含敏感信息，如位置、医疗信息或性偏好；当进一步与用户的IP地址或浏览器的用户代理字符串相结合时，这些信息可能会变得非常具体，在某些情况下可能会进行识别。在本文中，我们衡量了当用户提交搜索查询时，将搜索词发送给第三方的各种方式。我们开发了一种用于识别搜索组件并与之交互的方法，该方法是在插入指令的无头浏览器上实现的。我们使用这个爬虫访问了Tranco排名前100万的网站，并分析了三个向量的搜索词泄漏：URL查询参数、有效载荷和Referer HTTP标头。我们的爬虫发现，在排名前100万的网站中，有512701个进行了内部网站搜索。我们发现，81.3%的包含内部网站搜索的网站以某种形式向第三方发送（或从用户的角度泄露）我们的搜索词。然后，我们将我们的结果与基于对泄露网站隐私政策（如有）的自然语言分析的预期结果进行了比较，发现约87%的隐私政策没有明确提及搜索词。然而，在这些隐私政策中，约75%似乎提到了以通用方式与第三方共享某些信息。然后，我们提出了一些对策，包括浏览器扩展，以警告用户搜索词即将泄露给第三方。最后，我们就澄清内部网站搜索对最终用户的隐私影响提出了建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium

自引率

0.00%

发文量

审稿时长

16 weeks