Exploring User Privacy Awareness on GitHub: An Empirical Study

arXiv - CS - Software Engineering Pub Date : 2024-09-06 DOI:arxiv-2409.04048

Costanza Alfieri, Juri Di Rocco, Phuong T. Nguyen, Paola Inverardi

{"title":"Exploring User Privacy Awareness on GitHub: An Empirical Study","authors":"Costanza Alfieri, Juri Di Rocco, Phuong T. Nguyen, Paola Inverardi","doi":"arxiv-2409.04048","DOIUrl":null,"url":null,"abstract":"GitHub provides developers with a practical way to distribute source code and\ncollaboratively work on common projects. To enhance account security and\nprivacy, GitHub allows its users to manage access permissions, review audit\nlogs, and enable two-factor authentication. However, despite the endless\neffort, the platform still faces various issues related to the privacy of its\nusers. This paper presents an empirical study delving into the GitHub\necosystem. Our focus is on investigating the utilization of privacy settings on\nthe platform and identifying various types of sensitive information disclosed\nby users. Leveraging a dataset comprising 6,132 developers, we report and\nanalyze their activities by means of comments on pull requests. Our findings\nindicate an active engagement by users with the available privacy settings on\nGitHub. Notably, we observe the disclosure of different forms of private\ninformation within pull request comments. This observation has prompted our\nexploration into sensitivity detection using a large language model and BERT,\nto pave the way for a personalized privacy assistant. Our work provides\ninsights into the utilization of existing privacy protection tools, such as\nprivacy settings, along with their inherent limitations. Essentially, we aim to\nadvance research in this field by providing both the motivation for creating\nsuch privacy protection tools and a proposed methodology for personalizing\nthem.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

GitHub provides developers with a practical way to distribute source code and collaboratively work on common projects. To enhance account security and privacy, GitHub allows its users to manage access permissions, review audit logs, and enable two-factor authentication. However, despite the endless effort, the platform still faces various issues related to the privacy of its users. This paper presents an empirical study delving into the GitHub ecosystem. Our focus is on investigating the utilization of privacy settings on the platform and identifying various types of sensitive information disclosed by users. Leveraging a dataset comprising 6,132 developers, we report and analyze their activities by means of comments on pull requests. Our findings indicate an active engagement by users with the available privacy settings on GitHub. Notably, we observe the disclosure of different forms of private information within pull request comments. This observation has prompted our exploration into sensitivity detection using a large language model and BERT, to pave the way for a personalized privacy assistant. Our work provides insights into the utilization of existing privacy protection tools, such as privacy settings, along with their inherent limitations. Essentially, we aim to advance research in this field by providing both the motivation for creating such privacy protection tools and a proposed methodology for personalizing them.

查看原文本刊更多论文

探索 GitHub 上的用户隐私意识：实证研究

GitHub 为开发人员提供了发布源代码和协作完成共同项目的实用方法。为了加强账户安全和隐私保护，GitHub 允许用户管理访问权限、查看审计日志并启用双因素身份验证。然而，尽管付出了巨大努力，该平台仍然面临着与用户隐私相关的各种问题。本文介绍了一项深入研究 GitHub 生态系统的实证研究。我们的重点是调查平台上隐私设置的使用情况，并识别用户披露的各类敏感信息。我们利用由 6,132 名开发者组成的数据集，通过对拉取请求的评论来报告和分析他们的活动。我们的研究结果表明，用户积极使用 GitHub 上的可用隐私设置。值得注意的是，我们观察到用户在拉取请求评论中披露了不同形式的隐私信息。这一观察结果促使我们探索使用大型语言模型和 BERT 进行敏感度检测，从而为个性化隐私助手铺平道路。我们的工作为现有隐私保护工具（如隐私设置）的使用及其固有局限性提供了启示。从根本上说，我们的目标是通过提供创建此类隐私保护工具的动机和建议的个性化方法来推进该领域的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Software Engineering

自引率

0.00%

发文量