Costanza Alfieri, Juri Di Rocco, Phuong T. Nguyen, Paola Inverardi
{"title":"Exploring User Privacy Awareness on GitHub: An Empirical Study","authors":"Costanza Alfieri, Juri Di Rocco, Phuong T. Nguyen, Paola Inverardi","doi":"arxiv-2409.04048","DOIUrl":null,"url":null,"abstract":"GitHub provides developers with a practical way to distribute source code and\ncollaboratively work on common projects. To enhance account security and\nprivacy, GitHub allows its users to manage access permissions, review audit\nlogs, and enable two-factor authentication. However, despite the endless\neffort, the platform still faces various issues related to the privacy of its\nusers. This paper presents an empirical study delving into the GitHub\necosystem. Our focus is on investigating the utilization of privacy settings on\nthe platform and identifying various types of sensitive information disclosed\nby users. Leveraging a dataset comprising 6,132 developers, we report and\nanalyze their activities by means of comments on pull requests. Our findings\nindicate an active engagement by users with the available privacy settings on\nGitHub. Notably, we observe the disclosure of different forms of private\ninformation within pull request comments. This observation has prompted our\nexploration into sensitivity detection using a large language model and BERT,\nto pave the way for a personalized privacy assistant. Our work provides\ninsights into the utilization of existing privacy protection tools, such as\nprivacy settings, along with their inherent limitations. Essentially, we aim to\nadvance research in this field by providing both the motivation for creating\nsuch privacy protection tools and a proposed methodology for personalizing\nthem.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
GitHub provides developers with a practical way to distribute source code and
collaboratively work on common projects. To enhance account security and
privacy, GitHub allows its users to manage access permissions, review audit
logs, and enable two-factor authentication. However, despite the endless
effort, the platform still faces various issues related to the privacy of its
users. This paper presents an empirical study delving into the GitHub
ecosystem. Our focus is on investigating the utilization of privacy settings on
the platform and identifying various types of sensitive information disclosed
by users. Leveraging a dataset comprising 6,132 developers, we report and
analyze their activities by means of comments on pull requests. Our findings
indicate an active engagement by users with the available privacy settings on
GitHub. Notably, we observe the disclosure of different forms of private
information within pull request comments. This observation has prompted our
exploration into sensitivity detection using a large language model and BERT,
to pave the way for a personalized privacy assistant. Our work provides
insights into the utilization of existing privacy protection tools, such as
privacy settings, along with their inherent limitations. Essentially, we aim to
advance research in this field by providing both the motivation for creating
such privacy protection tools and a proposed methodology for personalizing
them.