开源项目中的对抗性作者归属

Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy Pub Date : 2019-03-13 DOI:10.1145/3292006.3300032

A. Matyukhina, Natalia Stakhanova, M. Preda, Celine Perley

{"title":"开源项目中的对抗性作者归属","authors":"A. Matyukhina, Natalia Stakhanova, M. Preda, Celine Perley","doi":"10.1145/3292006.3300032","DOIUrl":null,"url":null,"abstract":"Open-source software is open to anyone by design, whether it is a community of developers, hackers or malicious users. Authors of open-source software typically hide their identity through nicknames and avatars. However, they have no protection against authorship attribution techniques that are able to create software author profiles just by analyzing software characteristics. In this paper we present an author imitation attack that allows to deceive current authorship attribution systems and mimic a coding style of a target developer. Withing this context we explore the potential of the existing attribution techniques to be deceived. Our results show that we are able to imitate the coding style of the developers based on the data collected from the popular source code repository, GitHub. To subvert author imitation attack, we propose a novel author obfuscation approach that allows us to hide the coding style of the author. Unlike existing obfuscation tools, this new obfuscation technique uses transformations that preserve code readability. We assess the effectiveness of our attacks on several datasets produced by actual developers from GitHub, and participants of the GoogleCodeJam competition. Throughout our experiments we show that the author hiding can be achieved by making sensible transformations which significantly reduce the likelihood of identifying the author's style to 0% by current authorship attribution systems.","PeriodicalId":246233,"journal":{"name":"Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Adversarial Authorship Attribution in Open-Source Projects\",\"authors\":\"A. Matyukhina, Natalia Stakhanova, M. Preda, Celine Perley\",\"doi\":\"10.1145/3292006.3300032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Open-source software is open to anyone by design, whether it is a community of developers, hackers or malicious users. Authors of open-source software typically hide their identity through nicknames and avatars. However, they have no protection against authorship attribution techniques that are able to create software author profiles just by analyzing software characteristics. In this paper we present an author imitation attack that allows to deceive current authorship attribution systems and mimic a coding style of a target developer. Withing this context we explore the potential of the existing attribution techniques to be deceived. Our results show that we are able to imitate the coding style of the developers based on the data collected from the popular source code repository, GitHub. To subvert author imitation attack, we propose a novel author obfuscation approach that allows us to hide the coding style of the author. Unlike existing obfuscation tools, this new obfuscation technique uses transformations that preserve code readability. We assess the effectiveness of our attacks on several datasets produced by actual developers from GitHub, and participants of the GoogleCodeJam competition. Throughout our experiments we show that the author hiding can be achieved by making sensible transformations which significantly reduce the likelihood of identifying the author's style to 0% by current authorship attribution systems.\",\"PeriodicalId\":246233,\"journal\":{\"name\":\"Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3292006.3300032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3292006.3300032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

开源软件从设计上就对任何人开放，无论是开发者社区、黑客还是恶意用户。开源软件的作者通常通过昵称和头像来隐藏自己的身份。然而，对于那些仅仅通过分析软件特性就能创建软件作者配置文件的作者归属技术，它们没有任何保护。在本文中，我们提出了一种作者模仿攻击，该攻击允许欺骗当前的作者归属系统并模仿目标开发人员的编码风格。在此背景下，我们探讨了现有归因技术被欺骗的可能性。我们的结果表明，基于从流行的源代码存储库GitHub收集的数据，我们能够模仿开发人员的编码风格。为了颠覆作者模仿攻击，我们提出了一种新颖的作者混淆方法，该方法允许我们隐藏作者的编码风格。与现有的混淆工具不同，这种新的混淆技术使用保持代码可读性的转换。我们对几个数据集的攻击效果进行了评估，这些数据集是由GitHub的实际开发人员和GoogleCodeJam竞赛的参与者制作的。在我们的实验中，我们表明作者隐藏可以通过进行合理的转换来实现，这种转换可以显着将当前作者归属系统识别作者风格的可能性降低到0%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adversarial Authorship Attribution in Open-Source Projects

Open-source software is open to anyone by design, whether it is a community of developers, hackers or malicious users. Authors of open-source software typically hide their identity through nicknames and avatars. However, they have no protection against authorship attribution techniques that are able to create software author profiles just by analyzing software characteristics. In this paper we present an author imitation attack that allows to deceive current authorship attribution systems and mimic a coding style of a target developer. Withing this context we explore the potential of the existing attribution techniques to be deceived. Our results show that we are able to imitate the coding style of the developers based on the data collected from the popular source code repository, GitHub. To subvert author imitation attack, we propose a novel author obfuscation approach that allows us to hide the coding style of the author. Unlike existing obfuscation tools, this new obfuscation technique uses transformations that preserve code readability. We assess the effectiveness of our attacks on several datasets produced by actual developers from GitHub, and participants of the GoogleCodeJam competition. Throughout our experiments we show that the author hiding can be achieved by making sensible transformations which significantly reduce the likelihood of identifying the author's style to 0% by current authorship attribution systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy

自引率

0.00%

发文量