Nikolaos Bafatakis, Niels Boecker, Wenjie. Boon, Martin Cabello Salazar, J. Krinke, Gazi Oznacar, Robert White
{"title":"Python代码风格遵从堆栈溢出","authors":"Nikolaos Bafatakis, Niels Boecker, Wenjie. Boon, Martin Cabello Salazar, J. Krinke, Gazi Oznacar, Robert White","doi":"10.1109/MSR.2019.00042","DOIUrl":null,"url":null,"abstract":"Software developers all over the world use Stack Overflow (SO) to interact and exchange code snippets. Research also uses SO to harvest code snippets for use with recommendation systems. However, previous work has shown that code on SO may have quality issues, such as security or license problems. We analyse Python code on SO to determine its coding style compliance. From 1,962,535 code snippets tagged with 'python', we extracted 407,097 snippets of at least 6 statements of Python code. Surprisingly, 93.87% of the extracted snippets contain style violations, with an average of 0.7 violations per statement and a huge number of snippets with a considerably higher ratio. Researchers and developers should, therefore, be aware that code snippets on SO may not representative of good coding style. Furthermore, while user reputation seems to be unrelated to coding style compliance, for posts with vote scores in the range between -10 and 20, we found a strong correlation (r = -0.87, p < 10^-7) between the vote score a post received and the average number of violations per statement for snippets in such posts.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"52 1","pages":"210-214"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Python Coding Style Compliance on Stack Overflow\",\"authors\":\"Nikolaos Bafatakis, Niels Boecker, Wenjie. Boon, Martin Cabello Salazar, J. Krinke, Gazi Oznacar, Robert White\",\"doi\":\"10.1109/MSR.2019.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software developers all over the world use Stack Overflow (SO) to interact and exchange code snippets. Research also uses SO to harvest code snippets for use with recommendation systems. However, previous work has shown that code on SO may have quality issues, such as security or license problems. We analyse Python code on SO to determine its coding style compliance. From 1,962,535 code snippets tagged with 'python', we extracted 407,097 snippets of at least 6 statements of Python code. Surprisingly, 93.87% of the extracted snippets contain style violations, with an average of 0.7 violations per statement and a huge number of snippets with a considerably higher ratio. Researchers and developers should, therefore, be aware that code snippets on SO may not representative of good coding style. Furthermore, while user reputation seems to be unrelated to coding style compliance, for posts with vote scores in the range between -10 and 20, we found a strong correlation (r = -0.87, p < 10^-7) between the vote score a post received and the average number of violations per statement for snippets in such posts.\",\"PeriodicalId\":6706,\"journal\":{\"name\":\"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)\",\"volume\":\"52 1\",\"pages\":\"210-214\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MSR.2019.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR.2019.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
摘要
全世界的软件开发人员都使用Stack Overflow (SO)来交互和交换代码片段。研究人员还使用SO来收集推荐系统使用的代码片段。但是,以前的工作表明,SO上的代码可能存在质量问题,例如安全性或许可证问题。我们在SO上分析Python代码,以确定其编码风格的遵从性。从1,962,535个标有“python”的代码片段中,我们提取了至少6条python代码语句的407,097个片段。令人惊讶的是,93.87%的提取片段包含样式违规,平均每个语句有0.7个违规,而且大量片段的比例要高得多。因此,研究人员和开发人员应该意识到,SO上的代码片段可能并不代表良好的编码风格。此外,虽然用户声誉似乎与编码风格合规性无关,但对于投票得分在-10到20之间的帖子,我们发现帖子收到的投票得分与帖子中每个语句片段的平均违规次数之间存在很强的相关性(r = -0.87, p < 10^-7)。
Software developers all over the world use Stack Overflow (SO) to interact and exchange code snippets. Research also uses SO to harvest code snippets for use with recommendation systems. However, previous work has shown that code on SO may have quality issues, such as security or license problems. We analyse Python code on SO to determine its coding style compliance. From 1,962,535 code snippets tagged with 'python', we extracted 407,097 snippets of at least 6 statements of Python code. Surprisingly, 93.87% of the extracted snippets contain style violations, with an average of 0.7 violations per statement and a huge number of snippets with a considerably higher ratio. Researchers and developers should, therefore, be aware that code snippets on SO may not representative of good coding style. Furthermore, while user reputation seems to be unrelated to coding style compliance, for posts with vote scores in the range between -10 and 20, we found a strong correlation (r = -0.87, p < 10^-7) between the vote score a post received and the average number of violations per statement for snippets in such posts.