Is This Code the Best? Or Can It Be Further Improved? Developer Stats to the Rescue

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Access Pub Date : 2024-10-02 DOI:10.1109/ACCESS.2024.3472481

Waseem Ahmed;Ahmed Harbaoui

{"title":"Is This Code the Best? Or Can It Be Further Improved? Developer Stats to the Rescue","authors":"Waseem Ahmed;Ahmed Harbaoui","doi":"10.1109/ACCESS.2024.3472481","DOIUrl":null,"url":null,"abstract":"Is the given code the best? Or can it be further improved? And if so, by how much? To answer these three questions, code cannot be seen in isolation from it’s developer as the developer factor plays a vital role in determining code quality. However, no universally accepted metric or developer stat currently exists that provides an objective indicator to a developer’s ability to produce code benchmarked against an expert developer. While traditional developer stats like rank, position, rating and experience published on Online Judges (OJs) provide various insights into a developer’s behavior and ability, they do not help us in answering these three questions. Moreover, unless code quality can be numerically quantified this may not be possible. Towards this end, we conducted an empirical study of over 72 million submissions made by 143,853 users in 1876 contests on Codeforces, a popular OJ, to analyze their code in terms of its correctness, completeness and performance efficiency (code quality characteristics listed in the ISO/IEC 25010 product quality model) measured against the given requirements regardless of the programming language used. First, we investigated ways to predict code quality given a developer’s traditional stats using various ML regression models. To quantify and compare code quality, new concepts like score and contest scorecard, had to be introduced. Second, we identified causes that led to poor predictability. Our analysis helped classify user’s performance in contests based on our discovery of erratic or temperamental behavior of users during contests. Third, we formulated a quality index or \n<inline-formula> <tex-math>$q\\text {-}index$ </tex-math></inline-formula>\n of a developer, a new and unique developer stat to indicate the ability of a developer in producing quality code, and to help increase the predictability of the ML models by mitigating the negative effect of temperamental behavior of users during contests. Among the ML models used, our results suggest that the GradientBoost regressor is the most suited ML model to predict code quality which gave us a high prediction accuracy of around 99.55%. We also demonstrated the uniqueness of \n<inline-formula> <tex-math>$q\\text {-}index$ </tex-math></inline-formula>\n over traditional stats and described how it can complement the usefulness of traditional developer stats in decision making.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"12 ","pages":"144395-144411"},"PeriodicalIF":3.4000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10703058","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10703058/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Is the given code the best? Or can it be further improved? And if so, by how much? To answer these three questions, code cannot be seen in isolation from it’s developer as the developer factor plays a vital role in determining code quality. However, no universally accepted metric or developer stat currently exists that provides an objective indicator to a developer’s ability to produce code benchmarked against an expert developer. While traditional developer stats like rank, position, rating and experience published on Online Judges (OJs) provide various insights into a developer’s behavior and ability, they do not help us in answering these three questions. Moreover, unless code quality can be numerically quantified this may not be possible. Towards this end, we conducted an empirical study of over 72 million submissions made by 143,853 users in 1876 contests on Codeforces, a popular OJ, to analyze their code in terms of its correctness, completeness and performance efficiency (code quality characteristics listed in the ISO/IEC 25010 product quality model) measured against the given requirements regardless of the programming language used. First, we investigated ways to predict code quality given a developer’s traditional stats using various ML regression models. To quantify and compare code quality, new concepts like score and contest scorecard, had to be introduced. Second, we identified causes that led to poor predictability. Our analysis helped classify user’s performance in contests based on our discovery of erratic or temperamental behavior of users during contests. Third, we formulated a quality index or

$q\text {-}index$

of a developer, a new and unique developer stat to indicate the ability of a developer in producing quality code, and to help increase the predictability of the ML models by mitigating the negative effect of temperamental behavior of users during contests. Among the ML models used, our results suggest that the GradientBoost regressor is the most suited ML model to predict code quality which gave us a high prediction accuracy of around 99.55%. We also demonstrated the uniqueness of

$q\text {-}index$

over traditional stats and described how it can complement the usefulness of traditional developer stats in decision making.

查看原文本刊更多论文

这个代码是最好的吗？还是可以进一步改进？开发人员统计数据来救场

给出的代码是最好的吗？还是可以进一步改进？如果可以，又能改进多少？要回答这三个问题，就不能脱离开发人员来看待代码，因为开发人员的因素在决定代码质量方面起着至关重要的作用。然而，目前还没有一个普遍接受的衡量标准或开发人员统计数据，可以提供一个客观的指标来衡量开发人员的代码生成能力，并与专家级开发人员进行比较。虽然传统的开发人员统计数据，如在线评委（OJs）上公布的排名、职位、评级和经验等，可以让我们对开发人员的行为和能力有多方面的了解，但它们并不能帮助我们回答这三个问题。此外，除非代码质量可以量化，否则这可能是不可能的。为此，我们对 143,853 位用户在 Codeforces（一个流行的 OJ）上的 1876 次竞赛中提交的 7,200 多万份代码进行了实证研究，并根据给定的要求对代码的正确性、完整性和性能效率（ISO/IEC 25010 产品质量模型中列出的代码质量特征）进行了分析，而不管使用的是哪种编程语言。首先，我们利用各种多项式回归模型研究了如何根据开发人员的传统统计数据预测代码质量。为了量化和比较代码质量，必须引入分数和竞赛记分卡等新概念。其次，我们找出了导致可预测性差的原因。根据我们发现的用户在竞赛中的不稳定或不稳定行为，我们的分析有助于对用户在竞赛中的表现进行分类。第三，我们制定了开发人员的质量指数或 $q\text{-}index$，这是一种新的、独特的开发人员统计数据，用于显示开发人员生成高质量代码的能力，并通过减轻用户在竞赛期间的情绪化行为的负面影响来帮助提高 ML 模型的可预测性。在使用的 ML 模型中，我们的结果表明梯度提升回归器（GradientBoost regressor）是最适合预测代码质量的 ML 模型，其预测准确率高达 99.55%。我们还证明了 $q\text {-}index$ 相对于传统统计的独特性，并描述了它如何补充传统开发人员统计在决策制定中的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.