Don’t settle for the first! How many GitHub Copilot solutions should you check?

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-04-08 DOI:10.1016/j.infsof.2025.107737

Julian Oertel , Jil Klünder , Regina Hebig

{"title":"Don’t settle for the first! How many GitHub Copilot solutions should you check?","authors":"Julian Oertel , Jil Klünder , Regina Hebig","doi":"10.1016/j.infsof.2025.107737","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>With the integration of generative artificial intelligence (GenAI) tools such as GitHub Copilot into development processes, developers can be supported when writing code.</div></div><div><h3>Objectives:</h3><div>As GitHub Copilot has a feature to provide up to ten solutions at once, we explore, how developers should approach those solutions with the goal of providing recommendations to achieve suitable trade-offs in finding correct solutions and checking solutions.</div></div><div><h3>Methods:</h3><div>In this study, we analyze a total of 2025 coding problems provided by LeetCode and 17048 solutions to solve these problems generated by GitHub Copilot in Python. We focus on three key issues: firstly, whether it is beneficial to consider multiple solutions; secondly, the impact of the position of a solution; and thirdly, the number of solutions that should be checked by a developer.</div></div><div><h3>Results:</h3><div>Overall, our results point to the following observations: (1) solutions are not less likely to be correct if they appear at later positions; (2) when looking for a solution to a common problem, checking four to five solutions is generally enough; (3) novel or difficult problems are unlikely to be solved by GitHub Copilot; (4) skipping the first solution is advised when considering only one solution, as the first solution is less likely to be correct; and (5) checking all solutions is necessary to not miss correct solutions, but the effort is usually not justified.</div></div><div><h3>Conclusion:</h3><div>Based on our study, we conclude that there is potential for improvement in better supporting developers. For instance, there are few cases where ten generated solutions provide more value than fewer solutions. Depending on the use scenario, it could be more useful if GitHub Copilot allowed developers to request a single, comprehensive solution.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"183 ","pages":"Article 107737"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095058492500076X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

With the integration of generative artificial intelligence (GenAI) tools such as GitHub Copilot into development processes, developers can be supported when writing code.

Objectives:

As GitHub Copilot has a feature to provide up to ten solutions at once, we explore, how developers should approach those solutions with the goal of providing recommendations to achieve suitable trade-offs in finding correct solutions and checking solutions.

Methods:

In this study, we analyze a total of 2025 coding problems provided by LeetCode and 17 048 solutions to solve these problems generated by GitHub Copilot in Python. We focus on three key issues: firstly, whether it is beneficial to consider multiple solutions; secondly, the impact of the position of a solution; and thirdly, the number of solutions that should be checked by a developer.

Results:

Overall, our results point to the following observations: (1) solutions are not less likely to be correct if they appear at later positions; (2) when looking for a solution to a common problem, checking four to five solutions is generally enough; (3) novel or difficult problems are unlikely to be solved by GitHub Copilot; (4) skipping the first solution is advised when considering only one solution, as the first solution is less likely to be correct; and (5) checking all solutions is necessary to not miss correct solutions, but the effort is usually not justified.

Conclusion:

Based on our study, we conclude that there is potential for improvement in better supporting developers. For instance, there are few cases where ten generated solutions provide more value than fewer solutions. Depending on the use scenario, it could be more useful if GitHub Copilot allowed developers to request a single, comprehensive solution.

查看原文本刊更多论文

不要满足于前者！你应该检查多少个GitHub Copilot解决方案？

背景：通过将GitHub Copilot等生成式人工智能（GenAI）工具集成到开发过程中，开发人员可以在编写代码时得到支持。目标：由于GitHub Copilot有一个功能，一次提供多达十个解决方案，我们探索，开发人员应该如何接近这些解决方案的目标，提供建议，以实现寻找正确的解决方案和检查解决方案的适当权衡。方法：在本研究中，我们共分析了LeetCode提供的2025个编码问题，以及Python中GitHub Copilot生成的17048个解决这些问题的方案。我们关注三个关键问题：第一，考虑多种解决方案是否有益；其次，解决位置的影响；第三，开发人员应该检查的解决方案的数量。结果：总体而言，我们的结果指向以下观察结果：(1)如果解决方案出现在后面的位置，那么解决方案的正确性不会降低；(2)在寻找常见问题的解决方案时，检查四到五个解决方案一般就足够了；(3) GitHub Copilot不太可能解决新颖或困难的问题；(4)当只考虑一个解时，建议跳过第一个解，因为第一个解正确的可能性较小；(5)检查所有的解决方案是必要的，以免错过正确的解决方案，但这种努力通常是不合理的。结论：根据我们的研究，我们得出结论，在更好地支持开发人员方面存在改进的潜力。例如，很少有10个生成的解决方案比更少的解决方案提供更多价值的情况。根据使用场景，如果GitHub Copilot允许开发人员请求一个单一的、全面的解决方案，它可能会更有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.