Evaluating Copilot on CS1 Code Writing Problems with Suppressed Specifications

Proceedings of the 16th Annual ACM India Compute Conference Pub Date : 2023-12-09 DOI:10.1145/3627217.3627235

Varshini Venkatesh, Vaishnavi Venkatesh, Viraj Kumar

{"title":"Evaluating Copilot on CS1 Code Writing Problems with Suppressed Specifications","authors":"Varshini Venkatesh, Vaishnavi Venkatesh, Viraj Kumar","doi":"10.1145/3627217.3627235","DOIUrl":null,"url":null,"abstract":"Code writing problems in introductory programming (CS1) courses typically ask students to write simple functions or programs based on detailed natural-language specifications. These details can be leveraged by large language models (LLMs), accessible to students via tools such as GitHub Copilot, to generate solutions that are often correct. CS1 instructors who are unwilling or unable to prohibit such usage must consider variants of traditional code writing problems that align with their learning objectives but are more difficult for LLMs to solve. Since LLMs are sensitive to the level of details in their prompts, it is natural to consider variants where details are progressively trimmed from the specifications of traditional code writing problems, and consequent ambiguities are clarified via examples. We consider an extreme variant, where all natural language is suppressed except for meaningful names of functions and their arguments. We evaluate the performance of Copilot on suppressed specification versions of 153 such problems drawn from the CodeCheck repository. If Copilot initially fails to generate a correct solution, we augment each suppressed specification with as few clarifying examples as possible to obtain a correct solution. Copilot solves 134 problems (87%) with just 0.7 examples on average, requiring no examples in 78 instances. Thus, modifying traditional code-writing problems by merely trimming specification details is unlikely to thwart sophisticated LLMs such as GitHub Copilot.","PeriodicalId":508655,"journal":{"name":"Proceedings of the 16th Annual ACM India Compute Conference","volume":"23 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th Annual ACM India Compute Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3627217.3627235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Code writing problems in introductory programming (CS1) courses typically ask students to write simple functions or programs based on detailed natural-language specifications. These details can be leveraged by large language models (LLMs), accessible to students via tools such as GitHub Copilot, to generate solutions that are often correct. CS1 instructors who are unwilling or unable to prohibit such usage must consider variants of traditional code writing problems that align with their learning objectives but are more difficult for LLMs to solve. Since LLMs are sensitive to the level of details in their prompts, it is natural to consider variants where details are progressively trimmed from the specifications of traditional code writing problems, and consequent ambiguities are clarified via examples. We consider an extreme variant, where all natural language is suppressed except for meaningful names of functions and their arguments. We evaluate the performance of Copilot on suppressed specification versions of 153 such problems drawn from the CodeCheck repository. If Copilot initially fails to generate a correct solution, we augment each suppressed specification with as few clarifying examples as possible to obtain a correct solution. Copilot solves 134 problems (87%) with just 0.7 examples on average, requiring no examples in 78 instances. Thus, modifying traditional code-writing problems by merely trimming specification details is unlikely to thwart sophisticated LLMs such as GitHub Copilot.

查看原文本刊更多论文

评估 Copilot 在 CS1 代码编写问题上的使用被抑制的规范

编程入门（CS1）课程中的代码编写问题通常要求学生根据详细的自然语言规范编写简单的函数或程序。学生可以通过 GitHub Copilot 等工具访问大型语言模型 (LLM)，利用这些细节生成通常正确的解决方案。不愿意或无法禁止这种使用的 CS1 指导教师必须考虑传统代码编写问题的变体，这些变体符合他们的学习目标，但对 LLM 来说更难以解决。由于 LLM 对提示中的细节水平很敏感，因此自然要考虑一些变体，即从传统代码编写问题的规范中逐步删减细节，并通过示例澄清由此产生的模糊之处。我们考虑了一种极端的变体，即除了有意义的函数名称及其参数外，所有自然语言都被压制。我们评估了 Copilot 在来自 CodeCheck 代码库的 153 个此类问题的抑制规范版本上的性能。如果 Copilot 最初无法生成正确的解决方案，我们就会在每个被抑制的规范中添加尽可能少的说明性示例，以获得正确的解决方案。Copilot 平均仅用 0.7 个示例就解决了 134 个问题（87%），在 78 个实例中不需要任何示例。因此，仅仅通过修改规范细节来修改传统的代码编写问题，不太可能挫败像 GitHub Copilot 这样复杂的 LLM。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 16th Annual ACM India Compute Conference

自引率

0.00%

发文量