ChatGPT and Python programming homework

IF 0.8 Q3 EDUCATION & EDUCATIONAL RESEARCH

Decision Sciences-Journal of Innovative Education Pub Date : 2024-01-19 DOI:10.1111/dsji.12306

Michael E. Ellis, K. Mike Casey, Geoffrey Hill

{"title":"ChatGPT and Python programming homework","authors":"Michael E. Ellis, K. Mike Casey, Geoffrey Hill","doi":"10.1111/dsji.12306","DOIUrl":null,"url":null,"abstract":"<p>Large Language Model (LLM) artificial intelligence tools present a unique challenge for educators who teach programming languages. While LLMs like ChatGPT have been well documented for their ability to complete exams and create prose, there is a noticeable lack of research into their ability to solve problems using high-level programming languages. Like many other university educators, those teaching programming courses would like to detect if students submit assignments generated by an LLM. To investigate grade performance and the likelihood of instructors identifying code generated by artificial intelligence (AI) tools, we compare code generated by students and ChatGPT for introductory Python homework assignments. Our research reveals mixed results on both counts, with ChatGPT performing like a mid-range student on assignments and seasoned instructors struggling to detect AI-generated code. This indicates that although AI-generated results may not always be identifiable, they do not currently yield results approaching those of diligent students. We describe our methodology for selecting and evaluating the code examples, the results of our comparison, and the implications for future classes. We conclude with recommendations for how instructors of programming courses can mitigate student use of LLM tools as well as articulate the inherent value of preserving students’ individual creativity in producing programming languages.</p>","PeriodicalId":46210,"journal":{"name":"Decision Sciences-Journal of Innovative Education","volume":"22 2","pages":"74-87"},"PeriodicalIF":0.8000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Sciences-Journal of Innovative Education","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/dsji.12306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Large Language Model (LLM) artificial intelligence tools present a unique challenge for educators who teach programming languages. While LLMs like ChatGPT have been well documented for their ability to complete exams and create prose, there is a noticeable lack of research into their ability to solve problems using high-level programming languages. Like many other university educators, those teaching programming courses would like to detect if students submit assignments generated by an LLM. To investigate grade performance and the likelihood of instructors identifying code generated by artificial intelligence (AI) tools, we compare code generated by students and ChatGPT for introductory Python homework assignments. Our research reveals mixed results on both counts, with ChatGPT performing like a mid-range student on assignments and seasoned instructors struggling to detect AI-generated code. This indicates that although AI-generated results may not always be identifiable, they do not currently yield results approaching those of diligent students. We describe our methodology for selecting and evaluating the code examples, the results of our comparison, and the implications for future classes. We conclude with recommendations for how instructors of programming courses can mitigate student use of LLM tools as well as articulate the inherent value of preserving students’ individual creativity in producing programming languages.

查看原文本刊更多论文

ChatGPT 和 Python 编程作业

大型语言模型（LLM）人工智能工具为教授编程语言的教育工作者带来了独特的挑战。虽然像 ChatGPT 这样的 LLM 在完成考试和创作散文方面的能力已经得到了很好的证明，但对它们使用高级编程语言解决问题的能力却明显缺乏研究。与许多其他大学教育工作者一样，教授编程课程的人也希望检测学生是否提交了由 LLM 生成的作业。为了调查成绩表现和教师识别人工智能（AI）工具生成的代码的可能性，我们比较了学生和 ChatGPT 生成的 Python 入门作业的代码。我们的研究显示，这两方面的结果参差不齐，ChatGPT 在作业中的表现与中等水平的学生无异，而经验丰富的教师却很难发现人工智能生成的代码。这表明，虽然人工智能生成的结果不一定总能被识别出来，但它们目前产生的结果并不接近勤奋学生的结果。我们将介绍选择和评估代码示例的方法、比较结果以及对未来课程的影响。最后，我们就程序设计课程的讲师如何减少学生使用 LLM 工具提出了建议，并阐明了在制作程序设计语言时保留学生个人创造力的内在价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Decision Sciences-Journal of Innovative Education EDUCATION & EDUCATIONAL RESEARCH-

CiteScore

3.60

自引率

36.80%

发文量