通过大学真题评估 Chat-GPT 的 Swift 编程能力

Software: Practice and Experience Pub Date : 2024-03-21 DOI:10.1002/spe.3330

Zizhuo Zhang, Lian Wen, Yanfei Jiang, Yongli Liu

{"title":"通过大学真题评估 Chat-GPT 的 Swift 编程能力","authors":"Zizhuo Zhang, Lian Wen, Yanfei Jiang, Yongli Liu","doi":"10.1002/spe.3330","DOIUrl":null,"url":null,"abstract":"In this study, we evaluate the programming capabilities of OpenAI's GPT‐3.5 and GPT‐4 models using Swift‐based exam questions from a third‐year university course. The results indicate that both GPT models generally outperform the average student score, yet they do not consistently exceed the performance of the top students. This comparison highlights areas where the GPT models excel and where they fall short, providing a nuanced view of their current programming proficiency. The study also reveals surprising instances where GPT‐3.5 outperforms GPT‐4, suggesting complex variations in AI model capabilities. By providing a clear benchmark of GPT's programming skills in an academic context, our research contributes valuable insights for future advancements in AI programming education and underscores the need for continued development to fully realize AI's potential in educational settings.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":"64 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluate Chat‐GPT's programming capability in Swift through real university exam questions\",\"authors\":\"Zizhuo Zhang, Lian Wen, Yanfei Jiang, Yongli Liu\",\"doi\":\"10.1002/spe.3330\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this study, we evaluate the programming capabilities of OpenAI's GPT‐3.5 and GPT‐4 models using Swift‐based exam questions from a third‐year university course. The results indicate that both GPT models generally outperform the average student score, yet they do not consistently exceed the performance of the top students. This comparison highlights areas where the GPT models excel and where they fall short, providing a nuanced view of their current programming proficiency. The study also reveals surprising instances where GPT‐3.5 outperforms GPT‐4, suggesting complex variations in AI model capabilities. By providing a clear benchmark of GPT's programming skills in an academic context, our research contributes valuable insights for future advancements in AI programming education and underscores the need for continued development to fully realize AI's potential in educational settings.\",\"PeriodicalId\":21899,\"journal\":{\"name\":\"Software: Practice and Experience\",\"volume\":\"64 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software: Practice and Experience\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/spe.3330\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本研究中，我们使用大学三年级课程中基于 Swift 的试题，对 OpenAI 的 GPT-3.5 和 GPT-4 模型的编程能力进行了评估。结果表明，这两种 GPT 模型的成绩普遍高于学生的平均成绩，但它们并没有持续超过优秀学生的成绩。这种比较凸显了 GPT 模型的优势领域和不足之处，提供了对其当前编程能力的细微观察。研究还揭示了 GPT-3.5 优于 GPT-4 的惊人情况，这表明人工智能模型能力存在复杂的差异。我们的研究为 GPT 在学术背景下的编程技能提供了一个清晰的基准，为人工智能编程教育的未来发展提供了宝贵的见解，并强调了在教育环境中充分发挥人工智能潜力的持续发展的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluate Chat‐GPT's programming capability in Swift through real university exam questions

In this study, we evaluate the programming capabilities of OpenAI's GPT‐3.5 and GPT‐4 models using Swift‐based exam questions from a third‐year university course. The results indicate that both GPT models generally outperform the average student score, yet they do not consistently exceed the performance of the top students. This comparison highlights areas where the GPT models excel and where they fall short, providing a nuanced view of their current programming proficiency. The study also reveals surprising instances where GPT‐3.5 outperforms GPT‐4, suggesting complex variations in AI model capabilities. By providing a clear benchmark of GPT's programming skills in an academic context, our research contributes valuable insights for future advancements in AI programming education and underscores the need for continued development to fully realize AI's potential in educational settings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Software: Practice and Experience

自引率

0.00%

发文量