比较生成式人工智能在生物医学本科生和研究生书面评估中的表现

IF 8.6 1区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH
Andrew Williams
{"title":"比较生成式人工智能在生物医学本科生和研究生书面评估中的表现","authors":"Andrew Williams","doi":"10.1186/s41239-024-00485-y","DOIUrl":null,"url":null,"abstract":"<p>The value of generative AI tools in higher education has received considerable attention. Although there are many proponents of its value as a learning tool, many are concerned with the issues regarding academic integrity and its use by students to compose written assessments. This study evaluates and compares the output of three commonly used generative AI tools, ChatGPT, Bing and Bard. Each AI tool was prompted with an essay question from undergraduate (UG) level 4 (year 1), level 5 (year 2), level 6 (year 3) and postgraduate (PG) level 7 biomedical sciences courses. Anonymised AI generated output was then evaluated by four independent markers, according to specified marking criteria and matched to the Frameworks for Higher Education Qualifications (FHEQ) of UK level descriptors. Percentage scores and ordinal grades were given for each marking criteria across AI generated papers, inter-rater reliability was calculated using Kendall’s coefficient of concordance and generative AI performance ranked. Across all UG and PG levels, ChatGPT performed better than Bing or Bard in areas of scientific accuracy, scientific detail and context. All AI tools performed consistently well at PG level compared to UG level, although only ChatGPT consistently met levels of high attainment at all UG levels. ChatGPT and Bing did not provide adequate references, while Bing falsified references. In conclusion, generative AI tools are useful for providing scientific information consistent with the academic standards required of students in written assignments. These findings have broad implications for the design, implementation and grading of written assessments in higher education.</p>","PeriodicalId":13871,"journal":{"name":"International Journal of Educational Technology in Higher Education","volume":"28 1","pages":""},"PeriodicalIF":8.6000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of generative AI performance on undergraduate and postgraduate written assessments in the biomedical sciences\",\"authors\":\"Andrew Williams\",\"doi\":\"10.1186/s41239-024-00485-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The value of generative AI tools in higher education has received considerable attention. Although there are many proponents of its value as a learning tool, many are concerned with the issues regarding academic integrity and its use by students to compose written assessments. This study evaluates and compares the output of three commonly used generative AI tools, ChatGPT, Bing and Bard. Each AI tool was prompted with an essay question from undergraduate (UG) level 4 (year 1), level 5 (year 2), level 6 (year 3) and postgraduate (PG) level 7 biomedical sciences courses. Anonymised AI generated output was then evaluated by four independent markers, according to specified marking criteria and matched to the Frameworks for Higher Education Qualifications (FHEQ) of UK level descriptors. Percentage scores and ordinal grades were given for each marking criteria across AI generated papers, inter-rater reliability was calculated using Kendall’s coefficient of concordance and generative AI performance ranked. Across all UG and PG levels, ChatGPT performed better than Bing or Bard in areas of scientific accuracy, scientific detail and context. All AI tools performed consistently well at PG level compared to UG level, although only ChatGPT consistently met levels of high attainment at all UG levels. ChatGPT and Bing did not provide adequate references, while Bing falsified references. In conclusion, generative AI tools are useful for providing scientific information consistent with the academic standards required of students in written assignments. These findings have broad implications for the design, implementation and grading of written assessments in higher education.</p>\",\"PeriodicalId\":13871,\"journal\":{\"name\":\"International Journal of Educational Technology in Higher Education\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2024-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Educational Technology in Higher Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1186/s41239-024-00485-y\",\"RegionNum\":1,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Educational Technology in Higher Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1186/s41239-024-00485-y","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

摘要

生成式人工智能工具在高等教育中的价值受到了广泛关注。尽管有很多人支持其作为学习工具的价值,但也有很多人担心学术诚信问题以及学生使用它来撰写书面评估报告的问题。本研究对 ChatGPT、Bing 和 Bard 这三种常用的生成式人工智能工具的输出结果进行了评估和比较。每个人工智能工具都以本科(UG)4 级(1 年级)、5 级(2 年级)、6 级(3 年级)和研究生(PG)7 级生物医学课程中的作文题为提示。然后,由四名独立阅卷员根据指定的评分标准和英国高等教育资格框架(FHEQ)的等级描述进行匿名人工智能生成输出评估。对人工智能生成的论文的每个评分标准都给出了百分比分数和序数等级,使用肯德尔一致系数计算了评分者之间的可靠性,并对人工智能的生成性能进行了排名。在所有 UG 和 PG 级别中,ChatGPT 在科学准确性、科学细节和上下文方面的表现均优于 Bing 或 Bard。与 UG 水平相比,所有人工智能工具在 PG 水平上的表现都一直很好,但只有 ChatGPT 在所有 UG 水平上都一直达到高水平。ChatGPT 和 Bing 没有提供足够的参考文献,而 Bing 则伪造了参考文献。总之,生成式人工智能工具有助于提供符合学生书面作业学术标准的科学信息。这些发现对高等教育中书面评估的设计、实施和评分具有广泛的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Comparison of generative AI performance on undergraduate and postgraduate written assessments in the biomedical sciences

Comparison of generative AI performance on undergraduate and postgraduate written assessments in the biomedical sciences

The value of generative AI tools in higher education has received considerable attention. Although there are many proponents of its value as a learning tool, many are concerned with the issues regarding academic integrity and its use by students to compose written assessments. This study evaluates and compares the output of three commonly used generative AI tools, ChatGPT, Bing and Bard. Each AI tool was prompted with an essay question from undergraduate (UG) level 4 (year 1), level 5 (year 2), level 6 (year 3) and postgraduate (PG) level 7 biomedical sciences courses. Anonymised AI generated output was then evaluated by four independent markers, according to specified marking criteria and matched to the Frameworks for Higher Education Qualifications (FHEQ) of UK level descriptors. Percentage scores and ordinal grades were given for each marking criteria across AI generated papers, inter-rater reliability was calculated using Kendall’s coefficient of concordance and generative AI performance ranked. Across all UG and PG levels, ChatGPT performed better than Bing or Bard in areas of scientific accuracy, scientific detail and context. All AI tools performed consistently well at PG level compared to UG level, although only ChatGPT consistently met levels of high attainment at all UG levels. ChatGPT and Bing did not provide adequate references, while Bing falsified references. In conclusion, generative AI tools are useful for providing scientific information consistent with the academic standards required of students in written assignments. These findings have broad implications for the design, implementation and grading of written assessments in higher education.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
19.30
自引率
4.70%
发文量
59
审稿时长
76.7 days
期刊介绍: This journal seeks to foster the sharing of critical scholarly works and information exchange across diverse cultural perspectives in the fields of technology-enhanced and digital learning in higher education. It aims to advance scientific knowledge on the human and personal aspects of technology use in higher education, while keeping readers informed about the latest developments in applying digital technologies to learning, training, research, and management.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信