用GPT-3.5和GPT-4评估非创伤性脊髓损伤教育中解释的可读性和答案的可靠性。

IF 3.3 2区 教育学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES
Alejandro García-Rudolph, David Sanchez-Pinsach, Mark Andrew Wright, Eloy Opisso, Joan Vidal
{"title":"用GPT-3.5和GPT-4评估非创伤性脊髓损伤教育中解释的可读性和答案的可靠性。","authors":"Alejandro García-Rudolph, David Sanchez-Pinsach, Mark Andrew Wright, Eloy Opisso, Joan Vidal","doi":"10.1080/0142159X.2024.2430365","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Our study aimed to: i) Assess the readability of textbook explanations using established indexes; ii) Compare these with GPT-4's default explanations, ensuring similar word counts for direct comparisons; iii) Evaluate GPT-4's adaptability by simplifying high-complexity explanations; iv) Determine the reliability of GPT-3.5 and GPT-4 in providing accurate answers.</p><p><strong>Material and methods: </strong>We utilized a textbook designed for ABPMR certification. Our analysis covered 50 multiple-choice questions, each with a detailed explanation, focusing on non-traumatic spinal cord injury (NTSCI).</p><p><strong>Results: </strong>Our analysis revealed statistically significant differences in readability scores, with the textbook achieving 14.5 (SD = 2.5) compared to GPT-4's 17.3 (SD = 1.9), indicating that GPT-4's explanations are generally more complex (<i>p</i> < 0.001). Using the Flesch Reading Ease Score, 86% of GPT-4's explanations fell into the 'Very difficult' category, significantly higher than the textbook's 58% (<i>p</i> = 0.006). GPT-4 successfully demonstrated adaptability by reducing the mean readability score of the top-nine most complex explanations, maintaining the word count. Regarding reliability, GPT-3.5 and GPT-4 scored 84% and 96% respectively, with GPT-4 outperforming GPT-3.5 (<i>p</i> = 0.046).</p><p><strong>Conclusions: </strong>Our results confirmed GPT-4's potential in medical education by providing highly accurate yet often complex explanations for NTSCI, which were successfully simplified without losing accuracy.</p>","PeriodicalId":18643,"journal":{"name":"Medical Teacher","volume":" ","pages":"1-8"},"PeriodicalIF":3.3000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing readability of explanations and reliability of answers by GPT-3.5 and GPT-4 in non-traumatic spinal cord injury education.\",\"authors\":\"Alejandro García-Rudolph, David Sanchez-Pinsach, Mark Andrew Wright, Eloy Opisso, Joan Vidal\",\"doi\":\"10.1080/0142159X.2024.2430365\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Our study aimed to: i) Assess the readability of textbook explanations using established indexes; ii) Compare these with GPT-4's default explanations, ensuring similar word counts for direct comparisons; iii) Evaluate GPT-4's adaptability by simplifying high-complexity explanations; iv) Determine the reliability of GPT-3.5 and GPT-4 in providing accurate answers.</p><p><strong>Material and methods: </strong>We utilized a textbook designed for ABPMR certification. Our analysis covered 50 multiple-choice questions, each with a detailed explanation, focusing on non-traumatic spinal cord injury (NTSCI).</p><p><strong>Results: </strong>Our analysis revealed statistically significant differences in readability scores, with the textbook achieving 14.5 (SD = 2.5) compared to GPT-4's 17.3 (SD = 1.9), indicating that GPT-4's explanations are generally more complex (<i>p</i> < 0.001). Using the Flesch Reading Ease Score, 86% of GPT-4's explanations fell into the 'Very difficult' category, significantly higher than the textbook's 58% (<i>p</i> = 0.006). GPT-4 successfully demonstrated adaptability by reducing the mean readability score of the top-nine most complex explanations, maintaining the word count. Regarding reliability, GPT-3.5 and GPT-4 scored 84% and 96% respectively, with GPT-4 outperforming GPT-3.5 (<i>p</i> = 0.046).</p><p><strong>Conclusions: </strong>Our results confirmed GPT-4's potential in medical education by providing highly accurate yet often complex explanations for NTSCI, which were successfully simplified without losing accuracy.</p>\",\"PeriodicalId\":18643,\"journal\":{\"name\":\"Medical Teacher\",\"volume\":\" \",\"pages\":\"1-8\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical Teacher\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1080/0142159X.2024.2430365\",\"RegionNum\":2,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Teacher","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/0142159X.2024.2430365","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究旨在:i)利用已建立的指标评估教科书解释的可读性;ii)将这些与GPT-4的默认解释进行比较,确保直接比较的字数相似;iii)通过简化高复杂性的解释来评估GPT-4的适应性;iv)确定GPT-3.5和GPT-4在提供准确答案方面的可靠性。材料和方法:我们使用了ABPMR认证设计的教科书。我们的分析涵盖了50个选择题,每个都有详细的解释,重点是非创伤性脊髓损伤(NTSCI)。结果:我们的分析显示,与GPT-4的17.3分(SD = 1.9)相比,GPT-4的可读性得分为14.5分(SD = 2.5),具有统计学上的显著差异,这表明GPT-4的解释通常更复杂(p p = 0.006)。GPT-4通过降低前9个最复杂解释的平均可读性得分,保持单词数,成功地展示了适应性。在信度方面,GPT-3.5和GPT-4得分分别为84%和96%,其中GPT-4优于GPT-3.5 (p = 0.046)。结论:我们的研究结果证实了GPT-4在医学教育中的潜力,它为NTSCI提供了高度准确但往往复杂的解释,成功地简化了这些解释,同时又不失准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing readability of explanations and reliability of answers by GPT-3.5 and GPT-4 in non-traumatic spinal cord injury education.

Purpose: Our study aimed to: i) Assess the readability of textbook explanations using established indexes; ii) Compare these with GPT-4's default explanations, ensuring similar word counts for direct comparisons; iii) Evaluate GPT-4's adaptability by simplifying high-complexity explanations; iv) Determine the reliability of GPT-3.5 and GPT-4 in providing accurate answers.

Material and methods: We utilized a textbook designed for ABPMR certification. Our analysis covered 50 multiple-choice questions, each with a detailed explanation, focusing on non-traumatic spinal cord injury (NTSCI).

Results: Our analysis revealed statistically significant differences in readability scores, with the textbook achieving 14.5 (SD = 2.5) compared to GPT-4's 17.3 (SD = 1.9), indicating that GPT-4's explanations are generally more complex (p < 0.001). Using the Flesch Reading Ease Score, 86% of GPT-4's explanations fell into the 'Very difficult' category, significantly higher than the textbook's 58% (p = 0.006). GPT-4 successfully demonstrated adaptability by reducing the mean readability score of the top-nine most complex explanations, maintaining the word count. Regarding reliability, GPT-3.5 and GPT-4 scored 84% and 96% respectively, with GPT-4 outperforming GPT-3.5 (p = 0.046).

Conclusions: Our results confirmed GPT-4's potential in medical education by providing highly accurate yet often complex explanations for NTSCI, which were successfully simplified without losing accuracy.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Medical Teacher
Medical Teacher 医学-卫生保健
CiteScore
7.80
自引率
8.50%
发文量
396
审稿时长
3-6 weeks
期刊介绍: Medical Teacher provides accounts of new teaching methods, guidance on structuring courses and assessing achievement, and serves as a forum for communication between medical teachers and those involved in general education. In particular, the journal recognizes the problems teachers have in keeping up-to-date with the developments in educational methods that lead to more effective teaching and learning at a time when the content of the curriculum—from medical procedures to policy changes in health care provision—is also changing. The journal features reports of innovation and research in medical education, case studies, survey articles, practical guidelines, reviews of current literature and book reviews. All articles are peer reviewed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信