用GPT-3.5和GPT-4评估非创伤性脊髓损伤教育中解释的可读性和答案的可靠性。

IF 3.3 2区教育学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES

Medical Teacher Pub Date : 2025-08-01 Epub Date: 2025-01-20 DOI:10.1080/0142159X.2024.2430365

Alejandro García-Rudolph, David Sanchez-Pinsach, Mark Andrew Wright, Eloy Opisso, Joan Vidal

{"title":"用GPT-3.5和GPT-4评估非创伤性脊髓损伤教育中解释的可读性和答案的可靠性。","authors":"Alejandro García-Rudolph, David Sanchez-Pinsach, Mark Andrew Wright, Eloy Opisso, Joan Vidal","doi":"10.1080/0142159X.2024.2430365","DOIUrl":null,"url":null,"abstract":"Purpose: Our study aimed to: i) Assess the readability of textbook explanations using established indexes; ii) Compare these with GPT-4's default explanations, ensuring similar word counts for direct comparisons; iii) Evaluate GPT-4's adaptability by simplifying high-complexity explanations; iv) Determine the reliability of GPT-3.5 and GPT-4 in providing accurate answers.Material and methods: We utilized a textbook designed for ABPMR certification. Our analysis covered 50 multiple-choice questions, each with a detailed explanation, focusing on non-traumatic spinal cord injury (NTSCI).Results: Our analysis revealed statistically significant differences in readability scores, with the textbook achieving 14.5 (SD = 2.5) compared to GPT-4's 17.3 (SD = 1.9), indicating that GPT-4's explanations are generally more complex (p < 0.001). Using the Flesch Reading Ease Score, 86% of GPT-4's explanations fell into the 'Very difficult' category, significantly higher than the textbook's 58% (p = 0.006). GPT-4 successfully demonstrated adaptability by reducing the mean readability score of the top-nine most complex explanations, maintaining the word count. Regarding reliability, GPT-3.5 and GPT-4 scored 84% and 96% respectively, with GPT-4 outperforming GPT-3.5 (p = 0.046).Conclusions: Our results confirmed GPT-4's potential in medical education by providing highly accurate yet often complex explanations for NTSCI, which were successfully simplified without losing accuracy.","PeriodicalId":18643,"journal":{"name":"Medical Teacher","volume":" ","pages":"1336-1343"},"PeriodicalIF":3.3000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing readability of explanations and reliability of answers by GPT-3.5 and GPT-4 in non-traumatic spinal cord injury education.\",\"authors\":\"Alejandro García-Rudolph, David Sanchez-Pinsach, Mark Andrew Wright, Eloy Opisso, Joan Vidal\",\"doi\":\"10.1080/0142159X.2024.2430365\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: Our study aimed to: i) Assess the readability of textbook explanations using established indexes; ii) Compare these with GPT-4's default explanations, ensuring similar word counts for direct comparisons; iii) Evaluate GPT-4's adaptability by simplifying high-complexity explanations; iv) Determine the reliability of GPT-3.5 and GPT-4 in providing accurate answers.Material and methods: We utilized a textbook designed for ABPMR certification. Our analysis covered 50 multiple-choice questions, each with a detailed explanation, focusing on non-traumatic spinal cord injury (NTSCI).Results: Our analysis revealed statistically significant differences in readability scores, with the textbook achieving 14.5 (SD = 2.5) compared to GPT-4's 17.3 (SD = 1.9), indicating that GPT-4's explanations are generally more complex (p < 0.001). Using the Flesch Reading Ease Score, 86% of GPT-4's explanations fell into the 'Very difficult' category, significantly higher than the textbook's 58% (p = 0.006). GPT-4 successfully demonstrated adaptability by reducing the mean readability score of the top-nine most complex explanations, maintaining the word count. Regarding reliability, GPT-3.5 and GPT-4 scored 84% and 96% respectively, with GPT-4 outperforming GPT-3.5 (p = 0.046).Conclusions: Our results confirmed GPT-4's potential in medical education by providing highly accurate yet often complex explanations for NTSCI, which were successfully simplified without losing accuracy.\",\"PeriodicalId\":18643,\"journal\":{\"name\":\"Medical Teacher\",\"volume\":\" \",\"pages\":\"1336-1343\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical Teacher\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1080/0142159X.2024.2430365\",\"RegionNum\":2,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/20 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Teacher","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/0142159X.2024.2430365","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

摘要

目的：本研究旨在：i)利用已建立的指标评估教科书解释的可读性；ii)将这些与GPT-4的默认解释进行比较，确保直接比较的字数相似；iii)通过简化高复杂性的解释来评估GPT-4的适应性；iv)确定GPT-3.5和GPT-4在提供准确答案方面的可靠性。材料和方法：我们使用了ABPMR认证设计的教科书。我们的分析涵盖了50个选择题，每个都有详细的解释，重点是非创伤性脊髓损伤（NTSCI）。结果：我们的分析显示，与GPT-4的17.3分（SD = 1.9）相比，GPT-4的可读性得分为14.5分（SD = 2.5），具有统计学上的显著差异，这表明GPT-4的解释通常更复杂（p p = 0.006）。GPT-4通过降低前9个最复杂解释的平均可读性得分，保持单词数，成功地展示了适应性。在信度方面，GPT-3.5和GPT-4得分分别为84%和96%，其中GPT-4优于GPT-3.5 （p = 0.046）。结论：我们的研究结果证实了GPT-4在医学教育中的潜力，它为NTSCI提供了高度准确但往往复杂的解释，成功地简化了这些解释，同时又不失准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Assessing readability of explanations and reliability of answers by GPT-3.5 and GPT-4 in non-traumatic spinal cord injury education.

Purpose: Our study aimed to: i) Assess the readability of textbook explanations using established indexes; ii) Compare these with GPT-4's default explanations, ensuring similar word counts for direct comparisons; iii) Evaluate GPT-4's adaptability by simplifying high-complexity explanations; iv) Determine the reliability of GPT-3.5 and GPT-4 in providing accurate answers.

Material and methods: We utilized a textbook designed for ABPMR certification. Our analysis covered 50 multiple-choice questions, each with a detailed explanation, focusing on non-traumatic spinal cord injury (NTSCI).

Results: Our analysis revealed statistically significant differences in readability scores, with the textbook achieving 14.5 (SD = 2.5) compared to GPT-4's 17.3 (SD = 1.9), indicating that GPT-4's explanations are generally more complex (p < 0.001). Using the Flesch Reading Ease Score, 86% of GPT-4's explanations fell into the 'Very difficult' category, significantly higher than the textbook's 58% (p = 0.006). GPT-4 successfully demonstrated adaptability by reducing the mean readability score of the top-nine most complex explanations, maintaining the word count. Regarding reliability, GPT-3.5 and GPT-4 scored 84% and 96% respectively, with GPT-4 outperforming GPT-3.5 (p = 0.046).

Conclusions: Our results confirmed GPT-4's potential in medical education by providing highly accurate yet often complex explanations for NTSCI, which were successfully simplified without losing accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medical Teacher 医学-卫生保健

CiteScore

7.80

自引率

8.50%

发文量

396

审稿时长

3-6 weeks

期刊介绍： Medical Teacher provides accounts of new teaching methods, guidance on structuring courses and assessing achievement, and serves as a forum for communication between medical teachers and those involved in general education. In particular, the journal recognizes the problems teachers have in keeping up-to-date with the developments in educational methods that lead to more effective teaching and learning at a time when the content of the curriculum—from medical procedures to policy changes in health care provision—is also changing. The journal features reports of innovation and research in medical education, case studies, survey articles, practical guidelines, reviews of current literature and book reviews. All articles are peer reviewed.