{"title":"Real-world implementation of an AI learning tool-MetaGP-Edu in medical education: A multi-center cohort study","authors":"Yili Sun , Fei Liu","doi":"10.1016/j.compedu.2025.105388","DOIUrl":null,"url":null,"abstract":"<div><div>This study aimed to evaluate the real-world educational impact associated with the implementation of MetaGP-Edu, a bespoke generative artificial intelligence tool fine-tuned for medical learning, within the undergraduate Internal Medicine curriculum. We conducted a large-scale, multi-center retrospective cohort study utilizing historical academic records from six major medical schools in China (N = 1632). We evaluated student performance across multiple dimensions, including final scores that assessed both foundational knowledge recall and clinical reasoning—defined as the cognitive process of analyzing patient data to formulate a diagnosis and management plan. Formative in-tool skill metrics were also included. These outcomes were then compared between pre- and post-implementation cohorts (Pre-MetaGP-Edu vs. Post-MetaGP-Edu) using adjusted multivariable regression models. Analysis also included usage patterns and embedded competency test scores for the post-implementation cohort. Results indicated that students with access to MetaGP-Edu achieved significantly higher overall Internal Medicine scores (Adjusted Mean Difference: +8.2 points, P < 0.001). This improvement was primarily associated with significantly higher scores in clinical reasoning assessments (P < 0.001), with no significant difference observed in knowledge recall scores (P > 0.05). The positive association also varied across clinical topics, being more pronounced in complex system modules. Furthermore, within the post-implementation cohort, significant skill development was observed over time, and higher total usage time significantly predicted greater skill gains (Adjusted OR = 2.42, P < 0.001). In conclusion, supplementary integration of a domain-specific AI educational tool like MetaGP-Edu shows a positive association with enhanced medical student performance, particularly for higher-order reasoning skills, although student engagement appears critical to realizing these benefits.</div></div>","PeriodicalId":10568,"journal":{"name":"Computers & Education","volume":"237 ","pages":"Article 105388"},"PeriodicalIF":10.5000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Education","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360131525001563","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
This study aimed to evaluate the real-world educational impact associated with the implementation of MetaGP-Edu, a bespoke generative artificial intelligence tool fine-tuned for medical learning, within the undergraduate Internal Medicine curriculum. We conducted a large-scale, multi-center retrospective cohort study utilizing historical academic records from six major medical schools in China (N = 1632). We evaluated student performance across multiple dimensions, including final scores that assessed both foundational knowledge recall and clinical reasoning—defined as the cognitive process of analyzing patient data to formulate a diagnosis and management plan. Formative in-tool skill metrics were also included. These outcomes were then compared between pre- and post-implementation cohorts (Pre-MetaGP-Edu vs. Post-MetaGP-Edu) using adjusted multivariable regression models. Analysis also included usage patterns and embedded competency test scores for the post-implementation cohort. Results indicated that students with access to MetaGP-Edu achieved significantly higher overall Internal Medicine scores (Adjusted Mean Difference: +8.2 points, P < 0.001). This improvement was primarily associated with significantly higher scores in clinical reasoning assessments (P < 0.001), with no significant difference observed in knowledge recall scores (P > 0.05). The positive association also varied across clinical topics, being more pronounced in complex system modules. Furthermore, within the post-implementation cohort, significant skill development was observed over time, and higher total usage time significantly predicted greater skill gains (Adjusted OR = 2.42, P < 0.001). In conclusion, supplementary integration of a domain-specific AI educational tool like MetaGP-Edu shows a positive association with enhanced medical student performance, particularly for higher-order reasoning skills, although student engagement appears critical to realizing these benefits.
本研究旨在评估在本科内科课程中实施MetaGP-Edu相关的现实教育影响,MetaGP-Edu是一种定制的生成式人工智能工具,对医学学习进行了微调。我们利用中国六所主要医学院的历史学术记录进行了一项大规模、多中心回顾性队列研究(N=1632)。我们从多个维度评估学生的表现,包括评估基础知识回忆和临床推理的最终分数,临床推理被定义为分析患者数据以制定诊断和管理计划的认知过程。形成性的工具内技能度量也包括在内。然后使用调整后的多变量回归模型比较实施前和实施后队列(pre- metagp - edu vs. Post-MetaGP-Edu)的结果。分析还包括使用模式和嵌入式能力测试分数为后实施队列。结果表明,使用MetaGP-Edu的学生取得了显著更高的内科综合成绩(调整平均差:+8.2分,P <;0.001)。这种改善主要与临床推理评估得分显著提高有关(P <;0.001),知识回忆得分无显著差异(P >;0.05)。这种正相关性在不同的临床主题中也有所不同,在复杂的系统模块中更为明显。此外,在实施后的队列中,随着时间的推移,显著的技能发展被观察到,更长的总使用时间显著预示着更大的技能收获(调整后OR=2.42, P <;0.001)。综上所述,MetaGP-Edu等特定领域的人工智能教育工具的补充集成与提高医学生的表现呈正相关,特别是在高阶推理技能方面,尽管学生的参与似乎是实现这些好处的关键。
期刊介绍:
Computers & Education seeks to advance understanding of how digital technology can improve education by publishing high-quality research that expands both theory and practice. The journal welcomes research papers exploring the pedagogical applications of digital technology, with a focus broad enough to appeal to the wider education community.