评估本科医学教育中的自定义聊天机器人：性能、效用和感知的随机交叉混合方法评估。

IF 2.5 3区心理学 Q2 PSYCHOLOGY, MULTIDISCIPLINARY

Behavioral Sciences Pub Date : 2025-09-19 DOI:10.3390/bs15091284

Isaac Sung Him Ng, Anthony Siu, Claire Soo Jeong Han, Oscar Sing Him Ho, Johnathan Sun, Anatoliy Markiv, Stuart Knight, Mandeep Gill Sagoo

{"title":"评估本科医学教育中的自定义聊天机器人：性能、效用和感知的随机交叉混合方法评估。","authors":"Isaac Sung Him Ng, Anthony Siu, Claire Soo Jeong Han, Oscar Sing Him Ho, Johnathan Sun, Anatoliy Markiv, Stuart Knight, Mandeep Gill Sagoo","doi":"10.3390/bs15091284","DOIUrl":null,"url":null,"abstract":"Background: While LLM chatbots are gaining popularity in medical education, their pedagogical impact remains under-evaluated. This study examined the effects of a domain-specific chatbot on performance, perception, and cognitive engagement among medical students.Methods: Twenty first-year medical students completed two academic tasks using either a custom-built educational chatbot (Lenny AI by qVault) or conventional study methods in a randomised, crossover design. Performance was assessed through Single Best Answer (SBA) questions, while post-task surveys (Likert scales) and focus groups were employed to explore user perceptions. Statistical tests compared performance and perception metrics; qualitative data underwent thematic analysis with independent coding (κ = 0.403-0.633).Results: Participants rated the chatbot significantly higher than conventional resources for ease of use, satisfaction, engagement, perceived quality, and clarity (p < 0.05). Lenny AI use was positively correlated with perceived efficiency and confidence, but showed no significant performance gains. Thematic analysis revealed accelerated factual retrieval but limited support for higher-level cognitive reasoning. Students expressed high functional trust but raised concerns about transparency.Conclusions: The custom chatbot improved usability; effects on deeper learning were not detected within the tasks studied. Future designs should support adaptive scaffolding, transparent sourcing, and critical engagement to improve educational value.","PeriodicalId":8742,"journal":{"name":"Behavioral Sciences","volume":"15 9","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12467370/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating a Custom Chatbot in Undergraduate Medical Education: Randomised Crossover Mixed-Methods Evaluation of Performance, Utility, and Perceptions.\",\"authors\":\"Isaac Sung Him Ng, Anthony Siu, Claire Soo Jeong Han, Oscar Sing Him Ho, Johnathan Sun, Anatoliy Markiv, Stuart Knight, Mandeep Gill Sagoo\",\"doi\":\"10.3390/bs15091284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: While LLM chatbots are gaining popularity in medical education, their pedagogical impact remains under-evaluated. This study examined the effects of a domain-specific chatbot on performance, perception, and cognitive engagement among medical students.Methods: Twenty first-year medical students completed two academic tasks using either a custom-built educational chatbot (Lenny AI by qVault) or conventional study methods in a randomised, crossover design. Performance was assessed through Single Best Answer (SBA) questions, while post-task surveys (Likert scales) and focus groups were employed to explore user perceptions. Statistical tests compared performance and perception metrics; qualitative data underwent thematic analysis with independent coding (κ = 0.403-0.633).Results: Participants rated the chatbot significantly higher than conventional resources for ease of use, satisfaction, engagement, perceived quality, and clarity (p < 0.05). Lenny AI use was positively correlated with perceived efficiency and confidence, but showed no significant performance gains. Thematic analysis revealed accelerated factual retrieval but limited support for higher-level cognitive reasoning. Students expressed high functional trust but raised concerns about transparency.Conclusions: The custom chatbot improved usability; effects on deeper learning were not detected within the tasks studied. Future designs should support adaptive scaffolding, transparent sourcing, and critical engagement to improve educational value.\",\"PeriodicalId\":8742,\"journal\":{\"name\":\"Behavioral Sciences\",\"volume\":\"15 9\",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12467370/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Behavioral Sciences\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.3390/bs15091284\",\"RegionNum\":3,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PSYCHOLOGY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavioral Sciences","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3390/bs15091284","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

背景：虽然法学硕士聊天机器人在医学教育中越来越受欢迎，但它们的教学影响仍然被低估。本研究考察了特定领域的聊天机器人对医学生的表现、感知和认知参与的影响。方法：在随机交叉设计中，20名一年级医学生使用定制的教育聊天机器人（qVault的Lenny AI）或传统的研究方法完成了两项学术任务。通过单一最佳答案（SBA）问题评估绩效，而任务后调查（李克特量表）和焦点小组采用探索用户的看法。统计测试比较了绩效和感知指标；定性数据采用独立编码进行专题分析（κ = 0.403-0.633）。结果：参与者对聊天机器人的评价在易用性、满意度、参与度、感知质量和清晰度方面明显高于传统资源（p < 0.05）。大量使用人工智能与感知效率和信心正相关，但没有显示出显著的绩效提升。主题分析显示事实检索加速，但对高级认知推理的支持有限。学生们表达了高度的功能性信任，但也提出了对透明度的担忧。结论：自定义聊天机器人提高了可用性；在研究的任务中没有发现对深度学习的影响。未来的设计应该支持适应性脚手架、透明的采购和关键的参与，以提高教育价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Evaluating a Custom Chatbot in Undergraduate Medical Education: Randomised Crossover Mixed-Methods Evaluation of Performance, Utility, and Perceptions.

查看原文本刊更多论文

Evaluating a Custom Chatbot in Undergraduate Medical Education: Randomised Crossover Mixed-Methods Evaluation of Performance, Utility, and Perceptions.

Background: While LLM chatbots are gaining popularity in medical education, their pedagogical impact remains under-evaluated. This study examined the effects of a domain-specific chatbot on performance, perception, and cognitive engagement among medical students.

Methods: Twenty first-year medical students completed two academic tasks using either a custom-built educational chatbot (Lenny AI by qVault) or conventional study methods in a randomised, crossover design. Performance was assessed through Single Best Answer (SBA) questions, while post-task surveys (Likert scales) and focus groups were employed to explore user perceptions. Statistical tests compared performance and perception metrics; qualitative data underwent thematic analysis with independent coding (κ = 0.403-0.633).

Results: Participants rated the chatbot significantly higher than conventional resources for ease of use, satisfaction, engagement, perceived quality, and clarity (p < 0.05). Lenny AI use was positively correlated with perceived efficiency and confidence, but showed no significant performance gains. Thematic analysis revealed accelerated factual retrieval but limited support for higher-level cognitive reasoning. Students expressed high functional trust but raised concerns about transparency.

Conclusions: The custom chatbot improved usability; effects on deeper learning were not detected within the tasks studied. Future designs should support adaptive scaffolding, transparent sourcing, and critical engagement to improve educational value.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊