使用时间人工智能模型在模拟环境中对技能进行微观评估。

IF 3.3 2区教育学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES

Medical Teacher Pub Date : 2025-09-07 DOI:10.1080/0142159X.2025.2555353

Iben Bang Andersen, Morten Bo Søndergaard Svendsen, Anne Line Risgaard, Christian Sander Danstrup, Tobias Todsen, Martin G Tolsgaard, Mikkel Lønborg Friis

{"title":"使用时间人工智能模型在模拟环境中对技能进行微观评估。","authors":"Iben Bang Andersen, Morten Bo Søndergaard Svendsen, Anne Line Risgaard, Christian Sander Danstrup, Tobias Todsen, Martin G Tolsgaard, Mikkel Lønborg Friis","doi":"10.1080/0142159X.2025.2555353","DOIUrl":null,"url":null,"abstract":"Background: Assessing skills in simulated settings is resource-intensive and lacks validated metrics. Advances in AI offer the potential for automated competence assessment, addressing these limitations. This study aimed to develop and validate a machine learning AI model for automated evaluation during simulation-based thyroid ultrasound (US) training.Methods: Videos from eight experts and 21 novices performing thyroid US on a simulator were analyzed. Frames were processed into sequences of 1, 10, and 50 seconds. A convolutional neural network with a pre-trained ResNet-50 base and a long short-term memory layer analyzed these sequences. The model was trained to distinguish competence levels (competent=1, not competent=0) using fourfold cross-validation, with performance metrics including precision, recall, F1 score, and accuracy. Bayesian updating and adaptive thresholding assessed performance over time.Results: The AI model effectively differentiated expert and novice US performance. The 50-second sequences achieved the highest accuracy (70%) and F1 score (0.76). Experts showed significantly longer durations above the threshold (15.71s) compared to novices (9.31s, p= .030).Conclusions: A long short-term memory-based AI model provides near real-time, automated assessments of competence in US training. Utilizing temporal video data enables detailed micro-assessments of complex procedures, which may enhance interpretability and be applied across various procedural domains.","PeriodicalId":18643,"journal":{"name":"Medical Teacher","volume":" ","pages":"1-10"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enabling micro-assessments of skills in the simulated setting using temporal artificial intelligence-models.\",\"authors\":\"Iben Bang Andersen, Morten Bo Søndergaard Svendsen, Anne Line Risgaard, Christian Sander Danstrup, Tobias Todsen, Martin G Tolsgaard, Mikkel Lønborg Friis\",\"doi\":\"10.1080/0142159X.2025.2555353\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Assessing skills in simulated settings is resource-intensive and lacks validated metrics. Advances in AI offer the potential for automated competence assessment, addressing these limitations. This study aimed to develop and validate a machine learning AI model for automated evaluation during simulation-based thyroid ultrasound (US) training.Methods: Videos from eight experts and 21 novices performing thyroid US on a simulator were analyzed. Frames were processed into sequences of 1, 10, and 50 seconds. A convolutional neural network with a pre-trained ResNet-50 base and a long short-term memory layer analyzed these sequences. The model was trained to distinguish competence levels (competent=1, not competent=0) using fourfold cross-validation, with performance metrics including precision, recall, F1 score, and accuracy. Bayesian updating and adaptive thresholding assessed performance over time.Results: The AI model effectively differentiated expert and novice US performance. The 50-second sequences achieved the highest accuracy (70%) and F1 score (0.76). Experts showed significantly longer durations above the threshold (15.71s) compared to novices (9.31s, p= .030).Conclusions: A long short-term memory-based AI model provides near real-time, automated assessments of competence in US training. Utilizing temporal video data enables detailed micro-assessments of complex procedures, which may enhance interpretability and be applied across various procedural domains.\",\"PeriodicalId\":18643,\"journal\":{\"name\":\"Medical Teacher\",\"volume\":\" \",\"pages\":\"1-10\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical Teacher\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1080/0142159X.2025.2555353\",\"RegionNum\":2,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Teacher","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/0142159X.2025.2555353","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

摘要

背景：在模拟环境中评估技能是资源密集型的，缺乏有效的指标。人工智能的进步为自动化能力评估提供了潜力，解决了这些限制。本研究旨在开发和验证一种机器学习人工智能模型，用于基于模拟的甲状腺超声（US）训练期间的自动评估。方法：对8名专家和21名新手在模拟器上进行甲状腺超声成像的视频进行分析。帧被处理成1秒、10秒和50秒的序列。一个具有预训练的ResNet-50基和长短期记忆层的卷积神经网络分析了这些序列。使用四重交叉验证对模型进行训练，以区分能力水平（胜任=1，不胜任=0），性能指标包括精度，召回率，F1分数和准确性。随着时间的推移，贝叶斯更新和自适应阈值评估性能。结果：人工智能模型有效区分了专家和新手的表现。50秒序列的准确率最高（70%），F1得分最高（0.76）。专家在阈值以上的持续时间（15.71秒）明显长于新手（9.31秒，p= 0.030）。结论：基于长短期记忆的人工智能模型为美国培训提供了近乎实时的自动能力评估。利用时间视频数据可以对复杂程序进行详细的微观评估，这可以提高可解释性并适用于各种程序领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enabling micro-assessments of skills in the simulated setting using temporal artificial intelligence-models.

Background: Assessing skills in simulated settings is resource-intensive and lacks validated metrics. Advances in AI offer the potential for automated competence assessment, addressing these limitations. This study aimed to develop and validate a machine learning AI model for automated evaluation during simulation-based thyroid ultrasound (US) training.

Methods: Videos from eight experts and 21 novices performing thyroid US on a simulator were analyzed. Frames were processed into sequences of 1, 10, and 50 seconds. A convolutional neural network with a pre-trained ResNet-50 base and a long short-term memory layer analyzed these sequences. The model was trained to distinguish competence levels (competent=1, not competent=0) using fourfold cross-validation, with performance metrics including precision, recall, F1 score, and accuracy. Bayesian updating and adaptive thresholding assessed performance over time.

Results: The AI model effectively differentiated expert and novice US performance. The 50-second sequences achieved the highest accuracy (70%) and F1 score (0.76). Experts showed significantly longer durations above the threshold (15.71s) compared to novices (9.31s, p= .030).

Conclusions: A long short-term memory-based AI model provides near real-time, automated assessments of competence in US training. Utilizing temporal video data enables detailed micro-assessments of complex procedures, which may enhance interpretability and be applied across various procedural domains.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medical Teacher 医学-卫生保健

CiteScore

7.80

自引率

8.50%

发文量

396

审稿时长

3-6 weeks

期刊介绍： Medical Teacher provides accounts of new teaching methods, guidance on structuring courses and assessing achievement, and serves as a forum for communication between medical teachers and those involved in general education. In particular, the journal recognizes the problems teachers have in keeping up-to-date with the developments in educational methods that lead to more effective teaching and learning at a time when the content of the curriculum—from medical procedures to policy changes in health care provision—is also changing. The journal features reports of innovation and research in medical education, case studies, survey articles, practical guidelines, reviews of current literature and book reviews. All articles are peer reviewed.