深度学习在医学生简答题评分中的应用:学生认知与试点准确性评估

Focus on Health Professional Education: A Multi-Professional Journal Pub Date : 2023-03-31 DOI:10.11157/fohpe.v24i1.531

Lily Hollis-Sando, Charlotte Pugh, Kyle B. Franke, Toby Zerner, Yiran Tan, G. Carneiro, A. van den Hengel, Ian Symonds, P. Duggan, S. Bacchi

{"title":"深度学习在医学生简答题评分中的应用:学生认知与试点准确性评估","authors":"Lily Hollis-Sando, Charlotte Pugh, Kyle B. Franke, Toby Zerner, Yiran Tan, G. Carneiro, A. van den Hengel, Ian Symonds, P. Duggan, S. Bacchi","doi":"10.11157/fohpe.v24i1.531","DOIUrl":null,"url":null,"abstract":"Introduction: Machine learning has previously been applied to text analysis. There is limited data regarding the acceptability or accuracy of such applications in medical education. This project examined medical student opinion regarding computer-based marking and evaluated the accuracy of deep learning (DL), a subtype of machine learning, in the scoring of medical short answer questions (SAQs).\nMethods: Fourth- and fifth-year medical students undertook an anonymised online examination. Prior to the examination, students completed a survey gauging their opinion on computer-based marking. Questions were marked by humans, and then a DL analysis was conducted using convolutional neural networks. In the DL analysis, following preprocessing, data were split into a training dataset (on which models were developed using 10-fold cross-validation) and a test dataset (on which performance analysis was conducted).\nResults: One hundred and eighty-one students completed the examination (participation rate 59.0%). While students expressed concern regarding the accuracy of computer-based marking, the majority of students agreed that computer marking would be more objective than human marking (67.0%) and reported they would not object to computer-based marking (55.5%). Regarding automated marking of SAQs, for 1-mark questions, there were consistently high classification accuracies (mean accuracy 0.98). For more complex 2-mark and 3-mark SAQs, in which multiclass classification was required, accuracy was lower (mean 0.65 and 0.59, respectively).\nConclusions: Medical students may be supportive of computer-based marking due to its objectivity. DL has the potential to provide accurate marking of written questions, however further research into DL marking of medical examinations is required.","PeriodicalId":382787,"journal":{"name":"Focus on Health Professional Education: A Multi-Professional Journal","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep learning in the marking of medical student short answer question examinations: Student perceptions and pilot accuracy assessment\",\"authors\":\"Lily Hollis-Sando, Charlotte Pugh, Kyle B. Franke, Toby Zerner, Yiran Tan, G. Carneiro, A. van den Hengel, Ian Symonds, P. Duggan, S. Bacchi\",\"doi\":\"10.11157/fohpe.v24i1.531\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Machine learning has previously been applied to text analysis. There is limited data regarding the acceptability or accuracy of such applications in medical education. This project examined medical student opinion regarding computer-based marking and evaluated the accuracy of deep learning (DL), a subtype of machine learning, in the scoring of medical short answer questions (SAQs).\\nMethods: Fourth- and fifth-year medical students undertook an anonymised online examination. Prior to the examination, students completed a survey gauging their opinion on computer-based marking. Questions were marked by humans, and then a DL analysis was conducted using convolutional neural networks. In the DL analysis, following preprocessing, data were split into a training dataset (on which models were developed using 10-fold cross-validation) and a test dataset (on which performance analysis was conducted).\\nResults: One hundred and eighty-one students completed the examination (participation rate 59.0%). While students expressed concern regarding the accuracy of computer-based marking, the majority of students agreed that computer marking would be more objective than human marking (67.0%) and reported they would not object to computer-based marking (55.5%). Regarding automated marking of SAQs, for 1-mark questions, there were consistently high classification accuracies (mean accuracy 0.98). For more complex 2-mark and 3-mark SAQs, in which multiclass classification was required, accuracy was lower (mean 0.65 and 0.59, respectively).\\nConclusions: Medical students may be supportive of computer-based marking due to its objectivity. DL has the potential to provide accurate marking of written questions, however further research into DL marking of medical examinations is required.\",\"PeriodicalId\":382787,\"journal\":{\"name\":\"Focus on Health Professional Education: A Multi-Professional Journal\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Focus on Health Professional Education: A Multi-Professional Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11157/fohpe.v24i1.531\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Focus on Health Professional Education: A Multi-Professional Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11157/fohpe.v24i1.531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

简介:机器学习以前已经应用于文本分析。关于此类应用在医学教育中的可接受性或准确性的数据有限。该项目调查了医学生对基于计算机的评分的看法，并评估了深度学习(DL)(机器学习的一种亚型)在医学简答题(saq)评分中的准确性。方法:对四、五年级医学生进行匿名在线调查。在考试之前，学生们完成了一项调查，以评估他们对计算机阅卷的看法。问题由人类标记，然后使用卷积神经网络进行深度学习分析。在深度学习分析中，经过预处理后，数据被分成训练数据集(使用10倍交叉验证开发模型)和测试数据集(进行性能分析)。结果:181名学生完成考试，参与率为59.0%。虽然学生对电脑阅卷的准确性表示担忧，但大多数学生(67.0%)同意电脑阅卷比人工阅卷更客观，并表示他们不反对电脑阅卷(55.5%)。在saq自动评分方面，对于1分题，分类准确率一直很高(平均准确率0.98)。对于更复杂的2分和3分问题，需要进行多类分类，准确率较低(平均分别为0.65和0.59)。结论:医学生可能支持基于计算机的评分，因为它的客观性。DL有潜力为书面问题提供准确的标记，但需要对医学检查的DL标记进行进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep learning in the marking of medical student short answer question examinations: Student perceptions and pilot accuracy assessment

Introduction: Machine learning has previously been applied to text analysis. There is limited data regarding the acceptability or accuracy of such applications in medical education. This project examined medical student opinion regarding computer-based marking and evaluated the accuracy of deep learning (DL), a subtype of machine learning, in the scoring of medical short answer questions (SAQs). Methods: Fourth- and fifth-year medical students undertook an anonymised online examination. Prior to the examination, students completed a survey gauging their opinion on computer-based marking. Questions were marked by humans, and then a DL analysis was conducted using convolutional neural networks. In the DL analysis, following preprocessing, data were split into a training dataset (on which models were developed using 10-fold cross-validation) and a test dataset (on which performance analysis was conducted). Results: One hundred and eighty-one students completed the examination (participation rate 59.0%). While students expressed concern regarding the accuracy of computer-based marking, the majority of students agreed that computer marking would be more objective than human marking (67.0%) and reported they would not object to computer-based marking (55.5%). Regarding automated marking of SAQs, for 1-mark questions, there were consistently high classification accuracies (mean accuracy 0.98). For more complex 2-mark and 3-mark SAQs, in which multiclass classification was required, accuracy was lower (mean 0.65 and 0.59, respectively). Conclusions: Medical students may be supportive of computer-based marking due to its objectivity. DL has the potential to provide accurate marking of written questions, however further research into DL marking of medical examinations is required.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Focus on Health Professional Education: A Multi-Professional Journal

自引率

0.00%

发文量