ChatGPT Generated Otorhinolaryngology Multiple-Choice Questions: Quality, Psychometric Properties, and Suitability for Assessments.

IF 1.8 Q2 OTORHINOLARYNGOLOGY

OTO Open Pub Date : 2024-09-26 eCollection Date: 2024-07-01 DOI:10.1002/oto2.70018

Cecilia Lotto, Sean C Sheppard, Wilma Anschuetz, Daniel Stricker, Giulia Molinari, Sören Huwendiek, Lukas Anschuetz

{"title":"ChatGPT Generated Otorhinolaryngology Multiple-Choice Questions: Quality, Psychometric Properties, and Suitability for Assessments.","authors":"Cecilia Lotto, Sean C Sheppard, Wilma Anschuetz, Daniel Stricker, Giulia Molinari, Sören Huwendiek, Lukas Anschuetz","doi":"10.1002/oto2.70018","DOIUrl":null,"url":null,"abstract":"Objective: To explore Chat Generative Pretrained Transformer's (ChatGPT's) capability to create multiple-choice questions about otorhinolaryngology (ORL).Study design: Experimental question generation and exam simulation.Setting: Tertiary academic center.Methods: ChatGPT 3.5 was prompted: \"Can you please create a challenging 20-question multiple-choice questionnaire about clinical cases in otolaryngology, offering five answer options?.\" The generated questionnaire was sent to medical students, residents, and consultants. Questions were investigated regarding quality criteria. Answers were anonymized and the resulting data was analyzed in terms of difficulty and internal consistency.Results: ChatGPT 3.5 generated 20 exam questions of which 1 question was considered off-topic, 3 questions had a false answer, and 3 questions had multiple correct answers. Subspecialty theme repartition was as follows: 5 questions were on otology, 5 about rhinology, and 10 questions addressed head and neck. The qualities of focus and relevance were good while the vignette and distractor qualities were low. The level of difficulty was suitable for undergraduate medical students (n = 24), but too easy for residents (n = 30) or consultants (n = 10) in ORL. Cronbach's α was highest (.69) with 15 selected questions using students' results.Conclusion: ChatGPT 3.5 is able to generate grammatically correct simple ORL multiple choice questions for a medical student level. However, the overall quality of the questions was average, needing thorough review and revision by a medical expert to ensure suitability in future exams.","PeriodicalId":19697,"journal":{"name":"OTO Open","volume":"8 3","pages":"e70018"},"PeriodicalIF":1.8000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11424880/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"OTO Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/oto2.70018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To explore Chat Generative Pretrained Transformer's (ChatGPT's) capability to create multiple-choice questions about otorhinolaryngology (ORL).

Study design: Experimental question generation and exam simulation.

Setting: Tertiary academic center.

Methods: ChatGPT 3.5 was prompted: "Can you please create a challenging 20-question multiple-choice questionnaire about clinical cases in otolaryngology, offering five answer options?." The generated questionnaire was sent to medical students, residents, and consultants. Questions were investigated regarding quality criteria. Answers were anonymized and the resulting data was analyzed in terms of difficulty and internal consistency.

Results: ChatGPT 3.5 generated 20 exam questions of which 1 question was considered off-topic, 3 questions had a false answer, and 3 questions had multiple correct answers. Subspecialty theme repartition was as follows: 5 questions were on otology, 5 about rhinology, and 10 questions addressed head and neck. The qualities of focus and relevance were good while the vignette and distractor qualities were low. The level of difficulty was suitable for undergraduate medical students (n = 24), but too easy for residents (n = 30) or consultants (n = 10) in ORL. Cronbach's α was highest (.69) with 15 selected questions using students' results.

Conclusion: ChatGPT 3.5 is able to generate grammatically correct simple ORL multiple choice questions for a medical student level. However, the overall quality of the questions was average, needing thorough review and revision by a medical expert to ensure suitability in future exams.

查看原文本刊更多论文

ChatGPT 生成的耳鼻喉科选择题：质量、心理测量特性和评估适用性。

研究目的探索聊天生成预训练转换器（ChatGPT）创建耳鼻喉科（ORL）选择题的能力：研究设计：实验性问题生成和考试模拟：研究设计：实验性问题生成和考试模拟：提示：ChatGPT 3.5：方法： ChatGPT 3.5提示："能否请你制作一份具有挑战性的20道关于耳鼻喉科临床病例的选择题，提供五个答案选项？"。"生成的问卷被发送给医学生、住院医师和顾问。对问题的质量标准进行了调查。对答案进行了匿名处理，并从难度和内部一致性的角度对所得数据进行了分析：结果：ChatGPT 3.5 生成了 20 道考试题，其中 1 道题被视为偏题，3 道题有错误答案，3 道题有多个正确答案。亚专业主题的重新分配如下：耳科 5 道题，鼻科 5 道题，头颈部 10 道题。试题的重点和相关性较好，而小题和分散注意力题的质量较低。该问卷的难度适合医学本科生（24 人），但对于耳鼻喉科住院医师（30 人）或顾问（10 人）来说则过于简单。在使用学生成绩选出的 15 个问题中，Cronbach's α 最高（.69）：结论：ChatGPT 3.5 能够生成语法正确的简单 ORL 选择题，适合医学生水平。结论：ChatGPT 3.5 能够生成语法正确、适合医学生水平的简单 ORL 选择题，但试题的整体质量一般，需要医学专家进行彻底审查和修订，以确保适合未来的考试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊