Keeping humans in the loop efficiently by generating question templates instead of questions using AI: Validity evidence on Hybrid AIG.

IF 3.3 2区教育学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES

Medical Teacher Pub Date : 2025-04-01 Epub Date: 2024-11-27 DOI:10.1080/0142159X.2024.2430360

Yavuz Selim Kıyak, Emre Emekli, Özlem Coşkun, Işıl İrem Budakoğlu

{"title":"Keeping humans in the loop efficiently by generating question templates instead of questions using AI: Validity evidence on Hybrid AIG.","authors":"Yavuz Selim Kıyak, Emre Emekli, Özlem Coşkun, Işıl İrem Budakoğlu","doi":"10.1080/0142159X.2024.2430360","DOIUrl":null,"url":null,"abstract":"Background: Manually creating multiple-choice questions (MCQ) is inefficient. Automatic item generation (AIG) offers a scalable solution, with two main approaches: template-based and non-template-based (AI-driven). Template-based AIG ensures accuracy but requires significant expert input to develop templates. In contrast, AI-driven AIG can generate questions quickly but with inaccuracies. The Hybrid AIG combines the strengths of both methods. However, neither have MCQs been generated using the Hybrid AIG approach nor has any validity evidence been provided.Methods: We generated MCQs using the Hybrid AIG approach and investigated the validity evidence of these questions by determining whether experts could identify the correct answers. We used a custom ChatGPT to develop an item template, which were then fed into Gazitor, a template-based AIG (non-AI) software. A panel of medical doctors identified the answers.Results: Of 105 decisions, 101 (96.2%) matched the software's correct answer. In all MCQs (100%), the experts reached a consensus on the correct answer. The evidence corresponds to the 'Relations to Other Variables' in Messick's validity framework.Conclusions: The Hybrid AIG approach can enhance the efficiency of MCQ generation while maintaining accuracy. It mitigates concerns about hallucinations while benefiting from AI.","PeriodicalId":18643,"journal":{"name":"Medical Teacher","volume":" ","pages":"744-747"},"PeriodicalIF":3.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Teacher","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/0142159X.2024.2430360","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Manually creating multiple-choice questions (MCQ) is inefficient. Automatic item generation (AIG) offers a scalable solution, with two main approaches: template-based and non-template-based (AI-driven). Template-based AIG ensures accuracy but requires significant expert input to develop templates. In contrast, AI-driven AIG can generate questions quickly but with inaccuracies. The Hybrid AIG combines the strengths of both methods. However, neither have MCQs been generated using the Hybrid AIG approach nor has any validity evidence been provided.

Methods: We generated MCQs using the Hybrid AIG approach and investigated the validity evidence of these questions by determining whether experts could identify the correct answers. We used a custom ChatGPT to develop an item template, which were then fed into Gazitor, a template-based AIG (non-AI) software. A panel of medical doctors identified the answers.

Results: Of 105 decisions, 101 (96.2%) matched the software's correct answer. In all MCQs (100%), the experts reached a consensus on the correct answer. The evidence corresponds to the 'Relations to Other Variables' in Messick's validity framework.

Conclusions: The Hybrid AIG approach can enhance the efficiency of MCQ generation while maintaining accuracy. It mitigates concerns about hallucinations while benefiting from AI.

查看原文本刊更多论文

通过使用人工智能生成问题模板而不是问题，让人类有效地参与环路：混合 AIG 的有效性证据。

背景介绍手动创建多选题（MCQ）的效率很低。自动项目生成（AIG）提供了一种可扩展的解决方案，主要有两种方法：基于模板和非模板（人工智能驱动）。基于模板的 AIG 可确保准确性，但需要大量专家投入来开发模板。相比之下，人工智能驱动的 AIG 可以快速生成问题，但存在误差。混合式 AIG 结合了这两种方法的优点。然而，混合型 AIG 方法既没有生成 MCQ，也没有提供任何有效性证据：方法：我们使用混合 AIG 方法生成 MCQ，并通过确定专家是否能识别正确答案来调查这些问题的有效性证据。我们使用定制的 ChatGPT 开发了一个项目模板，然后将其输入基于模板的 AIG（非人工智能）软件 Gazitor。一个由医生组成的小组对答案进行了鉴定：在 105 个决定中，101 个（96.2%）符合软件的正确答案。在所有 MCQ 中（100%），专家们就正确答案达成了共识。这些证据符合梅西克有效性框架中的 "与其他变量的关系"：混合 AIG 方法可以提高 MCQ 生成的效率，同时保持准确性。结论：混合 AIG 方法既能提高 MCQ 生成的效率，又能保持准确性。它既能减轻对幻觉的担忧，又能从人工智能中获益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical Teacher 医学-卫生保健

CiteScore

7.80

自引率

8.50%

发文量

396

审稿时长

3-6 weeks

期刊介绍： Medical Teacher provides accounts of new teaching methods, guidance on structuring courses and assessing achievement, and serves as a forum for communication between medical teachers and those involved in general education. In particular, the journal recognizes the problems teachers have in keeping up-to-date with the developments in educational methods that lead to more effective teaching and learning at a time when the content of the curriculum—from medical procedures to policy changes in health care provision—is also changing. The journal features reports of innovation and research in medical education, case studies, survey articles, practical guidelines, reviews of current literature and book reviews. All articles are peer reviewed.