Artificial Intelligence Meets Item Analysis (AI meets IA): A Study of Chatbot Training and Performance in detecting and correcting MCQ Flaws.

IF 1.2 4区医学 Q2 MEDICINE, GENERAL & INTERNAL

Pakistan Journal of Medical Sciences Pub Date : 2025-03-01 DOI:10.12669/pjms.41.3.11224

Mashaal Sabqat, Rehan Ahmed Khan, Masood Jawaid, Madiha Sajjad

{"title":"Artificial Intelligence Meets Item Analysis (AI meets IA): A Study of Chatbot Training and Performance in detecting and correcting MCQ Flaws.","authors":"Mashaal Sabqat, Rehan Ahmed Khan, Masood Jawaid, Madiha Sajjad","doi":"10.12669/pjms.41.3.11224","DOIUrl":null,"url":null,"abstract":"Objective: To explore the potential of AI-powered chatbots, specifically ChatGPT, in identifying and correcting flaws in MCQs.Methods: A three-phase-Interventional study was conducted from February to August 2023 at Riphah International University, Islamabad. In Phase-1, flawed MCQs were selected from the NBME guide and fed into ChatGPT. ChatGPT identified item flaws and suggested corrections. In Phase-2, ChatGPT was trained to detect flaws in MCQs with text data from the NBME item writing guide. In Phase-3, ChatGPT was again tested to detect flaws and correct MCQs. Data were analyzed using SPSS, Version 26 and presented using percentages and McNemar's test with exact conditional method.Results: ChatGPT could identify and correct flaws such as use of \"None of the above,\" \"Grammatical cues,\" \"absolute terms,\" and \"inconsistently presented numerical data.\" However, it struggled with flaws related to \"complicated stems,\" \"long or complex options,\" and \"vague frequency terms.\" After training, ChatGPT became better at identifying and correcting flaws related to complicated stems and absolute terms. It also struggled with recognizing \"nonparallel options,\" \"convergence,\" and \"word repetition,\" both before and after training. ChatGPT's performance deteriorated during peak hours. The test of significance showed no measurable increase in ChatGPT's efficiency in detecting item flaws (p = 1.00) and correcting them (p = 0.125).Conclusion: AI is revolutionizing industries and improving efficiency, but limitations exist in complex conversations, analysis, accuracy, and error prevention. Ongoing research is vital to unlocking AI's potential, especially in education.","PeriodicalId":19958,"journal":{"name":"Pakistan Journal of Medical Sciences","volume":"41 3","pages":"652-656"},"PeriodicalIF":1.2000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11911725/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pakistan Journal of Medical Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.12669/pjms.41.3.11224","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To explore the potential of AI-powered chatbots, specifically ChatGPT, in identifying and correcting flaws in MCQs.

Methods: A three-phase-Interventional study was conducted from February to August 2023 at Riphah International University, Islamabad. In Phase-1, flawed MCQs were selected from the NBME guide and fed into ChatGPT. ChatGPT identified item flaws and suggested corrections. In Phase-2, ChatGPT was trained to detect flaws in MCQs with text data from the NBME item writing guide. In Phase-3, ChatGPT was again tested to detect flaws and correct MCQs. Data were analyzed using SPSS, Version 26 and presented using percentages and McNemar's test with exact conditional method.

Results: ChatGPT could identify and correct flaws such as use of "None of the above," "Grammatical cues," "absolute terms," and "inconsistently presented numerical data." However, it struggled with flaws related to "complicated stems," "long or complex options," and "vague frequency terms." After training, ChatGPT became better at identifying and correcting flaws related to complicated stems and absolute terms. It also struggled with recognizing "nonparallel options," "convergence," and "word repetition," both before and after training. ChatGPT's performance deteriorated during peak hours. The test of significance showed no measurable increase in ChatGPT's efficiency in detecting item flaws (p = 1.00) and correcting them (p = 0.125).

Conclusion: AI is revolutionizing industries and improving efficiency, but limitations exist in complex conversations, analysis, accuracy, and error prevention. Ongoing research is vital to unlocking AI's potential, especially in education.

查看原文本刊更多论文

人工智能与项目分析（AI meets IA）：聊天机器人在检测和纠正 MCQ 缺陷方面的培训和性能研究。

目的：探索人工智能聊天机器人（特别是ChatGPT）在识别和纠正mcq缺陷方面的潜力。方法：于2023年2月至8月在伊斯兰堡Riphah国际大学进行了一项为期三个阶段的介入研究。在阶段1中，从NBME指南中选择有缺陷的mcq并将其输入ChatGPT。ChatGPT识别项目缺陷并提出修正建议。在阶段2中，ChatGPT被训练来使用来自NBME项目编写指南的文本数据来检测mcq中的缺陷。在第三阶段，再次测试ChatGPT以检测缺陷并纠正mcq。数据采用SPSS， Version 26进行分析，采用百分比和McNemar检验，采用精确条件法。结果：ChatGPT可以识别并纠正诸如“以上皆非”、“语法提示”、“绝对术语”和“不一致的数字数据”等错误。然而，它与“复杂的系统”、“长或复杂的选项”和“模糊的频率术语”相关的缺陷进行了斗争。经过培训，ChatGPT在识别和纠正与复杂词干和绝对术语相关的缺陷方面变得更好。在训练前后，它在识别“非平行选项”、“收敛”和“单词重复”方面也遇到了困难。ChatGPT的性能在高峰时段恶化。显著性检验显示ChatGPT在发现项目缺陷（p = 1.00）和纠正它们（p = 0.125）方面的效率没有可测量的增加。结论：人工智能正在革新行业，提高效率，但在复杂的对话、分析、准确性和错误预防方面存在局限性。正在进行的研究对于释放人工智能的潜力至关重要，尤其是在教育领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pakistan Journal of Medical Sciences 医学-医学：内科

CiteScore

4.10

自引率

9.10%

发文量

363

审稿时长

3-6 weeks

期刊介绍： It is a peer reviewed medical journal published regularly since 1984. It was previously known as quarterly "SPECIALIST" till December 31st 1999. It publishes original research articles, review articles, current practices, short communications & case reports. It attracts manuscripts not only from within Pakistan but also from over fifty countries from abroad. Copies of PJMS are sent to all the import medical libraries all over Pakistan and overseas particularly in South East Asia and Asia Pacific besides WHO EMRO Region countries. Eminent members of the medical profession at home and abroad regularly contribute their write-ups, manuscripts in our publications. We pursue an independent editorial policy, which allows an opportunity to the healthcare professionals to express their views without any fear or favour. That is why many opinion makers among the medical and pharmaceutical profession use this publication to communicate their viewpoint.