人工智能工具在急诊医学答题库中的性能比较:ChatGPT 4.0、谷歌Gemini和Microsoft Copilot。

IF 1.2 4区 医学 Q2 MEDICINE, GENERAL & INTERNAL
Iskender Aksoy, Merve Kara Arslan
{"title":"人工智能工具在急诊医学答题库中的性能比较:ChatGPT 4.0、谷歌Gemini和Microsoft Copilot。","authors":"Iskender Aksoy, Merve Kara Arslan","doi":"10.12669/pjms.41.4.11178","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Using artificial intelligence tools that work with different software architectures for both clinical and educational purposes in the medical field has been a subject of considerable interest recently. In this study, we compared the answers given by three different artificial intelligence chatbots to the Emergency Medicine question pool obtained from the questions asked in the Turkish National Medical Specialization Exam. We tried to investigate the effects on the answers given by classifying the questions in terms of content and form and examining the question sentences.</p><p><strong>Methods: </strong>The questions related to emergency medicine of the Medical Specialization Exam questions between 2015-2020 were recorded. The questions were asked to artificial intelligence models, including ChatGPT-4, Gemini, and Copilot. The length of the questions, the question type and the topics of the wrong answers were recorded.</p><p><strong>Results: </strong>The most successful chatbot in terms of total score was Microsoft Copilot (7.8% error margin), while the least successful was Google Gemini (22.9% error margin) (p<0.001). It was important that all chatbots had the highest error margins in questions about trauma and surgical approaches and made mistakes in burns and pediatrics. The increase in the error rates in questions containing the root \"probability\" also showed that the question style affected the answers given.</p><p><strong>Conclusions: </strong>Although chatbots show promising success in determining the correct answer, we think that they should not see chatbots as a primary source for the exam, but rather as a good auxiliary tool to support their learning processes.</p>","PeriodicalId":19958,"journal":{"name":"Pakistan Journal of Medical Sciences","volume":"41 4","pages":"968-972"},"PeriodicalIF":1.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12022595/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot.\",\"authors\":\"Iskender Aksoy, Merve Kara Arslan\",\"doi\":\"10.12669/pjms.41.4.11178\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>Using artificial intelligence tools that work with different software architectures for both clinical and educational purposes in the medical field has been a subject of considerable interest recently. In this study, we compared the answers given by three different artificial intelligence chatbots to the Emergency Medicine question pool obtained from the questions asked in the Turkish National Medical Specialization Exam. We tried to investigate the effects on the answers given by classifying the questions in terms of content and form and examining the question sentences.</p><p><strong>Methods: </strong>The questions related to emergency medicine of the Medical Specialization Exam questions between 2015-2020 were recorded. The questions were asked to artificial intelligence models, including ChatGPT-4, Gemini, and Copilot. The length of the questions, the question type and the topics of the wrong answers were recorded.</p><p><strong>Results: </strong>The most successful chatbot in terms of total score was Microsoft Copilot (7.8% error margin), while the least successful was Google Gemini (22.9% error margin) (p<0.001). It was important that all chatbots had the highest error margins in questions about trauma and surgical approaches and made mistakes in burns and pediatrics. The increase in the error rates in questions containing the root \\\"probability\\\" also showed that the question style affected the answers given.</p><p><strong>Conclusions: </strong>Although chatbots show promising success in determining the correct answer, we think that they should not see chatbots as a primary source for the exam, but rather as a good auxiliary tool to support their learning processes.</p>\",\"PeriodicalId\":19958,\"journal\":{\"name\":\"Pakistan Journal of Medical Sciences\",\"volume\":\"41 4\",\"pages\":\"968-972\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12022595/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pakistan Journal of Medical Sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.12669/pjms.41.4.11178\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pakistan Journal of Medical Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.12669/pjms.41.4.11178","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

摘要

目的:在医学领域使用与不同软件架构一起工作的人工智能工具,用于临床和教育目的,最近一直是一个相当感兴趣的主题。在这项研究中,我们将三种不同的人工智能聊天机器人给出的答案与从土耳其国家医学专业考试中获得的急诊医学问题池进行了比较。我们试图通过对问题的内容和形式进行分类,并检查问题的句子来研究对答案的影响。方法:记录2015-2020年医学专业考试急诊科相关题目。这些问题被问及人工智能模型,包括ChatGPT-4、Gemini和Copilot。记录问题的长度、问题类型和错误答案的主题。结果:就总分而言,最成功的聊天机器人是Microsoft Copilot(误差范围为7.8%),而最不成功的是谷歌Gemini(误差范围为22.9%)。结论:尽管聊天机器人在确定正确答案方面表现出了很大的成功,但我们认为他们不应该将聊天机器人视为考试的主要来源,而应该将其视为支持他们学习过程的良好辅助工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot.

Objective: Using artificial intelligence tools that work with different software architectures for both clinical and educational purposes in the medical field has been a subject of considerable interest recently. In this study, we compared the answers given by three different artificial intelligence chatbots to the Emergency Medicine question pool obtained from the questions asked in the Turkish National Medical Specialization Exam. We tried to investigate the effects on the answers given by classifying the questions in terms of content and form and examining the question sentences.

Methods: The questions related to emergency medicine of the Medical Specialization Exam questions between 2015-2020 were recorded. The questions were asked to artificial intelligence models, including ChatGPT-4, Gemini, and Copilot. The length of the questions, the question type and the topics of the wrong answers were recorded.

Results: The most successful chatbot in terms of total score was Microsoft Copilot (7.8% error margin), while the least successful was Google Gemini (22.9% error margin) (p<0.001). It was important that all chatbots had the highest error margins in questions about trauma and surgical approaches and made mistakes in burns and pediatrics. The increase in the error rates in questions containing the root "probability" also showed that the question style affected the answers given.

Conclusions: Although chatbots show promising success in determining the correct answer, we think that they should not see chatbots as a primary source for the exam, but rather as a good auxiliary tool to support their learning processes.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pakistan Journal of Medical Sciences
Pakistan Journal of Medical Sciences 医学-医学:内科
CiteScore
4.10
自引率
9.10%
发文量
363
审稿时长
3-6 weeks
期刊介绍: It is a peer reviewed medical journal published regularly since 1984. It was previously known as quarterly "SPECIALIST" till December 31st 1999. It publishes original research articles, review articles, current practices, short communications & case reports. It attracts manuscripts not only from within Pakistan but also from over fifty countries from abroad. Copies of PJMS are sent to all the import medical libraries all over Pakistan and overseas particularly in South East Asia and Asia Pacific besides WHO EMRO Region countries. Eminent members of the medical profession at home and abroad regularly contribute their write-ups, manuscripts in our publications. We pursue an independent editorial policy, which allows an opportunity to the healthcare professionals to express their views without any fear or favour. That is why many opinion makers among the medical and pharmaceutical profession use this publication to communicate their viewpoint.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信