{"title":"人工智能工具在急诊医学答题库中的性能比较:ChatGPT 4.0、谷歌Gemini和Microsoft Copilot。","authors":"Iskender Aksoy, Merve Kara Arslan","doi":"10.12669/pjms.41.4.11178","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Using artificial intelligence tools that work with different software architectures for both clinical and educational purposes in the medical field has been a subject of considerable interest recently. In this study, we compared the answers given by three different artificial intelligence chatbots to the Emergency Medicine question pool obtained from the questions asked in the Turkish National Medical Specialization Exam. We tried to investigate the effects on the answers given by classifying the questions in terms of content and form and examining the question sentences.</p><p><strong>Methods: </strong>The questions related to emergency medicine of the Medical Specialization Exam questions between 2015-2020 were recorded. The questions were asked to artificial intelligence models, including ChatGPT-4, Gemini, and Copilot. The length of the questions, the question type and the topics of the wrong answers were recorded.</p><p><strong>Results: </strong>The most successful chatbot in terms of total score was Microsoft Copilot (7.8% error margin), while the least successful was Google Gemini (22.9% error margin) (p<0.001). It was important that all chatbots had the highest error margins in questions about trauma and surgical approaches and made mistakes in burns and pediatrics. The increase in the error rates in questions containing the root \"probability\" also showed that the question style affected the answers given.</p><p><strong>Conclusions: </strong>Although chatbots show promising success in determining the correct answer, we think that they should not see chatbots as a primary source for the exam, but rather as a good auxiliary tool to support their learning processes.</p>","PeriodicalId":19958,"journal":{"name":"Pakistan Journal of Medical Sciences","volume":"41 4","pages":"968-972"},"PeriodicalIF":1.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12022595/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot.\",\"authors\":\"Iskender Aksoy, Merve Kara Arslan\",\"doi\":\"10.12669/pjms.41.4.11178\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>Using artificial intelligence tools that work with different software architectures for both clinical and educational purposes in the medical field has been a subject of considerable interest recently. In this study, we compared the answers given by three different artificial intelligence chatbots to the Emergency Medicine question pool obtained from the questions asked in the Turkish National Medical Specialization Exam. We tried to investigate the effects on the answers given by classifying the questions in terms of content and form and examining the question sentences.</p><p><strong>Methods: </strong>The questions related to emergency medicine of the Medical Specialization Exam questions between 2015-2020 were recorded. The questions were asked to artificial intelligence models, including ChatGPT-4, Gemini, and Copilot. The length of the questions, the question type and the topics of the wrong answers were recorded.</p><p><strong>Results: </strong>The most successful chatbot in terms of total score was Microsoft Copilot (7.8% error margin), while the least successful was Google Gemini (22.9% error margin) (p<0.001). It was important that all chatbots had the highest error margins in questions about trauma and surgical approaches and made mistakes in burns and pediatrics. The increase in the error rates in questions containing the root \\\"probability\\\" also showed that the question style affected the answers given.</p><p><strong>Conclusions: </strong>Although chatbots show promising success in determining the correct answer, we think that they should not see chatbots as a primary source for the exam, but rather as a good auxiliary tool to support their learning processes.</p>\",\"PeriodicalId\":19958,\"journal\":{\"name\":\"Pakistan Journal of Medical Sciences\",\"volume\":\"41 4\",\"pages\":\"968-972\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12022595/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pakistan Journal of Medical Sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.12669/pjms.41.4.11178\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pakistan Journal of Medical Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.12669/pjms.41.4.11178","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot.
Objective: Using artificial intelligence tools that work with different software architectures for both clinical and educational purposes in the medical field has been a subject of considerable interest recently. In this study, we compared the answers given by three different artificial intelligence chatbots to the Emergency Medicine question pool obtained from the questions asked in the Turkish National Medical Specialization Exam. We tried to investigate the effects on the answers given by classifying the questions in terms of content and form and examining the question sentences.
Methods: The questions related to emergency medicine of the Medical Specialization Exam questions between 2015-2020 were recorded. The questions were asked to artificial intelligence models, including ChatGPT-4, Gemini, and Copilot. The length of the questions, the question type and the topics of the wrong answers were recorded.
Results: The most successful chatbot in terms of total score was Microsoft Copilot (7.8% error margin), while the least successful was Google Gemini (22.9% error margin) (p<0.001). It was important that all chatbots had the highest error margins in questions about trauma and surgical approaches and made mistakes in burns and pediatrics. The increase in the error rates in questions containing the root "probability" also showed that the question style affected the answers given.
Conclusions: Although chatbots show promising success in determining the correct answer, we think that they should not see chatbots as a primary source for the exam, but rather as a good auxiliary tool to support their learning processes.
期刊介绍:
It is a peer reviewed medical journal published regularly since 1984. It was previously known as quarterly "SPECIALIST" till December 31st 1999. It publishes original research articles, review articles, current practices, short communications & case reports. It attracts manuscripts not only from within Pakistan but also from over fifty countries from abroad.
Copies of PJMS are sent to all the import medical libraries all over Pakistan and overseas particularly in South East Asia and Asia Pacific besides WHO EMRO Region countries. Eminent members of the medical profession at home and abroad regularly contribute their write-ups, manuscripts in our publications. We pursue an independent editorial policy, which allows an opportunity to the healthcare professionals to express their views without any fear or favour. That is why many opinion makers among the medical and pharmaceutical profession use this publication to communicate their viewpoint.