Will ChatGPT pass the Polish specialty exam in radiology and diagnostic imaging? Insights into strengths and limitations.

Polish journal of radiology Pub Date : 2023-09-18 eCollection Date: 2023-01-01 DOI:10.5114/pjr.2023.131215

Jakub Kufel, Iga Paszkiewicz, Michał Bielówka, Wiktoria Bartnikowska, Michał Janik, Magdalena Stencel, Łukasz Czogalik, Katarzyna Gruszczyńska, Sylwia Mielcarska

{"title":"Will ChatGPT pass the Polish specialty exam in radiology and diagnostic imaging? Insights into strengths and limitations.","authors":"Jakub Kufel, Iga Paszkiewicz, Michał Bielówka, Wiktoria Bartnikowska, Michał Janik, Magdalena Stencel, Łukasz Czogalik, Katarzyna Gruszczyńska, Sylwia Mielcarska","doi":"10.5114/pjr.2023.131215","DOIUrl":null,"url":null,"abstract":"Purpose: Rapid development of artificial intelligence has aroused curiosity regarding its potential applications in medical field. The purpose of this article was to present the performance of ChatGPT, a state-of-the-art language model in relation to pass rate of national specialty examination (PES) in radiology and imaging diagnostics within Polish education system. Additionally, the study aimed to identify the strengths and limitations of the model through a detailed analysis of issues raised by exam questions.Material and methods: The present study utilized a PES exam consisting of 120 questions, provided by Medical Exami-nations Center in Lodz. Questions were administered using openai.com platform that grants free access to GPT-3.5 model. All questions were categorized according to Bloom's taxonomy to assess their complexity and difficulty. Following the answer to each exam question, ChatGPT was asked to rate its confidence on a scale of 1 to 5 to evaluate the accuracy of its response.Results: ChatGPT did not reach the pass rate threshold of PES exam (52%); however, it was close in certain question categories. No significant differences were observed in the percentage of correct answers across question types and sub-types.Conclusions: The performance of the ChatGPT model in the pass rate of PES exam in radiology and imaging diagnostics in Poland is yet to be determined, which requires further research on improved versions of ChatGPT.","PeriodicalId":94174,"journal":{"name":"Polish journal of radiology","volume":"88 ","pages":"e430-e434"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/e4/61/PJR-88-51387.PMC10551734.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Polish journal of radiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5114/pjr.2023.131215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Rapid development of artificial intelligence has aroused curiosity regarding its potential applications in medical field. The purpose of this article was to present the performance of ChatGPT, a state-of-the-art language model in relation to pass rate of national specialty examination (PES) in radiology and imaging diagnostics within Polish education system. Additionally, the study aimed to identify the strengths and limitations of the model through a detailed analysis of issues raised by exam questions.

Material and methods: The present study utilized a PES exam consisting of 120 questions, provided by Medical Exami-nations Center in Lodz. Questions were administered using openai.com platform that grants free access to GPT-3.5 model. All questions were categorized according to Bloom's taxonomy to assess their complexity and difficulty. Following the answer to each exam question, ChatGPT was asked to rate its confidence on a scale of 1 to 5 to evaluate the accuracy of its response.

Results: ChatGPT did not reach the pass rate threshold of PES exam (52%); however, it was close in certain question categories. No significant differences were observed in the percentage of correct answers across question types and sub-types.

Conclusions: The performance of the ChatGPT model in the pass rate of PES exam in radiology and imaging diagnostics in Poland is yet to be determined, which requires further research on improved versions of ChatGPT.

Abstract Image

查看原文本刊更多论文

ChatGPT会通过波兰放射学和诊断成像专业考试吗？洞察优势和局限性。

目的：人工智能的快速发展引起了人们对其在医学领域潜在应用的好奇。本文的目的是介绍ChatGPT的性能，这是一种最先进的语言模型，与波兰教育系统中放射学和成像诊断学国家专业考试（PES）的通过率有关。此外，该研究旨在通过对考试问题提出的问题进行详细分析，确定该模型的优势和局限性。材料和方法：本研究采用了由罗兹医学考试中心提供的由120道题组成的PES考试。问题使用openai.com平台进行管理，该平台允许免费访问GPT-3.5模型。所有问题都根据Bloom的分类法进行了分类，以评估其复杂性和难度。在回答完每个考试问题后，ChatGPT被要求对其置信度进行1到5的评分，以评估其回答的准确性。结果：ChatGPT未达到PES考试合格率阈值（52%）；然而，在某些问题类别中，它是接近的。不同问题类型和子类型的正确答案百分比没有显著差异。结论：ChatGPT模型在波兰放射学和成像诊断学PES考试通过率方面的表现尚待确定，这需要对改进版本的ChatGPT进行进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Polish journal of radiology

CiteScore

2.10

自引率

0.00%

发文量