Assessing knowledge about medical physics in language-generative AI with large language model: using the medical physicist exam.

IF 1.7 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Radiological Physics and Technology Pub Date : 2024-12-01 Epub Date: 2024-09-10 DOI:10.1007/s12194-024-00838-2

Noriyuki Kadoya, Kazuhiro Arai, Shohei Tanaka, Yuto Kimura, Ryota Tozuka, Keisuke Yasui, Naoki Hayashi, Yoshiyuki Katsuta, Haruna Takahashi, Koki Inoue, Keiichi Jingu

{"title":"Assessing knowledge about medical physics in language-generative AI with large language model: using the medical physicist exam.","authors":"Noriyuki Kadoya, Kazuhiro Arai, Shohei Tanaka, Yuto Kimura, Ryota Tozuka, Keisuke Yasui, Naoki Hayashi, Yoshiyuki Katsuta, Haruna Takahashi, Koki Inoue, Keiichi Jingu","doi":"10.1007/s12194-024-00838-2","DOIUrl":null,"url":null,"abstract":"<p><p>This study aimed to evaluate the performance for answering the Japanese medical physicist examination and providing the benchmark of knowledge about medical physics in language-generative AI with large language model. We used questions from Japan's 2018, 2019, 2020, 2021 and 2022 medical physicist board examinations, which covered various question types, including multiple-choice questions, and mainly focused on general medicine and medical physics. ChatGPT-3.5 and ChatGPT-4.0 (OpenAI) were used. We compared the AI-based answers with the correct ones. The average accuracy rates were 42.2 ± 2.5% (ChatGPT-3.5) and 72.7 ± 2.6% (ChatGPT-4), showing that ChatGPT-4 was more accurate than ChatGPT-3.5 [all categories (except for radiation-related laws and recommendations/medical ethics): p value < 0.05]. Even with the ChatGPT model with higher accuracy, the accuracy rates were less than 60% in two categories; radiation metrology (55.6%), and radiation-related laws and recommendations/medical ethics (40.0%). These data provide the benchmark for knowledge about medical physics in ChatGPT and can be utilized as basic data for the development of various medical physics tools using ChatGPT (e.g., radiation therapy support tools with Japanese input).</p>","PeriodicalId":46252,"journal":{"name":"Radiological Physics and Technology","volume":" ","pages":"929-937"},"PeriodicalIF":1.7000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiological Physics and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12194-024-00838-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/10 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

This study aimed to evaluate the performance for answering the Japanese medical physicist examination and providing the benchmark of knowledge about medical physics in language-generative AI with large language model. We used questions from Japan's 2018, 2019, 2020, 2021 and 2022 medical physicist board examinations, which covered various question types, including multiple-choice questions, and mainly focused on general medicine and medical physics. ChatGPT-3.5 and ChatGPT-4.0 (OpenAI) were used. We compared the AI-based answers with the correct ones. The average accuracy rates were 42.2 ± 2.5% (ChatGPT-3.5) and 72.7 ± 2.6% (ChatGPT-4), showing that ChatGPT-4 was more accurate than ChatGPT-3.5 [all categories (except for radiation-related laws and recommendations/medical ethics): p value < 0.05]. Even with the ChatGPT model with higher accuracy, the accuracy rates were less than 60% in two categories; radiation metrology (55.6%), and radiation-related laws and recommendations/medical ethics (40.0%). These data provide the benchmark for knowledge about medical physics in ChatGPT and can be utilized as basic data for the development of various medical physics tools using ChatGPT (e.g., radiation therapy support tools with Japanese input).

查看原文本刊更多论文

利用大型语言模型评估语言生成人工智能中的医学物理知识：使用医学物理学家考试。

本研究旨在评估日本医学物理学家考试的答题性能，并为具有大语言模型的语言生成人工智能提供医学物理知识基准。我们使用了日本 2018 年、2019 年、2020 年、2021 年和 2022 年医学物理学家考试的试题，这些试题涵盖了包括选择题在内的各种题型，主要集中在普通医学和医学物理方面。我们使用了 ChatGPT-3.5 和 ChatGPT-4.0（OpenAI）。我们将基于人工智能的答案与正确答案进行了比较。平均正确率为 42.2 ± 2.5%（ChatGPT-3.5）和 72.7 ± 2.6%（ChatGPT-4），显示 ChatGPT-4 比 ChatGPT-3.5 更准确[所有类别（辐射相关法律和建议/医学伦理除外）：p 值

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Radiological Physics and Technology RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

3.00

自引率

12.50%

发文量

期刊介绍： The purpose of the journal Radiological Physics and Technology is to provide a forum for sharing new knowledge related to research and development in radiological science and technology, including medical physics and radiological technology in diagnostic radiology, nuclear medicine, and radiation therapy among many other radiological disciplines, as well as to contribute to progress and improvement in medical practice and patient health care.