Performance Assessment of GPT 4.0 on the Japanese Medical Licensing Examination.

IF 2 4区医学 Q3 MEDICINE, RESEARCH & EXPERIMENTAL

Current Medical Science Pub Date : 2024-12-01 Epub Date: 2024-10-26 DOI:10.1007/s11596-024-2932-9

Hong-Lin Wang, Hong Zhou, Jia-Yao Zhang, Yi Xie, Jia-Ming Yang, Ming-di Xue, Zi-Neng Yan, Wen Li, Xi-Bao Zhang, Yong Wu, Xiao-Ling Chen, Peng-Ran Liu, Lin Lu, Zhe-Wei Ye

{"title":"Performance Assessment of GPT 4.0 on the Japanese Medical Licensing Examination.","authors":"Hong-Lin Wang, Hong Zhou, Jia-Yao Zhang, Yi Xie, Jia-Ming Yang, Ming-di Xue, Zi-Neng Yan, Wen Li, Xi-Bao Zhang, Yong Wu, Xiao-Ling Chen, Peng-Ran Liu, Lin Lu, Zhe-Wei Ye","doi":"10.1007/s11596-024-2932-9","DOIUrl":null,"url":null,"abstract":"Objective: To evaluate the accuracy and parsing ability of GPT 4.0 for Japanese medical practitioner qualification examinations in a multidimensional way to investigate its response accuracy and comprehensiveness to medical knowledge.Methods: We evaluated the performance of the GPT 4.0 on Japanese Medical Licensing Examination (JMLE) questions (2021-2023). Questions are categorized by difficulty and type, with distinctions between general and clinical parts, as well as between single-choice (MCQ1) and multiple-choice (MCQ2) questions. Difficulty levels were determined on the basis of correct rates provided by the JMLE Preparatory School. The accuracy and quality of the GPT 4.0 responses were analyzed via an improved Global Qualily Scale (GQS) scores, considering both the chosen options and the accompanying analysis. Descriptive statistics and Pearson Chi-square tests were used to examine performance across exam years, question difficulty, type, and choice. GPT 4.0 ability was evaluated via the GQS, with comparisons made via the Mann-Whitney U or Kruskal-Wallis test.Results: The correct response rate and parsing ability of the GPT4.0 to the JMLE questions reached the qualification level (80.4%). In terms of the accuracy of the GPT4.0 response to the JMLE, we found significant differences in accuracy across both difficulty levels and option types. According to the GQS scores for the GPT 4.0 responses to all the JMLE questions, the performance of the questionnaire varied according to year and choice type.Conclusion: GTP4.0 performs well in providing basic support in medical education and medical research, but it also needs to input a large amount of medical-related data to train its model and improve the accuracy of its medical knowledge output. Further integration of ChatGPT with the medical field could open new opportunities for medicine.","PeriodicalId":10820,"journal":{"name":"Current Medical Science","volume":" ","pages":"1148-1154"},"PeriodicalIF":2.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Medical Science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11596-024-2932-9","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/26 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To evaluate the accuracy and parsing ability of GPT 4.0 for Japanese medical practitioner qualification examinations in a multidimensional way to investigate its response accuracy and comprehensiveness to medical knowledge.

Methods: We evaluated the performance of the GPT 4.0 on Japanese Medical Licensing Examination (JMLE) questions (2021-2023). Questions are categorized by difficulty and type, with distinctions between general and clinical parts, as well as between single-choice (MCQ1) and multiple-choice (MCQ2) questions. Difficulty levels were determined on the basis of correct rates provided by the JMLE Preparatory School. The accuracy and quality of the GPT 4.0 responses were analyzed via an improved Global Qualily Scale (GQS) scores, considering both the chosen options and the accompanying analysis. Descriptive statistics and Pearson Chi-square tests were used to examine performance across exam years, question difficulty, type, and choice. GPT 4.0 ability was evaluated via the GQS, with comparisons made via the Mann-Whitney U or Kruskal-Wallis test.

Results: The correct response rate and parsing ability of the GPT4.0 to the JMLE questions reached the qualification level (80.4%). In terms of the accuracy of the GPT4.0 response to the JMLE, we found significant differences in accuracy across both difficulty levels and option types. According to the GQS scores for the GPT 4.0 responses to all the JMLE questions, the performance of the questionnaire varied according to year and choice type.

Conclusion: GTP4.0 performs well in providing basic support in medical education and medical research, but it also needs to input a large amount of medical-related data to train its model and improve the accuracy of its medical knowledge output. Further integration of ChatGPT with the medical field could open new opportunities for medicine.

查看原文本刊更多论文

日本医师资格考试 GPT 4.0 的成绩评估。

目的从多维度评估日本执业医师资格考试 GPT 4.0 的准确性和解析能力，研究其应答准确性和医学知识的全面性：我们评估了 GPT 4.0 在日本执业医师资格考试（JMLE）试题（2021-2023 年）中的表现。试题按难度和类型进行分类，分为综合部分和临床部分，以及单项选择题（MCQ1）和多项选择题（MCQ2）。难度级别根据 JMLE 预备学校提供的正确率确定。通过改进的全球质量量表（GQS）评分分析了 GPT 4.0 答题的准确性和质量，同时考虑了所选选项和附带分析。使用描述性统计和皮尔逊卡方检验来考察不同考试年份、问题难度、类型和选择的成绩。通过 GQS 评估 GPT 4.0 能力，并通过 Mann-Whitney U 或 Kruskal-Wallis 检验进行比较：结果：GPT4.0对JMLE问题的正确回答率和解析能力达到了合格水平（80.4%）。在GPT4.0对JMLE问题回答的准确性方面，我们发现不同难度和选项类型的准确性存在显著差异。根据 GPT4.0 回答所有 JMLE 问题的 GQS 分数，问卷的表现因年份和选项类型而异：GTP4.0在为医学教育和医学研究提供基础支持方面表现良好，但还需要输入大量医学相关数据来训练其模型，提高医学知识输出的准确性。ChatGPT 与医学领域的进一步整合将为医学带来新的机遇。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Current Medical Science Biochemistry, Genetics and Molecular Biology-Genetics

CiteScore

4.70

自引率

0.00%

发文量

126

期刊介绍： Current Medical Science provides a forum for peer-reviewed papers in the medical sciences, to promote academic exchange between Chinese researchers and doctors and their foreign counterparts. The journal covers the subjects of biomedicine such as physiology, biochemistry, molecular biology, pharmacology, pathology and pathophysiology, etc., and clinical research, such as surgery, internal medicine, obstetrics and gynecology, pediatrics and otorhinolaryngology etc. The articles appearing in Current Medical Science are mainly in English, with a very small number of its papers in German, to pay tribute to its German founder. This journal is the only medical periodical in Western languages sponsored by an educational institution located in the central part of China.