Hong-Lin Wang, Hong Zhou, Jia-Yao Zhang, Yi Xie, Jia-Ming Yang, Ming-di Xue, Zi-Neng Yan, Wen Li, Xi-Bao Zhang, Yong Wu, Xiao-Ling Chen, Peng-Ran Liu, Lin Lu, Zhe-Wei Ye
{"title":"Performance Assessment of GPT 4.0 on the Japanese Medical Licensing Examination.","authors":"Hong-Lin Wang, Hong Zhou, Jia-Yao Zhang, Yi Xie, Jia-Ming Yang, Ming-di Xue, Zi-Neng Yan, Wen Li, Xi-Bao Zhang, Yong Wu, Xiao-Ling Chen, Peng-Ran Liu, Lin Lu, Zhe-Wei Ye","doi":"10.1007/s11596-024-2932-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the accuracy and parsing ability of GPT 4.0 for Japanese medical practitioner qualification examinations in a multidimensional way to investigate its response accuracy and comprehensiveness to medical knowledge.</p><p><strong>Methods: </strong>We evaluated the performance of the GPT 4.0 on Japanese Medical Licensing Examination (JMLE) questions (2021-2023). Questions are categorized by difficulty and type, with distinctions between general and clinical parts, as well as between single-choice (MCQ1) and multiple-choice (MCQ2) questions. Difficulty levels were determined on the basis of correct rates provided by the JMLE Preparatory School. The accuracy and quality of the GPT 4.0 responses were analyzed via an improved Global Qualily Scale (GQS) scores, considering both the chosen options and the accompanying analysis. Descriptive statistics and Pearson Chi-square tests were used to examine performance across exam years, question difficulty, type, and choice. GPT 4.0 ability was evaluated via the GQS, with comparisons made via the Mann-Whitney U or Kruskal-Wallis test.</p><p><strong>Results: </strong>The correct response rate and parsing ability of the GPT4.0 to the JMLE questions reached the qualification level (80.4%). In terms of the accuracy of the GPT4.0 response to the JMLE, we found significant differences in accuracy across both difficulty levels and option types. According to the GQS scores for the GPT 4.0 responses to all the JMLE questions, the performance of the questionnaire varied according to year and choice type.</p><p><strong>Conclusion: </strong>GTP4.0 performs well in providing basic support in medical education and medical research, but it also needs to input a large amount of medical-related data to train its model and improve the accuracy of its medical knowledge output. Further integration of ChatGPT with the medical field could open new opportunities for medicine.</p>","PeriodicalId":10820,"journal":{"name":"Current Medical Science","volume":" ","pages":"1148-1154"},"PeriodicalIF":2.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Medical Science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11596-024-2932-9","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/26 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To evaluate the accuracy and parsing ability of GPT 4.0 for Japanese medical practitioner qualification examinations in a multidimensional way to investigate its response accuracy and comprehensiveness to medical knowledge.
Methods: We evaluated the performance of the GPT 4.0 on Japanese Medical Licensing Examination (JMLE) questions (2021-2023). Questions are categorized by difficulty and type, with distinctions between general and clinical parts, as well as between single-choice (MCQ1) and multiple-choice (MCQ2) questions. Difficulty levels were determined on the basis of correct rates provided by the JMLE Preparatory School. The accuracy and quality of the GPT 4.0 responses were analyzed via an improved Global Qualily Scale (GQS) scores, considering both the chosen options and the accompanying analysis. Descriptive statistics and Pearson Chi-square tests were used to examine performance across exam years, question difficulty, type, and choice. GPT 4.0 ability was evaluated via the GQS, with comparisons made via the Mann-Whitney U or Kruskal-Wallis test.
Results: The correct response rate and parsing ability of the GPT4.0 to the JMLE questions reached the qualification level (80.4%). In terms of the accuracy of the GPT4.0 response to the JMLE, we found significant differences in accuracy across both difficulty levels and option types. According to the GQS scores for the GPT 4.0 responses to all the JMLE questions, the performance of the questionnaire varied according to year and choice type.
Conclusion: GTP4.0 performs well in providing basic support in medical education and medical research, but it also needs to input a large amount of medical-related data to train its model and improve the accuracy of its medical knowledge output. Further integration of ChatGPT with the medical field could open new opportunities for medicine.
期刊介绍:
Current Medical Science provides a forum for peer-reviewed papers in the medical sciences, to promote academic exchange between Chinese researchers and doctors and their foreign counterparts. The journal covers the subjects of biomedicine such as physiology, biochemistry, molecular biology, pharmacology, pathology and pathophysiology, etc., and clinical research, such as surgery, internal medicine, obstetrics and gynecology, pediatrics and otorhinolaryngology etc. The articles appearing in Current Medical Science are mainly in English, with a very small number of its papers in German, to pay tribute to its German founder. This journal is the only medical periodical in Western languages sponsored by an educational institution located in the central part of China.