{"title":"Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study.","authors":"Zelin Wu, Wenyi Gan, Zhaowen Xue, Zhengxin Ni, Xiaofei Zheng, Yiyi Zhang","doi":"10.2196/52746","DOIUrl":"10.2196/52746","url":null,"abstract":"<p><strong>Background: </strong>The creation of large language models (LLMs) such as ChatGPT is an important step in the development of artificial intelligence, which shows great potential in medical education due to its powerful language understanding and generative capabilities. The purpose of this study was to quantitatively evaluate and comprehensively analyze ChatGPT's performance in handling questions for the National Nursing Licensure Examination (NNLE) in China and the United States, including the National Council Licensure Examination for Registered Nurses (NCLEX-RN) and the NNLE.</p><p><strong>Objective: </strong>This study aims to examine how well LLMs respond to the NCLEX-RN and the NNLE multiple-choice questions (MCQs) in various language inputs. To evaluate whether LLMs can be used as multilingual learning assistance for nursing, and to assess whether they possess a repository of professional knowledge applicable to clinical nursing practice.</p><p><strong>Methods: </strong>First, we compiled 150 NCLEX-RN Practical MCQs, 240 NNLE Theoretical MCQs, and 240 NNLE Practical MCQs. Then, the translation function of ChatGPT 3.5 was used to translate NCLEX-RN questions from English to Chinese and NNLE questions from Chinese to English. Finally, the original version and the translated version of the MCQs were inputted into ChatGPT 4.0, ChatGPT 3.5, and Google Bard. Different LLMs were compared according to the accuracy rate, and the differences between different language inputs were compared.</p><p><strong>Results: </strong>The accuracy rates of ChatGPT 4.0 for NCLEX-RN practical questions and Chinese-translated NCLEX-RN practical questions were 88.7% (133/150) and 79.3% (119/150), respectively. Despite the statistical significance of the difference (P=.03), the correct rate was generally satisfactory. Around 71.9% (169/235) of NNLE Theoretical MCQs and 69.1% (161/233) of NNLE Practical MCQs were correctly answered by ChatGPT 4.0. The accuracy of ChatGPT 4.0 in processing NNLE Theoretical MCQs and NNLE Practical MCQs translated into English was 71.5% (168/235; P=.92) and 67.8% (158/233; P=.77), respectively, and there was no statistically significant difference between the results of text input in different languages. ChatGPT 3.5 (NCLEX-RN P=.003, NNLE Theoretical P<.001, NNLE Practical P=.12) and Google Bard (NCLEX-RN P<.001, NNLE Theoretical P<.001, NNLE Practical P<.001) had lower accuracy rates for nursing-related MCQs than ChatGPT 4.0 in English input. English accuracy was higher when compared with ChatGPT 3.5's Chinese input, and the difference was statistically significant (NCLEX-RN P=.02, NNLE Practical P=.02). Whether submitted in Chinese or English, the MCQs from the NCLEX-RN and NNLE demonstrated that ChatGPT 4.0 had the highest number of unique correct responses and the lowest number of unique incorrect responses among the 3 LLMs.</p><p><strong>Conclusions: </strong>This study, focusing on 618 nursing MCQs including NCLEX-RN and ","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"10 ","pages":"e52746"},"PeriodicalIF":3.2,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11466054/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bridging the Telehealth Digital Divide With Collegiate Navigators: Mixed Methods Evaluation Study of a Service-Learning Health Disparities Course.","authors":"Zakaria Nadeem Doueiri, Rika Bajra, Malathi Srinivasan, Erika Schillinger, Nancy Cuan","doi":"10.2196/57077","DOIUrl":"10.2196/57077","url":null,"abstract":"<p><strong>Background: </strong>Limited digital literacy is a barrier for vulnerable patients accessing health care.</p><p><strong>Objective: </strong>The Stanford Technology Access Resource Team (START), a service-learning course created to bridge the telehealth digital divide, trained undergraduate and graduate students to provide hands-on patient support to improve access to electronic medical records (EMRs) and video visits while learning about social determinants of health.</p><p><strong>Methods: </strong>START students reached out to 1185 patients (n=711, 60% from primary care clinics of a large academic medical center and n=474, 40% from a federally qualified health center). Registries consisted of patients without an EMR account (at primary care clinics) or patients with a scheduled telehealth visit (at a federally qualified health center). Patient outcomes were evaluated by successful EMR enrollments and video visit setups. Student outcomes were assessed by reflections coded for thematic content.</p><p><strong>Results: </strong>Over 6 academic quarters, 57 students reached out to 1185 registry patients. Of the 229 patients contacted, 141 desired technical support. START students successfully established EMR accounts and set up video visits for 78.7% (111/141) of patients. After program completion, we reached out to 13.5% (19/141) of patients to collect perspectives on program utility. The majority (18/19, 94.7%) reported that START students were helpful, and 73.7% (14/19) reported that they had successfully connected with their health care provider in a digital visit. Inability to establish access included a lack of Wi-Fi or device access, the absence of an interpreter, and a disability that precluded the use of video visits. Qualitative analysis of student reflections showed an impact on future career goals and improved awareness of health disparities of technology access.</p><p><strong>Conclusions: </strong>Of the patients who desired telehealth access, START improved access for 78.7% (111/141) of patients. Students found that START broadened their understanding of health disparities and social determinants of health and influenced their future career goals.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"10 ","pages":"e57077"},"PeriodicalIF":3.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11480730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142366793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge Mapping and Global Trends in the Field of the Objective Structured Clinical Examination: Bibliometric and Visual Analysis (2004-2023).","authors":"Hongjun Ba, Lili Zhang, Xiufang He, Shujuan Li","doi":"10.2196/57772","DOIUrl":"10.2196/57772","url":null,"abstract":"<p><strong>Background: </strong>The Objective Structured Clinical Examination (OSCE) is a pivotal tool for assessing health care professionals and plays an integral role in medical education.</p><p><strong>Objective: </strong>This study aims to map the bibliometric landscape of OSCE research, highlighting trends and key influencers.</p><p><strong>Methods: </strong>A comprehensive literature search was conducted for materials related to OSCE from January 2004 to December 2023, using the Web of Science Core Collection database. Bibliometric analysis and visualization were performed with VOSviewer and CiteSpace software tools.</p><p><strong>Results: </strong>Our analysis indicates a consistent increase in OSCE-related publications over the study period, with a notable surge after 2019, culminating in a peak of activity in 2021. The United States emerged as a significant contributor, responsible for 30.86% (1626/5268) of total publications and amassing 44,051 citations. Coauthorship network analysis highlighted robust collaborations, particularly between the United States and the United Kingdom. Leading journals in this domain-BMC Medical Education, Medical Education, Academic Medicine, and Medical Teacher-featured the highest volume of papers, while The Lancet garnered substantial citations, reflecting its high impact factor (to be verified for accuracy). Prominent authors in the field include Sondra Zabar, Debra Pugh, Timothy J Wood, and Susan Humphrey-Murto, with Ronaldo M Harden, Brian D Hodges, and George E Miller being the most cited. The analysis of key research terms revealed a focus on \"education,\" \"performance,\" \"competence,\" and \"skills,\" indicating these are central themes in OSCE research.</p><p><strong>Conclusions: </strong>The study underscores a dynamic expansion in OSCE research and international collaboration, spotlighting influential countries, institutions, authors, and journals. These elements are instrumental in steering the evolution of medical education assessment practices and suggest a trajectory for future research endeavors. Future work should consider the implications of these findings for medical education and the potential areas for further investigation, particularly in underrepresented regions or emerging competencies in health care training.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"10 ","pages":"e57772"},"PeriodicalIF":3.2,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11474118/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annabelle Mielitz, Ulf Kulau, Lucas Bublitz, Anja Bittner, Hendrik Friederichs, Urs-Vito Albrecht
{"title":"Teaching Digital Medicine to Undergraduate Medical Students With an Interprofessional and Interdisciplinary Approach: Development and Usability Study.","authors":"Annabelle Mielitz, Ulf Kulau, Lucas Bublitz, Anja Bittner, Hendrik Friederichs, Urs-Vito Albrecht","doi":"10.2196/56787","DOIUrl":"10.2196/56787","url":null,"abstract":"<p><strong>Background: </strong>An integration of digital medicine into medical education can help future physicians shape the digital transformation of medicine.</p><p><strong>Objective: </strong>We aim to describe and evaluate a newly developed course for teaching digital medicine (the Bielefeld model) for the first time.</p><p><strong>Methods: </strong>The course was held with undergraduate medical students at Medical School Ostwestfalen-Lippe at Bielefeld University, Germany, in 2023 and evaluated via pretest-posttest surveys. The subjective and objective achievement of superordinate learning objectives and the objective achievement of subordinate learning objectives of the course, course design, and course importance were evaluated using 5-point Likert scales (1=strongly disagree; 5=strongly agree); reasons for absences were assessed using a multiple-choice format, and comments were collected. The superordinate objectives comprised (1) the understanding of factors driving the implementation of digital medical products and processes, (2) the application of this knowledge to a project, and (3) the empowerment to design such solutions in the future. The subordinate objectives comprised competencies related to the first superordinate objective.</p><p><strong>Results: </strong>In total, 10 undergraduate medical students (male: n=4, 40%; female: n=6, 60%; mean age 21.7, SD 2.1 years) evaluated the course. The superordinate objectives were achieved well to very well-the medians for the objective achievement were 4 (IQR 4-5), 4 (IQR 3-5), and 4 (IQR 4-4) scale units for the first, second, and third objectives, respectively, and the medians for the subjective achievement of the first, second, and third objectives were 4 (IQR 3-4), 4.5 (IQR 3-5), and 4 (IQR 3-5) scale units, respectively. Participants mastered the subordinate objectives, on average, better after the course than before (presurvey median 2.5, IQR 2-3 scale units; postsurvey median 4, IQR 3-4 scale units). The course concept was rated as highly suitable for achieving the superordinate objectives (median 5, IQR 4-5 scale units for the first, second, and third objectives). On average, the students strongly liked the course (median 5, IQR 4-5 scale units) and gained a benefit from it (median 4.5, IQR 4-5 scale units). All students fully agreed that the teaching staff was a strength of the course. The category positive feedback on the course or positive personal experience with the course received the most comments.</p><p><strong>Conclusions: </strong>The course framework shows promise in attaining learning objectives within the realm of digital medicine, notwithstanding the constraint of limited interpretability arising from a small sample size and further limitations. The course concept aligns with insights derived from teaching and learning research and the domain of digital medicine, albeit with identifiable areas for enhancement. A literature review indicates a dearth of publications pe","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":" ","pages":"e56787"},"PeriodicalIF":3.2,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11474112/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Artificial Intelligence in Dental Education: Opportunities and Challenges of Large Language Models and Multimodal Foundation Models.","authors":"Daniel Claman, Emre Sezgin","doi":"10.2196/52346","DOIUrl":"10.2196/52346","url":null,"abstract":"<p><strong>Unlabelled: </strong>Instructional and clinical technologies have been transforming dental education. With the emergence of artificial intelligence (AI), the opportunities of using AI in education has increased. With the recent advancement of generative AI, large language models (LLMs) and foundation models gained attention with their capabilities in natural language understanding and generation as well as combining multiple types of data, such as text, images, and audio. A common example has been ChatGPT, which is based on a powerful LLM-the GPT model. This paper discusses the potential benefits and challenges of incorporating LLMs in dental education, focusing on periodontal charting with a use case to outline capabilities of LLMs. LLMs can provide personalized feedback, generate case scenarios, and create educational content to contribute to the quality of dental education. However, challenges, limitations, and risks exist, including bias and inaccuracy in the content created, privacy and security concerns, and the risk of overreliance. With guidance and oversight, and by effectively and ethically integrating LLMs, dental education can incorporate engaging and personalized learning experiences for students toward readiness for real-life clinical practice.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"10 ","pages":"e52346"},"PeriodicalIF":3.2,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11451510/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kye Hwa Lee, Jae Ho Lee, Yura Lee, Hyunna Lee, Ji Sung Lee, Hye Jeon Jang, Kun Hee Lee, Jeong Hyun Han, SuJung Jang
{"title":"Impact of Health Informatics Analyst Education on Job Role, Career Transition, and Skill Development: Survey Study.","authors":"Kye Hwa Lee, Jae Ho Lee, Yura Lee, Hyunna Lee, Ji Sung Lee, Hye Jeon Jang, Kun Hee Lee, Jeong Hyun Han, SuJung Jang","doi":"10.2196/54427","DOIUrl":"10.2196/54427","url":null,"abstract":"<p><strong>Background: </strong>Professionals with expertise in health informatics play a crucial role in the digital health sector. Despite efforts to train experts in this field, the specific impact of such training, especially for individuals from diverse academic backgrounds, remains undetermined.</p><p><strong>Objective: </strong>This study therefore aims to evaluate the effectiveness of an intensive health informatics training program on graduates with respect to their job roles, transitions, and competencies and to provide insights for curriculum design and future research.</p><p><strong>Methods: </strong>A survey was conducted among 206 students who completed the Advanced Health Informatics Analyst program between 2018 and 2022. The questionnaire comprised four categories: (1) general information about the respondent, (2) changes before and after program completion, (3) the impact of the program on professional practice, and (4) continuing education requirements.</p><p><strong>Results: </strong>The study received 161 (78.2%) responses from the 206 students. Graduates of the program had diverse academic backgrounds and consequently undertook various informatics tasks after their training. Most graduates (117/161, 72.7%) are now involved in tasks such as data preprocessing, visualizing results for better understanding, and report writing for data processing and analysis. Program participation significantly improved job performance (P=.03), especially for those with a master's degree or higher (odds ratio 2.74, 95% CI 1.08-6.95) and those from regions other than Seoul or Gyeonggi-do (odds ratio 10.95, 95% CI 1.08-6.95). A substantial number of respondents indicated that the training had a substantial influence on their career transitions, primarily by providing a better understanding of job roles and generating intrinsic interest in the field.</p><p><strong>Conclusions: </strong>The integrated practical education program was effective in addressing the diverse needs of trainees from various fields, enhancing their capabilities, and preparing them for the evolving industry demands. This study emphasizes the value of providing specialized training in health informatics for graduates regardless of their discipline.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"10 ","pages":"e54427"},"PeriodicalIF":3.2,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446175/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Medical Interview Skills Through AI-Simulated Patient Interactions: Nonrandomized Controlled Trial.","authors":"Akira Yamamoto, Masahide Koda, Hiroko Ogawa, Tomoko Miyoshi, Yoshinobu Maeda, Fumio Otsuka, Hideo Ino","doi":"10.2196/58753","DOIUrl":"10.2196/58753","url":null,"abstract":"<p><strong>Background: </strong>Medical interviewing is a critical skill in clinical practice, yet opportunities for practical training are limited in Japanese medical schools, necessitating urgent measures. Given advancements in artificial intelligence (AI) technology, its application in the medical field is expanding. However, reports on its application in medical interviews in medical education are scarce.</p><p><strong>Objective: </strong>This study aimed to investigate whether medical students' interview skills could be improved by engaging with AI-simulated patients using large language models, including the provision of feedback.</p><p><strong>Methods: </strong>This nonrandomized controlled trial was conducted with fourth-year medical students in Japan. A simulation program using large language models was provided to 35 students in the intervention group in 2023, while 110 students from 2022 who did not participate in the intervention were selected as the control group. The primary outcome was the score on the Pre-Clinical Clerkship Objective Structured Clinical Examination (pre-CC OSCE), a national standardized clinical skills examination, in medical interviewing. Secondary outcomes included surveys such as the Simulation-Based Training Quality Assurance Tool (SBT-QA10), administered at the start and end of the study.</p><p><strong>Results: </strong>The AI intervention group showed significantly higher scores on medical interviews than the control group (AI group vs control group: mean 28.1, SD 1.6 vs 27.1, SD 2.2; P=.01). There was a trend of inverse correlation between the SBT-QA10 and pre-CC OSCE scores (regression coefficient -2.0 to -2.1). No significant safety concerns were observed.</p><p><strong>Conclusions: </strong>Education through medical interviews using AI-simulated patients has demonstrated safety and a certain level of educational effectiveness. However, at present, the educational effects of this platform on nonverbal communication skills are limited, suggesting that it should be used as a supplementary tool to traditional simulation education.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"10 ","pages":"e58753"},"PeriodicalIF":3.2,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459107/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142297395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fatma Sahan, Lisa Guthardt, Karin Panitz, Anna Siegel-Kianer, Isabel Eichhof, Björn D Schmitt, Jennifer Apolinario-Hagen
{"title":"Enhancing Digital Health Awareness and mHealth Competencies in Medical Education: Proof-of-Concept Study and Summative Process Evaluation of a Quality Improvement Project.","authors":"Fatma Sahan, Lisa Guthardt, Karin Panitz, Anna Siegel-Kianer, Isabel Eichhof, Björn D Schmitt, Jennifer Apolinario-Hagen","doi":"10.2196/59454","DOIUrl":"10.2196/59454","url":null,"abstract":"<p><strong>Background: </strong>Currently, there is a need to optimize knowledge on digital transformation in mental health care, including digital therapeutics (eg, prescription apps), in medical education. However, in Germany, digital health has not yet been systematically integrated into medical curricula and is taught in a relatively small number of electives. Challenges for lecturers include the dynamic field as well as lacking guidance on how to efficiently apply innovative teaching formats for these new digital competencies. Quality improvement projects provide options to pilot-test novel educational offerings, as little is known about the acceptability of participatory approaches in conventional medical education.</p><p><strong>Objective: </strong>This quality improvement project addressed the gap in medical school electives on digital health literacy by introducing and evaluating an elective scoping study on the systematic development of different health app concepts designed by students to cultivate essential skills for future health care professionals (ie, mobile health [mHealth] competencies).</p><p><strong>Methods: </strong>This proof-of-concept study describes the development, optimization, implementation, and evaluation of a web-based elective on digital (mental) health competencies in medical education. Implemented as part of a quality improvement project, the elective aimed to guide medical students in developing app concepts applying a design thinking approach at a German medical school from January 2021 to January 2024. Topics included defining digital (mental) health, quality criteria for health apps, user perspective, persuasive design, and critical reflection on digitization in medical practice. The elective was offered 6 times within 36 months, with continuous evaluation and iterative optimization using both process and outcome measures, such as web-based questionnaires. We present examples of app concepts designed by students and summarize the quantitative and qualitative evaluation results.</p><p><strong>Results: </strong>In total, 60 students completed the elective and developed 25 health app concepts, most commonly targeting stress management and depression. In addition, disease management and prevention apps were designed for various somatic conditions such as diabetes and chronic pain. The results indicated high overall satisfaction across the 6 courses according to the evaluation questionnaire, with lower scores indicating higher satisfaction on a scale ranging from 1 to 6 (mean 1.70, SD 0.68). Students particularly valued the content, flexibility, support, and structure. While improvements in group work, submissions, and information transfer were suggested, the results underscore the usefulness of the web-based elective.</p><p><strong>Conclusions: </strong>This quality improvement project provides insights into relevant features for the successful user-centered and creative integration of mHealth competencies into m","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"10 ","pages":"e59454"},"PeriodicalIF":3.2,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11452754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142297394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Soo-Hyuk Yoon, Seok Kyeong Oh, Byung Gun Lim, Ho-Jin Lee
{"title":"Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study.","authors":"Soo-Hyuk Yoon, Seok Kyeong Oh, Byung Gun Lim, Ho-Jin Lee","doi":"10.2196/56859","DOIUrl":"10.2196/56859","url":null,"abstract":"<p><strong>Background: </strong>ChatGPT has been tested in health care, including the US Medical Licensing Examination and specialty exams, showing near-passing results. Its performance in the field of anesthesiology has been assessed using English board examination questions; however, its effectiveness in Korea remains unexplored.</p><p><strong>Objective: </strong>This study investigated the problem-solving performance of ChatGPT in the fields of anesthesiology and pain medicine in the Korean language context, highlighted advancements in artificial intelligence (AI), and explored its potential applications in medical education.</p><p><strong>Methods: </strong>We investigated the performance (number of correct answers/number of questions) of GPT-4, GPT-3.5, and CLOVA X in the fields of anesthesiology and pain medicine, using in-training examinations that have been administered to Korean anesthesiology residents over the past 5 years, with an annual composition of 100 questions. Questions containing images, diagrams, or photographs were excluded from the analysis. Furthermore, to assess the performance differences of the GPT across different languages, we conducted a comparative analysis of the GPT-4's problem-solving proficiency using both the original Korean texts and their English translations.</p><p><strong>Results: </strong>A total of 398 questions were analyzed. GPT-4 (67.8%) demonstrated a significantly better overall performance than GPT-3.5 (37.2%) and CLOVA-X (36.7%). However, GPT-3.5 and CLOVA X did not show significant differences in their overall performance. Additionally, the GPT-4 showed superior performance on questions translated into English, indicating a language processing discrepancy (English: 75.4% vs Korean: 67.8%; difference 7.5%; 95% CI 3.1%-11.9%; P=.001).</p><p><strong>Conclusions: </strong>This study underscores the potential of AI tools, such as ChatGPT, in medical education and practice but emphasizes the need for cautious application and further refinement, especially in non-English medical contexts. The findings suggest that although AI advancements are promising, they require careful evaluation and development to ensure acceptable performance across diverse linguistic and professional settings.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"10 ","pages":"e56859"},"PeriodicalIF":3.2,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11443200/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142297313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brenton T Bicknell, Danner Butler, Sydney Whalen, James Ricks, Cory J Dixon, Abigail B Clark, Olivia Spaedy, Adam Skelton, Neel Edupuganti, Lance Dzubinski, Hudson Tate, Garrett Dyess, Brenessa Lindeman, Lisa Soleymani Lehmann
{"title":"Critical Analysis of ChatGPT 4 Omni in USMLE Disciplines, Clinical Clerkships, and Clinical Skills.","authors":"Brenton T Bicknell, Danner Butler, Sydney Whalen, James Ricks, Cory J Dixon, Abigail B Clark, Olivia Spaedy, Adam Skelton, Neel Edupuganti, Lance Dzubinski, Hudson Tate, Garrett Dyess, Brenessa Lindeman, Lisa Soleymani Lehmann","doi":"10.2196/63430","DOIUrl":"10.2196/63430","url":null,"abstract":"<p><strong>Background: </strong>Recent studies, including those by the National Board of Medical Examiners (NBME), have highlighted the remarkable capabilities of recent large language models (LLMs) such as ChatGPT in passing the United States Medical Licensing Examination (USMLE). However, there is a gap in detailed analysis of these models' performance in specific medical content areas, thus limiting an assessment of their potential utility for medical education.</p><p><strong>Objective: </strong>To assess and compare the accuracy of successive ChatGPT versions (GPT-3.5, GPT-4, and GPT-4 Omni) in USMLE disciplines, clinical clerkships, and the clinical skills of diagnostics and management.</p><p><strong>Methods: </strong>This study used 750 clinical vignette-based multiple-choice questions (MCQs) to characterize the performance of successive ChatGPT versions [ChatGPT 3.5 (GPT-3.5), ChatGPT 4 (GPT-4), and ChatGPT 4 Omni (GPT-4o)] across USMLE disciplines, clinical clerkships, and in clinical skills (diagnostics and management). Accuracy was assessed using a standardized protocol, with statistical analyses conducted to compare the models' performances.</p><p><strong>Results: </strong>GPT-4o achieved the highest accuracy across 750 MCQs at 90.4%, outperforming GPT-4 and GPT-3.5, which scored 81.1% and 60.0% respectively. GPT-4o's highest performances were in social sciences (95.5%), behavioral and neuroscience (94.2%), and pharmacology (93.2%). In clinical skills, GPT-4o's diagnostic accuracy was 92.7% and management accuracy 88.8%, significantly higher than its predecessors. Notably, both GPT-4o and GPT-4 significantly outperformed the medical student average accuracy of 59.3% (95% CI: 58.3-60.3).</p><p><strong>Conclusions: </strong>ChatGPT 4 Omni's performance in USMLE preclinical content areas as well as clinical skills indicates substantial improvements over its predecessors, suggesting significant potential for the use of this technology as an educational aid for medical students. These findings underscore the necessity of careful consideration of LLMs' integration into medical education, emphasizing the importance of structured curricula to guide their appropriate use and the need for ongoing critical analyses to ensure their reliability and effectiveness.</p><p><strong>Clinicaltrial: </strong></p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142297314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}