{"title":"Accuracy of ChatGPT in answering cardiology board-style questions.","authors":"Albert Andrew","doi":"10.3352/jeehp.2025.22.9","DOIUrl":"10.3352/jeehp.2025.22.9","url":null,"abstract":"","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"9"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143517011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The role of large language models in the peer-review process: opportunities and challenges for medical journal reviewers and editors.","authors":"Jisoo Lee, Jieun Lee, Jeong-Ju Yoo","doi":"10.3352/jeehp.2025.22.4","DOIUrl":"10.3352/jeehp.2025.22.4","url":null,"abstract":"<p><p>The peer review process ensures the integrity of scientific research. This is particularly important in the medical field, where research findings directly impact patient care. However, the rapid growth of publications has strained reviewers, causing delays and potential declines in quality. Generative artificial intelligence, especially large language models (LLMs) such as ChatGPT, may assist researchers with efficient, high-quality reviews. This review explores the integration of LLMs into peer review, highlighting their strengths in linguistic tasks and challenges in assessing scientific validity, particularly in clinical medicine. Key points for integration include initial screening, reviewer matching, feedback support, and language review. However, implementing LLMs for these purposes will necessitate addressing biases, privacy concerns, and data confidentiality. We recommend using LLMs as complementary tools under clear guidelines to support, not replace, human expertise in maintaining rigorous peer review standards.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"4"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11952698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143693856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prut Saowaprut, Romen Samuel Wabina, Junwei Yang, Lertboon Siriwat
{"title":"Performance of large language models on Thailand’s national medical licensing examination: a cross-sectional study.","authors":"Prut Saowaprut, Romen Samuel Wabina, Junwei Yang, Lertboon Siriwat","doi":"10.3352/jeehp.2025.22.16","DOIUrl":"10.3352/jeehp.2025.22.16","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to evaluate the feasibility of general-purpose large language models (LLMs) in addressing inequities in medical licensure exam preparation for Thailand’s National Medical Licensing Examination (ThaiNLE), which currently lacks standardized public study materials.</p><p><strong>Methods: </strong>We assessed 4 multi-modal LLMs (GPT-4, Claude 3 Opus, Gemini 1.0/1.5 Pro) using a 304-question ThaiNLE Step 1 mock examination (10.2% image-based), applying deterministic API configurations and 5 inference repetitions per model. Performance was measured via micro- and macro-accuracy metrics compared against historical passing thresholds.</p><p><strong>Results: </strong>All models exceeded passing scores, with GPT-4 achieving the highest accuracy (88.9%; 95% confidence interval, 88.7–89.1), surpassing Thailand’s national average by more than 2 standard deviations. Claude 3.5 Sonnet (80.1%) and Gemini 1.5 Pro (72.8%) followed hierarchically. Models demonstrated robustness across 17 of 20 medical domains, but variability was noted in genetics (74.0%) and cardiovascular topics (58.3%). While models demonstrated proficiency with images (Gemini 1.0 Pro: +9.9% vs. text), text-only accuracy remained superior (GPT4o: 90.0% vs. 82.6%).</p><p><strong>Conclusion: </strong>General-purpose LLMs show promise as equitable preparatory tools for ThaiNLE Step 1. However, domain-specific knowledge gaps and inconsistent multi-modal integration warrant refinement before clinical deployment.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"16"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143986836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eun-Kyung Chung, Seok Hoon Kang, Do-Hoon Kim, MinJeong Kim, Ji-Hyun Seo, Keunmi Lee, Eui-Ryoung Han
{"title":"A nationwide survey on the curriculum and educational resources related to the Clinical Skills Test of the Korean Medical Licensing Examination: a cross-sectional descriptive study.","authors":"Eun-Kyung Chung, Seok Hoon Kang, Do-Hoon Kim, MinJeong Kim, Ji-Hyun Seo, Keunmi Lee, Eui-Ryoung Han","doi":"10.3352/jeehp.2025.22.11","DOIUrl":"10.3352/jeehp.2025.22.11","url":null,"abstract":"<p><strong>Purpose: </strong>The revised Clinical Skills Test (CST) of the Korean Medical Licensing Exam aims to provide a better assessment of physicians’ clinical competence and ability to interact with patients. This study examined the impact of the revised CST on medical education curricula and resources nationwide, while also identifying areas for improvement within the revised CST.</p><p><strong>Methods: </strong>This study surveyed faculty responsible for clinical clerkships at 40 medical schools throughout Korea to evaluate the status and changes in clinical skills education, assessment, and resources related to the CST. The researchers distributed the survey via email through regional consortia between December 7, 2023 and January 19, 2024.</p><p><strong>Results: </strong>Nearly all schools implemented preliminary student–patient encounters during core clinical rotations. Schools primarily conducted clinical skills assessments in the third and fourth years, with a simplified form introduced in the first and second years. Remedial education was conducted through various methods, including oneon-one feedback from faculty after the assessment. All schools established clinical skills centers and made ongoing improvements. Faculty members did not perceive the CST revisions as significantly altering clinical clerkship or skills assessments. They suggested several improvements, including assessing patient records to improve accuracy and increasing the objectivity of standardized patient assessments to ensure fairness.</p><p><strong>Conclusion: </strong>During the CST, students’ involvement in patient encounters and clinical skills education increased, improving the assessment and feedback processes for clinical skills within the curriculum. To enhance students’ clinical competencies and readiness, strengthening the validity and reliability of the CST is essential.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"11"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143617568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Verónica Illescas-Megías, Jorge Manuel Maqueda-Pérez, Dolores Domínguez-Pinos, Teodoro Rudolphi Solero, Francisco Sendra-Portero
{"title":"Radiotorax.es: a web-based tool for formative self-assessment in chest X-ray interpretation.","authors":"Verónica Illescas-Megías, Jorge Manuel Maqueda-Pérez, Dolores Domínguez-Pinos, Teodoro Rudolphi Solero, Francisco Sendra-Portero","doi":"10.3352/jeehp.2025.22.17","DOIUrl":"10.3352/jeehp.2025.22.17","url":null,"abstract":"<p><p>Radiotorax.es is a free, non-profit web-based tool designed to support formative self-assessment in chest X-ray interpretation. This article presents its structure, educational applications, and usage data from 11 years of continuous operation. Users complete interpretation rounds of 20 clinical cases, compare their reports with expert evaluations, and conduct a structured self-assessment. From 2011 to 2022, 14,389 users registered, and 7,726 completed at least one session. Most were medical students (75.8%), followed by residents (15.2%) and practicing physicians (9.0%). The platform has been integrated into undergraduate medical curricula and used in various educational contexts, including tutorials, peer and expert review, and longitudinal tracking. Its flexible design supports self-directed learning, instructor-guided use, and multicenter research. As a freely accessible resource based on real clinical cases, Radiotorax.es provides a scalable, realistic, and well-received training environment that promotes diagnostic skill development, reflection, and educational innovation in radiology education.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"17"},"PeriodicalIF":3.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144250213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeongwook Choi, Sung-Soo Jung, Eun Kwang Choi, Kyung Sik Kim, Dong Gi Seo
{"title":"Feasibility of applying computerized adaptive testing to the Clinical Medical Science Comprehensive Examination in Korea: a psychometric study.","authors":"Jeongwook Choi, Sung-Soo Jung, Eun Kwang Choi, Kyung Sik Kim, Dong Gi Seo","doi":"10.3352/jeehp.2025.22.29","DOIUrl":"https://doi.org/10.3352/jeehp.2025.22.29","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to investigate the feasibility of transitioning the Clinical Medical Science Comprehensive Examination (CMSCE) to computerized adaptive testing (CAT) in Korea, thereby providing greater opportunities for medical students to accurately compare their clinical competencies with peers nationwide and to monitor their own progress.</p><p><strong>Methods: </strong>A medical self-assessment using CAT was conducted from March to June 2023, involving 1,541 medical students who volunteered from 40 medical colleges in Korea. An item bank consisting of 1,145 items from previously administered CMSCE examinations (2019-2021) hosted by the Medical Education Assessment Corporation was established. Items were selected through 2-stage filtering, based on classical test theory (discrimination index above 0.15) and item response theory (discrimination parameter estimates above 0.6 and difficulty parameter estimates between -5 and +5). Maximum Fisher information was employed as the item selection method, and maximum likelihood estimation was used for ability estimation.</p><p><strong>Results: </strong>The CAT was successfully administered without significant issues. The stopping rule was set at a standard error of measurement of 0.25, with a maximum of 50 items for ability estimation. The mean ability score was 0.55, with an average of 28 items administered per student. Students at extreme ability levels reached the maximum of 50 items due to the limited availability of items at appropriate difficulty levels.</p><p><strong>Conclusion: </strong>The medical self-assessment CAT, the first of its kind in Korea, was successfully implemented nationwide without significant problems. These results indicate strong potential for expanding the use of CAT in medical education assessments.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"29"},"PeriodicalIF":3.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145201748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emma Dejean-Bouyer, Anoujat Kanlagna, François Thuau, Pierre Perrot, Ugo Lancien
{"title":"Performance of ChatGPT-4 on the French Board of Plastic Reconstructive and Aesthetic Surgery written exam: a descriptive study.","authors":"Emma Dejean-Bouyer, Anoujat Kanlagna, François Thuau, Pierre Perrot, Ugo Lancien","doi":"10.3352/jeehp.2025.22.27","DOIUrl":"https://doi.org/10.3352/jeehp.2025.22.27","url":null,"abstract":"<p><strong>Purpose: </strong>This study aims to evaluate the performance of Chat Generative Pre-Trained Transformer 4 (ChatGPT-4) on the French Board of Plastic, Reconstructive, and Aesthetic Surgery written examination and to assess its role as a supplementary resource in helping medical students prepare for the qualification examination in plastic surgery.</p><p><strong>Methods: </strong>This descriptive study evaluated ChatGPT-4's performance on 213 items from the October 2024 French Board of Plastic, Reconstructive, and Aesthetic Surgery written examination. Responses were assessed for accuracy, logical reasoning, internal and external information use, and were categorized for fallacies by independent reviewers. Statistical analyses included chi-square tests and Fisher's exact test for significance.</p><p><strong>Results: </strong>ChatGPT-4 answered all questions across the 10 modules, achieving an overall accuracy rate of 77.5%. The model applied logical reasoning in 98.1% of the questions, utilized internal information in 94.4%, and incorporated external information in 91.1%.</p><p><strong>Conclusion: </strong>ChatGPT-4 performs satisfactorily on the French Board of Plastic, Reconstructive, and Aesthetic Surgery written examination. Its accuracy met the minimum passing standards for the exam. While responses generally align with expected knowledge, careful verification remains necessary, particularly for questions involving image interpretation. As artificial intelligence continues to evolve, ChatGPT-4 is expected to become an increasingly reliable tool for medical education. At present, it remains a valuable resource for assisting plastic surgery residents in their training.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"27"},"PeriodicalIF":3.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Halted medical education and medical residents’ training in Korea, journal metrics, and appreciation to reviewers and volunteers","authors":"Sun Huh","doi":"10.3352/jeehp.2025.22.1","DOIUrl":"10.3352/jeehp.2025.22.1","url":null,"abstract":"","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"1"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11880820/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142980326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Esraa Saleh Abdelall, Shadi Mohammad Hamouri, Abdallah Fawaz Al Dwairi, Omar Mefleh Al-Araidah
{"title":"Correlation between a motion analysis method and Global Operative Assessment of Laparoscopic Skills for assessing interns' performance in a simulated peg transfer task in Jordan: a validation study","authors":"Esraa Saleh Abdelall, Shadi Mohammad Hamouri, Abdallah Fawaz Al Dwairi, Omar Mefleh Al-Araidah","doi":"10.3352/jeehp.2025.22.10","DOIUrl":"10.3352/jeehp.2025.22.10","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to validate the use of ProAnalyst (Xcitex Inc.), a program for professional motion analysts to assess the performance of surgical interns while performing the peg transfer task in a simulator box for safe practice in real minimally invasive surgery.</p><p><strong>Methods: </strong>A correlation study was conducted in a multidisciplinary skills simulation lab at the Faculty of Medicine, Jordan University of Science and Technology from October 2019 to February 2020. Forty-one interns (i.e., novices and intermediates) were recruited, and an expert surgeon participated as a reference benchmark. Videos of participants’ performance were analyzed using ProAnalyst and the Global Operative Assessment of Laparoscopic Skills (GOALS). The two sets of results were analyzed to identify correlations.</p><p><strong>Results: </strong>The motion analysis scores from Proanalyst were correlated with those from GOALS for efficiency (r=+0.38, P<0.05), autonomy (r=+0.63, P<0.01), depth perception (r=+0.43, P<0.05), dexterity (r=+0.71, P<0.001), and operation flow (r=+0.88, P<0.001). Both assessment methods differentiated the participants’ performance based on their experience level.</p><p><strong>Conclusion: </strong>The motion analysis scoring method using Proanalyst provides an objective, time-efficient, and reproducible assessment of interns’ performance, with results comparable to those obtained using GOALS. It may require initial training and set-up; however, it eliminates the need for expert surgeon judgment.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"10"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012728/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143568494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simulation-based teaching versus traditional small group teaching for first-year medical students among high and low scorers in respiratory physiology, India: a randomized controlled trial.","authors":"Nalini Yelahanka Channegowda, Dinker Ramanand Pai, Shivasakthy Manivasakan","doi":"10.3352/jeehp.2025.22.8","DOIUrl":"https://doi.org/10.3352/jeehp.2025.22.8","url":null,"abstract":"<p><strong>Purpose: </strong>Although it is widely utilized in clinical subjects for skill training, using simulation-based education (SBE) for teaching basic science concepts to phase I medical students or pre-clinical students is limited. Simulation-based education/teaching is preferred in cardiovascular and respiratory physiology when compared to other systems because it is easy to recreate both the normal physiological component and alterations in the simulated environment, thus a promoting deep understanding of the core concepts.</p><p><strong>Methods: </strong>A block randomized study was conducted among 107 phase 1 (first-year) medical undergraduate students at a Deemed to be University in India. Group A received SBE and Group B traditional small group teaching. The effectiveness of the teaching intervention was assessed using pre- and post-tests. Student feedback was obtained through a self administered structured questionnaire via an anonymous online survey and by in-depth interview.</p><p><strong>Results: </strong>The intervention group showed a statistically significant improvement in post-test scores compared to the control group. A sub-analysis revealed that high scorers performed better than low scorers in both groups, but the knowledge gain among low scorers was more significant in the intervention group.</p><p><strong>Conclusion: </strong>This teaching strategy offers a valuable supplement to traditional methods, fostering a deeper comprehension of clinical concepts from the outset of medical training.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"8"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144006065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}