{"title":"Accuracy of ChatGPT in answering cardiology board-style questions.","authors":"Albert Andrew","doi":"10.3352/jeehp.2025.22.9","DOIUrl":"10.3352/jeehp.2025.22.9","url":null,"abstract":"","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"9"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143517011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prut Saowaprut, Romen Samuel Wabina, Junwei Yang, Lertboon Siriwat
{"title":"Performance of large language models on Thailand's national medical licensing examination: a cross-sectional study.","authors":"Prut Saowaprut, Romen Samuel Wabina, Junwei Yang, Lertboon Siriwat","doi":"10.3352/jeehp.2025.22.16","DOIUrl":"10.3352/jeehp.2025.22.16","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to evaluate the feasibility of general-purpose large language models (LLMs) in addressing inequities in medical licensure exam preparation for Thailand's National Medical Licensing Examination (ThaiNLE), which currently lacks standardized public study materials.</p><p><strong>Methods: </strong>We assessed 4 multi-modal LLMs (GPT-4, Claude 3 Opus, Gemini 1.0/1.5 Pro) using a 304-question ThaiNLE Step 1 mock examination (10.2% image-based), applying deterministic API configurations and 5 inference repetitions per model. Performance was measured via micro- and macro-accuracy metrics compared against historical passing thresholds.</p><p><strong>Results: </strong>All models exceeded passing scores, with GPT-4 achieving the highest accuracy (88.9%; 95% confidence interval, 88.7-89.1), surpassing Thailand's national average by more than 2 standard deviations. Claude 3.5 Sonnet (80.1%) and Gemini 1.5 Pro (72.8%) followed hierarchically. Models demonstrated robustness across 17 of 20 medical domains, but variability was noted in genetics (74.0%) and cardiovascular topics (58.3%). While models demonstrated proficiency with images (Gemini 1.0 Pro: +9.9% vs. text), text-only accuracy remained superior (GPT-4o: 90.0% vs. 82.6%).</p><p><strong>Conclusion: </strong>General-purpose LLMs show promise as equitable preparatory tools for ThaiNLE Step 1. However, domain-specific knowledge gaps and inconsistent multi-modal integration warrant refinement before clinical deployment.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"16"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143986836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The role of large language models in the peer-review process: opportunities and challenges for medical journal reviewers and editors.","authors":"Jisoo Lee, Jieun Lee, Jeong-Ju Yoo","doi":"10.3352/jeehp.2025.22.4","DOIUrl":"10.3352/jeehp.2025.22.4","url":null,"abstract":"<p><p>The peer review process ensures the integrity of scientific research. This is particularly important in the medical field, where research findings directly impact patient care. However, the rapid growth of publications has strained reviewers, causing delays and potential declines in quality. Generative artificial intelligence, especially large language models (LLMs) such as ChatGPT, may assist researchers with efficient, high-quality reviews. This review explores the integration of LLMs into peer review, highlighting their strengths in linguistic tasks and challenges in assessing scientific validity, particularly in clinical medicine. Key points for integration include initial screening, reviewer matching, feedback support, and language review. However, implementing LLMs for these purposes will necessitate addressing biases, privacy concerns, and data confidentiality. We recommend using LLMs as complementary tools under clear guidelines to support, not replace, human expertise in maintaining rigorous peer review standards.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"4"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11952698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143693856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eun-Kyung Chung, Seok Hoon Kang, Do-Hoon Kim, MinJeong Kim, Ji-Hyun Seo, Keunmi Lee, Eui-Ryoung Han
{"title":"A nationwide survey on the curriculum and educational resources related to the Clinical Skills Test of the Korean Medical Licensing Examination: a cross-sectional descriptive study.","authors":"Eun-Kyung Chung, Seok Hoon Kang, Do-Hoon Kim, MinJeong Kim, Ji-Hyun Seo, Keunmi Lee, Eui-Ryoung Han","doi":"10.3352/jeehp.2025.22.11","DOIUrl":"10.3352/jeehp.2025.22.11","url":null,"abstract":"<p><strong>Purpose: </strong>The revised Clinical Skills Test (CST) of the Korean Medical Licensing Exam aims to provide a better assessment of physicians’ clinical competence and ability to interact with patients. This study examined the impact of the revised CST on medical education curricula and resources nationwide, while also identifying areas for improvement within the revised CST.</p><p><strong>Methods: </strong>This study surveyed faculty responsible for clinical clerkships at 40 medical schools throughout Korea to evaluate the status and changes in clinical skills education, assessment, and resources related to the CST. The researchers distributed the survey via email through regional consortia between December 7, 2023 and January 19, 2024.</p><p><strong>Results: </strong>Nearly all schools implemented preliminary student–patient encounters during core clinical rotations. Schools primarily conducted clinical skills assessments in the third and fourth years, with a simplified form introduced in the first and second years. Remedial education was conducted through various methods, including oneon-one feedback from faculty after the assessment. All schools established clinical skills centers and made ongoing improvements. Faculty members did not perceive the CST revisions as significantly altering clinical clerkship or skills assessments. They suggested several improvements, including assessing patient records to improve accuracy and increasing the objectivity of standardized patient assessments to ensure fairness.</p><p><strong>Conclusion: </strong>During the CST, students’ involvement in patient encounters and clinical skills education increased, improving the assessment and feedback processes for clinical skills within the curriculum. To enhance students’ clinical competencies and readiness, strengthening the validity and reliability of the CST is essential.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"11"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143617568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Halted medical education and medical residents’ training in Korea, journal metrics, and appreciation to reviewers and volunteers","authors":"Sun Huh","doi":"10.3352/jeehp.2025.22.1","DOIUrl":"10.3352/jeehp.2025.22.1","url":null,"abstract":"","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"1"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11880820/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142980326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Presidential address 2025: expansion of computer-based testing from 12 to 27 health professions by 2027 and adoption of a large language model for item generation","authors":"Hyunjoo Pai","doi":"10.3352/jeehp.2025.22.7","DOIUrl":"10.3352/jeehp.2025.22.7","url":null,"abstract":"","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"7"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934035/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143013023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simulation-based teaching versus traditional small group teaching for first-year medical students among high and low scorers in respiratory physiology, India: a randomized controlled trial.","authors":"Nalini Yelahanka Channegowda, Dinker Ramanand Pai, Shivasakthy Manivasakan","doi":"10.3352/jeehp.2025.22.8","DOIUrl":"https://doi.org/10.3352/jeehp.2025.22.8","url":null,"abstract":"<p><strong>Purpose: </strong>Although it is widely utilized in clinical subjects for skill training, using simulation-based education (SBE) for teaching basic science concepts to phase I medical students or pre-clinical students is limited. Simulation-based education/teaching is preferred in cardiovascular and respiratory physiology when compared to other systems because it is easy to recreate both the normal physiological component and alterations in the simulated environment, thus a promoting deep understanding of the core concepts.</p><p><strong>Methods: </strong>A block randomized study was conducted among 107 phase 1 (first-year) medical undergraduate students at a Deemed to be University in India. Group A received SBE and Group B traditional small group teaching. The effectiveness of the teaching intervention was assessed using pre- and post-tests. Student feedback was obtained through a self administered structured questionnaire via an anonymous online survey and by in-depth interview.</p><p><strong>Results: </strong>The intervention group showed a statistically significant improvement in post-test scores compared to the control group. A sub-analysis revealed that high scorers performed better than low scorers in both groups, but the knowledge gain among low scorers was more significant in the intervention group.</p><p><strong>Conclusion: </strong>This teaching strategy offers a valuable supplement to traditional methods, fostering a deeper comprehension of clinical concepts from the outset of medical training.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"8"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144006065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Esraa Saleh Abdelall, Shadi Mohammad Hamouri, Abdallah Fawaz Al Dwairi, Omar Mefleh Al-Araidah
{"title":"Correlation between a motion analysis method and Global Operative Assessment of Laparoscopic Skills for assessing interns' performance in a simulated peg transfer task in Jordan: a validation study","authors":"Esraa Saleh Abdelall, Shadi Mohammad Hamouri, Abdallah Fawaz Al Dwairi, Omar Mefleh Al-Araidah","doi":"10.3352/jeehp.2025.22.10","DOIUrl":"10.3352/jeehp.2025.22.10","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to validate the use of ProAnalyst (Xcitex Inc.), a program for professional motion analysts to assess the performance of surgical interns while performing the peg transfer task in a simulator box for safe practice in real minimally invasive surgery.</p><p><strong>Methods: </strong>A correlation study was conducted in a multidisciplinary skills simulation lab at the Faculty of Medicine, Jordan University of Science and Technology from October 2019 to February 2020. Forty-one interns (i.e., novices and intermediates) were recruited, and an expert surgeon participated as a reference benchmark. Videos of participants’ performance were analyzed using ProAnalyst and the Global Operative Assessment of Laparoscopic Skills (GOALS). The two sets of results were analyzed to identify correlations.</p><p><strong>Results: </strong>The motion analysis scores from Proanalyst were correlated with those from GOALS for efficiency (r=+0.38, P<0.05), autonomy (r=+0.63, P<0.01), depth perception (r=+0.43, P<0.05), dexterity (r=+0.71, P<0.001), and operation flow (r=+0.88, P<0.001). Both assessment methods differentiated the participants’ performance based on their experience level.</p><p><strong>Conclusion: </strong>The motion analysis scoring method using Proanalyst provides an objective, time-efficient, and reproducible assessment of interns’ performance, with results comparable to those obtained using GOALS. It may require initial training and set-up; however, it eliminates the need for expert surgeon judgment.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"10"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012728/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143568494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masoud Dauda, Swabaha Aidarus Yusuph, Harouni Yasini, Issa Mmbaga, Perpetua Mwambinngu, Hansol Park, Gyeongbae Seo, Kyoung Kyun Oh
{"title":"Empirical effect of the Dr LEE Jong-wook Fellowship Program to empower sustainable change for the health workforce in Tanzania: a mixed-methods study","authors":"Masoud Dauda, Swabaha Aidarus Yusuph, Harouni Yasini, Issa Mmbaga, Perpetua Mwambinngu, Hansol Park, Gyeongbae Seo, Kyoung Kyun Oh","doi":"10.3352/jeehp.2025.22.6","DOIUrl":"10.3352/jeehp.2025.22.6","url":null,"abstract":"<p><strong>Purpose: </strong>This study evaluated the Dr LEE Jong-wook Fellowship Program’s impact on Tanzania’s health workforce, focusing on relevance, effectiveness, efficiency, impact, and sustainability in addressing healthcare gaps.</p><p><strong>Methods: </strong>A mixed-methods research design was employed. Data were collected from 97 out of 140 alumni through an online survey, 35 in-depth interviews, and one focus group discussion. The study was conducted from November to December 2023 and included alumni from 2009 to 2022. Measurement instruments included structured questionnaires for quantitative data and semi-structured guides for qualitative data. Quantitative analysis involved descriptive and inferential statistics (Spearman’s rank correlation, non-parametric tests) using Python ver. 3.11.0 and Stata ver. 14.0. Thematic analysis was employed to analyze qualitative data using NVivo ver. 12.0.</p><p><strong>Results: </strong>Findings indicated high relevance (mean=91.6, standard deviation [SD]=8.6), effectiveness (mean=86.1, SD=11.2), efficiency (mean=82.7, SD=10.2), and impact (mean=87.7, SD=9.9), with improved skills, confidence, and institutional service quality. However, sustainability had a lower score (mean=58.0, SD=11.1), reflecting challenges in follow-up support and resource allocation. Effectiveness strongly correlated with impact (ρ=0.746, P<0.001). The qualitative findings revealed that participants valued tailored training but highlighted barriers, such as language challenges and insufficient practical components. Alumni-led initiatives contributed to knowledge sharing, but limited resources constrained sustainability.</p><p><strong>Conclusion: </strong>The Fellowship Program enhanced Tanzania’s health workforce capacity, but it requires localized curricula and strengthened alumni networks for sustainability. These findings provide actionable insights for improving similar programs globally, confirming the hypothesis that tailored training positively influences workforce and institutional outcomes.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"6"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12003955/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143013022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empathy and tolerance of ambiguity in medical students and doctors participating in art-based observational training at the Rijksmuseum in Amsterdam, the Netherlands: a before-and-after study","authors":"Stella Anna Bult, Thomas van Gulik","doi":"10.3352/jeehp.2025.22.3","DOIUrl":"10.3352/jeehp.2025.22.3","url":null,"abstract":"<p><strong>Purpose: </strong>This research presents an experimental study using validated questionnaires to quantitatively assess the outcomes of art-based observational training in medical students, residents, and specialists. The study tested the hypothesis that art-based observational training would lead to measurable effects on judgement skills (tolerance of ambiguity) and empathy in medical students and doctors.</p><p><strong>Methods: </strong>An experimental cohort study with pre- and post-intervention assessments was conducted using validated questionnaires and qualitative evaluation forms to examine the outcomes of art-based observational training in medical students and doctors. Between December 2023 and June 2024, 15 art courses were conducted in the Rijksmuseum in Amsterdam. Participants were assessed on empathy using the Jefferson Scale of Empathy (JSE) and tolerance of ambiguity using the Tolerance of Ambiguity in Medical Students and Doctors (TAMSAD) scale.</p><p><strong>Results: </strong>In total, 91 participants were included; 29 participants completed the JSE and 62 completed the TAMSAD scales. The results showed statistically significant post-test increases for mean JSE and TAMSAD scores (3.71 points for the JSE, ranging from 20 to 140, and 1.86 points for the TAMSAD, ranging from 0 to 100). The qualitative findings were predominantly positive.</p><p><strong>Conclusion: </strong>The results suggest that incorporating art-based observational training in medical education improves empathy and tolerance of ambiguity. This study highlights the importance of art-based observational training in medical education in the professional development of medical students and doctors.</p>","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"22 ","pages":"3"},"PeriodicalIF":9.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11880821/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142980319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}