Arum Choi, Hyun Gi Kim, Moon Hyung Choi, Shakthi Kumaran Ramasamy, Youme Kim, Seung Eun Jung
{"title":"Performance of GPT-4 Turbo and GPT-4o in Korean Society of Radiology In-Training Examinations.","authors":"Arum Choi, Hyun Gi Kim, Moon Hyung Choi, Shakthi Kumaran Ramasamy, Youme Kim, Seung Eun Jung","doi":"10.3348/kjr.2024.1096","DOIUrl":"10.3348/kjr.2024.1096","url":null,"abstract":"<p><strong>Objective: </strong>Despite the potential of large language models for radiology training, their ability to handle image-based radiological questions remains poorly understood. This study aimed to evaluate the performance of the GPT-4 Turbo and GPT-4o in radiology resident examinations, to analyze differences across question types, and to compare their results with those of residents at different levels.</p><p><strong>Materials and methods: </strong>A total of 776 multiple-choice questions from the Korean Society of Radiology In-Training Examinations were used, forming two question sets: one originally written in Korean and the other translated into English. We evaluated the performance of GPT-4 Turbo (gpt-4-turbo-2024-04-09) and GPT-4o (gpt-4o-2024-11-20) on these questions with the temperature set to zero, determining the accuracy based on the majority vote from five independent trials. We analyzed their results using the question type (text-only vs. image-based) and benchmarked them against nationwide radiology residents' performance. The impact of the input language (Korean or English) on model performance was examined.</p><p><strong>Results: </strong>GPT-4o outperformed GPT-4 Turbo for both image-based (48.2% vs. 41.8%, <i>P</i> = 0.002) and text-only questions (77.9% vs. 69.0%, <i>P</i> = 0.031). On image-based questions, GPT-4 Turbo and GPT-4o showed comparable performance to that of 1st-year residents (41.8% and 48.2%, respectively, vs. 43.3%, <i>P</i> = 0.608 and 0.079, respectively) but lower performance than that of 2nd- to 4th-year residents (vs. 56.0%-63.9%, all <i>P</i> ≤ 0.005). For text-only questions, GPT-4 Turbo and GPT-4o performed better than residents across all years (69.0% and 77.9%, respectively, vs. 44.7%-57.5%, all <i>P</i> ≤ 0.039). Performance on the English- and Korean-version questions showed no significant differences for either model (all <i>P</i> ≥ 0.275).</p><p><strong>Conclusion: </strong>GPT-4o outperformed the GPT-4 Turbo in all question types. On image-based questions, both models' performance matched that of 1st-year residents but was lower than that of higher-year residents. Both models demonstrated superior performance compared to residents for text-only questions. The models showed consistent performances across English and Korean inputs.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"524-531"},"PeriodicalIF":4.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144024638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seong Ho Park, Geraldine Dean, Ernest Montañà Ortiz, Joon-Il Choi
{"title":"Overview of South Korean Guidelines for Approval of Large Language or Multimodal Models as Medical Devices: Key Features and Areas for Improvement.","authors":"Seong Ho Park, Geraldine Dean, Ernest Montañà Ortiz, Joon-Il Choi","doi":"10.3348/kjr.2025.0257","DOIUrl":"10.3348/kjr.2025.0257","url":null,"abstract":"","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"519-523"},"PeriodicalIF":4.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144032417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bio Joo, Hyung Jun Park, Mina Park, Sang Hyun Suh, Sung Jun Ahn
{"title":"Response to \"MRI Morphometry of the Spinal Cord Depends on Several Factors That Must Be Taken Into Account When Selecting Healthy Volunteers\".","authors":"Bio Joo, Hyung Jun Park, Mina Park, Sang Hyun Suh, Sung Jun Ahn","doi":"10.3348/kjr.2025.0190","DOIUrl":"10.3348/kjr.2025.0190","url":null,"abstract":"","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"622-623"},"PeriodicalIF":4.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143764247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leehi Joo, Jung Hwan Baek, Jungbok Lee, Dong Eun Song, Sae Rom Chung, Young Jun Choi, Jeong Hyun Lee
{"title":"Superior Diagnostic Yield of Core Needle Biopsy Over Fine Needle Aspiration in Diagnosing Follicular-Patterned Neoplasms: A Multicenter Study Focusing on Bethesda IV Results.","authors":"Leehi Joo, Jung Hwan Baek, Jungbok Lee, Dong Eun Song, Sae Rom Chung, Young Jun Choi, Jeong Hyun Lee","doi":"10.3348/kjr.2024.1022","DOIUrl":"10.3348/kjr.2024.1022","url":null,"abstract":"<p><strong>Objective: </strong>To compare the diagnostic outcomes of core needle biopsy (CNB) and fine-needle aspiration (FNA) using Bethesda IV as a test-positive criterion for diagnosing follicular-patterned neoplasms in a large multicenter cohort.</p><p><strong>Materials and methods: </strong>This retrospective study included 5463 thyroid nodules ≥1 cm from 4883 patients (4019 females, 864 males; mean age 53.8 years) that underwent FNA or CNB across 26 hospitals in Korea between June and September 2015. The final diagnosis in cases diagnosed as Bethesda IV (follicular neoplasm) in biopsies were confirmed by surgical pathology. The primary study outcome was the diagnostic yield, defined as the proportion of nodules with follicular-patterned neoplasms confirmed at surgery after receiving Bethesda IV results on biopsy (FNA or CNB), among all that underwent biopsy. Secondary outcomes included false referral rate (FRR) and positive predictive value (PPV). All nodules were analyzed before matching (823 and 4640 nodules for CNB and FNA, respectively) and after nodule matching in a 1:2 ratio (799 and 1571 nodules, respectively) according to age, sex, nodule size, and Korean Thyroid Imaging Reporting and Data System (K-TIRADS) category. Additionally, the diagnostic yields of various histological subtypes of follicular-patterned neoplasms and nodule subgroups were analyzed.</p><p><strong>Results: </strong>CNB demonstrated a significantly higher diagnostic yield than FNA both before (9.0% vs. 0.5%; <i>P</i> < 0.001) and after matching (9.0% vs. 0.6%; <i>P</i> < 0.001). CNB consistently had higher diagnostic yields than FNA for most histological subtypes and all subgroups. FRR was not significantly different between the CNB and FNA groups after matching (0.4% vs. 0.1%; <i>P</i> = 0.337). The PPV was consistently greater than 90% for both methods, with no significant difference.</p><p><strong>Conclusion: </strong>CNB had a higher diagnostic yield than FNA for follicular-patterned neoplasms, with no significant difference in FRR using Bethesda IV as the test-positive criterion.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"604-615"},"PeriodicalIF":4.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144033586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Imaging of Peripheral Arthritis: Special Focus on Differences in Inflammatory Lesions Between Rheumatoid Arthritis and Psoriatic Arthritis.","authors":"Takeshi Fukuda, Akira Ogihara, Shunsuke Kisaki, Mami Momose, Yoshinori Umezawa, Akihiko Asahina, Hiroya Ojiri","doi":"10.3348/kjr.2025.0036","DOIUrl":"10.3348/kjr.2025.0036","url":null,"abstract":"<p><p>Differentiating rheumatoid arthritis (RA) and psoriatic arthritis (PsA) remains challenging, particularly when clinical and serological markers are inconclusive. Imaging provides critical insights, with MRI and dual-energy CT iodine maps highlighting key distinctions. Both conditions share inflammatory features such as capsular synovitis, tenosynovitis, and bone marrow edema. However, periarticular inflammation is often a strong indicator of PsA. This reflects their differing inflammatory targets: RA primarily involves the synovium, whereas PsA targets the enthesis. This distinction contributes to the broader bone marrow edema seen in PsA and explains inflammatory changes at the distal interphalangeal joint and dactylitis, which are characteristic of PsA but not RA. Recognizing these inflammatory patterns and distributions is essential for accurate diagnosis and treatment guidance.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"569-580"},"PeriodicalIF":4.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144033594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Myoung Kyoung Kim, Min Su Park, Min Gyu Go, Jeong Eon Lee, Jong Han Yu, Boo-Kyung Han, Eun Young Ko, Ji Soo Choi, Jeongmin Lee, Haejung Kim, Yeon Hee Park, Eun Sook Ko
{"title":"Surveillance Outcomes by Imaging Methods in the First 5 Years After Breast Cancer Surgery.","authors":"Myoung Kyoung Kim, Min Su Park, Min Gyu Go, Jeong Eon Lee, Jong Han Yu, Boo-Kyung Han, Eun Young Ko, Ji Soo Choi, Jeongmin Lee, Haejung Kim, Yeon Hee Park, Eun Sook Ko","doi":"10.3348/kjr.2024.1101","DOIUrl":"10.3348/kjr.2024.1101","url":null,"abstract":"<p><strong>Objective: </strong>To compare the outcomes of imaging methods (mammography alone, ultrasound [US] alone, mammography combined with US, and magnetic resonance imaging [MRI]-based examination) for surveillance during the first 5 years after breast cancer surgery.</p><p><strong>Materials and methods: </strong>This retrospective cohort study analyzed the medical records of patients who underwent breast cancer surgery at a single institution between January 2011 and December 2015. Imaging surveillance was performed at 6-month or 1-year intervals during the first 5 years.</p><p><strong>Results: </strong>A total of 6371 women (median age, 49 years; age range, 20-90 years) underwent 28199 mammograms, 42759 US, and 2619 MRI examinations. Of 172 second breast cancer diagnoses, 19 (11.0%) were interval cancers. Mammography combined with US demonstrated higher cancer detection rate (CDR) compared to mammography alone (odds ratios [OR] = 3.31, 95% confidence interval [CI]: 1.52-8.96, <i>P</i> = 0.009) and US alone (OR = 2.80, 95% CI: 1.71-4.65, <i>P</i> < 0.001), whereas there was no statistical significance when compared with MRI-based examinations (OR = 0.89, 95% CI: 0.49-1.74, <i>P</i> > 0.999). A statistically significant interaction was observed between the mammographic breast density (MBD) and CDR of the imaging methods (<i>P</i> for interaction = 0.003).</p><p><strong>Conclusion: </strong>The CDR of surveillance mammography combined with US was comparable with that of MRI-based examinations in an intensive surveillance setting. Considering the significant interaction between MBD and the CDR, a tailored approach for surveillance based on breast density is warranted.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"532-545"},"PeriodicalIF":4.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144033541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MRI Morphometry of the Spinal Cord Depends on Several Factors That Must Be Taken Into Account When Selecting Healthy Volunteers.","authors":"Josef Finsterer","doi":"10.3348/kjr.2025.0153","DOIUrl":"10.3348/kjr.2025.0153","url":null,"abstract":"","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"620-621"},"PeriodicalIF":4.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143764243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hokun Kim, Bohyun Kim, Moon Hyung Choi, Joon-Il Choi, Soon Nam Oh, Sung Eun Rha
{"title":"Conversion of Mixed-Language Free-Text CT Reports of Pancreatic Cancer to National Comprehensive Cancer Network Structured Reporting Templates by Using GPT-4.","authors":"Hokun Kim, Bohyun Kim, Moon Hyung Choi, Joon-Il Choi, Soon Nam Oh, Sung Eun Rha","doi":"10.3348/kjr.2024.1228","DOIUrl":"10.3348/kjr.2024.1228","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the feasibility of generative pre-trained transformer-4 (GPT-4) in generating structured reports (SRs) from mixed-language (English and Korean) narrative-style CT reports for pancreatic ductal adenocarcinoma (PDAC) and to assess its accuracy in categorizing PDCA resectability.</p><p><strong>Materials and methods: </strong>This retrospective study included consecutive free-text reports of pancreas-protocol CT for staging PDAC, from two institutions, written in English or Korean from January 2021 to December 2023. Both the GPT-4 Turbo and GPT-4o models were provided prompts along with the free-text reports via an application programming interface and tasked with generating SRs and categorizing tumor resectability according to the National Comprehensive Cancer Network guidelines version 2.2024. Prompts were optimized using the GPT-4 Turbo model and 50 reports from Institution B. The performances of the GPT-4 Turbo and GPT-4o models in the two tasks were evaluated using 115 reports from Institution A. Results were compared with a reference standard that was manually derived by an abdominal radiologist. Each report was consecutively processed three times, with the most frequent response selected as the final output. Error analysis was guided by the decision rationale provided by the models.</p><p><strong>Results: </strong>Of the 115 narrative reports tested, 96 (83.5%) contained both English and Korean. For SR generation, GPT-4 Turbo and GPT-4o demonstrated comparable accuracies (92.3% [1592/1725] and 92.2% [1590/1725], respectively; <i>P</i> = 0.923). In the resectability categorization, GPT-4 Turbo showed higher accuracy than GPT-4o (81.7% [94/115] vs. 67.0% [77/115], respectively; <i>P</i> = 0.002). In the error analysis of GPT-4 Turbo, the SR generation error rate was 7.7% (133/1725 items), which was primarily attributed to inaccurate data extraction (54.1% [72/133]). The resectability categorization error rate was 18.3% (21/115), with the main cause being violation of the resectability criteria (61.9% [13/21]).</p><p><strong>Conclusion: </strong>Both GPT-4 Turbo and GPT-4o demonstrated acceptable accuracy in generating NCCN-based SRs on PDACs from mixed-language narrative reports. However, oversight by human radiologists is essential for determining resectability based on CT findings.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"557-568"},"PeriodicalIF":4.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144007794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Response to \"2025 Korean Society of Abdominal Radiology Recommendations on Gallbladder Polyps and Gallbladder Wall Thickening Warrant Further Investigation and Clarification\".","authors":"Jeong Hee Yoon","doi":"10.3348/kjr.2025.0216","DOIUrl":"10.3348/kjr.2025.0216","url":null,"abstract":"","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"516-517"},"PeriodicalIF":4.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12055274/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143764245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}