Applications and Implications of ChatGPT and GPT-4 in Radiology

iRadiology Pub Date : 2025-07-31 DOI:10.1002/ird3.70031

Bo Gao, Weihua Ou

{"title":"Applications and Implications of ChatGPT and GPT-4 in Radiology","authors":"Bo Gao, Weihua Ou","doi":"10.1002/ird3.70031","DOIUrl":null,"url":null,"abstract":"Rapid advancements in artificial intelligence (AI) technology have resulted in the emergence of state-of-the-art large language models (LLMs) such as ChatGPT and GPT-4. Originally designed for natural language processing, these models are now being applied to increasingly broader domains, particularly in medical image processing [1]. Concurrently, the rise of such models has introduced innovative tools into medical image processing and diagnosis, profoundly shaping the future trajectory of this field. These tools not only enhance diagnostic accuracy and efficiency, but also alleviate substantial repetitive workloads for clinicians [2]. To address the critical needs for transparency, reproducibility, and clinical reliability in biomedical AI research, Gallifant et al. [3] proposed Transparent Reporting of a prediction model for Individual Prognosis or Diagnosis-LLM, an extension to the Transparent Reporting of a prediction model for Individual Prognosis or Diagnosis + artificial intelligence statement. In a domain-specific innovation, Liu et al. [4] developed Radiology-GPT through training and fine-tuning on a massive radiology knowledge corpus. In comparison with general-purpose LLMs, this specialized model demonstrated superior performance, validating the feasibility of creating localized generative models for specific medical specialties. Complementing this work, Yuan et al. [5] systematically evaluated the capabilities of the advanced multimodal model ChatGPT-4V for diagnosing brain tumors on 7T magnetic resonance imaging (MRI) data. Their study established a benchmark framework for ultra-high field imaging AI applications, propelling the progress of precision medicine and intelligent diagnostics.This special issue on ChatGPT and GPT-4 includes four recent studies that cover applications of different LLMs, such as Meta LLaMA 3.1, ChatGPT, Claude, Gemini, and LLaVA, in various medical scenarios. Yuan et al. [6] deeply explored the application of the Transformer architecture in natural language processing of chest X-ray reports, finding that this architecture holds significant potential in medical text processing. However, computational efficiency and ethical compliance require optimization, and future integration with multimodal data is needed to enhance diagnostic accuracy. Lotfian et al. [7] evaluated the performance of the open-source model LLaMA 3.1 in thoracic imaging diagnostics using 126 multiple-choice questions. The model achieved an overall accuracy of 61.1%, with excellent performance in intensive care (90%) and terminology recognition (83.3%) but weaker results in basic imaging (40%) and lung cancer diagnosis (33.3%). This assessment demonstrates the potential of open-source models like LLaMA 3.1 while highlighting the need for domain-specific fine-tuning to improve stability as well as the need to balance open-source flexibility with proprietary model reliability in clinical applications. Sarangi et al. [8] compared six models, including ChatGPT, Claude, and Gemini, evaluating the accuracy and readability (Flesch–Kincaid metrics) of their generated plain language summaries for 100 radiology abstracts. They found that ChatGPT had the highest accuracy, while Claude performed best in readability. However, manual review is necessary to ensure accuracy and avoid misinterpretation or omission of technical terms.Beyond concerns about the performance of these LLMs, we also care about the attitudes of healthcare providers toward them. Lecler et al. [9] argue that these LLMs will not replace radiologists but will serve as adjunctive tools to enhance diagnostic efficiency and accuracy, necessitating collaborative work between the two. Through interview-based surveys, He et al. [10] found that 67.4%–94.3% of respondents believe that AI can improve clinical accuracy, and 64% agree that it enhances efficiency. They also found that 71.1% believe AI will not significantly impact job security, with some residents more inclined to choose radiology as a career because of the availability of AI assistance. Only 55.6% of respondents were familiar with AI, whereas over 80% supported incorporating AI into resident training curricula. Meanwhile, Perera et al. [11] advocate integrating ChatGPT into radiology training to cultivate physicians' ability to collaborate with AI while strengthening critical thinking. To promote the adoption of LLMs by radiologists, Kim et al. [12] demonstrated that a 10-min structured LLM tutorial significantly improved radiology residents' performance and confidence in differential diagnosis on brain MRI. Such low-cost educational interventions could serve as a key strategy to facilitate the safe and effective application of LLMs in medicine.LLMs, such as ChatGPT and GPT-4, are exerting a profound influence on the development of medical imaging. Particularly in the context of big data, these models and technologies are driving medical imaging's transition from an “experience-driven” to a “data-driven” paradigm [13]. LLMs represented by ChatGPT will usher in a new phase for medical imaging development and elevate it to new heights.Bo Gao: review and editing (lead). Weihua Ou: writing – original draft (lead).The authors have nothing to report.The authors have nothing to report.This article belongs to a special issue (SI)—Application of ChatGPT/GPT-4 for Radiology. As the journal's Executive Editor-in-Chief and SI's guest editor, to minimize bias, Professor Bo Gao was excluded from all the editorial decisions related to the publication of this article. As the SI's guest editor, to minimize bias, Professor Weihua Ou was also excluded from all editorial decision-making related to the acceptance of this article for publication.","PeriodicalId":73508,"journal":{"name":"iRadiology","volume":"3 4","pages":"259-260"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ird3.70031","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"iRadiology","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ird3.70031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Rapid advancements in artificial intelligence (AI) technology have resulted in the emergence of state-of-the-art large language models (LLMs) such as ChatGPT and GPT-4. Originally designed for natural language processing, these models are now being applied to increasingly broader domains, particularly in medical image processing [1]. Concurrently, the rise of such models has introduced innovative tools into medical image processing and diagnosis, profoundly shaping the future trajectory of this field. These tools not only enhance diagnostic accuracy and efficiency, but also alleviate substantial repetitive workloads for clinicians [2]. To address the critical needs for transparency, reproducibility, and clinical reliability in biomedical AI research, Gallifant et al. [3] proposed Transparent Reporting of a prediction model for Individual Prognosis or Diagnosis-LLM, an extension to the Transparent Reporting of a prediction model for Individual Prognosis or Diagnosis + artificial intelligence statement. In a domain-specific innovation, Liu et al. [4] developed Radiology-GPT through training and fine-tuning on a massive radiology knowledge corpus. In comparison with general-purpose LLMs, this specialized model demonstrated superior performance, validating the feasibility of creating localized generative models for specific medical specialties. Complementing this work, Yuan et al. [5] systematically evaluated the capabilities of the advanced multimodal model ChatGPT-4V for diagnosing brain tumors on 7T magnetic resonance imaging (MRI) data. Their study established a benchmark framework for ultra-high field imaging AI applications, propelling the progress of precision medicine and intelligent diagnostics.

This special issue on ChatGPT and GPT-4 includes four recent studies that cover applications of different LLMs, such as Meta LLaMA 3.1, ChatGPT, Claude, Gemini, and LLaVA, in various medical scenarios. Yuan et al. [6] deeply explored the application of the Transformer architecture in natural language processing of chest X-ray reports, finding that this architecture holds significant potential in medical text processing. However, computational efficiency and ethical compliance require optimization, and future integration with multimodal data is needed to enhance diagnostic accuracy. Lotfian et al. [7] evaluated the performance of the open-source model LLaMA 3.1 in thoracic imaging diagnostics using 126 multiple-choice questions. The model achieved an overall accuracy of 61.1%, with excellent performance in intensive care (90%) and terminology recognition (83.3%) but weaker results in basic imaging (40%) and lung cancer diagnosis (33.3%). This assessment demonstrates the potential of open-source models like LLaMA 3.1 while highlighting the need for domain-specific fine-tuning to improve stability as well as the need to balance open-source flexibility with proprietary model reliability in clinical applications. Sarangi et al. [8] compared six models, including ChatGPT, Claude, and Gemini, evaluating the accuracy and readability (Flesch–Kincaid metrics) of their generated plain language summaries for 100 radiology abstracts. They found that ChatGPT had the highest accuracy, while Claude performed best in readability. However, manual review is necessary to ensure accuracy and avoid misinterpretation or omission of technical terms.

Beyond concerns about the performance of these LLMs, we also care about the attitudes of healthcare providers toward them. Lecler et al. [9] argue that these LLMs will not replace radiologists but will serve as adjunctive tools to enhance diagnostic efficiency and accuracy, necessitating collaborative work between the two. Through interview-based surveys, He et al. [10] found that 67.4%–94.3% of respondents believe that AI can improve clinical accuracy, and 64% agree that it enhances efficiency. They also found that 71.1% believe AI will not significantly impact job security, with some residents more inclined to choose radiology as a career because of the availability of AI assistance. Only 55.6% of respondents were familiar with AI, whereas over 80% supported incorporating AI into resident training curricula. Meanwhile, Perera et al. [11] advocate integrating ChatGPT into radiology training to cultivate physicians' ability to collaborate with AI while strengthening critical thinking. To promote the adoption of LLMs by radiologists, Kim et al. [12] demonstrated that a 10-min structured LLM tutorial significantly improved radiology residents' performance and confidence in differential diagnosis on brain MRI. Such low-cost educational interventions could serve as a key strategy to facilitate the safe and effective application of LLMs in medicine.

LLMs, such as ChatGPT and GPT-4, are exerting a profound influence on the development of medical imaging. Particularly in the context of big data, these models and technologies are driving medical imaging's transition from an “experience-driven” to a “data-driven” paradigm [13]. LLMs represented by ChatGPT will usher in a new phase for medical imaging development and elevate it to new heights.

Bo Gao: review and editing (lead). Weihua Ou: writing – original draft (lead).

The authors have nothing to report.

This article belongs to a special issue (SI)—Application of ChatGPT/GPT-4 for Radiology. As the journal's Executive Editor-in-Chief and SI's guest editor, to minimize bias, Professor Bo Gao was excluded from all the editorial decisions related to the publication of this article. As the SI's guest editor, to minimize bias, Professor Weihua Ou was also excluded from all editorial decision-making related to the acceptance of this article for publication.

查看原文本刊更多论文

ChatGPT和GPT-4在放射学中的应用和意义

人工智能（AI）技术的快速发展导致了ChatGPT和GPT-4等最先进的大型语言模型（llm）的出现。这些模型最初是为自然语言处理而设计的，现在正被应用于越来越广泛的领域，特别是在医学图像处理领域。同时，这些模型的兴起为医学图像处理和诊断引入了创新工具，深刻地塑造了该领域的未来轨迹。这些工具不仅提高了诊断的准确性和效率，还减轻了临床医生的大量重复性工作量。为了解决生物医学人工智能研究中对透明度、可重复性和临床可靠性的关键需求，Gallifant等人提出了个体预后或诊断预测模型的透明报告- llm，这是对个体预后或诊断预测模型的透明报告+人工智能声明的扩展。在特定领域的创新中，Liu等人通过对大量放射学知识语料库进行培训和微调，开发了radiology - gpt。与通用llm相比，该专业模型表现出优越的性能，验证了为特定医学专业创建本地化生成模型的可行性。Yuan等人（[5]）系统地评估了先进的多模态模型ChatGPT-4V在7T磁共振成像（MRI）数据上诊断脑肿瘤的能力。他们的研究为超高场成像人工智能应用建立了基准框架，推动了精准医疗和智能诊断的进步。本期关于ChatGPT和GPT-4的特刊包括四项最新研究，涵盖了Meta LLaMA 3.1、ChatGPT、Claude、Gemini和LLaVA等不同llm在各种医疗场景中的应用。Yuan等人[6]深入探索了Transformer架构在胸部x光报告自然语言处理中的应用，发现该架构在医学文本处理中具有巨大的潜力。然而，计算效率和道德合规需要优化，未来需要与多模态数据集成以提高诊断准确性。Lotfian等人用126道选择题评估了开源模型LLaMA 3.1在胸部影像诊断中的性能。该模型的总体准确率为61.1%，在重症监护（90%）和术语识别（83.3%）方面表现优异，但在基础影像学（40%）和肺癌诊断（33.3%）方面表现较差。该评估显示了像LLaMA 3.1这样的开源模型的潜力，同时强调了在临床应用中需要对特定领域进行微调以提高稳定性，以及需要平衡开源灵活性和专有模型可靠性。Sarangi等人比较了六种模型，包括ChatGPT、Claude和Gemini，评估了它们为100篇放射学摘要生成的简单语言摘要的准确性和可读性（Flesch-Kincaid指标）。他们发现ChatGPT的准确率最高，而Claude在可读性方面表现最好。然而，手工审查是必要的，以确保准确性和避免误解或遗漏的技术术语。除了关注这些法学硕士的表现外，我们还关心医疗保健提供者对他们的态度。Lecler等人认为，这些llm不会取代放射科医生，而是作为辅助工具来提高诊断效率和准确性，需要两者之间的协同工作。He et al.[10]通过访谈调查发现，67.4%-94.3%的受访者认为AI可以提高临床准确性，64%的受访者认为AI可以提高效率。他们还发现，71.1%的人认为人工智能不会对工作保障产生重大影响，由于人工智能的帮助，一些居民更倾向于选择放射学作为职业。只有55.6%的受访者熟悉人工智能，而超过80%的受访者支持将人工智能纳入住院医师培训课程。同时，Perera et al. b[11]主张将ChatGPT整合到放射学培训中，培养医生与AI协作的能力，同时加强批判性思维。为了促进放射科医师对LLM的采用，Kim等人证明了10分钟的结构化LLM教程显著提高了放射科住院医师在脑MRI鉴别诊断方面的表现和信心。这种低成本的教育干预措施可以作为促进法学硕士在医学中安全有效应用的关键策略。ChatGPT和GPT-4等llm对医学成像的发展产生了深远的影响。特别是在大数据的背景下，这些模型和技术正在推动医学成像从“经验驱动”模式向“数据驱动”模式的转变。以ChatGPT为代表的llm将迎来医学影像发展的新阶段，并将其提升到新的高度。博高：审稿编辑（主笔）。欧卫华：写作——原稿（主笔）。作者没有什么可报告的。作者没有什么可报告的。本文属于特刊(SI) - ChatGPT/GPT-4在放射学中的应用。作为该杂志的执行总编辑和SI的客座编辑，为了尽量减少偏见，Bo Gao教授被排除在与本文发表相关的所有编辑决策之外。作为SI的客座编辑，为了尽量减少偏见，欧卫华教授也被排除在与本文发表与否相关的所有编辑决策之外。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

iRadiology

自引率

0.00%

发文量