{"title":"Applications and Implications of ChatGPT and GPT-4 in Radiology","authors":"Bo Gao, Weihua Ou","doi":"10.1002/ird3.70031","DOIUrl":null,"url":null,"abstract":"<p>Rapid advancements in artificial intelligence (AI) technology have resulted in the emergence of state-of-the-art large language models (LLMs) such as ChatGPT and GPT-4. Originally designed for natural language processing, these models are now being applied to increasingly broader domains, particularly in medical image processing [<span>1</span>]. Concurrently, the rise of such models has introduced innovative tools into medical image processing and diagnosis, profoundly shaping the future trajectory of this field. These tools not only enhance diagnostic accuracy and efficiency, but also alleviate substantial repetitive workloads for clinicians [<span>2</span>]. To address the critical needs for transparency, reproducibility, and clinical reliability in biomedical AI research, Gallifant et al. [<span>3</span>] proposed Transparent Reporting of a prediction model for Individual Prognosis or Diagnosis-LLM, an extension to the Transparent Reporting of a prediction model for Individual Prognosis or Diagnosis + artificial intelligence statement. In a domain-specific innovation, Liu et al. [<span>4</span>] developed Radiology-GPT through training and fine-tuning on a massive radiology knowledge corpus. In comparison with general-purpose LLMs, this specialized model demonstrated superior performance, validating the feasibility of creating localized generative models for specific medical specialties. Complementing this work, Yuan et al. [<span>5</span>] systematically evaluated the capabilities of the advanced multimodal model ChatGPT-4V for diagnosing brain tumors on 7T magnetic resonance imaging (MRI) data. Their study established a benchmark framework for ultra-high field imaging AI applications, propelling the progress of precision medicine and intelligent diagnostics.</p><p>This special issue on ChatGPT and GPT-4 includes four recent studies that cover applications of different LLMs, such as Meta LLaMA 3.1, ChatGPT, Claude, Gemini, and LLaVA, in various medical scenarios. Yuan et al. [<span>6</span>] deeply explored the application of the Transformer architecture in natural language processing of chest X-ray reports, finding that this architecture holds significant potential in medical text processing. However, computational efficiency and ethical compliance require optimization, and future integration with multimodal data is needed to enhance diagnostic accuracy. Lotfian et al. [<span>7</span>] evaluated the performance of the open-source model LLaMA 3.1 in thoracic imaging diagnostics using 126 multiple-choice questions. The model achieved an overall accuracy of 61.1%, with excellent performance in intensive care (90%) and terminology recognition (83.3%) but weaker results in basic imaging (40%) and lung cancer diagnosis (33.3%). This assessment demonstrates the potential of open-source models like LLaMA 3.1 while highlighting the need for domain-specific fine-tuning to improve stability as well as the need to balance open-source flexibility with proprietary model reliability in clinical applications. Sarangi et al. [<span>8</span>] compared six models, including ChatGPT, Claude, and Gemini, evaluating the accuracy and readability (Flesch–Kincaid metrics) of their generated plain language summaries for 100 radiology abstracts. They found that ChatGPT had the highest accuracy, while Claude performed best in readability. However, manual review is necessary to ensure accuracy and avoid misinterpretation or omission of technical terms.</p><p>Beyond concerns about the performance of these LLMs, we also care about the attitudes of healthcare providers toward them. Lecler et al. [<span>9</span>] argue that these LLMs will not replace radiologists but will serve as adjunctive tools to enhance diagnostic efficiency and accuracy, necessitating collaborative work between the two. Through interview-based surveys, He et al. [<span>10</span>] found that 67.4%–94.3% of respondents believe that AI can improve clinical accuracy, and 64% agree that it enhances efficiency. They also found that 71.1% believe AI will not significantly impact job security, with some residents more inclined to choose radiology as a career because of the availability of AI assistance. Only 55.6% of respondents were familiar with AI, whereas over 80% supported incorporating AI into resident training curricula. Meanwhile, Perera et al. [<span>11</span>] advocate integrating ChatGPT into radiology training to cultivate physicians' ability to collaborate with AI while strengthening critical thinking. To promote the adoption of LLMs by radiologists, Kim et al. [<span>12</span>] demonstrated that a 10-min structured LLM tutorial significantly improved radiology residents' performance and confidence in differential diagnosis on brain MRI. Such low-cost educational interventions could serve as a key strategy to facilitate the safe and effective application of LLMs in medicine.</p><p>LLMs, such as ChatGPT and GPT-4, are exerting a profound influence on the development of medical imaging. Particularly in the context of big data, these models and technologies are driving medical imaging's transition from an “experience-driven” to a “data-driven” paradigm [<span>13</span>]. LLMs represented by ChatGPT will usher in a new phase for medical imaging development and elevate it to new heights.</p><p><b>Bo Gao:</b> review and editing (lead). <b>Weihua Ou:</b> writing – original draft (lead).</p><p>The authors have nothing to report.</p><p>The authors have nothing to report.</p><p>This article belongs to a special issue (SI)—Application of ChatGPT/GPT-4 for Radiology. As the journal's Executive Editor-in-Chief and SI's guest editor, to minimize bias, Professor Bo Gao was excluded from all the editorial decisions related to the publication of this article. As the SI's guest editor, to minimize bias, Professor Weihua Ou was also excluded from all editorial decision-making related to the acceptance of this article for publication.</p>","PeriodicalId":73508,"journal":{"name":"iRadiology","volume":"3 4","pages":"259-260"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ird3.70031","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"iRadiology","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ird3.70031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Rapid advancements in artificial intelligence (AI) technology have resulted in the emergence of state-of-the-art large language models (LLMs) such as ChatGPT and GPT-4. Originally designed for natural language processing, these models are now being applied to increasingly broader domains, particularly in medical image processing [1]. Concurrently, the rise of such models has introduced innovative tools into medical image processing and diagnosis, profoundly shaping the future trajectory of this field. These tools not only enhance diagnostic accuracy and efficiency, but also alleviate substantial repetitive workloads for clinicians [2]. To address the critical needs for transparency, reproducibility, and clinical reliability in biomedical AI research, Gallifant et al. [3] proposed Transparent Reporting of a prediction model for Individual Prognosis or Diagnosis-LLM, an extension to the Transparent Reporting of a prediction model for Individual Prognosis or Diagnosis + artificial intelligence statement. In a domain-specific innovation, Liu et al. [4] developed Radiology-GPT through training and fine-tuning on a massive radiology knowledge corpus. In comparison with general-purpose LLMs, this specialized model demonstrated superior performance, validating the feasibility of creating localized generative models for specific medical specialties. Complementing this work, Yuan et al. [5] systematically evaluated the capabilities of the advanced multimodal model ChatGPT-4V for diagnosing brain tumors on 7T magnetic resonance imaging (MRI) data. Their study established a benchmark framework for ultra-high field imaging AI applications, propelling the progress of precision medicine and intelligent diagnostics.
This special issue on ChatGPT and GPT-4 includes four recent studies that cover applications of different LLMs, such as Meta LLaMA 3.1, ChatGPT, Claude, Gemini, and LLaVA, in various medical scenarios. Yuan et al. [6] deeply explored the application of the Transformer architecture in natural language processing of chest X-ray reports, finding that this architecture holds significant potential in medical text processing. However, computational efficiency and ethical compliance require optimization, and future integration with multimodal data is needed to enhance diagnostic accuracy. Lotfian et al. [7] evaluated the performance of the open-source model LLaMA 3.1 in thoracic imaging diagnostics using 126 multiple-choice questions. The model achieved an overall accuracy of 61.1%, with excellent performance in intensive care (90%) and terminology recognition (83.3%) but weaker results in basic imaging (40%) and lung cancer diagnosis (33.3%). This assessment demonstrates the potential of open-source models like LLaMA 3.1 while highlighting the need for domain-specific fine-tuning to improve stability as well as the need to balance open-source flexibility with proprietary model reliability in clinical applications. Sarangi et al. [8] compared six models, including ChatGPT, Claude, and Gemini, evaluating the accuracy and readability (Flesch–Kincaid metrics) of their generated plain language summaries for 100 radiology abstracts. They found that ChatGPT had the highest accuracy, while Claude performed best in readability. However, manual review is necessary to ensure accuracy and avoid misinterpretation or omission of technical terms.
Beyond concerns about the performance of these LLMs, we also care about the attitudes of healthcare providers toward them. Lecler et al. [9] argue that these LLMs will not replace radiologists but will serve as adjunctive tools to enhance diagnostic efficiency and accuracy, necessitating collaborative work between the two. Through interview-based surveys, He et al. [10] found that 67.4%–94.3% of respondents believe that AI can improve clinical accuracy, and 64% agree that it enhances efficiency. They also found that 71.1% believe AI will not significantly impact job security, with some residents more inclined to choose radiology as a career because of the availability of AI assistance. Only 55.6% of respondents were familiar with AI, whereas over 80% supported incorporating AI into resident training curricula. Meanwhile, Perera et al. [11] advocate integrating ChatGPT into radiology training to cultivate physicians' ability to collaborate with AI while strengthening critical thinking. To promote the adoption of LLMs by radiologists, Kim et al. [12] demonstrated that a 10-min structured LLM tutorial significantly improved radiology residents' performance and confidence in differential diagnosis on brain MRI. Such low-cost educational interventions could serve as a key strategy to facilitate the safe and effective application of LLMs in medicine.
LLMs, such as ChatGPT and GPT-4, are exerting a profound influence on the development of medical imaging. Particularly in the context of big data, these models and technologies are driving medical imaging's transition from an “experience-driven” to a “data-driven” paradigm [13]. LLMs represented by ChatGPT will usher in a new phase for medical imaging development and elevate it to new heights.
Bo Gao: review and editing (lead). Weihua Ou: writing – original draft (lead).
The authors have nothing to report.
The authors have nothing to report.
This article belongs to a special issue (SI)—Application of ChatGPT/GPT-4 for Radiology. As the journal's Executive Editor-in-Chief and SI's guest editor, to minimize bias, Professor Bo Gao was excluded from all the editorial decisions related to the publication of this article. As the SI's guest editor, to minimize bias, Professor Weihua Ou was also excluded from all editorial decision-making related to the acceptance of this article for publication.