评估微软必应与 ChatGPT-4 对腹部计算机断层扫描和磁共振图像的评估。

IF 1.7 4区医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Diagnostic and interventional radiology Pub Date : 2025-04-28 Epub Date: 2024-08-19 DOI:10.4274/dir.2024.232680

Alperen Elek, Duygu Doğa Ekizalioğlu, Ezgi Güler

{"title":"评估微软必应与 ChatGPT-4 对腹部计算机断层扫描和磁共振图像的评估。","authors":"Alperen Elek, Duygu Doğa Ekizalioğlu, Ezgi Güler","doi":"10.4274/dir.2024.232680","DOIUrl":null,"url":null,"abstract":"Purpose: To evaluate the performance of Microsoft Bing with ChatGPT-4 technology in analyzing abdominal computed tomography (CT) and magnetic resonance images (MRI).Methods: A comparative and descriptive analysis was conducted using the institutional picture archiving and communication systems. A total of 80 abdominal images (44 CT, 36 MRI) that showed various entities affecting the abdominal structures were included. Microsoft Bing's interpretations were compared with the impressions of radiologists in terms of recognition of the imaging modality, identification of the imaging planes (axial, coronal, and sagittal), sequences (in the case of MRI), contrast media administration, correct identification of the anatomical region depicted in the image, and detection of abnormalities.Results: Microsoft Bing detected that the images were CT scans with 95.4% accuracy (42/44) and that the images were MRI scans with 86.1% accuracy (31/36). However, it failed to detect one CT image (2.3%) and misidentified another CT image as an MRI (2.3%). On the other hand, it also misidentified four MRI as CT images (11.1%) and one as an X-ray (2.7%). Bing achieved an 83.75% success rate in correctly identifying abdominal regions, with 90% accuracy for CT scans (40/44) and 77.7% for MRI scans (28/36). Concerning the identification of imaging planes, Bing achieved a success rate of 95.4% for CT images and 83.3% for MRI. Regarding the identification of MRI sequences (T1-weighted and T2-weighted), the success rate was 68.75%. In the identification of the use of contrast media for CT scans, the success rate was 64.2%. Bing detected abnormalities in 35% of the images but achieved a correct interpretation rate of 10.7% for the definite diagnosis.Conclusion: While Microsoft Bing, leveraging ChatGPT-4 technology, demonstrates proficiency in basic task identification on abdominal CT and MRI, its inability to reliably interpret abnormalities highlights the need for continued refinement to enhance its clinical applicability.Clinical significance: The contribution of large language models (LLMs) to the diagnostic process in radiology is still being explored. However, with a comprehensive understanding of their capabilities and limitations, LLMs can significantly support radiologists during diagnosis and improve the overall efficiency of abdominal radiology practices. Acknowledging the limitations of current studies related to ChatGPT in this field, our work provides a foundation for future clinical research, paving the way for more integrated and effective diagnostic tools.","PeriodicalId":11341,"journal":{"name":"Diagnostic and interventional radiology","volume":" ","pages":"196-205"},"PeriodicalIF":1.7000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12057540/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating Microsoft Bing with ChatGPT-4 for the assessment of abdominal computed tomography and magnetic resonance images.\",\"authors\":\"Alperen Elek, Duygu Doğa Ekizalioğlu, Ezgi Güler\",\"doi\":\"10.4274/dir.2024.232680\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: To evaluate the performance of Microsoft Bing with ChatGPT-4 technology in analyzing abdominal computed tomography (CT) and magnetic resonance images (MRI).Methods: A comparative and descriptive analysis was conducted using the institutional picture archiving and communication systems. A total of 80 abdominal images (44 CT, 36 MRI) that showed various entities affecting the abdominal structures were included. Microsoft Bing's interpretations were compared with the impressions of radiologists in terms of recognition of the imaging modality, identification of the imaging planes (axial, coronal, and sagittal), sequences (in the case of MRI), contrast media administration, correct identification of the anatomical region depicted in the image, and detection of abnormalities.Results: Microsoft Bing detected that the images were CT scans with 95.4% accuracy (42/44) and that the images were MRI scans with 86.1% accuracy (31/36). However, it failed to detect one CT image (2.3%) and misidentified another CT image as an MRI (2.3%). On the other hand, it also misidentified four MRI as CT images (11.1%) and one as an X-ray (2.7%). Bing achieved an 83.75% success rate in correctly identifying abdominal regions, with 90% accuracy for CT scans (40/44) and 77.7% for MRI scans (28/36). Concerning the identification of imaging planes, Bing achieved a success rate of 95.4% for CT images and 83.3% for MRI. Regarding the identification of MRI sequences (T1-weighted and T2-weighted), the success rate was 68.75%. In the identification of the use of contrast media for CT scans, the success rate was 64.2%. Bing detected abnormalities in 35% of the images but achieved a correct interpretation rate of 10.7% for the definite diagnosis.Conclusion: While Microsoft Bing, leveraging ChatGPT-4 technology, demonstrates proficiency in basic task identification on abdominal CT and MRI, its inability to reliably interpret abnormalities highlights the need for continued refinement to enhance its clinical applicability.Clinical significance: The contribution of large language models (LLMs) to the diagnostic process in radiology is still being explored. However, with a comprehensive understanding of their capabilities and limitations, LLMs can significantly support radiologists during diagnosis and improve the overall efficiency of abdominal radiology practices. Acknowledging the limitations of current studies related to ChatGPT in this field, our work provides a foundation for future clinical research, paving the way for more integrated and effective diagnostic tools.\",\"PeriodicalId\":11341,\"journal\":{\"name\":\"Diagnostic and interventional radiology\",\"volume\":\" \",\"pages\":\"196-205\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12057540/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diagnostic and interventional radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.4274/dir.2024.232680\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/19 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and interventional radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4274/dir.2024.232680","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/19 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

目的：评估微软必应与 ChatGPT-4 技术在分析腹部计算机断层扫描（CT）和磁共振成像（MRI）方面的性能：方法：利用机构图片存档和通信系统进行比较和描述性分析。共纳入了 80 张显示影响腹部结构的各种实体的腹部图像（44 张 CT，36 张 MRI）。微软必应的判读结果与放射科医生的判读结果进行了比较，包括成像方式的识别、成像平面（轴位、冠状位和矢状位）的识别、序列（如果是核磁共振成像）、造影剂的使用、图像中描述的解剖区域的正确识别以及异常的检测：微软必应检测到图像是 CT 扫描的准确率为 95.4%（42/44），检测到图像是 MRI 扫描的准确率为 86.1%（31/36）。不过，它未能检测出一张 CT 图像（2.3%），并将另一张 CT 图像误认为核磁共振图像（2.3%）。另一方面，它还将四张核磁共振图像误认为 CT 图像（11.1%），将一张误认为 X 光（2.7%）。Bing 在正确识别腹部区域方面的成功率为 83.75%，其中 CT 扫描的准确率为 90%（40/44），MRI 扫描的准确率为 77.7%（28/36）。在成像平面的识别方面，Bing 对 CT 图像的识别成功率为 95.4%，对 MRI 图像的识别成功率为 83.3%。在核磁共振成像序列（T1 加权和 T2 加权）的识别方面，成功率为 68.75%。在识别 CT 扫描是否使用造影剂方面，成功率为 64.2%。必应在 35% 的图像中发现了异常，但明确诊断的正确解释率仅为 10.7%：结论：虽然微软必应利用 ChatGPT-4 技术在腹部 CT 和 MRI 的基本任务识别方面表现出色，但它无法可靠地解释异常情况，这突出表明需要不断改进以提高其临床适用性：临床意义：大型语言模型（LLM）对放射学诊断过程的贡献仍在探索之中。然而，只要全面了解其能力和局限性，大语言模型就能在诊断过程中为放射科医生提供重要支持，并提高腹部放射学实践的整体效率。我们的工作承认目前与 ChatGPT 相关的研究在这一领域存在局限性，但我们的工作为未来的临床研究奠定了基础，为更综合、更有效的诊断工具铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Evaluating Microsoft Bing with ChatGPT-4 for the assessment of abdominal computed tomography and magnetic resonance images.

查看原文本刊更多论文

Evaluating Microsoft Bing with ChatGPT-4 for the assessment of abdominal computed tomography and magnetic resonance images.

Purpose: To evaluate the performance of Microsoft Bing with ChatGPT-4 technology in analyzing abdominal computed tomography (CT) and magnetic resonance images (MRI).

Methods: A comparative and descriptive analysis was conducted using the institutional picture archiving and communication systems. A total of 80 abdominal images (44 CT, 36 MRI) that showed various entities affecting the abdominal structures were included. Microsoft Bing's interpretations were compared with the impressions of radiologists in terms of recognition of the imaging modality, identification of the imaging planes (axial, coronal, and sagittal), sequences (in the case of MRI), contrast media administration, correct identification of the anatomical region depicted in the image, and detection of abnormalities.

Results: Microsoft Bing detected that the images were CT scans with 95.4% accuracy (42/44) and that the images were MRI scans with 86.1% accuracy (31/36). However, it failed to detect one CT image (2.3%) and misidentified another CT image as an MRI (2.3%). On the other hand, it also misidentified four MRI as CT images (11.1%) and one as an X-ray (2.7%). Bing achieved an 83.75% success rate in correctly identifying abdominal regions, with 90% accuracy for CT scans (40/44) and 77.7% for MRI scans (28/36). Concerning the identification of imaging planes, Bing achieved a success rate of 95.4% for CT images and 83.3% for MRI. Regarding the identification of MRI sequences (T1-weighted and T2-weighted), the success rate was 68.75%. In the identification of the use of contrast media for CT scans, the success rate was 64.2%. Bing detected abnormalities in 35% of the images but achieved a correct interpretation rate of 10.7% for the definite diagnosis.

Conclusion: While Microsoft Bing, leveraging ChatGPT-4 technology, demonstrates proficiency in basic task identification on abdominal CT and MRI, its inability to reliably interpret abnormalities highlights the need for continued refinement to enhance its clinical applicability.

Clinical significance: The contribution of large language models (LLMs) to the diagnostic process in radiology is still being explored. However, with a comprehensive understanding of their capabilities and limitations, LLMs can significantly support radiologists during diagnosis and improve the overall efficiency of abdominal radiology practices. Acknowledging the limitations of current studies related to ChatGPT in this field, our work provides a foundation for future clinical research, paving the way for more integrated and effective diagnostic tools.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Diagnostic and interventional radiology Medicine-Radiology, Nuclear Medicine and Imaging

自引率

4.80%

发文量

期刊介绍： Diagnostic and Interventional Radiology (Diagn Interv Radiol) is the open access, online-only official publication of Turkish Society of Radiology. It is published bimonthly and the journal’s publication language is English. The journal is a medium for original articles, reviews, pictorial essays, technical notes related to all fields of diagnostic and interventional radiology.