Olga Bayar-Kapıcı, Erman Altunışık, Feyza Musabeyoğlu, Şeyda Dev, Ömer Kaya
{"title":"Artificial intelligence in radiology: diagnostic sensitivity of ChatGPT for detecting hemorrhages in cranial computed tomography scans.","authors":"Olga Bayar-Kapıcı, Erman Altunışık, Feyza Musabeyoğlu, Şeyda Dev, Ömer Kaya","doi":"10.4274/dir.2025.253456","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Chat Generative Pre-trained Transformer (ChatGPT)-4V, a large language model developed by OpenAI, has been explored for its potential application in radiology. This study assesses ChatGPT-4V's diagnostic performance in identifying various types of intracranial hemorrhages in non-contrast cranial computed tomography (CT) images.</p><p><strong>Methods: </strong>Intracranial hemorrhages were presented to ChatGPT using the clearest 2D imaging slices. The first question, \"Q1: Which imaging technique is used in this image?\" was asked to determine the imaging modality. ChatGPT was then prompted with the second question, \"Q2: What do you see in this image and what is the final diagnosis?\" to assess whether the CT scan was normal or showed pathology. For CT scans containing hemorrhage that ChatGPT did not interpret correctly, a follow-up question-\"Q3: There is bleeding in this image. Which type of bleeding do you see?\"-was used to evaluate whether this guidance influenced its response.</p><p><strong>Results: </strong>ChatGPT accurately identified the imaging technique (Q1) in all cases but demonstrated difficulty diagnosing epidural hematoma (EDH), subdural hematoma (SDH), and subarachnoid hemorrhage (SAH) when no clues were provided (Q2). When a hemorrhage clue was introduced (Q3), ChatGPT correctly identified EDH in 16.7% of cases, SDH in 60%, and SAH in 15.6%, and achieved 100% diagnostic accuracy for hemorrhagic cerebrovascular disease. Its sensitivity, specificity, and accuracy for Q2 were 23.6%, 92.5%, and 57.4%, respectively. These values improved substantially with the clue in Q3, with sensitivity rising to 50.9% and accuracy to 71.3%. ChatGPT also demonstrated higher diagnostic accuracy in larger hemorrhages in EDH and SDH images.</p><p><strong>Conclusion: </strong>Although the model performs well in recognizing imaging modalities, its diagnostic accuracy substantially improves when guided by additional contextual information.</p><p><strong>Clinical significance: </strong>These findings suggest that ChatGPT's diagnostic performance improves with guided prompts, highlighting its potential as a supportive tool in clinical radiology.</p>","PeriodicalId":11341,"journal":{"name":"Diagnostic and interventional radiology","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and interventional radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4274/dir.2025.253456","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Chat Generative Pre-trained Transformer (ChatGPT)-4V, a large language model developed by OpenAI, has been explored for its potential application in radiology. This study assesses ChatGPT-4V's diagnostic performance in identifying various types of intracranial hemorrhages in non-contrast cranial computed tomography (CT) images.
Methods: Intracranial hemorrhages were presented to ChatGPT using the clearest 2D imaging slices. The first question, "Q1: Which imaging technique is used in this image?" was asked to determine the imaging modality. ChatGPT was then prompted with the second question, "Q2: What do you see in this image and what is the final diagnosis?" to assess whether the CT scan was normal or showed pathology. For CT scans containing hemorrhage that ChatGPT did not interpret correctly, a follow-up question-"Q3: There is bleeding in this image. Which type of bleeding do you see?"-was used to evaluate whether this guidance influenced its response.
Results: ChatGPT accurately identified the imaging technique (Q1) in all cases but demonstrated difficulty diagnosing epidural hematoma (EDH), subdural hematoma (SDH), and subarachnoid hemorrhage (SAH) when no clues were provided (Q2). When a hemorrhage clue was introduced (Q3), ChatGPT correctly identified EDH in 16.7% of cases, SDH in 60%, and SAH in 15.6%, and achieved 100% diagnostic accuracy for hemorrhagic cerebrovascular disease. Its sensitivity, specificity, and accuracy for Q2 were 23.6%, 92.5%, and 57.4%, respectively. These values improved substantially with the clue in Q3, with sensitivity rising to 50.9% and accuracy to 71.3%. ChatGPT also demonstrated higher diagnostic accuracy in larger hemorrhages in EDH and SDH images.
Conclusion: Although the model performs well in recognizing imaging modalities, its diagnostic accuracy substantially improves when guided by additional contextual information.
Clinical significance: These findings suggest that ChatGPT's diagnostic performance improves with guided prompts, highlighting its potential as a supportive tool in clinical radiology.
期刊介绍:
Diagnostic and Interventional Radiology (Diagn Interv Radiol) is the open access, online-only official publication of Turkish Society of Radiology. It is published bimonthly and the journal’s publication language is English.
The journal is a medium for original articles, reviews, pictorial essays, technical notes related to all fields of diagnostic and interventional radiology.