Alexis Hadjiathanasiou, Leonie Goelz, Florian Muhn, Rebecca Heinz, Lutz Kreißl, Paul Sparenberg, Johannes Lemcke, Ingo Schmehl, Sven Mutze, Patrick Schuss
{"title":"Artificial intelligence in neurovascular decision-making: a comparative analysis of ChatGPT-4 and multidisciplinary expert recommendations for unruptured intracranial aneurysms.","authors":"Alexis Hadjiathanasiou, Leonie Goelz, Florian Muhn, Rebecca Heinz, Lutz Kreißl, Paul Sparenberg, Johannes Lemcke, Ingo Schmehl, Sven Mutze, Patrick Schuss","doi":"10.1007/s10143-025-03341-3","DOIUrl":null,"url":null,"abstract":"<p><p>In the multidisciplinary treatment of cerebrovascular diseases, specialists from different disciplines strive to develop patient-specific treatment recommendations. ChatGPT is a natural language processing chatbot with increasing applicability in medical practice. This study evaluates ChatGPT's ability to provide treatment recommendations for patients with unruptured intracranial aneurysms (UIA). Anonymized patient data and radiological reports of 20 patients with UIAs were provided to GPT-4 in a standardized format and used to generate a treatment recommendation for different clinical scenarios. GPT-4 responses were evaluated by a multidisciplinary panel of specialists by means of the Likert scale and subsequently benchmarked against the Unruptured Intracranial Aneurysm Treatment Score (UIATS) as well as the actual treatment decision made by the multidisciplinary institutional neurovascular board (INVB). Agreement between expert raters was measured using linear weighted Fleiss-Kappa coefficient. GPT-4 analyzed individual pathological features of the radiological reports and formulated a corresponding assessment for each aspect. None of the recommendations generated reflected evidence of factual hallucination, although in 25% of the case studies no specific recommendation could be derived from the GPT-4 responses. The expert panel rated the overall quality of the GPT-4 recommendations with a median of 3.4 out of 5 points. The GPT-4 recommendations were congruent with those of the INBI in 65% of cases. Interrater reliability among experts showed moderate to low agreement in the assessment of AI-assisted decision making. GPT-4 appears to be able to process clinical information about UIAs and generate treatment recommendations. However, the level of ambiguity and the utilization of scientific evidence in the recommendations are not yet patient/case specific enough to substitute the decision-making of a multidisciplinary neurovascular board. A prospective evaluation of GPT-4 competence as a companion in decision-making panels is deemed necessary.</p>","PeriodicalId":19184,"journal":{"name":"Neurosurgical Review","volume":"48 1","pages":"261"},"PeriodicalIF":2.5000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurosurgical Review","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10143-025-03341-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
Artificial intelligence in neurovascular decision-making: a comparative analysis of ChatGPT-4 and multidisciplinary expert recommendations for unruptured intracranial aneurysms.
In the multidisciplinary treatment of cerebrovascular diseases, specialists from different disciplines strive to develop patient-specific treatment recommendations. ChatGPT is a natural language processing chatbot with increasing applicability in medical practice. This study evaluates ChatGPT's ability to provide treatment recommendations for patients with unruptured intracranial aneurysms (UIA). Anonymized patient data and radiological reports of 20 patients with UIAs were provided to GPT-4 in a standardized format and used to generate a treatment recommendation for different clinical scenarios. GPT-4 responses were evaluated by a multidisciplinary panel of specialists by means of the Likert scale and subsequently benchmarked against the Unruptured Intracranial Aneurysm Treatment Score (UIATS) as well as the actual treatment decision made by the multidisciplinary institutional neurovascular board (INVB). Agreement between expert raters was measured using linear weighted Fleiss-Kappa coefficient. GPT-4 analyzed individual pathological features of the radiological reports and formulated a corresponding assessment for each aspect. None of the recommendations generated reflected evidence of factual hallucination, although in 25% of the case studies no specific recommendation could be derived from the GPT-4 responses. The expert panel rated the overall quality of the GPT-4 recommendations with a median of 3.4 out of 5 points. The GPT-4 recommendations were congruent with those of the INBI in 65% of cases. Interrater reliability among experts showed moderate to low agreement in the assessment of AI-assisted decision making. GPT-4 appears to be able to process clinical information about UIAs and generate treatment recommendations. However, the level of ambiguity and the utilization of scientific evidence in the recommendations are not yet patient/case specific enough to substitute the decision-making of a multidisciplinary neurovascular board. A prospective evaluation of GPT-4 competence as a companion in decision-making panels is deemed necessary.
期刊介绍:
The goal of Neurosurgical Review is to provide a forum for comprehensive reviews on current issues in neurosurgery. Each issue contains up to three reviews, reflecting all important aspects of one topic (a disease or a surgical approach). Comments by a panel of experts within the same issue complete the topic. By providing comprehensive coverage of one topic per issue, Neurosurgical Review combines the topicality of professional journals with the indepth treatment of a monograph. Original papers of high quality are also welcome.