人工智能在神经血管决策中的应用：ChatGPT-4与多学科专家建议对颅内未破裂动脉瘤的比较分析

IF 2.5 3区医学 Q2 CLINICAL NEUROLOGY

Neurosurgical Review Pub Date : 2025-02-21 DOI:10.1007/s10143-025-03341-3

Alexis Hadjiathanasiou, Leonie Goelz, Florian Muhn, Rebecca Heinz, Lutz Kreißl, Paul Sparenberg, Johannes Lemcke, Ingo Schmehl, Sven Mutze, Patrick Schuss

{"title":"人工智能在神经血管决策中的应用：ChatGPT-4与多学科专家建议对颅内未破裂动脉瘤的比较分析","authors":"Alexis Hadjiathanasiou, Leonie Goelz, Florian Muhn, Rebecca Heinz, Lutz Kreißl, Paul Sparenberg, Johannes Lemcke, Ingo Schmehl, Sven Mutze, Patrick Schuss","doi":"10.1007/s10143-025-03341-3","DOIUrl":null,"url":null,"abstract":"In the multidisciplinary treatment of cerebrovascular diseases, specialists from different disciplines strive to develop patient-specific treatment recommendations. ChatGPT is a natural language processing chatbot with increasing applicability in medical practice. This study evaluates ChatGPT's ability to provide treatment recommendations for patients with unruptured intracranial aneurysms (UIA). Anonymized patient data and radiological reports of 20 patients with UIAs were provided to GPT-4 in a standardized format and used to generate a treatment recommendation for different clinical scenarios. GPT-4 responses were evaluated by a multidisciplinary panel of specialists by means of the Likert scale and subsequently benchmarked against the Unruptured Intracranial Aneurysm Treatment Score (UIATS) as well as the actual treatment decision made by the multidisciplinary institutional neurovascular board (INVB). Agreement between expert raters was measured using linear weighted Fleiss-Kappa coefficient. GPT-4 analyzed individual pathological features of the radiological reports and formulated a corresponding assessment for each aspect. None of the recommendations generated reflected evidence of factual hallucination, although in 25% of the case studies no specific recommendation could be derived from the GPT-4 responses. The expert panel rated the overall quality of the GPT-4 recommendations with a median of 3.4 out of 5 points. The GPT-4 recommendations were congruent with those of the INBI in 65% of cases. Interrater reliability among experts showed moderate to low agreement in the assessment of AI-assisted decision making. GPT-4 appears to be able to process clinical information about UIAs and generate treatment recommendations. However, the level of ambiguity and the utilization of scientific evidence in the recommendations are not yet patient/case specific enough to substitute the decision-making of a multidisciplinary neurovascular board. A prospective evaluation of GPT-4 competence as a companion in decision-making panels is deemed necessary.","PeriodicalId":19184,"journal":{"name":"Neurosurgical Review","volume":"48 1","pages":"261"},"PeriodicalIF":2.5000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artificial intelligence in neurovascular decision-making: a comparative analysis of ChatGPT-4 and multidisciplinary expert recommendations for unruptured intracranial aneurysms.\",\"authors\":\"Alexis Hadjiathanasiou, Leonie Goelz, Florian Muhn, Rebecca Heinz, Lutz Kreißl, Paul Sparenberg, Johannes Lemcke, Ingo Schmehl, Sven Mutze, Patrick Schuss\",\"doi\":\"10.1007/s10143-025-03341-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the multidisciplinary treatment of cerebrovascular diseases, specialists from different disciplines strive to develop patient-specific treatment recommendations. ChatGPT is a natural language processing chatbot with increasing applicability in medical practice. This study evaluates ChatGPT's ability to provide treatment recommendations for patients with unruptured intracranial aneurysms (UIA). Anonymized patient data and radiological reports of 20 patients with UIAs were provided to GPT-4 in a standardized format and used to generate a treatment recommendation for different clinical scenarios. GPT-4 responses were evaluated by a multidisciplinary panel of specialists by means of the Likert scale and subsequently benchmarked against the Unruptured Intracranial Aneurysm Treatment Score (UIATS) as well as the actual treatment decision made by the multidisciplinary institutional neurovascular board (INVB). Agreement between expert raters was measured using linear weighted Fleiss-Kappa coefficient. GPT-4 analyzed individual pathological features of the radiological reports and formulated a corresponding assessment for each aspect. None of the recommendations generated reflected evidence of factual hallucination, although in 25% of the case studies no specific recommendation could be derived from the GPT-4 responses. The expert panel rated the overall quality of the GPT-4 recommendations with a median of 3.4 out of 5 points. The GPT-4 recommendations were congruent with those of the INBI in 65% of cases. Interrater reliability among experts showed moderate to low agreement in the assessment of AI-assisted decision making. GPT-4 appears to be able to process clinical information about UIAs and generate treatment recommendations. However, the level of ambiguity and the utilization of scientific evidence in the recommendations are not yet patient/case specific enough to substitute the decision-making of a multidisciplinary neurovascular board. A prospective evaluation of GPT-4 competence as a companion in decision-making panels is deemed necessary.\",\"PeriodicalId\":19184,\"journal\":{\"name\":\"Neurosurgical Review\",\"volume\":\"48 1\",\"pages\":\"261\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurosurgical Review\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s10143-025-03341-3\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurosurgical Review","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10143-025-03341-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

摘要

在脑血管疾病的多学科治疗中，来自不同学科的专家努力制定针对患者的治疗建议。ChatGPT 是一种自然语言处理聊天机器人，在医疗实践中的应用日益广泛。本研究评估了 ChatGPT 为未破裂颅内动脉瘤（UIA）患者提供治疗建议的能力。20 名 UIA 患者的匿名患者数据和放射学报告以标准化格式提供给 GPT-4，用于生成不同临床情况下的治疗建议。由多学科专家组成的小组通过李克特量表对 GPT-4 的反应进行评估，随后将其与未破裂颅内动脉瘤治疗评分（UIATS）以及多学科机构神经血管委员会（INVB）做出的实际治疗决定进行比较。专家评分者之间的一致性采用线性加权弗莱斯-卡帕系数（Fleiss-Kappa coefficient）进行测量。GPT-4 分析了放射学报告的各个病理特征，并针对每个方面制定了相应的评估。尽管有 25% 的病例研究无法从 GPT-4 的回答中得出具体的建议，但没有一项建议反映了事实性幻觉的证据。专家组对 GPT-4 建议的整体质量进行了评分，中位数为 3.4（满分 5 分）。在 65% 的案例中，GPT-4 的建议与 INBI 的建议一致。在对人工智能辅助决策的评估中，专家之间的相互信度显示出中度到低度的一致性。GPT-4似乎能够处理有关UIA的临床信息并生成治疗建议。然而，建议中的模糊程度和科学证据的利用还不足以取代多学科神经血管委员会的决策。我们认为有必要对 GPT-4 作为决策小组辅助工具的能力进行前瞻性评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Artificial intelligence in neurovascular decision-making: a comparative analysis of ChatGPT-4 and multidisciplinary expert recommendations for unruptured intracranial aneurysms.

In the multidisciplinary treatment of cerebrovascular diseases, specialists from different disciplines strive to develop patient-specific treatment recommendations. ChatGPT is a natural language processing chatbot with increasing applicability in medical practice. This study evaluates ChatGPT's ability to provide treatment recommendations for patients with unruptured intracranial aneurysms (UIA). Anonymized patient data and radiological reports of 20 patients with UIAs were provided to GPT-4 in a standardized format and used to generate a treatment recommendation for different clinical scenarios. GPT-4 responses were evaluated by a multidisciplinary panel of specialists by means of the Likert scale and subsequently benchmarked against the Unruptured Intracranial Aneurysm Treatment Score (UIATS) as well as the actual treatment decision made by the multidisciplinary institutional neurovascular board (INVB). Agreement between expert raters was measured using linear weighted Fleiss-Kappa coefficient. GPT-4 analyzed individual pathological features of the radiological reports and formulated a corresponding assessment for each aspect. None of the recommendations generated reflected evidence of factual hallucination, although in 25% of the case studies no specific recommendation could be derived from the GPT-4 responses. The expert panel rated the overall quality of the GPT-4 recommendations with a median of 3.4 out of 5 points. The GPT-4 recommendations were congruent with those of the INBI in 65% of cases. Interrater reliability among experts showed moderate to low agreement in the assessment of AI-assisted decision making. GPT-4 appears to be able to process clinical information about UIAs and generate treatment recommendations. However, the level of ambiguity and the utilization of scientific evidence in the recommendations are not yet patient/case specific enough to substitute the decision-making of a multidisciplinary neurovascular board. A prospective evaluation of GPT-4 competence as a companion in decision-making panels is deemed necessary.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurosurgical Review 医学-临床神经学

CiteScore

5.60

自引率

7.10%

发文量

191

审稿时长

6-12 weeks

期刊介绍： The goal of Neurosurgical Review is to provide a forum for comprehensive reviews on current issues in neurosurgery. Each issue contains up to three reviews, reflecting all important aspects of one topic (a disease or a surgical approach). Comments by a panel of experts within the same issue complete the topic. By providing comprehensive coverage of one topic per issue, Neurosurgical Review combines the topicality of professional journals with the indepth treatment of a monograph. Original papers of high quality are also welcome.