GPT-4 Performance for Neurologic Localization.

IF 2.3 Q3 CLINICAL NEUROLOGY

Neurology. Clinical practice Pub Date : 2024-06-01 Epub Date: 2024-03-27 DOI:10.1212/CPJ.0000000000200293

Jung-Hyun Lee, Eunhee Choi, Robert McDougal, William W Lytton

{"title":"GPT-4 Performance for Neurologic Localization.","authors":"Jung-Hyun Lee, Eunhee Choi, Robert McDougal, William W Lytton","doi":"10.1212/CPJ.0000000000200293","DOIUrl":null,"url":null,"abstract":"Background and objectives: In health care, large language models such as Generative Pretrained Transformers (GPTs), trained on extensive text datasets, have potential applications in reducing health care disparities across regions and populations. Previous software developed for lesion localization has been limited in scope. This study aims to evaluate the capability of GPT-4 for lesion localization based on clinical presentation.Methods: GPT-4 was prompted using history and neurologic physical examination (H&P) from published cases of acute stroke followed by questions for clinical reasoning with answering for \"single or multiple lesions,\" \"side,\" and \"brain region\" using Zero-Shot Chain-of-Thought and Text Classification prompting. GPT-4 output on 3 separate trials for each of 46 cases was compared with imaging-based localization.Results: GPT-4 successfully processed raw text from H&P to generate accurate neuroanatomical localization and detailed clinical reasoning. Performance metrics across trial-based analysis for specificity, sensitivity, precision, and F1-score were 0.87, 0.74, 0.75, and 0.74, respectively, for side; 0.94, 0.85, 0.84, and 0.85, respectively, for brain region. Class labels within the brain region were similarly high for all regions except the cerebellum and were also similar when considering all 3 trials to examine metrics by case. Errors were due to extrinsic causes-inadequate information in the published cases, and intrinsic causes-failures of logic or inadequate knowledge base.Discussion: This study reveals capabilities of GPT-4 in the localization of acute stroke lesions, showing a potential future role as a clinical tool in neurology.","PeriodicalId":19136,"journal":{"name":"Neurology. Clinical practice","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11003355/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurology. Clinical practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1212/CPJ.0000000000200293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/27 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background and objectives: In health care, large language models such as Generative Pretrained Transformers (GPTs), trained on extensive text datasets, have potential applications in reducing health care disparities across regions and populations. Previous software developed for lesion localization has been limited in scope. This study aims to evaluate the capability of GPT-4 for lesion localization based on clinical presentation.

Methods: GPT-4 was prompted using history and neurologic physical examination (H&P) from published cases of acute stroke followed by questions for clinical reasoning with answering for "single or multiple lesions," "side," and "brain region" using Zero-Shot Chain-of-Thought and Text Classification prompting. GPT-4 output on 3 separate trials for each of 46 cases was compared with imaging-based localization.

Results: GPT-4 successfully processed raw text from H&P to generate accurate neuroanatomical localization and detailed clinical reasoning. Performance metrics across trial-based analysis for specificity, sensitivity, precision, and F1-score were 0.87, 0.74, 0.75, and 0.74, respectively, for side; 0.94, 0.85, 0.84, and 0.85, respectively, for brain region. Class labels within the brain region were similarly high for all regions except the cerebellum and were also similar when considering all 3 trials to examine metrics by case. Errors were due to extrinsic causes-inadequate information in the published cases, and intrinsic causes-failures of logic or inadequate knowledge base.

Discussion: This study reveals capabilities of GPT-4 in the localization of acute stroke lesions, showing a potential future role as a clinical tool in neurology.

查看原文本刊更多论文

神经定位的 GPT-4 性能。

背景和目标：在医疗保健领域，大型语言模型（如在大量文本数据集上训练的生成预训练转换器 (GPT)）在减少不同地区和人群的医疗保健差异方面具有潜在的应用价值。以前开发的病变定位软件范围有限。本研究旨在评估 GPT-4 根据临床表现进行病灶定位的能力：方法：使用已发表的急性脑卒中病例的病史和神经系统体格检查（H&P）提示 GPT-4，然后使用零点思维链和文本分类提示回答 "单个或多个病灶"、"一侧 "和 "脑区 "等临床推理问题。对 46 个病例中每个病例的 GPT-4 输出进行了 3 次单独试验，并与基于成像的定位进行了比较：结果：GPT-4 成功处理了来自 H&P 的原始文本，生成了准确的神经解剖定位和详细的临床推理。基于试验分析的特异性、灵敏度、精确度和 F1 分数的性能指标分别为：侧脑 0.87、0.74、0.75 和 0.74；脑区 0.94、0.85、0.84 和 0.85。除小脑外，所有脑区的类别标签都很高，而且在考虑所有 3 次试验以逐个检查指标时也很相似。错误的原因有外在原因--公布的案例信息不足，也有内在原因--逻辑失误或知识库不足：讨论：本研究揭示了 GPT-4 在急性中风病灶定位方面的能力，显示了其作为神经病学临床工具的潜在作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurology. Clinical practice CLINICAL NEUROLOGY-

CiteScore

4.00

自引率

0.00%

发文量

期刊介绍： Neurology® Genetics is an online open access journal publishing peer-reviewed reports in the field of neurogenetics. The journal publishes original articles in all areas of neurogenetics including rare and common genetic variations, genotype-phenotype correlations, outlier phenotypes as a result of mutations in known disease genes, and genetic variations with a putative link to diseases. Articles include studies reporting on genetic disease risk, pharmacogenomics, and results of gene-based clinical trials (viral, ASO, etc.). Genetically engineered model systems are not a primary focus of Neurology® Genetics, but studies using model systems for treatment trials, including well-powered studies reporting negative results, are welcome.