Artificial intelligence as a modality to enhance the readability of neurosurgical literature for patients.

IF 3.5 2区医学 Q1 CLINICAL NEUROLOGY

Journal of neurosurgery Pub Date : 2024-11-08 Print Date: 2025-04-01 DOI:10.3171/2024.6.JNS24617

Gage A Guerra, Sophie Grove, Jonathan Le, Hayden L Hofmann, Ishan Shah, Sweta Bhagavatula, Benjamin Fixman, David Gomez, Benjamin Hopkins, Jonathan Dallas, Giovanni Cacciamani, Racheal Peterson, Gabriel Zada

{"title":"Artificial intelligence as a modality to enhance the readability of neurosurgical literature for patients.","authors":"Gage A Guerra, Sophie Grove, Jonathan Le, Hayden L Hofmann, Ishan Shah, Sweta Bhagavatula, Benjamin Fixman, David Gomez, Benjamin Hopkins, Jonathan Dallas, Giovanni Cacciamani, Racheal Peterson, Gabriel Zada","doi":"10.3171/2024.6.JNS24617","DOIUrl":null,"url":null,"abstract":"Objective: In this study the authors assessed the ability of Chat Generative Pretrained Transformer (ChatGPT) 3.5 and ChatGPT4 to generate readable and accurate summaries of published neurosurgical literature.Methods: Abstracts published in journal issues released between June 2023 and August 2023 (n = 150) were randomly selected from the top 5 ranked neurosurgical journals according to Google Scholar. ChatGPT models were instructed to generate a readable layperson summary of the original abstract from a statistically validated prompt. Readability results and grade-level indicators (RR-GLIs) scores were calculated for GPT3.5- and GPT4-generated summaries and original abstracts. Two physicians independently rated the accuracy of ChatGPT-generated layperson summaries to assess scientific validity. One-way ANOVA followed by pairwise t-test with Bonferroni correction were performed to compare readability scores. Cohen's kappa was used to assess interrater agreement between the two rater physicians.Results: Analysis of 150 original abstracts showed a statistically significant difference for all RR-GLIs between the ChatGPT-generated summaries and original abstracts. The readability scores are formatted as follows (original abstract mean, GPT3.5 summary mean, GPT4 summary mean, p value): Flesch-Kincaid reading grade (12.55, 7.80, 7.70, p < 0.0001); Gunning fog score (15.46, 10.00, 9.00, p < 0.0001); Simple Measure of Gobbledygook (SMOG) index (11.30, 7.13, 6.60, p < 0.0001); Coleman-Liau index (14.67, 11.32, 10.26, p < 0.0001); automated readability index (10.87, 8.50, 7.75, p < 0.0001); and Flesch-Kincaid reading ease (33.29, 68.45, 69.55, p < 0.0001). GPT4-generated summaries demonstrated higher RR-GLIs than GPT3.5-generated summaries in the following categories: Gunning fog score (0.0003); SMOG index (0.027); Coleman-Liau index (< 0.0001); sentences (< 0.0001); complex words (< 0.0001); and % complex words (0.0035). A total of 68.4% and 84.2% of GPT3.5- and GPT4-generated summaries, respectively, maintained moderate scientific accuracy according to the two physician-reviewers.Conclusions: The findings demonstrate promising potential for application of the ChatGPT in patient education. GPT4 is an accessible tool that can be an immediate solution to enhancing the readability of current neurosurgical literature. Layperson summaries generated by GPT4 would be a valuable addition to a neurosurgical journal and would be likely to improve comprehension for patients using internet resources like PubMed.","PeriodicalId":16505,"journal":{"name":"Journal of neurosurgery","volume":" ","pages":"1189-1195"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of neurosurgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3171/2024.6.JNS24617","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"Print","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: In this study the authors assessed the ability of Chat Generative Pretrained Transformer (ChatGPT) 3.5 and ChatGPT4 to generate readable and accurate summaries of published neurosurgical literature.

Methods: Abstracts published in journal issues released between June 2023 and August 2023 (n = 150) were randomly selected from the top 5 ranked neurosurgical journals according to Google Scholar. ChatGPT models were instructed to generate a readable layperson summary of the original abstract from a statistically validated prompt. Readability results and grade-level indicators (RR-GLIs) scores were calculated for GPT3.5- and GPT4-generated summaries and original abstracts. Two physicians independently rated the accuracy of ChatGPT-generated layperson summaries to assess scientific validity. One-way ANOVA followed by pairwise t-test with Bonferroni correction were performed to compare readability scores. Cohen's kappa was used to assess interrater agreement between the two rater physicians.

Results: Analysis of 150 original abstracts showed a statistically significant difference for all RR-GLIs between the ChatGPT-generated summaries and original abstracts. The readability scores are formatted as follows (original abstract mean, GPT3.5 summary mean, GPT4 summary mean, p value): Flesch-Kincaid reading grade (12.55, 7.80, 7.70, p < 0.0001); Gunning fog score (15.46, 10.00, 9.00, p < 0.0001); Simple Measure of Gobbledygook (SMOG) index (11.30, 7.13, 6.60, p < 0.0001); Coleman-Liau index (14.67, 11.32, 10.26, p < 0.0001); automated readability index (10.87, 8.50, 7.75, p < 0.0001); and Flesch-Kincaid reading ease (33.29, 68.45, 69.55, p < 0.0001). GPT4-generated summaries demonstrated higher RR-GLIs than GPT3.5-generated summaries in the following categories: Gunning fog score (0.0003); SMOG index (0.027); Coleman-Liau index (< 0.0001); sentences (< 0.0001); complex words (< 0.0001); and % complex words (0.0035). A total of 68.4% and 84.2% of GPT3.5- and GPT4-generated summaries, respectively, maintained moderate scientific accuracy according to the two physician-reviewers.

Conclusions: The findings demonstrate promising potential for application of the ChatGPT in patient education. GPT4 is an accessible tool that can be an immediate solution to enhancing the readability of current neurosurgical literature. Layperson summaries generated by GPT4 would be a valuable addition to a neurosurgical journal and would be likely to improve comprehension for patients using internet resources like PubMed.

查看原文本刊更多论文

人工智能作为一种模式，可提高神经外科文献对患者的可读性。

目的：在这项研究中，作者评估了 Chat Generative Pretrained Transformer (ChatGPT) 3.5 和 ChatGPT4 生成可读且准确的神经外科文献摘要的能力：从谷歌学术排名前 5 位的神经外科期刊中随机选取了 2023 年 6 月至 2023 年 8 月间发行的期刊上发表的摘要（n = 150）。指导 ChatGPT 模型根据经统计验证的提示生成原始摘要的外行人可读摘要。计算GPT3.5和GPT4生成的摘要和原始摘要的可读性结果和等级指标（RR-GLIs）得分。两名医生对 ChatGPT 生成的非专业摘要的准确性进行了独立评分，以评估其科学性。在比较可读性得分时，先进行单因素方差分析，然后进行配对 t 检验并进行 Bonferroni 校正。科恩卡帕（Cohen's kappa）用于评估两位评分医生之间的评分者间一致性：结果：对 150 篇原始摘要的分析表明，ChatGPT 生成的摘要与原始摘要在所有 RR-GLIs 上都存在显著的统计学差异。可读性评分的格式如下（原始摘要平均值、GPT3.5 摘要平均值、GPT4 摘要平均值、P 值）：Flesch-Kincaid阅读等级（12.55，7.80，7.70，P＜0.0001）；Gunning雾度得分（15.46，10.00，9.00，P＜0.0001）；Simple Measure of Gobbledygook（SMOG）指数（11.30，7.13，6.60，P＜0.0001); Coleman-Liau 指数 (14.67, 11.32, 10.26, p < 0.0001); 自动可读性指数 (10.87, 8.50, 7.75, p < 0.0001); 和 Flesch-Kincaid 阅读难易度 (33.29, 68.45, 69.55, p < 0.0001)。在以下类别中，GPT4 生成的摘要比 GPT3.5 生成的摘要显示出更高的 RR-GLI：Gunning 雾度得分（0.0003）；SMOG 指数（0.027）；Coleman-Liau 指数（< 0.0001）；句子（< 0.0001）；复杂词（< 0.0001）；复杂词百分比（0.0035）。根据两位医生评审员的意见，GPT3.5 和 GPT4 生成的摘要分别有 68.4% 和 84.2% 保持了中等的科学准确性：研究结果表明，聊天 GPT 在患者教育中的应用前景广阔。GPT4 是一种易于使用的工具，可以立即提高当前神经外科文献的可读性。由 GPT4 生成的非专业人士摘要将成为神经外科期刊的重要补充，并有可能提高使用 PubMed 等互联网资源的患者的理解能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of neurosurgery 医学-临床神经学

CiteScore

7.20

自引率

7.30%

发文量

1003

审稿时长

1 months

期刊介绍： The Journal of Neurosurgery, Journal of Neurosurgery: Spine, Journal of Neurosurgery: Pediatrics, and Neurosurgical Focus are devoted to the publication of original works relating primarily to neurosurgery, including studies in clinical neurophysiology, organic neurology, ophthalmology, radiology, pathology, and molecular biology. The Editors and Editorial Boards encourage submission of clinical and laboratory studies. Other manuscripts accepted for review include technical notes on instruments or equipment that are innovative or useful to clinicians and researchers in the field of neuroscience; papers describing unusual cases; manuscripts on historical persons or events related to neurosurgery; and in Neurosurgical Focus, occasional reviews. Letters to the Editor commenting on articles recently published in the Journal of Neurosurgery, Journal of Neurosurgery: Spine, and Journal of Neurosurgery: Pediatrics are welcome.