评估ChatGPT在解决成人脊柱畸形手术患者查询中的准确性和可读性。

IF 2.6 3区医学 Q2 CLINICAL NEUROLOGY

Global Spine Journal Pub Date : 2025-07-11 DOI:10.1177/21925682251360655

Fergui Hernandez, Rafael Guizar, Henry Avetisian, Marc A Abdou, William J Karakash, Andy Ton, Matthew C Gallo, Jacob R Ball, Jeffrey C Wang, Ram K Alluri, Raymond J Hah, Michael Safaee

{"title":"评估ChatGPT在解决成人脊柱畸形手术患者查询中的准确性和可读性。","authors":"Fergui Hernandez, Rafael Guizar, Henry Avetisian, Marc A Abdou, William J Karakash, Andy Ton, Matthew C Gallo, Jacob R Ball, Jeffrey C Wang, Ram K Alluri, Raymond J Hah, Michael Safaee","doi":"10.1177/21925682251360655","DOIUrl":null,"url":null,"abstract":"Study DesignCross-Sectional.ObjectivesAdult spinal deformity (ASD) affects 68% of the elderly, with surgical intervention carrying complication rates of up to 50%. Effective patient education is essential for managing expectations, yet high patient volumes can limit preoperative counseling. Language learning models (LLMs), such as ChatGPT, may supplement patient education. This study evaluates ChatGPT-3.5's accuracy and readability in answering common patient questions regarding ASD surgery.MethodsStructured interviews with ASD surgery patients identified 40 common preoperative questions, of which 19 were selected. Each question was posed to ChatGPT-3.5 in separate chat sessions to ensure independent responses. Three spine surgeons assessed response accuracy using a validated 4-point scale (1 = excellent, 4 = unsatisfactory). Readability was analyzed using the Flesch-Kincaid Grade Level formula.ResultsPatient inquiries fell into four themes: (1) Preoperative preparation, (2) Recovery (pain expectations, physical therapy), (3) Lifestyle modifications, and (4) Postoperative course. Accuracy scores varies: Preoperative responses averaged 1.67, Recovery and lifestyle responses 1.33, and postoperative responses 2.0. 59.7% of responses were excellent (no clarification needed), 26.3% were satisfactory (minimal clarification needed), 12.3% required moderate clarification, and 1.8% were unsatisfactory, with one response (\"Will my pain return or worsen?\") rated inaccurate by all reviewers. Readability analysis showed all 19 responses exceeded the eight-grade reading level by an average of 5.91 grade levels.ConclusionChatGPT-3.5 demonstrates potential as a supplemental patient education tool but provides varying accuracy and complex readability. While it may support patient understanding, the complexity of its responses may limit usefulness for individuals with lower health literacy.","PeriodicalId":12680,"journal":{"name":"Global Spine Journal","volume":" ","pages":"21925682251360655"},"PeriodicalIF":2.6000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12254131/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Accuracy and Readability of ChatGPT in Addressing Patient Queries on Adult Spinal Deformity Surgery.\",\"authors\":\"Fergui Hernandez, Rafael Guizar, Henry Avetisian, Marc A Abdou, William J Karakash, Andy Ton, Matthew C Gallo, Jacob R Ball, Jeffrey C Wang, Ram K Alluri, Raymond J Hah, Michael Safaee\",\"doi\":\"10.1177/21925682251360655\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Study DesignCross-Sectional.ObjectivesAdult spinal deformity (ASD) affects 68% of the elderly, with surgical intervention carrying complication rates of up to 50%. Effective patient education is essential for managing expectations, yet high patient volumes can limit preoperative counseling. Language learning models (LLMs), such as ChatGPT, may supplement patient education. This study evaluates ChatGPT-3.5's accuracy and readability in answering common patient questions regarding ASD surgery.MethodsStructured interviews with ASD surgery patients identified 40 common preoperative questions, of which 19 were selected. Each question was posed to ChatGPT-3.5 in separate chat sessions to ensure independent responses. Three spine surgeons assessed response accuracy using a validated 4-point scale (1 = excellent, 4 = unsatisfactory). Readability was analyzed using the Flesch-Kincaid Grade Level formula.ResultsPatient inquiries fell into four themes: (1) Preoperative preparation, (2) Recovery (pain expectations, physical therapy), (3) Lifestyle modifications, and (4) Postoperative course. Accuracy scores varies: Preoperative responses averaged 1.67, Recovery and lifestyle responses 1.33, and postoperative responses 2.0. 59.7% of responses were excellent (no clarification needed), 26.3% were satisfactory (minimal clarification needed), 12.3% required moderate clarification, and 1.8% were unsatisfactory, with one response (\\\"Will my pain return or worsen?\\\") rated inaccurate by all reviewers. Readability analysis showed all 19 responses exceeded the eight-grade reading level by an average of 5.91 grade levels.ConclusionChatGPT-3.5 demonstrates potential as a supplemental patient education tool but provides varying accuracy and complex readability. While it may support patient understanding, the complexity of its responses may limit usefulness for individuals with lower health literacy.\",\"PeriodicalId\":12680,\"journal\":{\"name\":\"Global Spine Journal\",\"volume\":\" \",\"pages\":\"21925682251360655\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12254131/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Spine Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/21925682251360655\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/21925682251360655","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

摘要

研究DesignCross-Sectional。成人脊柱畸形（ASD）影响68%的老年人，手术干预的并发症发生率高达50%。有效的患者教育对于管理预期至关重要，然而高患者数量会限制术前咨询。语言学习模型（llm），如ChatGPT，可以补充患者教育。本研究评估了ChatGPT-3.5在回答有关ASD手术的常见患者问题时的准确性和可读性。方法对ASD手术患者进行结构化访谈，确定40个术前常见问题，从中选取19个。每个问题都是在单独的聊天会话中向ChatGPT-3.5提出的，以确保独立的回答。三名脊柱外科医生使用有效的4分制评估反应准确性（1 =优秀，4 =不满意）。使用Flesch-Kincaid Grade Level公式分析可读性。结果患者的问询分为4个主题：(1)术前准备，(2)康复（疼痛预期、物理治疗），(3)生活方式改变，(4)术后过程。准确性评分各不相同：术前反应平均1.67，恢复和生活方式反应平均1.33，术后反应平均2.0。59.7%的回答是优秀的（不需要澄清），26.3%的回答是满意的（需要最少的澄清），12.3%的回答是中等程度的澄清，1.8%的回答是不满意的，其中一个回答（“我的疼痛会复发或恶化吗？”）被所有评论者评为不准确。结论chatgpt -3.5作为一种辅助的患者教育工具具有一定的潜力，但其准确性和可读性存在差异。虽然它可能有助于患者的理解，但其反应的复杂性可能会限制对健康素养较低的个人的有用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating the Accuracy and Readability of ChatGPT in Addressing Patient Queries on Adult Spinal Deformity Surgery.

Study DesignCross-Sectional.ObjectivesAdult spinal deformity (ASD) affects 68% of the elderly, with surgical intervention carrying complication rates of up to 50%. Effective patient education is essential for managing expectations, yet high patient volumes can limit preoperative counseling. Language learning models (LLMs), such as ChatGPT, may supplement patient education. This study evaluates ChatGPT-3.5's accuracy and readability in answering common patient questions regarding ASD surgery.MethodsStructured interviews with ASD surgery patients identified 40 common preoperative questions, of which 19 were selected. Each question was posed to ChatGPT-3.5 in separate chat sessions to ensure independent responses. Three spine surgeons assessed response accuracy using a validated 4-point scale (1 = excellent, 4 = unsatisfactory). Readability was analyzed using the Flesch-Kincaid Grade Level formula.ResultsPatient inquiries fell into four themes: (1) Preoperative preparation, (2) Recovery (pain expectations, physical therapy), (3) Lifestyle modifications, and (4) Postoperative course. Accuracy scores varies: Preoperative responses averaged 1.67, Recovery and lifestyle responses 1.33, and postoperative responses 2.0. 59.7% of responses were excellent (no clarification needed), 26.3% were satisfactory (minimal clarification needed), 12.3% required moderate clarification, and 1.8% were unsatisfactory, with one response ("Will my pain return or worsen?") rated inaccurate by all reviewers. Readability analysis showed all 19 responses exceeded the eight-grade reading level by an average of 5.91 grade levels.ConclusionChatGPT-3.5 demonstrates potential as a supplemental patient education tool but provides varying accuracy and complex readability. While it may support patient understanding, the complexity of its responses may limit usefulness for individuals with lower health literacy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Global Spine Journal Medicine-Surgery

CiteScore

6.20

自引率

8.30%

发文量

278

审稿时长

8 weeks

期刊介绍： Global Spine Journal (GSJ) is the official scientific publication of AOSpine. A peer-reviewed, open access journal, devoted to the study and treatment of spinal disorders, including diagnosis, operative and non-operative treatment options, surgical techniques, and emerging research and clinical developments.GSJ is indexed in PubMedCentral, SCOPUS, and Emerging Sources Citation Index (ESCI).