{"title":"The Role of Claude 3.5 Sonet and ChatGPT-4 in Posterior Cervical Fusion Patient Guidance.","authors":"Rauf Nasirov","doi":"10.1016/j.wneu.2025.123889","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study evaluates the role of ChatGPT-4 and Claude 3.5 Sonet in postoperative management for patients undergoing posterior cervical fusion. It focuses on their ability to provide accurate, clear, and relevant responses to patient concerns, highlighting their potential as supplementary tools in surgical aftercare.</p><p><strong>Methods: </strong>Ten common postoperative questions were selected and posed to ChatGPT-4 and Claude 3.5 Sonet. Ten independent neurosurgeons evaluated responses using a structured framework that assessed accuracy, response time, clarity, and relevance. A 5-point Likert scale also measured satisfaction, quality, performance, and importance. Advanced statistical analyses were used to compare the two AI platforms, including sensitivity, specificity, p-values, confidence intervals, and Cohen's d.</p><p><strong>Results: </strong>Claude 3.5 Sonet outperformed ChatGPT-4 across all metrics, particularly in accuracy (96.5% vs. 80.6%), response time (92.9% vs. 76.4%), clarity (94.6% vs. 75.4%), and relevance (95.5% vs. 74.0%). Likert scale evaluations showed significant differences (p < 0.001) in satisfaction, quality, and performance, with Claude achieving higher ratings. Statistical analyses confirmed large effect sizes, high inter-rater reliability (kappa: 0.85-0.92 for Claude), and narrower confidence intervals, reinforcing Claude's consistency and superior performance.</p><p><strong>Conclusion: </strong>Claude 3.5 Sonet demonstrated exceptional capability in addressing postoperative concerns for posterior cervical fusion patients, surpassing ChatGPT-4 in accuracy, clarity, and practical relevance. These findings underscore its potential as a reliable AI tool for enhancing patient care and satisfaction in surgical aftercare.</p>","PeriodicalId":23906,"journal":{"name":"World neurosurgery","volume":" ","pages":"123889"},"PeriodicalIF":1.9000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World neurosurgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.wneu.2025.123889","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: This study evaluates the role of ChatGPT-4 and Claude 3.5 Sonet in postoperative management for patients undergoing posterior cervical fusion. It focuses on their ability to provide accurate, clear, and relevant responses to patient concerns, highlighting their potential as supplementary tools in surgical aftercare.
Methods: Ten common postoperative questions were selected and posed to ChatGPT-4 and Claude 3.5 Sonet. Ten independent neurosurgeons evaluated responses using a structured framework that assessed accuracy, response time, clarity, and relevance. A 5-point Likert scale also measured satisfaction, quality, performance, and importance. Advanced statistical analyses were used to compare the two AI platforms, including sensitivity, specificity, p-values, confidence intervals, and Cohen's d.
Results: Claude 3.5 Sonet outperformed ChatGPT-4 across all metrics, particularly in accuracy (96.5% vs. 80.6%), response time (92.9% vs. 76.4%), clarity (94.6% vs. 75.4%), and relevance (95.5% vs. 74.0%). Likert scale evaluations showed significant differences (p < 0.001) in satisfaction, quality, and performance, with Claude achieving higher ratings. Statistical analyses confirmed large effect sizes, high inter-rater reliability (kappa: 0.85-0.92 for Claude), and narrower confidence intervals, reinforcing Claude's consistency and superior performance.
Conclusion: Claude 3.5 Sonet demonstrated exceptional capability in addressing postoperative concerns for posterior cervical fusion patients, surpassing ChatGPT-4 in accuracy, clarity, and practical relevance. These findings underscore its potential as a reliable AI tool for enhancing patient care and satisfaction in surgical aftercare.
期刊介绍:
World Neurosurgery has an open access mirror journal World Neurosurgery: X, sharing the same aims and scope, editorial team, submission system and rigorous peer review.
The journal''s mission is to:
-To provide a first-class international forum and a 2-way conduit for dialogue that is relevant to neurosurgeons and providers who care for neurosurgery patients. The categories of the exchanged information include clinical and basic science, as well as global information that provide social, political, educational, economic, cultural or societal insights and knowledge that are of significance and relevance to worldwide neurosurgery patient care.
-To act as a primary intellectual catalyst for the stimulation of creativity, the creation of new knowledge, and the enhancement of quality neurosurgical care worldwide.
-To provide a forum for communication that enriches the lives of all neurosurgeons and their colleagues; and, in so doing, enriches the lives of their patients.
Topics to be addressed in World Neurosurgery include: EDUCATION, ECONOMICS, RESEARCH, POLITICS, HISTORY, CULTURE, CLINICAL SCIENCE, LABORATORY SCIENCE, TECHNOLOGY, OPERATIVE TECHNIQUES, CLINICAL IMAGES, VIDEOS