Fergui Hernandez, Rafael Guizar, Henry Avetisian, Marc A Abdou, William J Karakash, Andy Ton, Matthew C Gallo, Jacob R Ball, Jeffrey C Wang, Ram K Alluri, Raymond J Hah, Michael Safaee
{"title":"评估ChatGPT在解决成人脊柱畸形手术患者查询中的准确性和可读性。","authors":"Fergui Hernandez, Rafael Guizar, Henry Avetisian, Marc A Abdou, William J Karakash, Andy Ton, Matthew C Gallo, Jacob R Ball, Jeffrey C Wang, Ram K Alluri, Raymond J Hah, Michael Safaee","doi":"10.1177/21925682251360655","DOIUrl":null,"url":null,"abstract":"<p><p>Study DesignCross-Sectional.ObjectivesAdult spinal deformity (ASD) affects 68% of the elderly, with surgical intervention carrying complication rates of up to 50%. Effective patient education is essential for managing expectations, yet high patient volumes can limit preoperative counseling. Language learning models (LLMs), such as ChatGPT, may supplement patient education. This study evaluates ChatGPT-3.5's accuracy and readability in answering common patient questions regarding ASD surgery.MethodsStructured interviews with ASD surgery patients identified 40 common preoperative questions, of which 19 were selected. Each question was posed to ChatGPT-3.5 in separate chat sessions to ensure independent responses. Three spine surgeons assessed response accuracy using a validated 4-point scale (1 = excellent, 4 = unsatisfactory). Readability was analyzed using the Flesch-Kincaid Grade Level formula.ResultsPatient inquiries fell into four themes: (1) Preoperative preparation, (2) Recovery (pain expectations, physical therapy), (3) Lifestyle modifications, and (4) Postoperative course. Accuracy scores varies: Preoperative responses averaged 1.67, Recovery and lifestyle responses 1.33, and postoperative responses 2.0. 59.7% of responses were excellent (no clarification needed), 26.3% were satisfactory (minimal clarification needed), 12.3% required moderate clarification, and 1.8% were unsatisfactory, with one response (\"Will my pain return or worsen?\") rated inaccurate by all reviewers. Readability analysis showed all 19 responses exceeded the eight-grade reading level by an average of 5.91 grade levels.ConclusionChatGPT-3.5 demonstrates potential as a supplemental patient education tool but provides varying accuracy and complex readability. While it may support patient understanding, the complexity of its responses may limit usefulness for individuals with lower health literacy.</p>","PeriodicalId":12680,"journal":{"name":"Global Spine Journal","volume":" ","pages":"21925682251360655"},"PeriodicalIF":2.6000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12254131/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Accuracy and Readability of ChatGPT in Addressing Patient Queries on Adult Spinal Deformity Surgery.\",\"authors\":\"Fergui Hernandez, Rafael Guizar, Henry Avetisian, Marc A Abdou, William J Karakash, Andy Ton, Matthew C Gallo, Jacob R Ball, Jeffrey C Wang, Ram K Alluri, Raymond J Hah, Michael Safaee\",\"doi\":\"10.1177/21925682251360655\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Study DesignCross-Sectional.ObjectivesAdult spinal deformity (ASD) affects 68% of the elderly, with surgical intervention carrying complication rates of up to 50%. Effective patient education is essential for managing expectations, yet high patient volumes can limit preoperative counseling. Language learning models (LLMs), such as ChatGPT, may supplement patient education. This study evaluates ChatGPT-3.5's accuracy and readability in answering common patient questions regarding ASD surgery.MethodsStructured interviews with ASD surgery patients identified 40 common preoperative questions, of which 19 were selected. Each question was posed to ChatGPT-3.5 in separate chat sessions to ensure independent responses. Three spine surgeons assessed response accuracy using a validated 4-point scale (1 = excellent, 4 = unsatisfactory). Readability was analyzed using the Flesch-Kincaid Grade Level formula.ResultsPatient inquiries fell into four themes: (1) Preoperative preparation, (2) Recovery (pain expectations, physical therapy), (3) Lifestyle modifications, and (4) Postoperative course. Accuracy scores varies: Preoperative responses averaged 1.67, Recovery and lifestyle responses 1.33, and postoperative responses 2.0. 59.7% of responses were excellent (no clarification needed), 26.3% were satisfactory (minimal clarification needed), 12.3% required moderate clarification, and 1.8% were unsatisfactory, with one response (\\\"Will my pain return or worsen?\\\") rated inaccurate by all reviewers. Readability analysis showed all 19 responses exceeded the eight-grade reading level by an average of 5.91 grade levels.ConclusionChatGPT-3.5 demonstrates potential as a supplemental patient education tool but provides varying accuracy and complex readability. While it may support patient understanding, the complexity of its responses may limit usefulness for individuals with lower health literacy.</p>\",\"PeriodicalId\":12680,\"journal\":{\"name\":\"Global Spine Journal\",\"volume\":\" \",\"pages\":\"21925682251360655\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12254131/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Spine Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/21925682251360655\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/21925682251360655","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
Evaluating the Accuracy and Readability of ChatGPT in Addressing Patient Queries on Adult Spinal Deformity Surgery.
Study DesignCross-Sectional.ObjectivesAdult spinal deformity (ASD) affects 68% of the elderly, with surgical intervention carrying complication rates of up to 50%. Effective patient education is essential for managing expectations, yet high patient volumes can limit preoperative counseling. Language learning models (LLMs), such as ChatGPT, may supplement patient education. This study evaluates ChatGPT-3.5's accuracy and readability in answering common patient questions regarding ASD surgery.MethodsStructured interviews with ASD surgery patients identified 40 common preoperative questions, of which 19 were selected. Each question was posed to ChatGPT-3.5 in separate chat sessions to ensure independent responses. Three spine surgeons assessed response accuracy using a validated 4-point scale (1 = excellent, 4 = unsatisfactory). Readability was analyzed using the Flesch-Kincaid Grade Level formula.ResultsPatient inquiries fell into four themes: (1) Preoperative preparation, (2) Recovery (pain expectations, physical therapy), (3) Lifestyle modifications, and (4) Postoperative course. Accuracy scores varies: Preoperative responses averaged 1.67, Recovery and lifestyle responses 1.33, and postoperative responses 2.0. 59.7% of responses were excellent (no clarification needed), 26.3% were satisfactory (minimal clarification needed), 12.3% required moderate clarification, and 1.8% were unsatisfactory, with one response ("Will my pain return or worsen?") rated inaccurate by all reviewers. Readability analysis showed all 19 responses exceeded the eight-grade reading level by an average of 5.91 grade levels.ConclusionChatGPT-3.5 demonstrates potential as a supplemental patient education tool but provides varying accuracy and complex readability. While it may support patient understanding, the complexity of its responses may limit usefulness for individuals with lower health literacy.
期刊介绍:
Global Spine Journal (GSJ) is the official scientific publication of AOSpine. A peer-reviewed, open access journal, devoted to the study and treatment of spinal disorders, including diagnosis, operative and non-operative treatment options, surgical techniques, and emerging research and clinical developments.GSJ is indexed in PubMedCentral, SCOPUS, and Emerging Sources Citation Index (ESCI).