{"title":"Evaluation of the reliability, usefulness, quality and readability of ChatGPT's responses on Scoliosis.","authors":"Ayşe Merve Çıracıoğlu, Suheyla Dal Erdoğan","doi":"10.1007/s00590-025-04198-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study evaluates the reliability, usefulness, quality, and readability of ChatGPT's responses to frequently asked questions about scoliosis.</p><p><strong>Methods: </strong>Sixteen frequently asked questions, identified through an analysis of Google Trends data and clinical feedback, were presented to ChatGPT for evaluation. Two independent experts assessed the responses using a 7-point Likert scale for reliability and usefulness. Additionally, the overall quality was also rated using the Global Quality Scale (GQS). To assess readability, various established metrics were employed, including the Flesch Reading Ease score (FRE), the Simple Measure of Gobbledygook (SMOG) Index, the Coleman-Liau Index (CLI), the Gunning Fog Index (GFI), the Flesch-Kinkaid Grade Level (FKGL), the FORCAST Grade Level, and the Automated Readability Index (ARI).</p><p><strong>Results: </strong>The mean reliability scores were 4.68 ± 0.73 (Median: 5, IQR 4-5), while the mean usefulness scores were 4.84 ± 0.84 (Median: 5, IQR 4-5). Additionally the mean GQS scores were 4.28 ± 0.58 (Median: 4, IQR 4-5). Inter-rater reliability analysis using the Intraclass correlation coefficient showed excellent agreement: 0.942 for reliability, 0.935 for usefulness, and 0.868 for GQS. While general informational questions received high scores, responses to treatment-specific and personalized inquiries required greater depth and comprehensiveness. Readability analysis indicated that ChatGPT's responses required at least a high school senior to college-level reading ability.</p><p><strong>Conclusion: </strong>ChatGPT provides reliable, useful, and moderate quality information on scoliosis but has limitations in addressing treatment-specific and personalized inquiries. Caution is essential when using Artificial Intelligence (AI) in patient education and medical decision-making.</p>","PeriodicalId":50484,"journal":{"name":"European Journal of Orthopaedic Surgery and Traumatology","volume":"35 1","pages":"123"},"PeriodicalIF":1.4000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Orthopaedic Surgery and Traumatology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00590-025-04198-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: This study evaluates the reliability, usefulness, quality, and readability of ChatGPT's responses to frequently asked questions about scoliosis.
Methods: Sixteen frequently asked questions, identified through an analysis of Google Trends data and clinical feedback, were presented to ChatGPT for evaluation. Two independent experts assessed the responses using a 7-point Likert scale for reliability and usefulness. Additionally, the overall quality was also rated using the Global Quality Scale (GQS). To assess readability, various established metrics were employed, including the Flesch Reading Ease score (FRE), the Simple Measure of Gobbledygook (SMOG) Index, the Coleman-Liau Index (CLI), the Gunning Fog Index (GFI), the Flesch-Kinkaid Grade Level (FKGL), the FORCAST Grade Level, and the Automated Readability Index (ARI).
Results: The mean reliability scores were 4.68 ± 0.73 (Median: 5, IQR 4-5), while the mean usefulness scores were 4.84 ± 0.84 (Median: 5, IQR 4-5). Additionally the mean GQS scores were 4.28 ± 0.58 (Median: 4, IQR 4-5). Inter-rater reliability analysis using the Intraclass correlation coefficient showed excellent agreement: 0.942 for reliability, 0.935 for usefulness, and 0.868 for GQS. While general informational questions received high scores, responses to treatment-specific and personalized inquiries required greater depth and comprehensiveness. Readability analysis indicated that ChatGPT's responses required at least a high school senior to college-level reading ability.
Conclusion: ChatGPT provides reliable, useful, and moderate quality information on scoliosis but has limitations in addressing treatment-specific and personalized inquiries. Caution is essential when using Artificial Intelligence (AI) in patient education and medical decision-making.
期刊介绍:
The European Journal of Orthopaedic Surgery and Traumatology (EJOST) aims to publish high quality Orthopedic scientific work. The objective of our journal is to disseminate meaningful, impactful, clinically relevant work from each and every region of the world, that has the potential to change and or inform clinical practice.