评价ChatGPT对脊柱侧凸反应的可靠性、有用性、质量和可读性。

IF 1.4 Q3 ORTHOPEDICS
Ayşe Merve Çıracıoğlu, Suheyla Dal Erdoğan
{"title":"评价ChatGPT对脊柱侧凸反应的可靠性、有用性、质量和可读性。","authors":"Ayşe Merve Çıracıoğlu, Suheyla Dal Erdoğan","doi":"10.1007/s00590-025-04198-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study evaluates the reliability, usefulness, quality, and readability of ChatGPT's responses to frequently asked questions about scoliosis.</p><p><strong>Methods: </strong>Sixteen frequently asked questions, identified through an analysis of Google Trends data and clinical feedback, were presented to ChatGPT for evaluation. Two independent experts assessed the responses using a 7-point Likert scale for reliability and usefulness. Additionally, the overall quality was also rated using the Global Quality Scale (GQS). To assess readability, various established metrics were employed, including the Flesch Reading Ease score (FRE), the Simple Measure of Gobbledygook (SMOG) Index, the Coleman-Liau Index (CLI), the Gunning Fog Index (GFI), the Flesch-Kinkaid Grade Level (FKGL), the FORCAST Grade Level, and the Automated Readability Index (ARI).</p><p><strong>Results: </strong>The mean reliability scores were 4.68 ± 0.73 (Median: 5, IQR 4-5), while the mean usefulness scores were 4.84 ± 0.84 (Median: 5, IQR 4-5). Additionally the mean GQS scores were 4.28 ± 0.58 (Median: 4, IQR 4-5). Inter-rater reliability analysis using the Intraclass correlation coefficient showed excellent agreement: 0.942 for reliability, 0.935 for usefulness, and 0.868 for GQS. While general informational questions received high scores, responses to treatment-specific and personalized inquiries required greater depth and comprehensiveness. Readability analysis indicated that ChatGPT's responses required at least a high school senior to college-level reading ability.</p><p><strong>Conclusion: </strong>ChatGPT provides reliable, useful, and moderate quality information on scoliosis but has limitations in addressing treatment-specific and personalized inquiries. Caution is essential when using Artificial Intelligence (AI) in patient education and medical decision-making.</p>","PeriodicalId":50484,"journal":{"name":"European Journal of Orthopaedic Surgery and Traumatology","volume":"35 1","pages":"123"},"PeriodicalIF":1.4000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of the reliability, usefulness, quality and readability of ChatGPT's responses on Scoliosis.\",\"authors\":\"Ayşe Merve Çıracıoğlu, Suheyla Dal Erdoğan\",\"doi\":\"10.1007/s00590-025-04198-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>This study evaluates the reliability, usefulness, quality, and readability of ChatGPT's responses to frequently asked questions about scoliosis.</p><p><strong>Methods: </strong>Sixteen frequently asked questions, identified through an analysis of Google Trends data and clinical feedback, were presented to ChatGPT for evaluation. Two independent experts assessed the responses using a 7-point Likert scale for reliability and usefulness. Additionally, the overall quality was also rated using the Global Quality Scale (GQS). To assess readability, various established metrics were employed, including the Flesch Reading Ease score (FRE), the Simple Measure of Gobbledygook (SMOG) Index, the Coleman-Liau Index (CLI), the Gunning Fog Index (GFI), the Flesch-Kinkaid Grade Level (FKGL), the FORCAST Grade Level, and the Automated Readability Index (ARI).</p><p><strong>Results: </strong>The mean reliability scores were 4.68 ± 0.73 (Median: 5, IQR 4-5), while the mean usefulness scores were 4.84 ± 0.84 (Median: 5, IQR 4-5). Additionally the mean GQS scores were 4.28 ± 0.58 (Median: 4, IQR 4-5). Inter-rater reliability analysis using the Intraclass correlation coefficient showed excellent agreement: 0.942 for reliability, 0.935 for usefulness, and 0.868 for GQS. While general informational questions received high scores, responses to treatment-specific and personalized inquiries required greater depth and comprehensiveness. Readability analysis indicated that ChatGPT's responses required at least a high school senior to college-level reading ability.</p><p><strong>Conclusion: </strong>ChatGPT provides reliable, useful, and moderate quality information on scoliosis but has limitations in addressing treatment-specific and personalized inquiries. Caution is essential when using Artificial Intelligence (AI) in patient education and medical decision-making.</p>\",\"PeriodicalId\":50484,\"journal\":{\"name\":\"European Journal of Orthopaedic Surgery and Traumatology\",\"volume\":\"35 1\",\"pages\":\"123\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Orthopaedic Surgery and Traumatology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00590-025-04198-4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Orthopaedic Surgery and Traumatology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00590-025-04198-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究评估ChatGPT对脊柱侧凸常见问题的回答的可靠性、有用性、质量和可读性。方法:通过分析谷歌Trends数据和临床反馈确定的16个常见问题,提交给ChatGPT进行评估。两位独立专家使用7分李克特量表评估了这些回答的可靠性和有用性。此外,还使用全球质量量表(GQS)对整体质量进行了评级。为了评估可读性,采用了各种既定的指标,包括Flesch Reading Ease score (FRE)、Simple Measure of Gobbledygook (SMOG)指数、Coleman-Liau指数(CLI)、Gunning Fog指数(GFI)、Flesch- kinkaid等级水平(FKGL)、cast等级水平和自动可读性指数(ARI)。结果:信度平均得分为4.68±0.73(中位数:5,IQR 4-5),有用性平均得分为4.84±0.84(中位数:5,IQR 4-5)。平均GQS评分为4.28±0.58(中位数:4,IQR 4-5)。采用类内相关系数进行信度分析,结果一致,信度为0.942,有用性为0.935,GQS为0.868。虽然一般的信息问题得到了高分,但对特定治疗和个性化询问的回答需要更深入和全面。可读性分析表明,ChatGPT的回答至少需要高中至大学水平的阅读能力。结论:ChatGPT提供了可靠、有用和中等质量的脊柱侧凸信息,但在解决特定治疗和个性化查询方面存在局限性。在患者教育和医疗决策中使用人工智能(AI)时,谨慎是必不可少的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of the reliability, usefulness, quality and readability of ChatGPT's responses on Scoliosis.

Objective: This study evaluates the reliability, usefulness, quality, and readability of ChatGPT's responses to frequently asked questions about scoliosis.

Methods: Sixteen frequently asked questions, identified through an analysis of Google Trends data and clinical feedback, were presented to ChatGPT for evaluation. Two independent experts assessed the responses using a 7-point Likert scale for reliability and usefulness. Additionally, the overall quality was also rated using the Global Quality Scale (GQS). To assess readability, various established metrics were employed, including the Flesch Reading Ease score (FRE), the Simple Measure of Gobbledygook (SMOG) Index, the Coleman-Liau Index (CLI), the Gunning Fog Index (GFI), the Flesch-Kinkaid Grade Level (FKGL), the FORCAST Grade Level, and the Automated Readability Index (ARI).

Results: The mean reliability scores were 4.68 ± 0.73 (Median: 5, IQR 4-5), while the mean usefulness scores were 4.84 ± 0.84 (Median: 5, IQR 4-5). Additionally the mean GQS scores were 4.28 ± 0.58 (Median: 4, IQR 4-5). Inter-rater reliability analysis using the Intraclass correlation coefficient showed excellent agreement: 0.942 for reliability, 0.935 for usefulness, and 0.868 for GQS. While general informational questions received high scores, responses to treatment-specific and personalized inquiries required greater depth and comprehensiveness. Readability analysis indicated that ChatGPT's responses required at least a high school senior to college-level reading ability.

Conclusion: ChatGPT provides reliable, useful, and moderate quality information on scoliosis but has limitations in addressing treatment-specific and personalized inquiries. Caution is essential when using Artificial Intelligence (AI) in patient education and medical decision-making.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.00
自引率
5.90%
发文量
265
审稿时长
3-8 weeks
期刊介绍: The European Journal of Orthopaedic Surgery and Traumatology (EJOST) aims to publish high quality Orthopedic scientific work. The objective of our journal is to disseminate meaningful, impactful, clinically relevant work from each and every region of the world, that has the potential to change and or inform clinical practice.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信