Refining patient education on autologous chondrocyte implantation for chondral lesions of the knee: A fine-tuned ChatGPT-4o model improves readability and quality

IF 2.7 Q2 ORTHOPEDICS
Maha Alsadaan, Stephen Fahy, Danko Dan Milinkovic, Benjamin Bartek, Tobias Winkler, Tobias Jung, Stephan Oehme
{"title":"Refining patient education on autologous chondrocyte implantation for chondral lesions of the knee: A fine-tuned ChatGPT-4o model improves readability and quality","authors":"Maha Alsadaan,&nbsp;Stephen Fahy,&nbsp;Danko Dan Milinkovic,&nbsp;Benjamin Bartek,&nbsp;Tobias Winkler,&nbsp;Tobias Jung,&nbsp;Stephan Oehme","doi":"10.1002/jeo2.70445","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>Autologous chondrocyte implantation (ACI) is a complex procedure for cartilage defects, requiring patient understanding of treatment and recovery, as health literacy impacts outcomes. This study evaluated the quality and readability of AI-generated ACI materials using ChatGPT-4o as adjuncts to physician-led education. We compared responses from the native model and a fine-tuned version and hypothesised that the fine-tuned model would provide improved quality and readability.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Twenty-two frequently asked questions were identified using Google's ‘People Also Asked’ feature. Two ChatGPT-4o configurations were evaluated: the native model and a fine-tuned version (ACI guide) optimised by instruction-based fine-tuning and reinforcement learning from human feedback. Two orthopaedic surgeons independently scored the responses. Quality was assessed using the DISCERN criteria and readability by the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Interrater reliability was determined using intraclass correlation coefficient (ICC) in a two-way mixed-effects model.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The fine-tuned ACI Guide outperformed the native ChatGPT-4o on all parameters. The native model produced poor-quality responses with a mean DISCERN score of 35.29 ± 5.0 (range: 23–45), while the ACI Guide achieved a significantly higher score of 43.18  ± 3.92 (range: 34–53; <i>p</i> &lt; 0.001), reflecting moderate quality. Regarding readability, the native model reached FKGL of 13.45 ± 1.30 (university sophomore level)<b>.</b> In contrast, the ACI Guide achieved FKGL of 9.25 ± 1.64 (9th-grade level). The FRES was also significantly higher for the ACI Guide (49.59 ± 10.44) than the native model (35.68 ± 5.08; <i>p</i> &lt; 0.001). Interrater reliability was strong (ICC = 0.767), indicating good agreement.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>ChatGPT-4o's responses were of poor quality and written at a readability level substantially exceeding recommended thresholds for patient education materials, limiting their applicability in clinical communication and patient education. Fine-tuning ChatGPT-4o improved the readability and quality of ACI patient education materials, generating content closer to the 8th–9th-grade level. It may serve as a useful adjunct to physician-led education in enhancing patient understanding of complex orthopaedic procedures.</p>\n </section>\n \n <section>\n \n <h3> Level of Evidence</h3>\n \n <p>Level V.</p>\n </section>\n </div>","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"12 4","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://esskajournals.onlinelibrary.wiley.com/doi/epdf/10.1002/jeo2.70445","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://esskajournals.onlinelibrary.wiley.com/doi/10.1002/jeo2.70445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

Autologous chondrocyte implantation (ACI) is a complex procedure for cartilage defects, requiring patient understanding of treatment and recovery, as health literacy impacts outcomes. This study evaluated the quality and readability of AI-generated ACI materials using ChatGPT-4o as adjuncts to physician-led education. We compared responses from the native model and a fine-tuned version and hypothesised that the fine-tuned model would provide improved quality and readability.

Methods

Twenty-two frequently asked questions were identified using Google's ‘People Also Asked’ feature. Two ChatGPT-4o configurations were evaluated: the native model and a fine-tuned version (ACI guide) optimised by instruction-based fine-tuning and reinforcement learning from human feedback. Two orthopaedic surgeons independently scored the responses. Quality was assessed using the DISCERN criteria and readability by the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Interrater reliability was determined using intraclass correlation coefficient (ICC) in a two-way mixed-effects model.

Results

The fine-tuned ACI Guide outperformed the native ChatGPT-4o on all parameters. The native model produced poor-quality responses with a mean DISCERN score of 35.29 ± 5.0 (range: 23–45), while the ACI Guide achieved a significantly higher score of 43.18  ± 3.92 (range: 34–53; p < 0.001), reflecting moderate quality. Regarding readability, the native model reached FKGL of 13.45 ± 1.30 (university sophomore level). In contrast, the ACI Guide achieved FKGL of 9.25 ± 1.64 (9th-grade level). The FRES was also significantly higher for the ACI Guide (49.59 ± 10.44) than the native model (35.68 ± 5.08; p < 0.001). Interrater reliability was strong (ICC = 0.767), indicating good agreement.

Conclusions

ChatGPT-4o's responses were of poor quality and written at a readability level substantially exceeding recommended thresholds for patient education materials, limiting their applicability in clinical communication and patient education. Fine-tuning ChatGPT-4o improved the readability and quality of ACI patient education materials, generating content closer to the 8th–9th-grade level. It may serve as a useful adjunct to physician-led education in enhancing patient understanding of complex orthopaedic procedures.

Level of Evidence

Level V.

Abstract Image

改进患者对膝关节软骨病变自体软骨细胞植入的教育:一个微调的chatgpt - 40模型提高了可读性和质量
目的自体软骨细胞植入(ACI)是一项复杂的软骨缺损手术,需要患者了解治疗和恢复,因为健康素养会影响结果。本研究使用chatgpt - 40作为医生主导教育的辅助工具,评估了人工智能生成的ACI材料的质量和可读性。我们比较了原生模型和微调版本的响应,并假设微调模型将提供更好的质量和可读性。方法利用谷歌的“人们也被问到”功能识别22个常见问题。评估了两种chatgpt - 40配置:原生模型和微调版本(ACI指南),通过基于指令的微调和从人类反馈中强化学习来优化。两名整形外科医生独立对这些回答进行评分。采用DISCERN标准评估质量,采用Flesch Reading Ease Score (FRES)和Flesch- kincaid Grade Level (FKGL)评估可读性。在双向混合效应模型中,使用类内相关系数(ICC)来确定组间信度。结果优化后的ACI Guide在所有参数上都优于原生chatgpt - 40。原生模型的反应质量较差,平均DISCERN得分为35.29±5.0(范围:23-45),而ACI指南的得分更高,为43.18 ±3.92(范围:34-53;p < 0.001),反映了中等质量。在可读性方面,本地模型的FKGL达到13.45±1.30(大学大二水平)。相比之下,ACI指南的FKGL为9.25±1.64(9年级水平)。ACI指南的FRES(49.59±10.44)也显著高于原生模型(35.68±5.08;p < 0.001)。测者间信度较强(ICC = 0.767),表明一致性较好。结论chatgpt - 40的回复质量较差,可读性大大超过了患者教育材料的推荐阈值,限制了其在临床交流和患者教育中的适用性。微调chatgpt - 40提高了ACI患者教育材料的可读性和质量,生成的内容更接近8 - 9年级的水平。它可以作为一个有用的辅助医生主导的教育,以提高患者对复杂的骨科手术的理解。证据等级V级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Experimental Orthopaedics
Journal of Experimental Orthopaedics Medicine-Orthopedics and Sports Medicine
CiteScore
3.20
自引率
5.60%
发文量
114
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信