Refining patient education on autologous chondrocyte implantation for chondral lesions of the knee: A fine-tuned ChatGPT-4o model improves readability and quality

IF 2.7 Q2 ORTHOPEDICS

Journal of Experimental Orthopaedics Pub Date : 2025-09-27 DOI:10.1002/jeo2.70445

Maha Alsadaan, Stephen Fahy, Danko Dan Milinkovic, Benjamin Bartek, Tobias Winkler, Tobias Jung, Stephan Oehme

{"title":"Refining patient education on autologous chondrocyte implantation for chondral lesions of the knee: A fine-tuned ChatGPT-4o model improves readability and quality","authors":"Maha Alsadaan, Stephen Fahy, Danko Dan Milinkovic, Benjamin Bartek, Tobias Winkler, Tobias Jung, Stephan Oehme","doi":"10.1002/jeo2.70445","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>Autologous chondrocyte implantation (ACI) is a complex procedure for cartilage defects, requiring patient understanding of treatment and recovery, as health literacy impacts outcomes. This study evaluated the quality and readability of AI-generated ACI materials using ChatGPT-4o as adjuncts to physician-led education. We compared responses from the native model and a fine-tuned version and hypothesised that the fine-tuned model would provide improved quality and readability.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Twenty-two frequently asked questions were identified using Google's ‘People Also Asked’ feature. Two ChatGPT-4o configurations were evaluated: the native model and a fine-tuned version (ACI guide) optimised by instruction-based fine-tuning and reinforcement learning from human feedback. Two orthopaedic surgeons independently scored the responses. Quality was assessed using the DISCERN criteria and readability by the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Interrater reliability was determined using intraclass correlation coefficient (ICC) in a two-way mixed-effects model.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The fine-tuned ACI Guide outperformed the native ChatGPT-4o on all parameters. The native model produced poor-quality responses with a mean DISCERN score of 35.29 ± 5.0 (range: 23–45), while the ACI Guide achieved a significantly higher score of 43.18  ± 3.92 (range: 34–53; <i>p</i> < 0.001), reflecting moderate quality. Regarding readability, the native model reached FKGL of 13.45 ± 1.30 (university sophomore level)<b>.</b> In contrast, the ACI Guide achieved FKGL of 9.25 ± 1.64 (9th-grade level). The FRES was also significantly higher for the ACI Guide (49.59 ± 10.44) than the native model (35.68 ± 5.08; <i>p</i> < 0.001). Interrater reliability was strong (ICC = 0.767), indicating good agreement.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>ChatGPT-4o's responses were of poor quality and written at a readability level substantially exceeding recommended thresholds for patient education materials, limiting their applicability in clinical communication and patient education. Fine-tuning ChatGPT-4o improved the readability and quality of ACI patient education materials, generating content closer to the 8th–9th-grade level. It may serve as a useful adjunct to physician-led education in enhancing patient understanding of complex orthopaedic procedures.</p>\n </section>\n \n <section>\n \n <h3> Level of Evidence</h3>\n \n <p>Level V.</p>\n </section>\n </div>","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"12 4","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://esskajournals.onlinelibrary.wiley.com/doi/epdf/10.1002/jeo2.70445","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://esskajournals.onlinelibrary.wiley.com/doi/10.1002/jeo2.70445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

Autologous chondrocyte implantation (ACI) is a complex procedure for cartilage defects, requiring patient understanding of treatment and recovery, as health literacy impacts outcomes. This study evaluated the quality and readability of AI-generated ACI materials using ChatGPT-4o as adjuncts to physician-led education. We compared responses from the native model and a fine-tuned version and hypothesised that the fine-tuned model would provide improved quality and readability.

Methods

Twenty-two frequently asked questions were identified using Google's ‘People Also Asked’ feature. Two ChatGPT-4o configurations were evaluated: the native model and a fine-tuned version (ACI guide) optimised by instruction-based fine-tuning and reinforcement learning from human feedback. Two orthopaedic surgeons independently scored the responses. Quality was assessed using the DISCERN criteria and readability by the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Interrater reliability was determined using intraclass correlation coefficient (ICC) in a two-way mixed-effects model.

Results

The fine-tuned ACI Guide outperformed the native ChatGPT-4o on all parameters. The native model produced poor-quality responses with a mean DISCERN score of 35.29 ± 5.0 (range: 23–45), while the ACI Guide achieved a significantly higher score of 43.18  ± 3.92 (range: 34–53; p < 0.001), reflecting moderate quality. Regarding readability, the native model reached FKGL of 13.45 ± 1.30 (university sophomore level). In contrast, the ACI Guide achieved FKGL of 9.25 ± 1.64 (9th-grade level). The FRES was also significantly higher for the ACI Guide (49.59 ± 10.44) than the native model (35.68 ± 5.08; p < 0.001). Interrater reliability was strong (ICC = 0.767), indicating good agreement.

Conclusions

ChatGPT-4o's responses were of poor quality and written at a readability level substantially exceeding recommended thresholds for patient education materials, limiting their applicability in clinical communication and patient education. Fine-tuning ChatGPT-4o improved the readability and quality of ACI patient education materials, generating content closer to the 8th–9th-grade level. It may serve as a useful adjunct to physician-led education in enhancing patient understanding of complex orthopaedic procedures.

Level of Evidence

Level V.

Abstract Image

查看原文本刊更多论文

改进患者对膝关节软骨病变自体软骨细胞植入的教育：一个微调的chatgpt - 40模型提高了可读性和质量

目的自体软骨细胞植入（ACI）是一项复杂的软骨缺损手术，需要患者了解治疗和恢复，因为健康素养会影响结果。本研究使用chatgpt - 40作为医生主导教育的辅助工具，评估了人工智能生成的ACI材料的质量和可读性。我们比较了原生模型和微调版本的响应，并假设微调模型将提供更好的质量和可读性。方法利用谷歌的“人们也被问到”功能识别22个常见问题。评估了两种chatgpt - 40配置：原生模型和微调版本（ACI指南），通过基于指令的微调和从人类反馈中强化学习来优化。两名整形外科医生独立对这些回答进行评分。采用DISCERN标准评估质量，采用Flesch Reading Ease Score （FRES）和Flesch- kincaid Grade Level （FKGL）评估可读性。在双向混合效应模型中，使用类内相关系数（ICC）来确定组间信度。结果优化后的ACI Guide在所有参数上都优于原生chatgpt - 40。原生模型的反应质量较差，平均DISCERN得分为35.29±5.0（范围：23-45），而ACI指南的得分更高，为43.18 ±3.92（范围：34-53；p < 0.001），反映了中等质量。在可读性方面，本地模型的FKGL达到13.45±1.30（大学大二水平）。相比之下，ACI指南的FKGL为9.25±1.64（9年级水平）。ACI指南的FRES（49.59±10.44）也显著高于原生模型（35.68±5.08;p < 0.001）。测者间信度较强（ICC = 0.767），表明一致性较好。结论chatgpt - 40的回复质量较差，可读性大大超过了患者教育材料的推荐阈值，限制了其在临床交流和患者教育中的适用性。微调chatgpt - 40提高了ACI患者教育材料的可读性和质量，生成的内容更接近8 - 9年级的水平。它可以作为一个有用的辅助医生主导的教育，以提高患者对复杂的骨科手术的理解。证据等级V级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊