Refining patient education on autologous chondrocyte implantation for chondral lesions of the knee: A fine-tuned ChatGPT-4o model improves readability and quality
Maha Alsadaan, Stephen Fahy, Danko Dan Milinkovic, Benjamin Bartek, Tobias Winkler, Tobias Jung, Stephan Oehme
{"title":"Refining patient education on autologous chondrocyte implantation for chondral lesions of the knee: A fine-tuned ChatGPT-4o model improves readability and quality","authors":"Maha Alsadaan, Stephen Fahy, Danko Dan Milinkovic, Benjamin Bartek, Tobias Winkler, Tobias Jung, Stephan Oehme","doi":"10.1002/jeo2.70445","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>Autologous chondrocyte implantation (ACI) is a complex procedure for cartilage defects, requiring patient understanding of treatment and recovery, as health literacy impacts outcomes. This study evaluated the quality and readability of AI-generated ACI materials using ChatGPT-4o as adjuncts to physician-led education. We compared responses from the native model and a fine-tuned version and hypothesised that the fine-tuned model would provide improved quality and readability.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Twenty-two frequently asked questions were identified using Google's ‘People Also Asked’ feature. Two ChatGPT-4o configurations were evaluated: the native model and a fine-tuned version (ACI guide) optimised by instruction-based fine-tuning and reinforcement learning from human feedback. Two orthopaedic surgeons independently scored the responses. Quality was assessed using the DISCERN criteria and readability by the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Interrater reliability was determined using intraclass correlation coefficient (ICC) in a two-way mixed-effects model.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The fine-tuned ACI Guide outperformed the native ChatGPT-4o on all parameters. The native model produced poor-quality responses with a mean DISCERN score of 35.29 ± 5.0 (range: 23–45), while the ACI Guide achieved a significantly higher score of 43.18 ± 3.92 (range: 34–53; <i>p</i> < 0.001), reflecting moderate quality. Regarding readability, the native model reached FKGL of 13.45 ± 1.30 (university sophomore level)<b>.</b> In contrast, the ACI Guide achieved FKGL of 9.25 ± 1.64 (9th-grade level). The FRES was also significantly higher for the ACI Guide (49.59 ± 10.44) than the native model (35.68 ± 5.08; <i>p</i> < 0.001). Interrater reliability was strong (ICC = 0.767), indicating good agreement.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>ChatGPT-4o's responses were of poor quality and written at a readability level substantially exceeding recommended thresholds for patient education materials, limiting their applicability in clinical communication and patient education. Fine-tuning ChatGPT-4o improved the readability and quality of ACI patient education materials, generating content closer to the 8th–9th-grade level. It may serve as a useful adjunct to physician-led education in enhancing patient understanding of complex orthopaedic procedures.</p>\n </section>\n \n <section>\n \n <h3> Level of Evidence</h3>\n \n <p>Level V.</p>\n </section>\n </div>","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"12 4","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://esskajournals.onlinelibrary.wiley.com/doi/epdf/10.1002/jeo2.70445","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://esskajournals.onlinelibrary.wiley.com/doi/10.1002/jeo2.70445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
Autologous chondrocyte implantation (ACI) is a complex procedure for cartilage defects, requiring patient understanding of treatment and recovery, as health literacy impacts outcomes. This study evaluated the quality and readability of AI-generated ACI materials using ChatGPT-4o as adjuncts to physician-led education. We compared responses from the native model and a fine-tuned version and hypothesised that the fine-tuned model would provide improved quality and readability.
Methods
Twenty-two frequently asked questions were identified using Google's ‘People Also Asked’ feature. Two ChatGPT-4o configurations were evaluated: the native model and a fine-tuned version (ACI guide) optimised by instruction-based fine-tuning and reinforcement learning from human feedback. Two orthopaedic surgeons independently scored the responses. Quality was assessed using the DISCERN criteria and readability by the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Interrater reliability was determined using intraclass correlation coefficient (ICC) in a two-way mixed-effects model.
Results
The fine-tuned ACI Guide outperformed the native ChatGPT-4o on all parameters. The native model produced poor-quality responses with a mean DISCERN score of 35.29 ± 5.0 (range: 23–45), while the ACI Guide achieved a significantly higher score of 43.18 ± 3.92 (range: 34–53; p < 0.001), reflecting moderate quality. Regarding readability, the native model reached FKGL of 13.45 ± 1.30 (university sophomore level). In contrast, the ACI Guide achieved FKGL of 9.25 ± 1.64 (9th-grade level). The FRES was also significantly higher for the ACI Guide (49.59 ± 10.44) than the native model (35.68 ± 5.08; p < 0.001). Interrater reliability was strong (ICC = 0.767), indicating good agreement.
Conclusions
ChatGPT-4o's responses were of poor quality and written at a readability level substantially exceeding recommended thresholds for patient education materials, limiting their applicability in clinical communication and patient education. Fine-tuning ChatGPT-4o improved the readability and quality of ACI patient education materials, generating content closer to the 8th–9th-grade level. It may serve as a useful adjunct to physician-led education in enhancing patient understanding of complex orthopaedic procedures.