Maha Alsadaan, Stephen Fahy, Danko Dan Milinkovic, Benjamin Bartek, Tobias Winkler, Tobias Jung, Stephan Oehme
{"title":"改进患者对膝关节软骨病变自体软骨细胞植入的教育:一个微调的chatgpt - 40模型提高了可读性和质量","authors":"Maha Alsadaan, Stephen Fahy, Danko Dan Milinkovic, Benjamin Bartek, Tobias Winkler, Tobias Jung, Stephan Oehme","doi":"10.1002/jeo2.70445","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>Autologous chondrocyte implantation (ACI) is a complex procedure for cartilage defects, requiring patient understanding of treatment and recovery, as health literacy impacts outcomes. This study evaluated the quality and readability of AI-generated ACI materials using ChatGPT-4o as adjuncts to physician-led education. We compared responses from the native model and a fine-tuned version and hypothesised that the fine-tuned model would provide improved quality and readability.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Twenty-two frequently asked questions were identified using Google's ‘People Also Asked’ feature. Two ChatGPT-4o configurations were evaluated: the native model and a fine-tuned version (ACI guide) optimised by instruction-based fine-tuning and reinforcement learning from human feedback. Two orthopaedic surgeons independently scored the responses. Quality was assessed using the DISCERN criteria and readability by the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Interrater reliability was determined using intraclass correlation coefficient (ICC) in a two-way mixed-effects model.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The fine-tuned ACI Guide outperformed the native ChatGPT-4o on all parameters. The native model produced poor-quality responses with a mean DISCERN score of 35.29 ± 5.0 (range: 23–45), while the ACI Guide achieved a significantly higher score of 43.18 ± 3.92 (range: 34–53; <i>p</i> < 0.001), reflecting moderate quality. Regarding readability, the native model reached FKGL of 13.45 ± 1.30 (university sophomore level)<b>.</b> In contrast, the ACI Guide achieved FKGL of 9.25 ± 1.64 (9th-grade level). The FRES was also significantly higher for the ACI Guide (49.59 ± 10.44) than the native model (35.68 ± 5.08; <i>p</i> < 0.001). Interrater reliability was strong (ICC = 0.767), indicating good agreement.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>ChatGPT-4o's responses were of poor quality and written at a readability level substantially exceeding recommended thresholds for patient education materials, limiting their applicability in clinical communication and patient education. Fine-tuning ChatGPT-4o improved the readability and quality of ACI patient education materials, generating content closer to the 8th–9th-grade level. It may serve as a useful adjunct to physician-led education in enhancing patient understanding of complex orthopaedic procedures.</p>\n </section>\n \n <section>\n \n <h3> Level of Evidence</h3>\n \n <p>Level V.</p>\n </section>\n </div>","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"12 4","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://esskajournals.onlinelibrary.wiley.com/doi/epdf/10.1002/jeo2.70445","citationCount":"0","resultStr":"{\"title\":\"Refining patient education on autologous chondrocyte implantation for chondral lesions of the knee: A fine-tuned ChatGPT-4o model improves readability and quality\",\"authors\":\"Maha Alsadaan, Stephen Fahy, Danko Dan Milinkovic, Benjamin Bartek, Tobias Winkler, Tobias Jung, Stephan Oehme\",\"doi\":\"10.1002/jeo2.70445\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Purpose</h3>\\n \\n <p>Autologous chondrocyte implantation (ACI) is a complex procedure for cartilage defects, requiring patient understanding of treatment and recovery, as health literacy impacts outcomes. This study evaluated the quality and readability of AI-generated ACI materials using ChatGPT-4o as adjuncts to physician-led education. We compared responses from the native model and a fine-tuned version and hypothesised that the fine-tuned model would provide improved quality and readability.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Twenty-two frequently asked questions were identified using Google's ‘People Also Asked’ feature. Two ChatGPT-4o configurations were evaluated: the native model and a fine-tuned version (ACI guide) optimised by instruction-based fine-tuning and reinforcement learning from human feedback. Two orthopaedic surgeons independently scored the responses. Quality was assessed using the DISCERN criteria and readability by the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Interrater reliability was determined using intraclass correlation coefficient (ICC) in a two-way mixed-effects model.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>The fine-tuned ACI Guide outperformed the native ChatGPT-4o on all parameters. The native model produced poor-quality responses with a mean DISCERN score of 35.29 ± 5.0 (range: 23–45), while the ACI Guide achieved a significantly higher score of 43.18 ± 3.92 (range: 34–53; <i>p</i> < 0.001), reflecting moderate quality. Regarding readability, the native model reached FKGL of 13.45 ± 1.30 (university sophomore level)<b>.</b> In contrast, the ACI Guide achieved FKGL of 9.25 ± 1.64 (9th-grade level). The FRES was also significantly higher for the ACI Guide (49.59 ± 10.44) than the native model (35.68 ± 5.08; <i>p</i> < 0.001). Interrater reliability was strong (ICC = 0.767), indicating good agreement.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>ChatGPT-4o's responses were of poor quality and written at a readability level substantially exceeding recommended thresholds for patient education materials, limiting their applicability in clinical communication and patient education. Fine-tuning ChatGPT-4o improved the readability and quality of ACI patient education materials, generating content closer to the 8th–9th-grade level. It may serve as a useful adjunct to physician-led education in enhancing patient understanding of complex orthopaedic procedures.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Level of Evidence</h3>\\n \\n <p>Level V.</p>\\n </section>\\n </div>\",\"PeriodicalId\":36909,\"journal\":{\"name\":\"Journal of Experimental Orthopaedics\",\"volume\":\"12 4\",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://esskajournals.onlinelibrary.wiley.com/doi/epdf/10.1002/jeo2.70445\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Experimental Orthopaedics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://esskajournals.onlinelibrary.wiley.com/doi/10.1002/jeo2.70445\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://esskajournals.onlinelibrary.wiley.com/doi/10.1002/jeo2.70445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
Refining patient education on autologous chondrocyte implantation for chondral lesions of the knee: A fine-tuned ChatGPT-4o model improves readability and quality
Purpose
Autologous chondrocyte implantation (ACI) is a complex procedure for cartilage defects, requiring patient understanding of treatment and recovery, as health literacy impacts outcomes. This study evaluated the quality and readability of AI-generated ACI materials using ChatGPT-4o as adjuncts to physician-led education. We compared responses from the native model and a fine-tuned version and hypothesised that the fine-tuned model would provide improved quality and readability.
Methods
Twenty-two frequently asked questions were identified using Google's ‘People Also Asked’ feature. Two ChatGPT-4o configurations were evaluated: the native model and a fine-tuned version (ACI guide) optimised by instruction-based fine-tuning and reinforcement learning from human feedback. Two orthopaedic surgeons independently scored the responses. Quality was assessed using the DISCERN criteria and readability by the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Interrater reliability was determined using intraclass correlation coefficient (ICC) in a two-way mixed-effects model.
Results
The fine-tuned ACI Guide outperformed the native ChatGPT-4o on all parameters. The native model produced poor-quality responses with a mean DISCERN score of 35.29 ± 5.0 (range: 23–45), while the ACI Guide achieved a significantly higher score of 43.18 ± 3.92 (range: 34–53; p < 0.001), reflecting moderate quality. Regarding readability, the native model reached FKGL of 13.45 ± 1.30 (university sophomore level). In contrast, the ACI Guide achieved FKGL of 9.25 ± 1.64 (9th-grade level). The FRES was also significantly higher for the ACI Guide (49.59 ± 10.44) than the native model (35.68 ± 5.08; p < 0.001). Interrater reliability was strong (ICC = 0.767), indicating good agreement.
Conclusions
ChatGPT-4o's responses were of poor quality and written at a readability level substantially exceeding recommended thresholds for patient education materials, limiting their applicability in clinical communication and patient education. Fine-tuning ChatGPT-4o improved the readability and quality of ACI patient education materials, generating content closer to the 8th–9th-grade level. It may serve as a useful adjunct to physician-led education in enhancing patient understanding of complex orthopaedic procedures.