Assessing the role of large language models in adolescent idiopathic scoliosis care: a comparison between ChatGPT and Google Gemini.

IF 1

Acta orthopaedica et traumatologica turcica Pub Date : 2025-07-18 DOI:10.5152/j.aott.2025.25279

Semih Yaş, Dilek Yapar, Aliekber Yapar, Tayfun Özel, Mehmet Ali Tokgöz, Alim Can Baymurat, Alpaslan Şenköylü

{"title":"Assessing the role of large language models in adolescent idiopathic scoliosis care: a comparison between ChatGPT and Google Gemini.","authors":"Semih Yaş, Dilek Yapar, Aliekber Yapar, Tayfun Özel, Mehmet Ali Tokgöz, Alim Can Baymurat, Alpaslan Şenköylü","doi":"10.5152/j.aott.2025.25279","DOIUrl":null,"url":null,"abstract":"<p><p>Objective: To evaluate the accuracy, applicability, comprehensiveness, and communication quality of responses generated by ChatGPT and Google Gemini in adolescent idiopathic scoliosis (AIS)-related scenarios, with the aim of assessing their potential utility as tools in patient management. Methods: Six case-based questions reflecting common patient concerns related to adolescent idiopathic scoliosis were developed by orthopedic specialists. Responses generated by ChatGPT and Google Gemini were independently evaluated by 61 orthopedic surgeons using a standardized rubric assessing accuracy, applicability, comprehensiveness, and communication clarity, each rated on a 1-5 Likert scale. Comparative analyses between platforms were performed using the Mann-Whitney U and Wilcoxon signed-rank tests. Additionally, open-ended feedback was collected to explore participants' perspectives on the potential and limitations of AI-based consultations. Results: ChatGPT outperformed Google Gemini in terms of accuracy (P = .013) in postoperative care scenarios. The results for applicability (P = .119), comprehensiveness (P = .619), and communication (P = .240) were not statistically significant. Orthopedic specialists rated both AI models significantly higher than residents in accuracy, applicability, and comprehensiveness. Most evaluators acknowledged the potential of AI to reduce physician workload and support patient guidance; however, concerns were raised regarding reliability, ethical implications, and the current limitations of AI in ensuring patient safety. Conclusion: ChatGPT and Google Gemini demonstrated moderate accuracy and communication quality in adolescent idiopathic scoliosis-related scenarios, with ChatGPT showing a modest advantage. Although both models show promising results as supportive tools for patient education and preliminary consultations, their current limitations in accuracy and comprehensiveness restrict their clinical reliability. Multidisciplinary collaboration is crucial to ensure e!ective applications of AI in orthopedic practice. Level of Evidence: Level III, Diagnostic Study.</p>","PeriodicalId":93854,"journal":{"name":"Acta orthopaedica et traumatologica turcica","volume":"59 4","pages":"222-229"},"PeriodicalIF":1.0000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362497/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta orthopaedica et traumatologica turcica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5152/j.aott.2025.25279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To evaluate the accuracy, applicability, comprehensiveness, and communication quality of responses generated by ChatGPT and Google Gemini in adolescent idiopathic scoliosis (AIS)-related scenarios, with the aim of assessing their potential utility as tools in patient management. Methods: Six case-based questions reflecting common patient concerns related to adolescent idiopathic scoliosis were developed by orthopedic specialists. Responses generated by ChatGPT and Google Gemini were independently evaluated by 61 orthopedic surgeons using a standardized rubric assessing accuracy, applicability, comprehensiveness, and communication clarity, each rated on a 1-5 Likert scale. Comparative analyses between platforms were performed using the Mann-Whitney U and Wilcoxon signed-rank tests. Additionally, open-ended feedback was collected to explore participants' perspectives on the potential and limitations of AI-based consultations. Results: ChatGPT outperformed Google Gemini in terms of accuracy (P = .013) in postoperative care scenarios. The results for applicability (P = .119), comprehensiveness (P = .619), and communication (P = .240) were not statistically significant. Orthopedic specialists rated both AI models significantly higher than residents in accuracy, applicability, and comprehensiveness. Most evaluators acknowledged the potential of AI to reduce physician workload and support patient guidance; however, concerns were raised regarding reliability, ethical implications, and the current limitations of AI in ensuring patient safety. Conclusion: ChatGPT and Google Gemini demonstrated moderate accuracy and communication quality in adolescent idiopathic scoliosis-related scenarios, with ChatGPT showing a modest advantage. Although both models show promising results as supportive tools for patient education and preliminary consultations, their current limitations in accuracy and comprehensiveness restrict their clinical reliability. Multidisciplinary collaboration is crucial to ensure e!ective applications of AI in orthopedic practice. Level of Evidence: Level III, Diagnostic Study.

Abstract Image

查看原文本刊更多论文

评估大语言模型在青少年特发性脊柱侧凸护理中的作用：ChatGPT和谷歌Gemini的比较

目的：评估ChatGPT和谷歌Gemini在青少年特发性脊柱侧凸（AIS）相关情况下产生的应答的准确性、适用性、全面性和沟通质量，以评估其作为患者管理工具的潜在效用。方法：由骨科专家开发了六个基于病例的问题，反映了与青少年特发性脊柱侧凸相关的常见患者问题。ChatGPT和谷歌Gemini生成的反馈由61位骨科医生独立评估，使用标准化的标准评估准确性、适用性、全面性和沟通清晰度，每个评分为1-5李克特量表。平台间比较分析采用Mann-Whitney U和Wilcoxon符号秩检验。此外，收集了开放式反馈，以探讨参与者对基于人工智能的咨询的潜力和局限性的看法。结果：ChatGPT在术后护理场景的准确性方面优于谷歌Gemini （P = 0.013）。适用性（P = .119）、综合性（P = .619）、沟通性（P = .240）的结果均无统计学意义。骨科专家认为这两种人工智能模型在准确性、适用性和全面性方面都明显高于住院医生。大多数评估人员承认人工智能在减少医生工作量和支持患者指导方面的潜力；然而，人们对人工智能在确保患者安全方面的可靠性、伦理影响以及目前的局限性提出了担忧。结论：ChatGPT和谷歌Gemini在青少年特发性脊柱侧凸相关场景中表现出中等的准确性和通信质量，ChatGPT显示出适度的优势。尽管这两种模型都显示出作为患者教育和初步咨询的支持工具的良好结果，但它们目前在准确性和全面性方面的局限性限制了它们的临床可靠性。多学科合作对于确保e！人工智能在骨科实践中的有效应用。证据等级：III级，诊断性研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Acta orthopaedica et traumatologica turcica

自引率

0.00%

发文量