Editorial – Current capacities and future possibilities of large language models in orthopaedic surgery

IF 2.7 Q2 ORTHOPEDICS

Journal of Experimental Orthopaedics Pub Date : 2025-05-26 DOI:10.1002/jeo2.70273

Assil Mahamid, Lior Laver, Sana Zahalka, Felix Oettl, Eyal Behrbalk, Michael Hirschmann, Kristian Samuelsson

{"title":"Editorial – Current capacities and future possibilities of large language models in orthopaedic surgery","authors":"Assil Mahamid, Lior Laver, Sana Zahalka, Felix Oettl, Eyal Behrbalk, Michael Hirschmann, Kristian Samuelsson","doi":"10.1002/jeo2.70273","DOIUrl":null,"url":null,"abstract":"The accelerated advancement of artificial intelligence (AI) and large language models (LLMs) like GPT-4 has paved the way for revolutionary shifts in almost all medical specialties. Orthopaedic surgery, traditionally characterized by its reliance on physical and radiographic diagnosis as well as surgical expertise, is increasingly integrating these advanced AI technologies into clinical practice. This editorial evaluates the use of LLMs in orthopaedic surgery, the influence that prominent LLMs have had on the field, and the potential these technologies hold to improve patient care and medical research in the future. Figure 1 provides a concise explanation of the operational mechanism of the LLM.There has been an increasing interest in using LLMs in the medical field in general and orthopaedic surgery in particular. The potential use of LLMs in orthopaedic surgery has been vast and variable in fields in clinical use, research conduction, medical education, as well as patient education.Recent studies show that AI and LLMs can aid in creating clinical letters and care strategies for typical orthopaedic scenarios in clinical practice. Large volumes of unstructured data, including imaging reports, surgical notes and patient records, are processed by these models, offering a more thorough understanding of patient conditions and generating understandable, efficient and generally accurate texts. However, occasionally, the omitted output may be inconsistent, lack key details or offer general recommendations [2, 3].Interestingly, LLMs can offer treatment recommendations based on clinical data, such as magnetic resonance imaging reports. Still, their utility is limited by a need for further context and specificity, necessitating oversight by healthcare professionals [4].Furthermore, LLMs can assist clinicians as early on in their careers as medical school and residency, as several noteworthy studies have demonstrated that LLMs meet the passing criteria for both the United States Medical Licensing Examination (USMLE) and the Orthopaedic board examinations [5-8].Apart from clinicians, LLMs may deliver clear and concise answers to frequently asked patient inquiries, providing a dependable knowledge source for prevalent orthopaedic issues. Those models can condense complex facts into comprehensible summaries, which are useful for patients aiming to understand their conditions and therapy alternatives, subsequently referring them to the appropriate specialist, potentially cutting back unwarranted appointments and alleviating the workload of general physicians [3, 9, 10].Several LLM platforms have emerged as leaders in this domain, each bringing unique strengths to orthopaedic practice. OpenAI's GPT-4, and Anthropic's Claude AI are among the most notable. These platforms are designed to understand and generate human-like text, making them invaluable tools for both clinical and research applications.GPT-4o, has shown potential in numerous aspects of orthopaedic practice. In clinical decision-making, it can assist orthopaedic surgeons by providing quick access to relevant medical information, offering diagnostic suggestions and aiding in treatment planning [11]. In orthopaedic research, the AI model can assist in conducting literature reviews, draft research proposals and manuscripts, as well as help analyze research data [4, 12]. Moreover, when given the Orthopaedic In-Service Training Exam (OITE) as input, ChatGPT performed around the first postgraduate year level, providing a consistently logical rationale for most of the correct answers [13]. Similar results were observed with the American Board of Orthopaedic Surgery exams [5-7]. This emphasizes the model's potential in providing interactive learning experiences for medical students and residents.Claude AI is another platform gaining attention due to its potential medical applications, particularly in orthopaedic surgery. A cross-sectional study assesses the effectiveness of LLMs in responding to surgery-related patient inquiries, focusing on accuracy, relevance, clarity and emotional sensitivity. The results indicate that LLMs perform well in these areas, with Claude surpassing ChatGPT and Google's Bard [14].Gemini, developed by Google, is an advanced AI tool engineered to generate human-like responses by processing extensive data sets. In paediatric orthopaedics, the emerging role of LLMs, such as Gemini, in aiding clinical decision-making and patient education is gaining increasing attention [15, 16]. In comparison to evidence-based guidelines on supra-condylar and diaphyseal femur fractures, such as those provided by the American Academy of Orthopaedic Surgeons, Gemini exhibited an accuracy rate similar to that of other LLMs, including ChatGPT-4.0, in delivering treatment recommendations for common paediatric fractures. However, notable discrepancies were observed in the citation of studies, and instances of overgeneralization in management plans were present, with many responses necessitating substantial modifications [15].Bidirectional Encoder Representations from Transformers (BERT), while not a chatbot platform, is a prominent neural network-based method for language processing. It has proven particularly valuable for natural language processing tasks in orthopaedic surgery. The recent launch of the Japanese Orthopaedic Association National Registry (JOANR) demonstrated that BERT significantly outperformed the other methods in accurately identifying key surgical details from operative records, such as the surgical approach and fixation technique. These findings suggest that BERT is the most suitable method for automating data extraction for JOANR, thereby streamlining the registration process and reducing the workload on surgeons [17].An additional promising feature of BERT was observed in a proof-of-concept study demonstrating improved prediction accuracy for surgical case duration, based on BERT's ability to extract features from unstructured clinical data of patients [18], thereby reinforcing LLMs' ability to optimize daily clinical planning and function.Conceptualization and methodology: A.M., L.L. and F.O; Software: J.O. and F.O.; Validation: B.Z., F.O., and F.J.; Writing—Original Draft Preparation: A.M. and S.Z.; Writing—Review and Editing: L.L, E.B and M.H.; Supervision: L.L., K.S., and E.B.Kristian Samuelsson is a Member of the Board of Directors of Getinge AB (publ) and medtech advisor to Carl Bennet AB. The other authors declare no conflicts of interest.This study did not require ethical approval as it did not involve human or animal subjects. Patient consent was not required for this study.","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"12 2","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jeo2.70273","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jeo2.70273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

The accelerated advancement of artificial intelligence (AI) and large language models (LLMs) like GPT-4 has paved the way for revolutionary shifts in almost all medical specialties. Orthopaedic surgery, traditionally characterized by its reliance on physical and radiographic diagnosis as well as surgical expertise, is increasingly integrating these advanced AI technologies into clinical practice. This editorial evaluates the use of LLMs in orthopaedic surgery, the influence that prominent LLMs have had on the field, and the potential these technologies hold to improve patient care and medical research in the future. Figure 1 provides a concise explanation of the operational mechanism of the LLM.

There has been an increasing interest in using LLMs in the medical field in general and orthopaedic surgery in particular. The potential use of LLMs in orthopaedic surgery has been vast and variable in fields in clinical use, research conduction, medical education, as well as patient education.

Recent studies show that AI and LLMs can aid in creating clinical letters and care strategies for typical orthopaedic scenarios in clinical practice. Large volumes of unstructured data, including imaging reports, surgical notes and patient records, are processed by these models, offering a more thorough understanding of patient conditions and generating understandable, efficient and generally accurate texts. However, occasionally, the omitted output may be inconsistent, lack key details or offer general recommendations [2, 3].

Interestingly, LLMs can offer treatment recommendations based on clinical data, such as magnetic resonance imaging reports. Still, their utility is limited by a need for further context and specificity, necessitating oversight by healthcare professionals [4].

Furthermore, LLMs can assist clinicians as early on in their careers as medical school and residency, as several noteworthy studies have demonstrated that LLMs meet the passing criteria for both the United States Medical Licensing Examination (USMLE) and the Orthopaedic board examinations [5-8].

Apart from clinicians, LLMs may deliver clear and concise answers to frequently asked patient inquiries, providing a dependable knowledge source for prevalent orthopaedic issues. Those models can condense complex facts into comprehensible summaries, which are useful for patients aiming to understand their conditions and therapy alternatives, subsequently referring them to the appropriate specialist, potentially cutting back unwarranted appointments and alleviating the workload of general physicians [3, 9, 10].

Several LLM platforms have emerged as leaders in this domain, each bringing unique strengths to orthopaedic practice. OpenAI's GPT-4, and Anthropic's Claude AI are among the most notable. These platforms are designed to understand and generate human-like text, making them invaluable tools for both clinical and research applications.

GPT-4o, has shown potential in numerous aspects of orthopaedic practice. In clinical decision-making, it can assist orthopaedic surgeons by providing quick access to relevant medical information, offering diagnostic suggestions and aiding in treatment planning [11]. In orthopaedic research, the AI model can assist in conducting literature reviews, draft research proposals and manuscripts, as well as help analyze research data [4, 12]. Moreover, when given the Orthopaedic In-Service Training Exam (OITE) as input, ChatGPT performed around the first postgraduate year level, providing a consistently logical rationale for most of the correct answers [13]. Similar results were observed with the American Board of Orthopaedic Surgery exams [5-7]. This emphasizes the model's potential in providing interactive learning experiences for medical students and residents.

Claude AI is another platform gaining attention due to its potential medical applications, particularly in orthopaedic surgery. A cross-sectional study assesses the effectiveness of LLMs in responding to surgery-related patient inquiries, focusing on accuracy, relevance, clarity and emotional sensitivity. The results indicate that LLMs perform well in these areas, with Claude surpassing ChatGPT and Google's Bard [14].

Gemini, developed by Google, is an advanced AI tool engineered to generate human-like responses by processing extensive data sets. In paediatric orthopaedics, the emerging role of LLMs, such as Gemini, in aiding clinical decision-making and patient education is gaining increasing attention [15, 16]. In comparison to evidence-based guidelines on supra-condylar and diaphyseal femur fractures, such as those provided by the American Academy of Orthopaedic Surgeons, Gemini exhibited an accuracy rate similar to that of other LLMs, including ChatGPT-4.0, in delivering treatment recommendations for common paediatric fractures. However, notable discrepancies were observed in the citation of studies, and instances of overgeneralization in management plans were present, with many responses necessitating substantial modifications [15].

Bidirectional Encoder Representations from Transformers (BERT), while not a chatbot platform, is a prominent neural network-based method for language processing. It has proven particularly valuable for natural language processing tasks in orthopaedic surgery. The recent launch of the Japanese Orthopaedic Association National Registry (JOANR) demonstrated that BERT significantly outperformed the other methods in accurately identifying key surgical details from operative records, such as the surgical approach and fixation technique. These findings suggest that BERT is the most suitable method for automating data extraction for JOANR, thereby streamlining the registration process and reducing the workload on surgeons [17].

An additional promising feature of BERT was observed in a proof-of-concept study demonstrating improved prediction accuracy for surgical case duration, based on BERT's ability to extract features from unstructured clinical data of patients [18], thereby reinforcing LLMs' ability to optimize daily clinical planning and function.

Conceptualization and methodology: A.M., L.L. and F.O; Software: J.O. and F.O.; Validation: B.Z., F.O., and F.J.; Writing—Original Draft Preparation: A.M. and S.Z.; Writing—Review and Editing: L.L, E.B and M.H.; Supervision: L.L., K.S., and E.B.

Kristian Samuelsson is a Member of the Board of Directors of Getinge AB (publ) and medtech advisor to Carl Bennet AB. The other authors declare no conflicts of interest.

This study did not require ethical approval as it did not involve human or animal subjects. Patient consent was not required for this study.

Abstract Image

查看原文本刊更多论文

编辑-骨科手术中大型语言模型的当前能力和未来可能性

但是，在引用研究报告方面观察到明显的差异，并且存在管理计划过度一般化的情况，许多反应需要进行大量修改[b]。双向编码器表示从变形金刚（BERT），虽然不是聊天机器人平台，是一个突出的基于神经网络的语言处理方法。它已被证明对骨科手术中的自然语言处理任务特别有价值。最近推出的日本骨科协会国家注册（JOANR）表明，BERT在准确识别手术记录中的关键手术细节（如手术入路和固定技术）方面明显优于其他方法。这些发现表明BERT是JOANR中最适合的自动化数据提取方法，从而简化了注册过程，减少了外科医生的工作量。在一项概念验证研究中观察到BERT的另一个有希望的特征，表明基于BERT从患者[18]的非结构化临床数据中提取特征的能力，可以提高手术病例持续时间的预测准确性，从而增强LLMs优化日常临床计划和功能的能力。概念和方法：a.m.、L.L.和F.O；软件：J.O.和F.O.；验证：b.z., f.o.， F.J.；写作-初稿准备：上午、深圳；写作-审编：l.l.、e.b.、M.H.；kristian Samuelsson是Getinge AB （public）的董事会成员和Carl Bennet AB的医疗技术顾问。其他作者声明没有利益冲突。这项研究不需要伦理批准，因为它不涉及人类或动物受试者。本研究不需要患者同意。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊