Assil Mahamid, Lior Laver, Sana Zahalka, Felix Oettl, Eyal Behrbalk, Michael Hirschmann, Kristian Samuelsson
{"title":"Editorial – Current capacities and future possibilities of large language models in orthopaedic surgery","authors":"Assil Mahamid, Lior Laver, Sana Zahalka, Felix Oettl, Eyal Behrbalk, Michael Hirschmann, Kristian Samuelsson","doi":"10.1002/jeo2.70273","DOIUrl":null,"url":null,"abstract":"<p>The accelerated advancement of artificial intelligence (AI) and large language models (LLMs) like GPT-4 has paved the way for revolutionary shifts in almost all medical specialties. Orthopaedic surgery, traditionally characterized by its reliance on physical and radiographic diagnosis as well as surgical expertise, is increasingly integrating these advanced AI technologies into clinical practice. This editorial evaluates the use of LLMs in orthopaedic surgery, the influence that prominent LLMs have had on the field, and the potential these technologies hold to improve patient care and medical research in the future. Figure 1 provides a concise explanation of the operational mechanism of the LLM.</p><p>There has been an increasing interest in using LLMs in the medical field in general and orthopaedic surgery in particular. The potential use of LLMs in orthopaedic surgery has been vast and variable in fields in clinical use, research conduction, medical education, as well as patient education.</p><p>Recent studies show that AI and LLMs can aid in creating clinical letters and care strategies for typical orthopaedic scenarios in clinical practice. Large volumes of unstructured data, including imaging reports, surgical notes and patient records, are processed by these models, offering a more thorough understanding of patient conditions and generating understandable, efficient and generally accurate texts. However, occasionally, the omitted output may be inconsistent, lack key details or offer general recommendations [<span>2, 3</span>].</p><p>Interestingly, LLMs can offer treatment recommendations based on clinical data, such as magnetic resonance imaging reports. Still, their utility is limited by a need for further context and specificity, necessitating oversight by healthcare professionals [<span>4</span>].</p><p>Furthermore, LLMs can assist clinicians as early on in their careers as medical school and residency, as several noteworthy studies have demonstrated that LLMs meet the passing criteria for both the United States Medical Licensing Examination (USMLE) and the Orthopaedic board examinations [<span>5-8</span>].</p><p>Apart from clinicians, LLMs may deliver clear and concise answers to frequently asked patient inquiries, providing a dependable knowledge source for prevalent orthopaedic issues. Those models can condense complex facts into comprehensible summaries, which are useful for patients aiming to understand their conditions and therapy alternatives, subsequently referring them to the appropriate specialist, potentially cutting back unwarranted appointments and alleviating the workload of general physicians [<span>3, 9, 10</span>].</p><p>Several LLM platforms have emerged as leaders in this domain, each bringing unique strengths to orthopaedic practice. OpenAI's GPT-4, and Anthropic's Claude AI are among the most notable. These platforms are designed to understand and generate human-like text, making them invaluable tools for both clinical and research applications.</p><p><i>GPT-4o</i>, has shown potential in numerous aspects of orthopaedic practice. In clinical decision-making, it can assist orthopaedic surgeons by providing quick access to relevant medical information, offering diagnostic suggestions and aiding in treatment planning [<span>11</span>]. In orthopaedic research, the AI model can assist in conducting literature reviews, draft research proposals and manuscripts, as well as help analyze research data [<span>4, 12</span>]. Moreover, when given the Orthopaedic In-Service Training Exam (OITE) as input, ChatGPT performed around the first postgraduate year level, providing a consistently logical rationale for most of the correct answers [<span>13</span>]. Similar results were observed with the American Board of Orthopaedic Surgery exams [<span>5-7</span>]. This emphasizes the model's potential in providing interactive learning experiences for medical students and residents.</p><p><i>Claude AI</i> is another platform gaining attention due to its potential medical applications, particularly in orthopaedic surgery. A cross-sectional study assesses the effectiveness of LLMs in responding to surgery-related patient inquiries, focusing on accuracy, relevance, clarity and emotional sensitivity. The results indicate that LLMs perform well in these areas, with Claude surpassing ChatGPT and Google's Bard [<span>14</span>].</p><p><i>Gemini</i>, developed by Google, is an advanced AI tool engineered to generate human-like responses by processing extensive data sets. In paediatric orthopaedics, the emerging role of LLMs, such as Gemini, in aiding clinical decision-making and patient education is gaining increasing attention [<span>15, 16</span>]. In comparison to evidence-based guidelines on supra-condylar and diaphyseal femur fractures, such as those provided by the American Academy of Orthopaedic Surgeons, Gemini exhibited an accuracy rate similar to that of other LLMs, including ChatGPT-4.0, in delivering treatment recommendations for common paediatric fractures. However, notable discrepancies were observed in the citation of studies, and instances of overgeneralization in management plans were present, with many responses necessitating substantial modifications [<span>15</span>].</p><p><i>Bidirectional Encoder Representations from Transformers (BERT)</i>, while not a chatbot platform, is a prominent neural network-based method for language processing. It has proven particularly valuable for natural language processing tasks in orthopaedic surgery. The recent launch of the Japanese Orthopaedic Association National Registry (JOANR) demonstrated that BERT significantly outperformed the other methods in accurately identifying key surgical details from operative records, such as the surgical approach and fixation technique. These findings suggest that BERT is the most suitable method for automating data extraction for JOANR, thereby streamlining the registration process and reducing the workload on surgeons [<span>17</span>].</p><p>An additional promising feature of BERT was observed in a proof-of-concept study demonstrating improved prediction accuracy for surgical case duration, based on BERT's ability to extract features from unstructured clinical data of patients [<span>18</span>], thereby reinforcing LLMs' ability to optimize daily clinical planning and function.</p><p>Conceptualization and methodology: A.M., L.L. and F.O; Software: J.O. and F.O.; Validation: B.Z., F.O., and F.J.; Writing—Original Draft Preparation: A.M. and S.Z.; Writing—Review and Editing: L.L, E.B and M.H.; Supervision: L.L., K.S., and E.B.</p><p>Kristian Samuelsson is a Member of the Board of Directors of Getinge AB (publ) and medtech advisor to Carl Bennet AB. The other authors declare no conflicts of interest.</p><p>This study did not require ethical approval as it did not involve human or animal subjects. Patient consent was not required for this study.</p>","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"12 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jeo2.70273","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jeo2.70273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
The accelerated advancement of artificial intelligence (AI) and large language models (LLMs) like GPT-4 has paved the way for revolutionary shifts in almost all medical specialties. Orthopaedic surgery, traditionally characterized by its reliance on physical and radiographic diagnosis as well as surgical expertise, is increasingly integrating these advanced AI technologies into clinical practice. This editorial evaluates the use of LLMs in orthopaedic surgery, the influence that prominent LLMs have had on the field, and the potential these technologies hold to improve patient care and medical research in the future. Figure 1 provides a concise explanation of the operational mechanism of the LLM.
There has been an increasing interest in using LLMs in the medical field in general and orthopaedic surgery in particular. The potential use of LLMs in orthopaedic surgery has been vast and variable in fields in clinical use, research conduction, medical education, as well as patient education.
Recent studies show that AI and LLMs can aid in creating clinical letters and care strategies for typical orthopaedic scenarios in clinical practice. Large volumes of unstructured data, including imaging reports, surgical notes and patient records, are processed by these models, offering a more thorough understanding of patient conditions and generating understandable, efficient and generally accurate texts. However, occasionally, the omitted output may be inconsistent, lack key details or offer general recommendations [2, 3].
Interestingly, LLMs can offer treatment recommendations based on clinical data, such as magnetic resonance imaging reports. Still, their utility is limited by a need for further context and specificity, necessitating oversight by healthcare professionals [4].
Furthermore, LLMs can assist clinicians as early on in their careers as medical school and residency, as several noteworthy studies have demonstrated that LLMs meet the passing criteria for both the United States Medical Licensing Examination (USMLE) and the Orthopaedic board examinations [5-8].
Apart from clinicians, LLMs may deliver clear and concise answers to frequently asked patient inquiries, providing a dependable knowledge source for prevalent orthopaedic issues. Those models can condense complex facts into comprehensible summaries, which are useful for patients aiming to understand their conditions and therapy alternatives, subsequently referring them to the appropriate specialist, potentially cutting back unwarranted appointments and alleviating the workload of general physicians [3, 9, 10].
Several LLM platforms have emerged as leaders in this domain, each bringing unique strengths to orthopaedic practice. OpenAI's GPT-4, and Anthropic's Claude AI are among the most notable. These platforms are designed to understand and generate human-like text, making them invaluable tools for both clinical and research applications.
GPT-4o, has shown potential in numerous aspects of orthopaedic practice. In clinical decision-making, it can assist orthopaedic surgeons by providing quick access to relevant medical information, offering diagnostic suggestions and aiding in treatment planning [11]. In orthopaedic research, the AI model can assist in conducting literature reviews, draft research proposals and manuscripts, as well as help analyze research data [4, 12]. Moreover, when given the Orthopaedic In-Service Training Exam (OITE) as input, ChatGPT performed around the first postgraduate year level, providing a consistently logical rationale for most of the correct answers [13]. Similar results were observed with the American Board of Orthopaedic Surgery exams [5-7]. This emphasizes the model's potential in providing interactive learning experiences for medical students and residents.
Claude AI is another platform gaining attention due to its potential medical applications, particularly in orthopaedic surgery. A cross-sectional study assesses the effectiveness of LLMs in responding to surgery-related patient inquiries, focusing on accuracy, relevance, clarity and emotional sensitivity. The results indicate that LLMs perform well in these areas, with Claude surpassing ChatGPT and Google's Bard [14].
Gemini, developed by Google, is an advanced AI tool engineered to generate human-like responses by processing extensive data sets. In paediatric orthopaedics, the emerging role of LLMs, such as Gemini, in aiding clinical decision-making and patient education is gaining increasing attention [15, 16]. In comparison to evidence-based guidelines on supra-condylar and diaphyseal femur fractures, such as those provided by the American Academy of Orthopaedic Surgeons, Gemini exhibited an accuracy rate similar to that of other LLMs, including ChatGPT-4.0, in delivering treatment recommendations for common paediatric fractures. However, notable discrepancies were observed in the citation of studies, and instances of overgeneralization in management plans were present, with many responses necessitating substantial modifications [15].
Bidirectional Encoder Representations from Transformers (BERT), while not a chatbot platform, is a prominent neural network-based method for language processing. It has proven particularly valuable for natural language processing tasks in orthopaedic surgery. The recent launch of the Japanese Orthopaedic Association National Registry (JOANR) demonstrated that BERT significantly outperformed the other methods in accurately identifying key surgical details from operative records, such as the surgical approach and fixation technique. These findings suggest that BERT is the most suitable method for automating data extraction for JOANR, thereby streamlining the registration process and reducing the workload on surgeons [17].
An additional promising feature of BERT was observed in a proof-of-concept study demonstrating improved prediction accuracy for surgical case duration, based on BERT's ability to extract features from unstructured clinical data of patients [18], thereby reinforcing LLMs' ability to optimize daily clinical planning and function.
Conceptualization and methodology: A.M., L.L. and F.O; Software: J.O. and F.O.; Validation: B.Z., F.O., and F.J.; Writing—Original Draft Preparation: A.M. and S.Z.; Writing—Review and Editing: L.L, E.B and M.H.; Supervision: L.L., K.S., and E.B.
Kristian Samuelsson is a Member of the Board of Directors of Getinge AB (publ) and medtech advisor to Carl Bennet AB. The other authors declare no conflicts of interest.
This study did not require ethical approval as it did not involve human or animal subjects. Patient consent was not required for this study.