Managing class imbalance in the training of a large language model to predict patient selection for total knee arthroplasty: Results from the Artificial intelligence to Revolutionise the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project
{"title":"Managing class imbalance in the training of a large language model to predict patient selection for total knee arthroplasty: Results from the Artificial intelligence to Revolutionise the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project","authors":"Luke Farrow, Lesley Anderson, Mingjun Zhong","doi":"10.1016/j.knee.2025.02.007","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>This study set out to test the efficacy of different techniques used to manage to class imbalance, a type of data bias, in application of a large language model (LLM) to predict patient selection for total knee arthroplasty (TKA).</div></div><div><h3>Methods</h3><div>This study utilised data from the Artificial Intelligence to Revolutionise the Patient Care Pathway in Hip and Knee Arthroplasty (ARCHERY) project (ISRCTN18398037). Data included the pre-operative radiology reports of patients referred to secondary care for knee-related complaints from within the North of Scotland. A clinically based LLM (GatorTron) was trained regarding prediction of selection for TKA. Three methods for managing class imbalance were assessed: a standard model, use of class weighting, and majority class undersampling.</div></div><div><h3>Results</h3><div>A total of 7707 individual knee radiology reports were included (dated from 2015 to 2022). The mean text length was 74 words (range 26–275). Only 910/7707 (11.8%) patients underwent TKA surgery (the designated ‘minority class’). Class weighting technique performed better for minority class discrimination and calibration compared with the other two techniques (Recall 0.61/AUROC 0.73 for class weighting compared with 0.54/0.70 and 0.59/0.72 for the standard model and majority class undersampling, respectively. There was also significant data loss for majority class undersampling when compared with class-weighting.</div></div><div><h3>Conclusion</h3><div>Use of class-weighting appears to provide the optimal method of training a an LLM to perform analytical tasks on free-text clinical information in the face of significant data bias (‘class imbalance’). Such knowledge is an important consideration in the development of high-performance clinical AI models within Trauma and Orthopaedics.</div></div>","PeriodicalId":56110,"journal":{"name":"Knee","volume":"54 ","pages":"Pages 1-8"},"PeriodicalIF":1.6000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knee","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968016025000213","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
This study set out to test the efficacy of different techniques used to manage to class imbalance, a type of data bias, in application of a large language model (LLM) to predict patient selection for total knee arthroplasty (TKA).
Methods
This study utilised data from the Artificial Intelligence to Revolutionise the Patient Care Pathway in Hip and Knee Arthroplasty (ARCHERY) project (ISRCTN18398037). Data included the pre-operative radiology reports of patients referred to secondary care for knee-related complaints from within the North of Scotland. A clinically based LLM (GatorTron) was trained regarding prediction of selection for TKA. Three methods for managing class imbalance were assessed: a standard model, use of class weighting, and majority class undersampling.
Results
A total of 7707 individual knee radiology reports were included (dated from 2015 to 2022). The mean text length was 74 words (range 26–275). Only 910/7707 (11.8%) patients underwent TKA surgery (the designated ‘minority class’). Class weighting technique performed better for minority class discrimination and calibration compared with the other two techniques (Recall 0.61/AUROC 0.73 for class weighting compared with 0.54/0.70 and 0.59/0.72 for the standard model and majority class undersampling, respectively. There was also significant data loss for majority class undersampling when compared with class-weighting.
Conclusion
Use of class-weighting appears to provide the optimal method of training a an LLM to perform analytical tasks on free-text clinical information in the face of significant data bias (‘class imbalance’). Such knowledge is an important consideration in the development of high-performance clinical AI models within Trauma and Orthopaedics.
期刊介绍:
The Knee is an international journal publishing studies on the clinical treatment and fundamental biomechanical characteristics of this joint. The aim of the journal is to provide a vehicle relevant to surgeons, biomedical engineers, imaging specialists, materials scientists, rehabilitation personnel and all those with an interest in the knee.
The topics covered include, but are not limited to:
• Anatomy, physiology, morphology and biochemistry;
• Biomechanical studies;
• Advances in the development of prosthetic, orthotic and augmentation devices;
• Imaging and diagnostic techniques;
• Pathology;
• Trauma;
• Surgery;
• Rehabilitation.