Managing class imbalance in the training of a large language model to predict patient selection for total knee arthroplasty: Results from the Artificial intelligence to Revolutionise the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project

IF 1.6 4区医学 Q3 ORTHOPEDICS

Knee Pub Date : 2025-02-28 DOI:10.1016/j.knee.2025.02.007

Luke Farrow, Lesley Anderson, Mingjun Zhong

{"title":"Managing class imbalance in the training of a large language model to predict patient selection for total knee arthroplasty: Results from the Artificial intelligence to Revolutionise the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project","authors":"Luke Farrow, Lesley Anderson, Mingjun Zhong","doi":"10.1016/j.knee.2025.02.007","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>This study set out to test the efficacy of different techniques used to manage to class imbalance, a type of data bias, in application of a large language model (LLM) to predict patient selection for total knee arthroplasty (TKA).</div></div><div><h3>Methods</h3><div>This study utilised data from the Artificial Intelligence to Revolutionise the Patient Care Pathway in Hip and Knee Arthroplasty (ARCHERY) project (ISRCTN18398037). Data included the pre-operative radiology reports of patients referred to secondary care for knee-related complaints from within the North of Scotland. A clinically based LLM (GatorTron) was trained regarding prediction of selection for TKA. Three methods for managing class imbalance were assessed: a standard model, use of class weighting, and majority class undersampling.</div></div><div><h3>Results</h3><div>A total of 7707 individual knee radiology reports were included (dated from 2015 to 2022). The mean text length was 74 words (range 26–275). Only 910/7707 (11.8%) patients underwent TKA surgery (the designated ‘minority class’). Class weighting technique performed better for minority class discrimination and calibration compared with the other two techniques (Recall 0.61/AUROC 0.73 for class weighting compared with 0.54/0.70 and 0.59/0.72 for the standard model and majority class undersampling, respectively. There was also significant data loss for majority class undersampling when compared with class-weighting.</div></div><div><h3>Conclusion</h3><div>Use of class-weighting appears to provide the optimal method of training a an LLM to perform analytical tasks on free-text clinical information in the face of significant data bias (‘class imbalance’). Such knowledge is an important consideration in the development of high-performance clinical AI models within Trauma and Orthopaedics.</div></div>","PeriodicalId":56110,"journal":{"name":"Knee","volume":"54 ","pages":"Pages 1-8"},"PeriodicalIF":1.6000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knee","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968016025000213","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction

This study set out to test the efficacy of different techniques used to manage to class imbalance, a type of data bias, in application of a large language model (LLM) to predict patient selection for total knee arthroplasty (TKA).

Methods

This study utilised data from the Artificial Intelligence to Revolutionise the Patient Care Pathway in Hip and Knee Arthroplasty (ARCHERY) project (ISRCTN18398037). Data included the pre-operative radiology reports of patients referred to secondary care for knee-related complaints from within the North of Scotland. A clinically based LLM (GatorTron) was trained regarding prediction of selection for TKA. Three methods for managing class imbalance were assessed: a standard model, use of class weighting, and majority class undersampling.

Results

A total of 7707 individual knee radiology reports were included (dated from 2015 to 2022). The mean text length was 74 words (range 26–275). Only 910/7707 (11.8%) patients underwent TKA surgery (the designated ‘minority class’). Class weighting technique performed better for minority class discrimination and calibration compared with the other two techniques (Recall 0.61/AUROC 0.73 for class weighting compared with 0.54/0.70 and 0.59/0.72 for the standard model and majority class undersampling, respectively. There was also significant data loss for majority class undersampling when compared with class-weighting.

Conclusion

Use of class-weighting appears to provide the optimal method of training a an LLM to perform analytical tasks on free-text clinical information in the face of significant data bias (‘class imbalance’). Such knowledge is an important consideration in the development of high-performance clinical AI models within Trauma and Orthopaedics.

查看原文本刊更多论文

管理大语言模型训练中的班级不平衡，以预测全膝关节置换术患者的选择：人工智能在髋关节和膝关节置换术（ARCHERY）项目中彻底改变患者护理途径的结果

本研究旨在测试不同技术对分类不平衡（一种数据偏差）的处理效果，并应用大型语言模型（LLM）预测患者选择全膝关节置换术（TKA）。方法本研究利用人工智能革新髋关节和膝关节置换术（ARCHERY）项目（ISRCTN18398037）中的患者护理途径的数据。数据包括来自苏格兰北部的患者因膝关节相关投诉转介至二级护理的术前放射学报告。一个基于临床的LLM （GatorTron）进行了关于TKA选择预测的训练。评估了管理班级失衡的三种方法：标准模型、班级加权的使用和大多数班级欠抽样。结果共纳入7707例个体膝关节放射学报告（时间为2015 - 2022年）。平均文本长度为74个单词（范围为26-275）。只有910/7707例（11.8%）患者接受了TKA手术（指定的“少数类别”）。与其他两种技术相比，类加权技术在少数类别区分和校准方面表现更好（类加权的召回率为0.61/AUROC 0.73，而标准模型和多数类别欠采样的召回率分别为0.54/0.70和0.59/0.72）。与类加权相比，大多数类欠采样也有显著的数据丢失。在面对显著的数据偏倚（“类不平衡”）时，使用类加权似乎提供了训练法学硕士在自由文本临床信息上执行分析任务的最佳方法。这些知识是在创伤和骨科中开发高性能临床人工智能模型的重要考虑因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knee 医学-外科

CiteScore

3.80

自引率

5.30%

发文量

171

审稿时长

6 months

期刊介绍： The Knee is an international journal publishing studies on the clinical treatment and fundamental biomechanical characteristics of this joint. The aim of the journal is to provide a vehicle relevant to surgeons, biomedical engineers, imaging specialists, materials scientists, rehabilitation personnel and all those with an interest in the knee. The topics covered include, but are not limited to: • Anatomy, physiology, morphology and biochemistry; • Biomechanical studies; • Advances in the development of prosthetic, orthotic and augmentation devices; • Imaging and diagnostic techniques; • Pathology; • Trauma; • Surgery; • Rehabilitation.