利用自然语言处理技术从放射学报告中预测全髋关节和膝关节置换术的患者选择。

IF 4.6 1区医学 Q1 ORTHOPEDICS

Bone & Joint Journal Pub Date : 2024-07-01 DOI:10.1302/0301-620X.106B7.BJJ-2024-0136

Luke Farrow, Mingjun Zhong, Lesley Anderson

{"title":"利用自然语言处理技术从放射学报告中预测全髋关节和膝关节置换术的患者选择。","authors":"Luke Farrow, Mingjun Zhong, Lesley Anderson","doi":"10.1302/0301-620X.106B7.BJJ-2024-0136","DOIUrl":null,"url":null,"abstract":"Aims: To examine whether natural language processing (NLP) using a clinically based large language model (LLM) could be used to predict patient selection for total hip or total knee arthroplasty (THA/TKA) from routinely available free-text radiology reports.Methods: Data pre-processing and analyses were conducted according to the Artificial intelligence to Revolutionize the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project protocol. This included use of de-identified Scottish regional clinical data of patients referred for consideration of THA/TKA, held in a secure data environment designed for artificial intelligence (AI) inference. Only preoperative radiology reports were included. NLP algorithms were based on the freely available GatorTron model, a LLM trained on over 82 billion words of de-identified clinical text. Two inference tasks were performed: assessment after model-fine tuning (50 Epochs and three cycles of k-fold cross validation), and external validation.Results: For THA, there were 5,558 patient radiology reports included, of which 4,137 were used for model training and testing, and 1,421 for external validation. Following training, model performance demonstrated average (mean across three folds) accuracy, F1 score, and area under the receiver operating curve (AUROC) values of 0.850 (95% confidence interval (CI) 0.833 to 0.867), 0.813 (95% CI 0.785 to 0.841), and 0.847 (95% CI 0.822 to 0.872), respectively. For TKA, 7,457 patient radiology reports were included, with 3,478 used for model training and testing, and 3,152 for external validation. Performance metrics included accuracy, F1 score, and AUROC values of 0.757 (95% CI 0.702 to 0.811), 0.543 (95% CI 0.479 to 0.607), and 0.717 (95% CI 0.657 to 0.778) respectively. There was a notable deterioration in performance on external validation in both cohorts.Conclusion: The use of routinely available preoperative radiology reports provides promising potential to help screen suitable candidates for THA, but not for TKA. The external validation results demonstrate the importance of further model testing and training when confronted with new clinical cohorts.","PeriodicalId":48944,"journal":{"name":"Bone & Joint Journal","volume":"106-B 7","pages":"688-695"},"PeriodicalIF":4.6000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Use of natural language processing techniques to predict patient selection for total hip and knee arthroplasty from radiology reports.\",\"authors\":\"Luke Farrow, Mingjun Zhong, Lesley Anderson\",\"doi\":\"10.1302/0301-620X.106B7.BJJ-2024-0136\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aims: To examine whether natural language processing (NLP) using a clinically based large language model (LLM) could be used to predict patient selection for total hip or total knee arthroplasty (THA/TKA) from routinely available free-text radiology reports.Methods: Data pre-processing and analyses were conducted according to the Artificial intelligence to Revolutionize the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project protocol. This included use of de-identified Scottish regional clinical data of patients referred for consideration of THA/TKA, held in a secure data environment designed for artificial intelligence (AI) inference. Only preoperative radiology reports were included. NLP algorithms were based on the freely available GatorTron model, a LLM trained on over 82 billion words of de-identified clinical text. Two inference tasks were performed: assessment after model-fine tuning (50 Epochs and three cycles of k-fold cross validation), and external validation.Results: For THA, there were 5,558 patient radiology reports included, of which 4,137 were used for model training and testing, and 1,421 for external validation. Following training, model performance demonstrated average (mean across three folds) accuracy, F1 score, and area under the receiver operating curve (AUROC) values of 0.850 (95% confidence interval (CI) 0.833 to 0.867), 0.813 (95% CI 0.785 to 0.841), and 0.847 (95% CI 0.822 to 0.872), respectively. For TKA, 7,457 patient radiology reports were included, with 3,478 used for model training and testing, and 3,152 for external validation. Performance metrics included accuracy, F1 score, and AUROC values of 0.757 (95% CI 0.702 to 0.811), 0.543 (95% CI 0.479 to 0.607), and 0.717 (95% CI 0.657 to 0.778) respectively. There was a notable deterioration in performance on external validation in both cohorts.Conclusion: The use of routinely available preoperative radiology reports provides promising potential to help screen suitable candidates for THA, but not for TKA. The external validation results demonstrate the importance of further model testing and training when confronted with new clinical cohorts.\",\"PeriodicalId\":48944,\"journal\":{\"name\":\"Bone & Joint Journal\",\"volume\":\"106-B 7\",\"pages\":\"688-695\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bone & Joint Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1302/0301-620X.106B7.BJJ-2024-0136\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bone & Joint Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1302/0301-620X.106B7.BJJ-2024-0136","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

摘要

目的：研究是否可以使用基于临床的大型语言模型（LLM）进行自然语言处理（NLP），以便从日常可用的自由文本放射学报告中预测全髋关节或全膝关节置换术（THA/TKA）的患者选择：根据人工智能革新髋关节和膝关节置换术患者护理路径（ARCHERY）项目协议进行数据预处理和分析。其中包括使用苏格兰地区转诊考虑 THA/TKA 患者的去标识化临床数据，这些数据保存在专为人工智能（AI）推理设计的安全数据环境中。其中只包括术前放射学报告。NLP 算法基于免费提供的 GatorTron 模型，该模型是在超过 820 亿字的去标识化临床文本基础上训练而成的 LLM。进行了两项推理任务：模型微调后的评估（50 个 Epochs 和三个周期的 k-fold 交叉验证）以及外部验证：就 THA 而言，共有 5,558 份患者放射学报告，其中 4,137 份用于模型训练和测试，1,421 份用于外部验证。训练后，模型表现出的平均（三折平均值）准确率、F1得分和接收者操作曲线下面积（AUROC）值分别为0.850（95% 置信区间（CI）0.833 至 0.867）、0.813（95% CI 0.785 至 0.841）和 0.847（95% CI 0.822 至 0.872）。对于 TKA，共纳入 7,457 份患者放射学报告，其中 3,478 份用于模型训练和测试，3,152 份用于外部验证。性能指标包括准确度、F1 分数和 AUROC 值，分别为 0.757（95% CI 0.702 至 0.811）、0.543（95% CI 0.479 至 0.607）和 0.717（95% CI 0.657 至 0.778）。两组患者的外部验证结果均明显下降：结论：使用常规的术前放射学报告可帮助筛选THA的合适候选者，但TKA则不然。外部验证结果表明，在面对新的临床队列时，进一步的模型测试和训练非常重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Use of natural language processing techniques to predict patient selection for total hip and knee arthroplasty from radiology reports.

Aims: To examine whether natural language processing (NLP) using a clinically based large language model (LLM) could be used to predict patient selection for total hip or total knee arthroplasty (THA/TKA) from routinely available free-text radiology reports.

Methods: Data pre-processing and analyses were conducted according to the Artificial intelligence to Revolutionize the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project protocol. This included use of de-identified Scottish regional clinical data of patients referred for consideration of THA/TKA, held in a secure data environment designed for artificial intelligence (AI) inference. Only preoperative radiology reports were included. NLP algorithms were based on the freely available GatorTron model, a LLM trained on over 82 billion words of de-identified clinical text. Two inference tasks were performed: assessment after model-fine tuning (50 Epochs and three cycles of k-fold cross validation), and external validation.

Results: For THA, there were 5,558 patient radiology reports included, of which 4,137 were used for model training and testing, and 1,421 for external validation. Following training, model performance demonstrated average (mean across three folds) accuracy, F1 score, and area under the receiver operating curve (AUROC) values of 0.850 (95% confidence interval (CI) 0.833 to 0.867), 0.813 (95% CI 0.785 to 0.841), and 0.847 (95% CI 0.822 to 0.872), respectively. For TKA, 7,457 patient radiology reports were included, with 3,478 used for model training and testing, and 3,152 for external validation. Performance metrics included accuracy, F1 score, and AUROC values of 0.757 (95% CI 0.702 to 0.811), 0.543 (95% CI 0.479 to 0.607), and 0.717 (95% CI 0.657 to 0.778) respectively. There was a notable deterioration in performance on external validation in both cohorts.

Conclusion: The use of routinely available preoperative radiology reports provides promising potential to help screen suitable candidates for THA, but not for TKA. The external validation results demonstrate the importance of further model testing and training when confronted with new clinical cohorts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bone & Joint Journal ORTHOPEDICS-SURGERY

CiteScore

9.40

自引率

10.90%

发文量

318

期刊介绍： We welcome original articles from any part of the world. The papers are assessed by members of the Editorial Board and our international panel of expert reviewers, then either accepted for publication or rejected by the Editor. We receive over 2000 submissions each year and accept about 250 for publication, many after revisions recommended by the reviewers, editors or statistical advisers. A decision usually takes between six and eight weeks. Each paper is assessed by two reviewers with a special interest in the subject covered by the paper, and also by members of the editorial team. Controversial papers will be discussed at a full meeting of the Editorial Board. Publication is between four and six months after acceptance.