Classification of Interventional Radiology Reports into Technique Categories with a Fine-Tuned Large Language Model.

Journal of imaging informatics in medicine Pub Date : 2024-12-13 DOI:10.1007/s10278-024-01370-w

Koichiro Yasaka, Takuto Nomura, Jun Kamohara, Hiroshi Hirakawa, Takatoshi Kubo, Shigeru Kiryu, Osamu Abe

{"title":"Classification of Interventional Radiology Reports into Technique Categories with a Fine-Tuned Large Language Model.","authors":"Koichiro Yasaka, Takuto Nomura, Jun Kamohara, Hiroshi Hirakawa, Takatoshi Kubo, Shigeru Kiryu, Osamu Abe","doi":"10.1007/s10278-024-01370-w","DOIUrl":null,"url":null,"abstract":"<p><p>The aim of this study is to develop a fine-tuned large language model that classifies interventional radiology reports into technique categories and to compare its performance with readers. This retrospective study included 3198 patients (1758 males and 1440 females; age, 62.8 ± 16.8 years) who underwent interventional radiology from January 2018 to July 2024. Training, validation, and test datasets involved 2292, 250, and 656 patients, respectively. Input data involved texts in clinical indication, imaging diagnosis, and image-finding sections of interventional radiology reports. Manually classified technique categories (15 categories in total) were utilized as reference data. Fine-tuning of the Bidirectional Encoder Representations model was performed using training and validation datasets. This process was repeated 15 times due to the randomness of the learning process. The best-performed model, which showed the highest accuracy among 15 trials, was selected to further evaluate its performance in the independent test dataset. The report classification involved one radiologist (reader 1) and two radiology residents (readers 2 and 3). The accuracy and macrosensitivity (average of each category's sensitivity) of the best-performed model in the validation dataset were 0.996 and 0.994, respectively. For the test dataset, the accuracy/macrosensitivity were 0.988/0.980, 0.986/0.977, 0.989/0.979, and 0.988/0.980 in the best model, reader 1, reader 2, and reader 3, respectively. The model required 0.178 s required for classification per patient, which was 17.5-19.9 times faster than readers. In conclusion, fine-tuned large language model classified interventional radiology reports into technique categories with high accuracy similar to readers within a remarkably shorter time.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-024-01370-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The aim of this study is to develop a fine-tuned large language model that classifies interventional radiology reports into technique categories and to compare its performance with readers. This retrospective study included 3198 patients (1758 males and 1440 females; age, 62.8 ± 16.8 years) who underwent interventional radiology from January 2018 to July 2024. Training, validation, and test datasets involved 2292, 250, and 656 patients, respectively. Input data involved texts in clinical indication, imaging diagnosis, and image-finding sections of interventional radiology reports. Manually classified technique categories (15 categories in total) were utilized as reference data. Fine-tuning of the Bidirectional Encoder Representations model was performed using training and validation datasets. This process was repeated 15 times due to the randomness of the learning process. The best-performed model, which showed the highest accuracy among 15 trials, was selected to further evaluate its performance in the independent test dataset. The report classification involved one radiologist (reader 1) and two radiology residents (readers 2 and 3). The accuracy and macrosensitivity (average of each category's sensitivity) of the best-performed model in the validation dataset were 0.996 and 0.994, respectively. For the test dataset, the accuracy/macrosensitivity were 0.988/0.980, 0.986/0.977, 0.989/0.979, and 0.988/0.980 in the best model, reader 1, reader 2, and reader 3, respectively. The model required 0.178 s required for classification per patient, which was 17.5-19.9 times faster than readers. In conclusion, fine-tuned large language model classified interventional radiology reports into technique categories with high accuracy similar to readers within a remarkably shorter time.

查看原文本刊更多论文

基于微调大语言模型的介入放射学报告技术分类。

本研究的目的是开发一个微调的大型语言模型，将介入放射学报告分类为技术类别，并与读者比较其表现。本回顾性研究纳入3198例患者(男性1758例，女性1440例；年龄（62.8±16.8岁），于2018年1月至2024年7月接受介入放疗。训练、验证和测试数据集分别涉及2292、250和656名患者。输入的数据包括介入放射学报告的临床指征、影像诊断和影像查找部分的文本。采用人工分类技术类别（共15个类别）作为参考数据。使用训练和验证数据集对双向编码器表示模型进行微调。由于学习过程的随机性，这个过程重复了15次。选择15个试验中表现最好的模型，在独立测试数据集中进一步评估其性能。报告分类涉及一名放射科医生（读者1）和两名放射科住院医师（读者2和3）。验证数据集中表现最佳的模型的准确性和宏观灵敏度（每个类别灵敏度的平均值）分别为0.996和0.994。对于测试数据集，最佳模型、阅读器1、阅读器2和阅读器3的准确度/宏观灵敏度分别为0.988/0.980、0.986/0.977、0.989/0.979和0.988/0.980。该模型每例患者的分类时间为0.178 s，比读者快17.5-19.9倍。综上所述，经过微调的大型语言模型在显著缩短的时间内将介入放射学报告分类为与读者相似的高精度技术类别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of imaging informatics in medicine

自引率

0.00%

发文量