Automated classification of brain MRI reports using fine-tuned large language models.

IF 2.4 3区 医学 Q2 CLINICAL NEUROLOGY
Jun Kanzawa, Koichiro Yasaka, Nana Fujita, Shin Fujiwara, Osamu Abe
{"title":"Automated classification of brain MRI reports using fine-tuned large language models.","authors":"Jun Kanzawa, Koichiro Yasaka, Nana Fujita, Shin Fujiwara, Osamu Abe","doi":"10.1007/s00234-024-03427-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to investigate the efficacy of fine-tuned large language models (LLM) in classifying brain MRI reports into pretreatment, posttreatment, and nontumor cases.</p><p><strong>Methods: </strong>This retrospective study included 759, 284, and 164 brain MRI reports for training, validation, and test dataset. Radiologists stratified the reports into three groups: nontumor (group 1), posttreatment tumor (group 2), and pretreatment tumor (group 3) cases. A pretrained Bidirectional Encoder Representations from Transformers Japanese model was fine-tuned using the training dataset and evaluated on the validation dataset. The model which demonstrated the highest accuracy on the validation dataset was selected as the final model. Two additional radiologists were involved in classifying reports in the test datasets for the three groups. The model's performance on test dataset was compared to that of two radiologists.</p><p><strong>Results: </strong>The fine-tuned LLM attained an overall accuracy of 0.970 (95% CI: 0.930-0.990). The model's sensitivity for group 1/2/3 was 1.000/0.864/0.978. The model's specificity for group1/2/3 was 0.991/0.993/0.958. No statistically significant differences were found in terms of accuracy, sensitivity, and specificity between the LLM and human readers (p ≥ 0.371). The LLM completed the classification task approximately 20-26-fold faster than the radiologists. The area under the receiver operating characteristic curve for discriminating groups 2 and 3 from group 1 was 0.994 (95% CI: 0.982-1.000) and for discriminating group 3 from groups 1 and 2 was 0.992 (95% CI: 0.982-1.000).</p><p><strong>Conclusion: </strong>Fine-tuned LLM demonstrated a comparable performance with radiologists in classifying brain MRI reports, while requiring substantially less time.</p>","PeriodicalId":19422,"journal":{"name":"Neuroradiology","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuroradiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00234-024-03427-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: This study aimed to investigate the efficacy of fine-tuned large language models (LLM) in classifying brain MRI reports into pretreatment, posttreatment, and nontumor cases.

Methods: This retrospective study included 759, 284, and 164 brain MRI reports for training, validation, and test dataset. Radiologists stratified the reports into three groups: nontumor (group 1), posttreatment tumor (group 2), and pretreatment tumor (group 3) cases. A pretrained Bidirectional Encoder Representations from Transformers Japanese model was fine-tuned using the training dataset and evaluated on the validation dataset. The model which demonstrated the highest accuracy on the validation dataset was selected as the final model. Two additional radiologists were involved in classifying reports in the test datasets for the three groups. The model's performance on test dataset was compared to that of two radiologists.

Results: The fine-tuned LLM attained an overall accuracy of 0.970 (95% CI: 0.930-0.990). The model's sensitivity for group 1/2/3 was 1.000/0.864/0.978. The model's specificity for group1/2/3 was 0.991/0.993/0.958. No statistically significant differences were found in terms of accuracy, sensitivity, and specificity between the LLM and human readers (p ≥ 0.371). The LLM completed the classification task approximately 20-26-fold faster than the radiologists. The area under the receiver operating characteristic curve for discriminating groups 2 and 3 from group 1 was 0.994 (95% CI: 0.982-1.000) and for discriminating group 3 from groups 1 and 2 was 0.992 (95% CI: 0.982-1.000).

Conclusion: Fine-tuned LLM demonstrated a comparable performance with radiologists in classifying brain MRI reports, while requiring substantially less time.

Abstract Image

使用微调大型语言模型对脑磁共振成像报告进行自动分类。
目的:本研究旨在探讨微调大语言模型(LLM)在将脑磁共振成像报告分为治疗前、治疗后和非肿瘤病例方面的功效:这项回顾性研究的训练、验证和测试数据集分别包括 759、284 和 164 份脑磁共振成像报告。放射科医生将报告分为三组:非肿瘤病例(第 1 组)、治疗后肿瘤病例(第 2 组)和治疗前肿瘤病例(第 3 组)。利用训练数据集对预先训练的日本变形体双向编码器表征模型进行了微调,并在验证数据集上进行了评估。在验证数据集上准确率最高的模型被选为最终模型。另有两名放射科医生参与了三组测试数据集的报告分类工作。将模型在测试数据集上的表现与两名放射科医生的表现进行比较:结果:经过微调的 LLM 的总体准确率为 0.970(95% CI:0.930-0.990)。模型对 1/3 组的灵敏度为 1.000/0.864/0.978。模型对 1/2/3 组的特异性为 0.991/0.993/0.958。LLM 和人类读者在准确性、灵敏度和特异性方面没有发现明显的统计学差异(p ≥ 0.371)。LLM 完成分类任务的速度大约是放射科医生的 20-26 倍。区分第 2 组和第 3 组与第 1 组的接收器操作特征曲线下面积为 0.994(95% CI:0.982-1.000),区分第 3 组与第 1 组和第 2 组的接收器操作特征曲线下面积为 0.992(95% CI:0.982-1.000):微调 LLM 在脑部 MRI 报告分类方面的表现与放射科医生不相上下,但所需时间却大大减少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neuroradiology
Neuroradiology 医学-核医学
CiteScore
5.30
自引率
3.60%
发文量
214
审稿时长
4-8 weeks
期刊介绍: Neuroradiology aims to provide state-of-the-art medical and scientific information in the fields of Neuroradiology, Neurosciences, Neurology, Psychiatry, Neurosurgery, and related medical specialities. Neuroradiology as the official Journal of the European Society of Neuroradiology receives submissions from all parts of the world and publishes peer-reviewed original research, comprehensive reviews, educational papers, opinion papers, and short reports on exceptional clinical observations and new technical developments in the field of Neuroimaging and Neurointervention. The journal has subsections for Diagnostic and Interventional Neuroradiology, Advanced Neuroimaging, Paediatric Neuroradiology, Head-Neck-ENT Radiology, Spine Neuroradiology, and for submissions from Japan. Neuroradiology aims to provide new knowledge about and insights into the function and pathology of the human nervous system that may help to better diagnose and treat nervous system diseases. Neuroradiology is a member of the Committee on Publication Ethics (COPE) and follows the COPE core practices. Neuroradiology prefers articles that are free of bias, self-critical regarding limitations, transparent and clear in describing study participants, methods, and statistics, and short in presenting results. Before peer-review all submissions are automatically checked by iThenticate to assess for potential overlap in prior publication.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信