Enhancing radiomics features via a large language model for classifying benign and malignant breast tumors in mammography

IF 4.9 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2025-04-03 DOI:10.1016/j.cmpb.2025.108765

Sinyoung Ra , Jonghun Kim , Inye Na , Eun Sook Ko , Hyunjin Park

{"title":"Enhancing radiomics features via a large language model for classifying benign and malignant breast tumors in mammography","authors":"Sinyoung Ra , Jonghun Kim , Inye Na , Eun Sook Ko , Hyunjin Park","doi":"10.1016/j.cmpb.2025.108765","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Radiomics is widely used to assist in clinical decision-making, disease diagnosis, and treatment planning for various target organs, including the breast. Recent advances in large language models (LLMs) have helped enhance radiomics analysis.</div></div><div><h3>Materials and Methods</h3><div>Herein, we sought to improve radiomics analysis by incorporating LLM-learned clinical knowledge, to classify benign and malignant tumors in breast mammography. We extracted radiomics features from the mammograms based on the region of interest and retained the features related to the target task. Using prompt engineering, we devised an input sequence that reflected the selected features and the target task. The input sequence was fed to the chosen LLM (LLaMA variant), which was fine-tuned using low-rank adaptation to enhance radiomics features. This was then evaluated on two mammogram datasets (VinDr-Mammo and INbreast) against conventional baselines.</div></div><div><h3>Results</h3><div>The enhanced radiomics-based method performed better than baselines using conventional radiomics features tested on two mammogram datasets, achieving accuracies of 0.671 for the VinDr-Mammo dataset and 0.839 for the INbreast dataset. Conventional radiomics models require retraining from scratch for an unseen dataset using a new set of features. In contrast, the model developed in this study effectively reused the common features between the training and unseen datasets by explicitly linking feature names with feature values, leading to extensible learning across datasets. Our method performed better than the baseline method in this retraining setting using an unseen dataset.</div></div><div><h3>Conclusions</h3><div>Our method, one of the first to incorporate LLM into radiomics, has the potential to improve radiomics analysis.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"265 ","pages":"Article 108765"},"PeriodicalIF":4.9000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725001828","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objectives

Radiomics is widely used to assist in clinical decision-making, disease diagnosis, and treatment planning for various target organs, including the breast. Recent advances in large language models (LLMs) have helped enhance radiomics analysis.

Materials and Methods

Herein, we sought to improve radiomics analysis by incorporating LLM-learned clinical knowledge, to classify benign and malignant tumors in breast mammography. We extracted radiomics features from the mammograms based on the region of interest and retained the features related to the target task. Using prompt engineering, we devised an input sequence that reflected the selected features and the target task. The input sequence was fed to the chosen LLM (LLaMA variant), which was fine-tuned using low-rank adaptation to enhance radiomics features. This was then evaluated on two mammogram datasets (VinDr-Mammo and INbreast) against conventional baselines.

Results

The enhanced radiomics-based method performed better than baselines using conventional radiomics features tested on two mammogram datasets, achieving accuracies of 0.671 for the VinDr-Mammo dataset and 0.839 for the INbreast dataset. Conventional radiomics models require retraining from scratch for an unseen dataset using a new set of features. In contrast, the model developed in this study effectively reused the common features between the training and unseen datasets by explicitly linking feature names with feature values, leading to extensible learning across datasets. Our method performed better than the baseline method in this retraining setting using an unseen dataset.

Conclusions

Our method, one of the first to incorporate LLM into radiomics, has the potential to improve radiomics analysis.

查看原文本刊更多论文

通过一个大型语言模型增强放射组学特征，用于乳腺x光检查中良恶性肿瘤的分类

背景与目的放射组学广泛应用于包括乳腺在内的各种靶器官的临床决策、疾病诊断和治疗计划。大型语言模型（llm）的最新进展有助于增强放射组学分析。材料与方法在本文中，我们试图通过结合法学硕士学习的临床知识来改进放射组学分析，以区分乳腺x光检查中的良恶性肿瘤。我们基于感兴趣的区域从乳房x光片中提取放射组学特征，并保留与目标任务相关的特征。使用提示工程，我们设计了一个反映所选特征和目标任务的输入序列。将输入序列馈送到选定的LLM （LLaMA变体）中，使用低秩自适应对其进行微调以增强放射组学特征。然后根据常规基线对两个乳房x光检查数据集（vdr - mamo和INbreast）进行评估。结果增强的基于放射组学的方法在两个乳房x线照片数据集上的表现优于使用传统放射组学特征的基线，VinDr-Mammo数据集的准确率为0.671，INbreast数据集的准确率为0.839。传统的放射组学模型需要使用一组新的特征从头开始重新训练一个看不见的数据集。相比之下，本研究中开发的模型通过显式地将特征名称与特征值联系起来，有效地重用了训练数据集和未见数据集之间的公共特征，从而实现了跨数据集的可扩展学习。在使用未见过的数据集进行再训练时，我们的方法比基线方法表现得更好。结论sour方法是最早将LLM纳入放射组学的方法之一，具有提高放射组学分析水平的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.