Sinyoung Ra , Jonghun Kim , Inye Na , Eun Sook Ko , Hyunjin Park
{"title":"Enhancing radiomics features via a large language model for classifying benign and malignant breast tumors in mammography","authors":"Sinyoung Ra , Jonghun Kim , Inye Na , Eun Sook Ko , Hyunjin Park","doi":"10.1016/j.cmpb.2025.108765","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Radiomics is widely used to assist in clinical decision-making, disease diagnosis, and treatment planning for various target organs, including the breast. Recent advances in large language models (LLMs) have helped enhance radiomics analysis.</div></div><div><h3>Materials and Methods</h3><div>Herein, we sought to improve radiomics analysis by incorporating LLM-learned clinical knowledge, to classify benign and malignant tumors in breast mammography. We extracted radiomics features from the mammograms based on the region of interest and retained the features related to the target task. Using prompt engineering, we devised an input sequence that reflected the selected features and the target task. The input sequence was fed to the chosen LLM (LLaMA variant), which was fine-tuned using low-rank adaptation to enhance radiomics features. This was then evaluated on two mammogram datasets (VinDr-Mammo and INbreast) against conventional baselines.</div></div><div><h3>Results</h3><div>The enhanced radiomics-based method performed better than baselines using conventional radiomics features tested on two mammogram datasets, achieving accuracies of 0.671 for the VinDr-Mammo dataset and 0.839 for the INbreast dataset. Conventional radiomics models require retraining from scratch for an unseen dataset using a new set of features. In contrast, the model developed in this study effectively reused the common features between the training and unseen datasets by explicitly linking feature names with feature values, leading to extensible learning across datasets. Our method performed better than the baseline method in this retraining setting using an unseen dataset.</div></div><div><h3>Conclusions</h3><div>Our method, one of the first to incorporate LLM into radiomics, has the potential to improve radiomics analysis.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"265 ","pages":"Article 108765"},"PeriodicalIF":4.9000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725001828","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Background and Objectives
Radiomics is widely used to assist in clinical decision-making, disease diagnosis, and treatment planning for various target organs, including the breast. Recent advances in large language models (LLMs) have helped enhance radiomics analysis.
Materials and Methods
Herein, we sought to improve radiomics analysis by incorporating LLM-learned clinical knowledge, to classify benign and malignant tumors in breast mammography. We extracted radiomics features from the mammograms based on the region of interest and retained the features related to the target task. Using prompt engineering, we devised an input sequence that reflected the selected features and the target task. The input sequence was fed to the chosen LLM (LLaMA variant), which was fine-tuned using low-rank adaptation to enhance radiomics features. This was then evaluated on two mammogram datasets (VinDr-Mammo and INbreast) against conventional baselines.
Results
The enhanced radiomics-based method performed better than baselines using conventional radiomics features tested on two mammogram datasets, achieving accuracies of 0.671 for the VinDr-Mammo dataset and 0.839 for the INbreast dataset. Conventional radiomics models require retraining from scratch for an unseen dataset using a new set of features. In contrast, the model developed in this study effectively reused the common features between the training and unseen datasets by explicitly linking feature names with feature values, leading to extensible learning across datasets. Our method performed better than the baseline method in this retraining setting using an unseen dataset.
Conclusions
Our method, one of the first to incorporate LLM into radiomics, has the potential to improve radiomics analysis.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.