Zhuoqi Ma , Lulu Bi , Paige Collins , Owen Leary , Maliha Imami , Zhusi Zhong , Shaolei Lu , Grayson Baird , Nikos Tapinos , Ugur Cetintemel , Harrison Bai , Jerrold Boxerman , Zhicheng Jiao
{"title":"Large language model-based multi-source integration pipeline for automated diagnostic classification and zero-shot prognoses for brain tumor","authors":"Zhuoqi Ma , Lulu Bi , Paige Collins , Owen Leary , Maliha Imami , Zhusi Zhong , Shaolei Lu , Grayson Baird , Nikos Tapinos , Ugur Cetintemel , Harrison Bai , Jerrold Boxerman , Zhicheng Jiao","doi":"10.1016/j.metrad.2025.100150","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>In this study, we use large language models (LLMs) to integrate information from multi-source medical reports to enhance the accuracy of automated diagnostic classification and prognosis for brain tumors.</div></div><div><h3>Materials and methods</h3><div>Brain MRI reports from a cohort of 426 brain tumor patients were manually labeled for tumor presence and stability. Pathology reports from the same cohort were incorporated as an additional information source. A pre-trained LLM was used to extract features from the multi-source reports, and a Multi-layer perceptron (MLP) was trained for classification tasks. Model performance was evaluated on the test set using Micro F1 scores and AUROCs. The model’s zero-shot prognostic capability was validated on an independent cohort of 33 glioblastoma patients.</div></div><div><h3>Results</h3><div>Micro F1-score 0.849 (95%CI: 0.814, 0.880) for tumor presence classification and 0.929 (95%CI: 0.904, 0.954) for tumor stability classification are reached. Compared to using solely radiology reports, the developed model showed improvements on Micro F1 of 10.4 % for tumor presence and 5.6 % for stability classification. Log-rank tests confirmed significant distinction between the high- and low-risk patient groups stratified by model-predicted “Tumor Stability” label (<em>p</em>-value = 0.017), confirming the prognostic value of the model-generated labels.</div></div><div><h3>Conclusion</h3><div>This study developed a multi-source integration model based on LLMs for automated diagnostic classification and zero-shot prognosis of brain tumors. The integration of multi-source reports improved classification accuracy compared to single-source reports. Predicted tumor stability labels demonstrated survival prognostic capabilities. These findings confirm the potential of LLMs in brain tumor research, supporting precision diagnostics and prognosis.</div></div>","PeriodicalId":100921,"journal":{"name":"Meta-Radiology","volume":"3 2","pages":"Article 100150"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meta-Radiology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950162825000189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
In this study, we use large language models (LLMs) to integrate information from multi-source medical reports to enhance the accuracy of automated diagnostic classification and prognosis for brain tumors.
Materials and methods
Brain MRI reports from a cohort of 426 brain tumor patients were manually labeled for tumor presence and stability. Pathology reports from the same cohort were incorporated as an additional information source. A pre-trained LLM was used to extract features from the multi-source reports, and a Multi-layer perceptron (MLP) was trained for classification tasks. Model performance was evaluated on the test set using Micro F1 scores and AUROCs. The model’s zero-shot prognostic capability was validated on an independent cohort of 33 glioblastoma patients.
Results
Micro F1-score 0.849 (95%CI: 0.814, 0.880) for tumor presence classification and 0.929 (95%CI: 0.904, 0.954) for tumor stability classification are reached. Compared to using solely radiology reports, the developed model showed improvements on Micro F1 of 10.4 % for tumor presence and 5.6 % for stability classification. Log-rank tests confirmed significant distinction between the high- and low-risk patient groups stratified by model-predicted “Tumor Stability” label (p-value = 0.017), confirming the prognostic value of the model-generated labels.
Conclusion
This study developed a multi-source integration model based on LLMs for automated diagnostic classification and zero-shot prognosis of brain tumors. The integration of multi-source reports improved classification accuracy compared to single-source reports. Predicted tumor stability labels demonstrated survival prognostic capabilities. These findings confirm the potential of LLMs in brain tumor research, supporting precision diagnostics and prognosis.