xr - llava：用于解释胸部x线图像的多模态大语言模型。

IF 4.7 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

European Radiology Pub Date : 2025-07-01 Epub Date: 2025-01-15 DOI:10.1007/s00330-024-11339-6

Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon

{"title":"xr - llava：用于解释胸部x线图像的多模态大语言模型。","authors":"Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon","doi":"10.1007/s00330-024-11339-6","DOIUrl":null,"url":null,"abstract":"Objective: This study aimed to develop an open-source multimodal large language model (CXR-LLaVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists.Materials and methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLaVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting.Results: The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.56 for six major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports.Conclusion: This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts.Key points: Question How can a multimodal large language model be adapted to interpret chest X-rays and generate radiologic reports? Findings The developed CXR-LLaVA model effectively detects major pathological findings in chest X-rays and generates radiologic reports with a higher accuracy compared to general-purpose models. Clinical relevance This study demonstrates the potential of multimodal large language models to support radiologists by autonomously generating chest X-ray reports, potentially reducing diagnostic workloads and improving radiologist efficiency.","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":" ","pages":"4374-4386"},"PeriodicalIF":4.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12166004/pdf/","citationCount":"0","resultStr":"{\"title\":\"CXR-LLaVA: a multimodal large language model for interpreting chest X-ray images.\",\"authors\":\"Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon\",\"doi\":\"10.1007/s00330-024-11339-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: This study aimed to develop an open-source multimodal large language model (CXR-LLaVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists.Materials and methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLaVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting.Results: The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.56 for six major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports.Conclusion: This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts.Key points: Question How can a multimodal large language model be adapted to interpret chest X-rays and generate radiologic reports? Findings The developed CXR-LLaVA model effectively detects major pathological findings in chest X-rays and generates radiologic reports with a higher accuracy compared to general-purpose models. Clinical relevance This study demonstrates the potential of multimodal large language models to support radiologists by autonomously generating chest X-ray reports, potentially reducing diagnostic workloads and improving radiologist efficiency.\",\"PeriodicalId\":12076,\"journal\":{\"name\":\"European Radiology\",\"volume\":\" \",\"pages\":\"4374-4386\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12166004/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s00330-024-11339-6\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-024-11339-6","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

目的：本研究旨在开发一个开源的多模态大型语言模型（CXR-LLaVA），用于解释胸部x射线图像（cxr），利用大型语言模型（llm）的最新进展，有可能复制人类放射科医生的图像解释技能。材料和方法：对于训练，我们收集了592,580个公开可用的cxr，其中374,881个具有特定放射学异常的标签（数据集1），217,699个提供自由文本放射学报告（数据集2）。在使用数据集1预训练视觉转换器后，我们将其与受LLaVA网络影响的LLM集成。然后，对模型进行微调，主要使用数据集2。评估了该模型对主要病理发现的诊断性能，以及人类放射科医生对放射学报告的可接受性，以衡量其自主报告的潜力。结果：该模型在测试集中表现出色，在MIMIC内部测试集中，6项主要病理结果的F1平均得分为0.81，在外部测试集中，6项主要病理结果的F1平均得分为0.56。在两组测试中，该模型的F1分数都超过了GPT-4-vision和Gemini-Pro-Vision。在人类放射科医生对外部测试集的评估中，该模型在自主报告中取得了72.7%的成功率，略低于地面真实报告的84.0%。结论：本研究强调了多模式llm在CXR解释方面的巨大潜力，同时也承认了其性能局限性。尽管存在这些挑战，但我们相信将我们的模型开源将促进进一步的研究，扩大其在各种临床环境中的有效性和适用性。多模态大语言模型如何适用于解释胸部x光片并生成放射学报告？与通用模型相比，开发的CXR-LLaVA模型可以有效地检测胸部x射线中的主要病理发现，并生成更高准确性的放射学报告。本研究证明了多模态大语言模型的潜力，通过自主生成胸部x射线报告来支持放射科医生，有可能减少诊断工作量并提高放射科医生的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CXR-LLaVA: a multimodal large language model for interpreting chest X-ray images.

Objective: This study aimed to develop an open-source multimodal large language model (CXR-LLaVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists.

Materials and methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLaVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting.

Results: The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.56 for six major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports.

Conclusion: This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts.

Key points: Question How can a multimodal large language model be adapted to interpret chest X-rays and generate radiologic reports? Findings The developed CXR-LLaVA model effectively detects major pathological findings in chest X-rays and generates radiologic reports with a higher accuracy compared to general-purpose models. Clinical relevance This study demonstrates the potential of multimodal large language models to support radiologists by autonomously generating chest X-ray reports, potentially reducing diagnostic workloads and improving radiologist efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Radiology 医学-核医学

CiteScore

11.60

自引率

8.50%

发文量

874

审稿时长

2-4 weeks

期刊介绍： European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field. This is the Journal of the European Society of Radiology, and the official journal of a number of societies. From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.