{"title":"从图像到报告:使用视觉语言模型自动化肺癌筛查解释和报告。","authors":"Tien-Yu Chang, Qinglin Gou, Leyi Zhao, Tiancheng Zhou, Hongyu Chen, Dong Yang, Huiwen Ju, Kaleb E Smith, Chengkun Sun, Jinqian Pan, Yu Huang, Xing He, Xuhong Zhang, Daguang Xu, Jie Xu, Jiang Bian, Aokun Chen","doi":"10.1016/j.jbi.2025.104931","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Lung cancer is the most prevalent cancer and the leading cause of cancer-related death in the United States. Lung cancer screening with low-dose computed tomography (LDCT) helps identify lung cancer at an early stage and thus improves overall survival. The growing adoption of LDCT screening has increased radiologists' workload and demands specialized training to accurately interpret LDCT images and report findings. Advances in artificial intelligence (AI), including large language models (LLMs) and vision models, could help reduce this burden and improve accuracy.</p><p><strong>Methods: </strong>We devised LUMEN (Lung cancer screening with Unified Multimodal Evaluation and Navigation), a multimodal AI framework that mimics the radiologist's workflow by identifying nodules in LDCT images, generating their characteristics, and drafting corresponding radiology reports in accordance with reporting guidelines. LUMEN integrates computer vision, vision-language models (VLMs), and LLMs. To assess our system, we developed a benchmarking framework to evaluate the lung cancer screening reports generated based on the findings and management criteria outlined in the Lung Imaging Reporting and Data System (Lung-RADS). It extracts them from radiology reports and measures clinical accuracy-focusing on information that is clinically important for lung cancer screening-independently of report format.</p><p><strong>Results: </strong>This complement exists LLM/VLM in semantic accuracy metrics and provides a more comprehensive view of system performance. Our lung cancer screening report generation system achieved unparalleled performance compared to contemporary VLM systems, including M3D, CT2Report and MedM3DVLM. Furthermore, compared to standard LLM metrics, the clinical metrics we designed for lung cancer screening more accurately reflect the clinical utility of the generated reports.</p><p><strong>Conclusion: </strong>LUMEN demonstrates the feasibility of generating clinically accurate lung nodule reports from LDCT images through a nodule-centric VQA approach, highlighting the potential of integrating VLMs and LLMs to support radiologists in lung cancer screening workflows. Our findings also underscore the importance of applying clinically meaningful evaluation metrics in developing medical AI systems.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104931"},"PeriodicalIF":4.5000,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From image to report: automating lung cancer screening interpretation and reporting with vision-language models.\",\"authors\":\"Tien-Yu Chang, Qinglin Gou, Leyi Zhao, Tiancheng Zhou, Hongyu Chen, Dong Yang, Huiwen Ju, Kaleb E Smith, Chengkun Sun, Jinqian Pan, Yu Huang, Xing He, Xuhong Zhang, Daguang Xu, Jie Xu, Jiang Bian, Aokun Chen\",\"doi\":\"10.1016/j.jbi.2025.104931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>Lung cancer is the most prevalent cancer and the leading cause of cancer-related death in the United States. Lung cancer screening with low-dose computed tomography (LDCT) helps identify lung cancer at an early stage and thus improves overall survival. The growing adoption of LDCT screening has increased radiologists' workload and demands specialized training to accurately interpret LDCT images and report findings. Advances in artificial intelligence (AI), including large language models (LLMs) and vision models, could help reduce this burden and improve accuracy.</p><p><strong>Methods: </strong>We devised LUMEN (Lung cancer screening with Unified Multimodal Evaluation and Navigation), a multimodal AI framework that mimics the radiologist's workflow by identifying nodules in LDCT images, generating their characteristics, and drafting corresponding radiology reports in accordance with reporting guidelines. LUMEN integrates computer vision, vision-language models (VLMs), and LLMs. To assess our system, we developed a benchmarking framework to evaluate the lung cancer screening reports generated based on the findings and management criteria outlined in the Lung Imaging Reporting and Data System (Lung-RADS). It extracts them from radiology reports and measures clinical accuracy-focusing on information that is clinically important for lung cancer screening-independently of report format.</p><p><strong>Results: </strong>This complement exists LLM/VLM in semantic accuracy metrics and provides a more comprehensive view of system performance. Our lung cancer screening report generation system achieved unparalleled performance compared to contemporary VLM systems, including M3D, CT2Report and MedM3DVLM. Furthermore, compared to standard LLM metrics, the clinical metrics we designed for lung cancer screening more accurately reflect the clinical utility of the generated reports.</p><p><strong>Conclusion: </strong>LUMEN demonstrates the feasibility of generating clinically accurate lung nodule reports from LDCT images through a nodule-centric VQA approach, highlighting the potential of integrating VLMs and LLMs to support radiologists in lung cancer screening workflows. Our findings also underscore the importance of applying clinically meaningful evaluation metrics in developing medical AI systems.</p>\",\"PeriodicalId\":15263,\"journal\":{\"name\":\"Journal of Biomedical Informatics\",\"volume\":\" \",\"pages\":\"104931\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jbi.2025.104931\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jbi.2025.104931","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
From image to report: automating lung cancer screening interpretation and reporting with vision-language models.
Objective: Lung cancer is the most prevalent cancer and the leading cause of cancer-related death in the United States. Lung cancer screening with low-dose computed tomography (LDCT) helps identify lung cancer at an early stage and thus improves overall survival. The growing adoption of LDCT screening has increased radiologists' workload and demands specialized training to accurately interpret LDCT images and report findings. Advances in artificial intelligence (AI), including large language models (LLMs) and vision models, could help reduce this burden and improve accuracy.
Methods: We devised LUMEN (Lung cancer screening with Unified Multimodal Evaluation and Navigation), a multimodal AI framework that mimics the radiologist's workflow by identifying nodules in LDCT images, generating their characteristics, and drafting corresponding radiology reports in accordance with reporting guidelines. LUMEN integrates computer vision, vision-language models (VLMs), and LLMs. To assess our system, we developed a benchmarking framework to evaluate the lung cancer screening reports generated based on the findings and management criteria outlined in the Lung Imaging Reporting and Data System (Lung-RADS). It extracts them from radiology reports and measures clinical accuracy-focusing on information that is clinically important for lung cancer screening-independently of report format.
Results: This complement exists LLM/VLM in semantic accuracy metrics and provides a more comprehensive view of system performance. Our lung cancer screening report generation system achieved unparalleled performance compared to contemporary VLM systems, including M3D, CT2Report and MedM3DVLM. Furthermore, compared to standard LLM metrics, the clinical metrics we designed for lung cancer screening more accurately reflect the clinical utility of the generated reports.
Conclusion: LUMEN demonstrates the feasibility of generating clinically accurate lung nodule reports from LDCT images through a nodule-centric VQA approach, highlighting the potential of integrating VLMs and LLMs to support radiologists in lung cancer screening workflows. Our findings also underscore the importance of applying clinically meaningful evaluation metrics in developing medical AI systems.
期刊介绍:
The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.