基于文本引导聚合结构的双边网络肺感染图像分割。

IF 1.3 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Biomedical Physics & Engineering Express Pub Date : 2025-03-06 DOI:10.1088/2057-1976/adb290

Xiang Pan, Hanxiao Mei, Jianwei Zheng, Herong Zheng

{"title":"基于文本引导聚合结构的双边网络肺感染图像分割。","authors":"Xiang Pan, Hanxiao Mei, Jianwei Zheng, Herong Zheng","doi":"10.1088/2057-1976/adb290","DOIUrl":null,"url":null,"abstract":"Objective.Lung image segmentation is a crucial problem for autonomous understanding of the potential illness. However, existing approaches lead to a considerable decrease in accuracy for lung infection areas with varied shapes and sizes. Recently, researchers aimed to improve segmentation accuracy by combining diagnostic reports based on text prompts and image vision information. However, limited by the network structure, these methods are inefficient and ineffective.Method.To address this issue, this paper proposes a Bilateral Network with Text Guided Aggregation Architecture (BNTGAA) to fully fuse local and global information for text and image vision. This proposed architecture involves (i) a global fusion branch with a Hadamard product to align text and vision feature representation and (ii) a multi-scale cross-fusion branch with positional coding and skip connection, performing text-guided segmentation in different resolutions. (iii) The global fusion and multi-scale cross-fusion branches are combined to feed a mamba module for efficient segmentation.Results.Extensive quantitative and qualitative evaluations demonstrate that the proposed architecture performs better both in accuracy and efficiency. Our architecture outperforms the current best methods on the QaTa-COVID19 dataset, improving mIoU and Dice scores by 3.08% and 2.35%, respectively. Meanwhile, our architecture surpasses the computational speed of existing multimodal networks. Finally, the architecture has a quick convergence and generality. It can exceed the performance of the current best methods even if it is trained with only 50% of the dataset.Conclusion.With the backbone Mamba, the proposed fusion architecture, which performs text-guided aggregation under different scales, can greatly improve segmentation performance both in accuracy and efficiency. Codes are available at https://github.com/Meihanxiao/BNTGAA.","PeriodicalId":8896,"journal":{"name":"Biomedical Physics & Engineering Express","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bilateral network with text guided aggregation architecture for lung infection image segmentation.\",\"authors\":\"Xiang Pan, Hanxiao Mei, Jianwei Zheng, Herong Zheng\",\"doi\":\"10.1088/2057-1976/adb290\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective.Lung image segmentation is a crucial problem for autonomous understanding of the potential illness. However, existing approaches lead to a considerable decrease in accuracy for lung infection areas with varied shapes and sizes. Recently, researchers aimed to improve segmentation accuracy by combining diagnostic reports based on text prompts and image vision information. However, limited by the network structure, these methods are inefficient and ineffective.Method.To address this issue, this paper proposes a Bilateral Network with Text Guided Aggregation Architecture (BNTGAA) to fully fuse local and global information for text and image vision. This proposed architecture involves (i) a global fusion branch with a Hadamard product to align text and vision feature representation and (ii) a multi-scale cross-fusion branch with positional coding and skip connection, performing text-guided segmentation in different resolutions. (iii) The global fusion and multi-scale cross-fusion branches are combined to feed a mamba module for efficient segmentation.Results.Extensive quantitative and qualitative evaluations demonstrate that the proposed architecture performs better both in accuracy and efficiency. Our architecture outperforms the current best methods on the QaTa-COVID19 dataset, improving mIoU and Dice scores by 3.08% and 2.35%, respectively. Meanwhile, our architecture surpasses the computational speed of existing multimodal networks. Finally, the architecture has a quick convergence and generality. It can exceed the performance of the current best methods even if it is trained with only 50% of the dataset.Conclusion.With the backbone Mamba, the proposed fusion architecture, which performs text-guided aggregation under different scales, can greatly improve segmentation performance both in accuracy and efficiency. Codes are available at https://github.com/Meihanxiao/BNTGAA.\",\"PeriodicalId\":8896,\"journal\":{\"name\":\"Biomedical Physics & Engineering Express\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2025-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Physics & Engineering Express\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1088/2057-1976/adb290\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Physics & Engineering Express","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2057-1976/adb290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

肺图像分割是自主理解潜在疾病的关键问题。然而，现有的方法导致对不同形状和大小的肺部感染区域的准确性大大降低。最近，研究人员试图通过结合基于文本提示和图像视觉信息的诊断报告来提高分割的准确性。然而，受网络结构的限制，这些方法效率低下，效果不佳。为了解决这一问题，本文提出了一种具有文本引导聚合架构（BNTGAA）的双边网络，以充分融合文本和图像视觉的本地和全局信息。该架构包括(i)一个具有Hadamard积的全局融合分支，用于对齐文本和视觉特征表示；（ii）一个具有位置编码和跳过连接的多尺度交叉融合分支，用于在不同分辨率下执行文本引导分割。(3)将全局融合和多尺度交叉融合分支相结合，形成曼巴模块，实现高效分割。广泛的定量和定性评估表明，所提出的体系结构在准确性和效率方面都表现得更好。我们的架构优于目前在qada - covid数据集上的最佳方法，将mIoU和Dice分数分别提高了3.08%和2.35%。同时，我们的架构超越了现有多模态网络的计算速度。最后，该体系结构具有较快的收敛性和通用性。即使只使用50%的数据集进行训练，它也可以超过当前最佳方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bilateral network with text guided aggregation architecture for lung infection image segmentation.

Objective.Lung image segmentation is a crucial problem for autonomous understanding of the potential illness. However, existing approaches lead to a considerable decrease in accuracy for lung infection areas with varied shapes and sizes. Recently, researchers aimed to improve segmentation accuracy by combining diagnostic reports based on text prompts and image vision information. However, limited by the network structure, these methods are inefficient and ineffective.Method.To address this issue, this paper proposes a Bilateral Network with Text Guided Aggregation Architecture (BNTGAA) to fully fuse local and global information for text and image vision. This proposed architecture involves (i) a global fusion branch with a Hadamard product to align text and vision feature representation and (ii) a multi-scale cross-fusion branch with positional coding and skip connection, performing text-guided segmentation in different resolutions. (iii) The global fusion and multi-scale cross-fusion branches are combined to feed a mamba module for efficient segmentation.Results.Extensive quantitative and qualitative evaluations demonstrate that the proposed architecture performs better both in accuracy and efficiency. Our architecture outperforms the current best methods on the QaTa-COVID19 dataset, improving mIoU and Dice scores by 3.08% and 2.35%, respectively. Meanwhile, our architecture surpasses the computational speed of existing multimodal networks. Finally, the architecture has a quick convergence and generality. It can exceed the performance of the current best methods even if it is trained with only 50% of the dataset.Conclusion.With the backbone Mamba, the proposed fusion architecture, which performs text-guided aggregation under different scales, can greatly improve segmentation performance both in accuracy and efficiency. Codes are available at https://github.com/Meihanxiao/BNTGAA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical Physics & Engineering Express RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

2.80

自引率

0.00%

发文量

153

期刊介绍： BPEX is an inclusive, international, multidisciplinary journal devoted to publishing new research on any application of physics and/or engineering in medicine and/or biology. Characterized by a broad geographical coverage and a fast-track peer-review process, relevant topics include all aspects of biophysics, medical physics and biomedical engineering. Papers that are almost entirely clinical or biological in their focus are not suitable. The journal has an emphasis on publishing interdisciplinary work and bringing research fields together, encompassing experimental, theoretical and computational work.