Xichuan Zhou, Lingfeng Yan, Rui Ding, Chukwuemeka Clinton Atabansi, Jing Nie, Lihui Chen, Yujie Feng, Haijun Liu
{"title":"MIT-SAM:基于相互增强异构特征融合的医学图像-文本SAM。","authors":"Xichuan Zhou, Lingfeng Yan, Rui Ding, Chukwuemeka Clinton Atabansi, Jing Nie, Lihui Chen, Yujie Feng, Haijun Liu","doi":"10.1109/JBHI.2025.3561425","DOIUrl":null,"url":null,"abstract":"<p><p>In recent times, leveraging lesion text as supplementary data to enhance the performance of medical image segmentation models has garnered attention. Previous approaches only used attention mechanisms to integrate image and text features, while not effectively utilizing the highly condensed textual semantic information in improving the fused features, resulting in inaccurate lesion segmentation. This paper introduces a novel approach, the Medical Image-Text Segment Anything Model (MIT-SAM), for text-assisted medical image segmentation. Specifically, we introduce the SAM-enhanced image encoder and a Bert-based text encoder to extract heterogeneous features. To better leverage the highly condensed textual semantic information for heterogeneous feature fusion, such as crucial details like position and quantity, we propose the image-text interactive fusion (ITIF) block and self-supervised text reconstruction (SSTR) method. The ITIF block facilitates the mutual enhancement of homogeneous information among heterogeneous features and the SSTR method empowers the model to capture crucial details concerning lesion text, including location, quantity, and other key aspects. Experimental results demonstrate that our proposed model achieves state-of-the-art performance on the QaTa-COV19 and MosMedData+ datasets. The code of MIT-SAM is available at https://github.com/jojodan514/MIT-SAM.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MIT-SAM: Medical Image-Text SAM with Mutually Enhanced Heterogeneous Features Fusion for Medical Image Segmentation.\",\"authors\":\"Xichuan Zhou, Lingfeng Yan, Rui Ding, Chukwuemeka Clinton Atabansi, Jing Nie, Lihui Chen, Yujie Feng, Haijun Liu\",\"doi\":\"10.1109/JBHI.2025.3561425\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In recent times, leveraging lesion text as supplementary data to enhance the performance of medical image segmentation models has garnered attention. Previous approaches only used attention mechanisms to integrate image and text features, while not effectively utilizing the highly condensed textual semantic information in improving the fused features, resulting in inaccurate lesion segmentation. This paper introduces a novel approach, the Medical Image-Text Segment Anything Model (MIT-SAM), for text-assisted medical image segmentation. Specifically, we introduce the SAM-enhanced image encoder and a Bert-based text encoder to extract heterogeneous features. To better leverage the highly condensed textual semantic information for heterogeneous feature fusion, such as crucial details like position and quantity, we propose the image-text interactive fusion (ITIF) block and self-supervised text reconstruction (SSTR) method. The ITIF block facilitates the mutual enhancement of homogeneous information among heterogeneous features and the SSTR method empowers the model to capture crucial details concerning lesion text, including location, quantity, and other key aspects. Experimental results demonstrate that our proposed model achieves state-of-the-art performance on the QaTa-COV19 and MosMedData+ datasets. The code of MIT-SAM is available at https://github.com/jojodan514/MIT-SAM.</p>\",\"PeriodicalId\":13073,\"journal\":{\"name\":\"IEEE Journal of Biomedical and Health Informatics\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2025-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Biomedical and Health Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1109/JBHI.2025.3561425\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3561425","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
MIT-SAM: Medical Image-Text SAM with Mutually Enhanced Heterogeneous Features Fusion for Medical Image Segmentation.
In recent times, leveraging lesion text as supplementary data to enhance the performance of medical image segmentation models has garnered attention. Previous approaches only used attention mechanisms to integrate image and text features, while not effectively utilizing the highly condensed textual semantic information in improving the fused features, resulting in inaccurate lesion segmentation. This paper introduces a novel approach, the Medical Image-Text Segment Anything Model (MIT-SAM), for text-assisted medical image segmentation. Specifically, we introduce the SAM-enhanced image encoder and a Bert-based text encoder to extract heterogeneous features. To better leverage the highly condensed textual semantic information for heterogeneous feature fusion, such as crucial details like position and quantity, we propose the image-text interactive fusion (ITIF) block and self-supervised text reconstruction (SSTR) method. The ITIF block facilitates the mutual enhancement of homogeneous information among heterogeneous features and the SSTR method empowers the model to capture crucial details concerning lesion text, including location, quantity, and other key aspects. Experimental results demonstrate that our proposed model achieves state-of-the-art performance on the QaTa-COV19 and MosMedData+ datasets. The code of MIT-SAM is available at https://github.com/jojodan514/MIT-SAM.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.