Chuan Wang, Mingfeng Jiang, Yang Li, Bo Wei, Yongming Li, Pin Wang, Guang Yang
{"title":"MP-FocalUNet:用于医学图像分割的多尺度并行焦点自注意 U-Net","authors":"Chuan Wang, Mingfeng Jiang, Yang Li, Bo Wei, Yongming Li, Pin Wang, Guang Yang","doi":"10.1016/j.cmpb.2024.108562","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objective: </strong>Medical image segmentation has been significantly improved in recent years with the progress of Convolutional Neural Networks (CNNs). Due to the inherent limitations of convolutional operations, CNNs perform poorly in learning the correlation information between global and long-range features. To solve this problem, some existing solutions rely on building deep encoders and down-sampling operations, but such methods are prone to produce redundant network structures and lose local details. Therefore, medical image segmentation tasks require better solutions to improve the modeling of the global context, while maintaining a strong grasp of the low-level details.</p><p><strong>Methods: </strong>We propose a novel multiscale parallel branch architecture (MP-FocalUNet). On the encoder side of MP-FocalUNet, dual-scale sub-networks are used to extract information of different scales. A cross-scale \"Feature Fusion\" (FF) module was proposed to explore the potential of dual branch networks and fully utilize feature representations at different scales. On the decoder side, combined with the traditional CNN in parallel, focal self-attention is used for long-distance modeling, which can effectively capture the global dependencies and underlying spatial details in a shallower way.</p><p><strong>Results: </strong>Our proposed method is evaluated on both abdominal organ segmentation datasets and automatic cardiac diagnosis challenge datasets. Our method consistently outperforms several state-of-the-art segmentation methods with an average Dice score of 82.45 % (2.68 % higher than HC-Net) and 91.44 % (0.35 % higher than HC-Net) on the abdominal organ datasets and the automatic cardiac diagnosis challenge datasets, respectively.</p><p><strong>Conclusions: </strong>Our MP-FocalUNet is a novel encoder-decoder based multiscale parallel branch Transformer network, which solves the problem of insufficient long-distance modeling in CNNs and fuses image information at different scales. Extensive experiments on abdominal and cardiac medical image segmentation tasks show that our MP-FocalUNet outperforms other state-of-the-art methods. In the future, our work will focus on designing more lightweight Transformer-based models and better learning pixel-level intrinsic structural features generated by patch division in visual Transformers.</p>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"260 ","pages":"108562"},"PeriodicalIF":4.9000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MP-FocalUNet: Multiscale parallel focal self-attention U-Net for medical image segmentation.\",\"authors\":\"Chuan Wang, Mingfeng Jiang, Yang Li, Bo Wei, Yongming Li, Pin Wang, Guang Yang\",\"doi\":\"10.1016/j.cmpb.2024.108562\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and objective: </strong>Medical image segmentation has been significantly improved in recent years with the progress of Convolutional Neural Networks (CNNs). Due to the inherent limitations of convolutional operations, CNNs perform poorly in learning the correlation information between global and long-range features. To solve this problem, some existing solutions rely on building deep encoders and down-sampling operations, but such methods are prone to produce redundant network structures and lose local details. Therefore, medical image segmentation tasks require better solutions to improve the modeling of the global context, while maintaining a strong grasp of the low-level details.</p><p><strong>Methods: </strong>We propose a novel multiscale parallel branch architecture (MP-FocalUNet). On the encoder side of MP-FocalUNet, dual-scale sub-networks are used to extract information of different scales. A cross-scale \\\"Feature Fusion\\\" (FF) module was proposed to explore the potential of dual branch networks and fully utilize feature representations at different scales. On the decoder side, combined with the traditional CNN in parallel, focal self-attention is used for long-distance modeling, which can effectively capture the global dependencies and underlying spatial details in a shallower way.</p><p><strong>Results: </strong>Our proposed method is evaluated on both abdominal organ segmentation datasets and automatic cardiac diagnosis challenge datasets. Our method consistently outperforms several state-of-the-art segmentation methods with an average Dice score of 82.45 % (2.68 % higher than HC-Net) and 91.44 % (0.35 % higher than HC-Net) on the abdominal organ datasets and the automatic cardiac diagnosis challenge datasets, respectively.</p><p><strong>Conclusions: </strong>Our MP-FocalUNet is a novel encoder-decoder based multiscale parallel branch Transformer network, which solves the problem of insufficient long-distance modeling in CNNs and fuses image information at different scales. Extensive experiments on abdominal and cardiac medical image segmentation tasks show that our MP-FocalUNet outperforms other state-of-the-art methods. In the future, our work will focus on designing more lightweight Transformer-based models and better learning pixel-level intrinsic structural features generated by patch division in visual Transformers.</p>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"260 \",\"pages\":\"108562\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2024-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1016/j.cmpb.2024.108562\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.cmpb.2024.108562","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
MP-FocalUNet: Multiscale parallel focal self-attention U-Net for medical image segmentation.
Background and objective: Medical image segmentation has been significantly improved in recent years with the progress of Convolutional Neural Networks (CNNs). Due to the inherent limitations of convolutional operations, CNNs perform poorly in learning the correlation information between global and long-range features. To solve this problem, some existing solutions rely on building deep encoders and down-sampling operations, but such methods are prone to produce redundant network structures and lose local details. Therefore, medical image segmentation tasks require better solutions to improve the modeling of the global context, while maintaining a strong grasp of the low-level details.
Methods: We propose a novel multiscale parallel branch architecture (MP-FocalUNet). On the encoder side of MP-FocalUNet, dual-scale sub-networks are used to extract information of different scales. A cross-scale "Feature Fusion" (FF) module was proposed to explore the potential of dual branch networks and fully utilize feature representations at different scales. On the decoder side, combined with the traditional CNN in parallel, focal self-attention is used for long-distance modeling, which can effectively capture the global dependencies and underlying spatial details in a shallower way.
Results: Our proposed method is evaluated on both abdominal organ segmentation datasets and automatic cardiac diagnosis challenge datasets. Our method consistently outperforms several state-of-the-art segmentation methods with an average Dice score of 82.45 % (2.68 % higher than HC-Net) and 91.44 % (0.35 % higher than HC-Net) on the abdominal organ datasets and the automatic cardiac diagnosis challenge datasets, respectively.
Conclusions: Our MP-FocalUNet is a novel encoder-decoder based multiscale parallel branch Transformer network, which solves the problem of insufficient long-distance modeling in CNNs and fuses image information at different scales. Extensive experiments on abdominal and cardiac medical image segmentation tasks show that our MP-FocalUNet outperforms other state-of-the-art methods. In the future, our work will focus on designing more lightweight Transformer-based models and better learning pixel-level intrinsic structural features generated by patch division in visual Transformers.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.