Semantic segmentation feature fusion network based on transformer.

IF 3.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Reports Pub Date : 2025-02-19 DOI:10.1038/s41598-025-90518-x

Tianping Li, Zhaotong Cui, Hua Zhang

{"title":"Semantic segmentation feature fusion network based on transformer.","authors":"Tianping Li, Zhaotong Cui, Hua Zhang","doi":"10.1038/s41598-025-90518-x","DOIUrl":null,"url":null,"abstract":"<p><p>Convolutional neural networks have demonstrated efficacy in acquiring local features and spatial details; however, they struggle to obtain global information, which could potentially compromise the segmentation of important regions of an image. Transformer can increase the expressiveness of pixels by establishing global relationships between them. Moreover, some transformer-based self-attentive methods do not combine the advantages of convolution, which makes the model require more computational parameters. This work uses both Transformer and CNN structures to improve the relationship between image-level regions and global information to improve segmentation accuracy and performance in order to address these two issues and improve the semantic segmentation results at the same time. We first build a Feature Alignment Module (FAM) module to enhance spatial details and improve channel representations. Second, we compute the link between similar pixels using a Transformer structure, which enhances the pixel representation. Finally, we design a Pyramid Convolutional Pooling Module (PCPM) that both compresses and enriches the feature maps, as well as determines the global correlations among the pixels, to reduce the computational burden on the transformer. These three elements come together to form a transformer-based semantic segmentation feature fusion network (FFTNet). Our method yields 82.5% mIoU, according to experimental results based on the Cityscapes test dataset. Furthermore, we conducted various visualization tests using the Pascal VOC 2012 and Cityscapes datasets. The results show that our approach outperforms alternative approaches.</p>","PeriodicalId":21811,"journal":{"name":"Scientific Reports","volume":"15 1","pages":"6110"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11839911/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Reports","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41598-025-90518-x","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural networks have demonstrated efficacy in acquiring local features and spatial details; however, they struggle to obtain global information, which could potentially compromise the segmentation of important regions of an image. Transformer can increase the expressiveness of pixels by establishing global relationships between them. Moreover, some transformer-based self-attentive methods do not combine the advantages of convolution, which makes the model require more computational parameters. This work uses both Transformer and CNN structures to improve the relationship between image-level regions and global information to improve segmentation accuracy and performance in order to address these two issues and improve the semantic segmentation results at the same time. We first build a Feature Alignment Module (FAM) module to enhance spatial details and improve channel representations. Second, we compute the link between similar pixels using a Transformer structure, which enhances the pixel representation. Finally, we design a Pyramid Convolutional Pooling Module (PCPM) that both compresses and enriches the feature maps, as well as determines the global correlations among the pixels, to reduce the computational burden on the transformer. These three elements come together to form a transformer-based semantic segmentation feature fusion network (FFTNet). Our method yields 82.5% mIoU, according to experimental results based on the Cityscapes test dataset. Furthermore, we conducted various visualization tests using the Pascal VOC 2012 and Cityscapes datasets. The results show that our approach outperforms alternative approaches.

Abstract Image

查看原文本刊更多论文

基于变压器的语义分割特征融合网络。

卷积神经网络在获取局部特征和空间细节方面已经被证明是有效的；然而，它们很难获得全局信息，这可能会危及图像重要区域的分割。Transformer可以通过在像素之间建立全局关系来增强像素的表现力。此外，一些基于变压器的自关注方法没有结合卷积的优点，使得模型需要更多的计算参数。本工作同时使用Transformer和CNN结构来改善图像级区域与全局信息之间的关系，提高分割精度和性能，以解决这两个问题，同时改善语义分割结果。我们首先构建了一个特征对齐模块（FAM）模块来增强空间细节和改进信道表示。其次，我们使用Transformer结构计算相似像素之间的链接，这增强了像素表示。最后，我们设计了一个金字塔卷积池化模块（PCPM）来压缩和丰富特征映射，并确定像素之间的全局相关性，以减少变压器的计算负担。这三个要素共同构成了基于变换的语义分割特征融合网络（FFTNet）。根据基于cityscape测试数据集的实验结果，我们的方法获得了82.5%的mIoU。此外，我们使用Pascal VOC 2012和cityscape数据集进行了各种可视化测试。结果表明，我们的方法优于其他方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific Reports Natural Science Disciplines-

CiteScore

7.50

自引率

4.30%

发文量

19567

审稿时长

3.9 months

期刊介绍： We publish original research from all areas of the natural sciences, psychology, medicine and engineering. You can learn more about what we publish by browsing our specific scientific subject areas below or explore Scientific Reports by browsing all articles and collections. Scientific Reports has a 2-year impact factor: 4.380 (2021), and is the 6th most-cited journal in the world, with more than 540,000 citations in 2020 (Clarivate Analytics, 2021). •Engineering Engineering covers all aspects of engineering, technology, and applied science. It plays a crucial role in the development of technologies to address some of the world''s biggest challenges, helping to save lives and improve the way we live. •Physical sciences Physical sciences are those academic disciplines that aim to uncover the underlying laws of nature — often written in the language of mathematics. It is a collective term for areas of study including astronomy, chemistry, materials science and physics. •Earth and environmental sciences Earth and environmental sciences cover all aspects of Earth and planetary science and broadly encompass solid Earth processes, surface and atmospheric dynamics, Earth system history, climate and climate change, marine and freshwater systems, and ecology. It also considers the interactions between humans and these systems. •Biological sciences Biological sciences encompass all the divisions of natural sciences examining various aspects of vital processes. The concept includes anatomy, physiology, cell biology, biochemistry and biophysics, and covers all organisms from microorganisms, animals to plants. •Health sciences The health sciences study health, disease and healthcare. This field of study aims to develop knowledge, interventions and technology for use in healthcare to improve the treatment of patients.