trimnet：用于遥感道路提取的Trinityformer-Mamba融合

IF 4.1 3区地球科学 Q2 ENVIRONMENTAL SCIENCES

Egyptian Journal of Remote Sensing and Space Sciences Pub Date : 2025-08-05 DOI:10.1016/j.ejrs.2025.07.006

Zhenzhong Huang , Hongjuan Shao , Chao Ren , Hongman Li , Haoming Bai , Zhou Lei , Gu Yao , Qinyi Chen

{"title":"trimnet：用于遥感道路提取的Trinityformer-Mamba融合","authors":"Zhenzhong Huang , Hongjuan Shao , Chao Ren , Hongman Li , Haoming Bai , Zhou Lei , Gu Yao , Qinyi Chen","doi":"10.1016/j.ejrs.2025.07.006","DOIUrl":null,"url":null,"abstract":"<div><div>Precise road information extraction is crucial for transportation and intelligent sensing. Recently, the fusion of CNN and Transformer architectures in remote sensing-based road extraction, along with U-shaped semantic segmentation networks, has gained significant attention. However, existing methods rely heavily on global features while overlooking local details, limiting accuracy in complex road scenarios. To address this, we propose Trinityformer-Mamba Network (TriM-Net) to enhance local feature extraction. TriM-Net adopts Trinityformer, a modified Transformer architecture. This architecture optimizes local feature perception and reduces computational overhead by replacing the traditional softmax with an improved self-attention mechanism and a novel normalization method. The feedforward network employs a Kolmogorov-Arnold network (KAN), reducing neuron count while enhancing local detail capture using edge activation functions and the Arnold transform. Additionally, the normalization layer integrates the benefits of BatchNorm and LayerNorm for better performance. Furthermore, TriM-Net incorporates an MT_block built with stacked Mamba networks. By leveraging their internal CausalConv1D and SSM modules, this block enhances modeling and local perception while effectively merging Transformer and CNN information for improved image reconstruction. Experimental results demonstrate TriM-Net’s significant superiority over existing state-of-the-art models. On the LSRV dataset, it outperformed the second-best model with advantages of 2.17% in Precision, 0.34% in Recall, 1.72% in IoU, and 2.09% in F1-score. Similarly, on the Massachusetts Road Dataset, it achieved superior Recall (0.45%), IoU (1.41%), and F1-score (1.07%) over its closest competitor. These substantial improvements highlight TriM-Net’s outstanding performance in road information extraction.</div></div>","PeriodicalId":48539,"journal":{"name":"Egyptian Journal of Remote Sensing and Space Sciences","volume":"28 3","pages":"Pages 523-533"},"PeriodicalIF":4.1000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TriM-Net: Trinityformer-Mamba fusion for road extraction in remote sensing\",\"authors\":\"Zhenzhong Huang , Hongjuan Shao , Chao Ren , Hongman Li , Haoming Bai , Zhou Lei , Gu Yao , Qinyi Chen\",\"doi\":\"10.1016/j.ejrs.2025.07.006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Precise road information extraction is crucial for transportation and intelligent sensing. Recently, the fusion of CNN and Transformer architectures in remote sensing-based road extraction, along with U-shaped semantic segmentation networks, has gained significant attention. However, existing methods rely heavily on global features while overlooking local details, limiting accuracy in complex road scenarios. To address this, we propose Trinityformer-Mamba Network (TriM-Net) to enhance local feature extraction. TriM-Net adopts Trinityformer, a modified Transformer architecture. This architecture optimizes local feature perception and reduces computational overhead by replacing the traditional softmax with an improved self-attention mechanism and a novel normalization method. The feedforward network employs a Kolmogorov-Arnold network (KAN), reducing neuron count while enhancing local detail capture using edge activation functions and the Arnold transform. Additionally, the normalization layer integrates the benefits of BatchNorm and LayerNorm for better performance. Furthermore, TriM-Net incorporates an MT_block built with stacked Mamba networks. By leveraging their internal CausalConv1D and SSM modules, this block enhances modeling and local perception while effectively merging Transformer and CNN information for improved image reconstruction. Experimental results demonstrate TriM-Net’s significant superiority over existing state-of-the-art models. On the LSRV dataset, it outperformed the second-best model with advantages of 2.17% in Precision, 0.34% in Recall, 1.72% in IoU, and 2.09% in F1-score. Similarly, on the Massachusetts Road Dataset, it achieved superior Recall (0.45%), IoU (1.41%), and F1-score (1.07%) over its closest competitor. These substantial improvements highlight TriM-Net’s outstanding performance in road information extraction.</div></div>\",\"PeriodicalId\":48539,\"journal\":{\"name\":\"Egyptian Journal of Remote Sensing and Space Sciences\",\"volume\":\"28 3\",\"pages\":\"Pages 523-533\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Egyptian Journal of Remote Sensing and Space Sciences\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1110982325000456\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Journal of Remote Sensing and Space Sciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110982325000456","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

精确的道路信息提取对交通运输和智能传感至关重要。近年来，CNN和Transformer架构的融合以及u型语义分割网络在基于遥感的道路提取中得到了广泛关注。然而，现有的方法严重依赖全局特征，而忽略了局部细节，限制了复杂道路场景的准确性。为了解决这个问题，我们提出了Trinityformer-Mamba Network （TriM-Net）来增强局部特征提取。TriM-Net采用了一种改进的Transformer架构Trinityformer。该体系结构通过改进的自关注机制和新的归一化方法取代传统的softmax，优化了局部特征感知，减少了计算开销。前馈网络采用了Kolmogorov-Arnold网络（KAN），减少了神经元数量，同时利用边缘激活函数和Arnold变换增强了局部细节捕获。此外，规范化层集成了BatchNorm和LayerNorm的优点，以获得更好的性能。此外，TriM-Net还集成了一个MT_block，该MT_block由堆叠的Mamba网络构建。通过利用其内部的CausalConv1D和SSM模块，该块增强了建模和局部感知，同时有效地合并Transformer和CNN信息，以改进图像重建。实验结果表明，TriM-Net比现有的最先进模型具有显著的优势。在LSRV数据集上，它以2.17%的精度、0.34%的召回率、1.72%的IoU和2.09%的F1-score优势优于次优模型。同样，在马萨诸塞州道路数据集上，它比最接近的竞争对手取得了更高的召回率（0.45%）、IoU（1.41%）和f1分数（1.07%）。这些实质性的改进凸显了TriM-Net在道路信息提取方面的卓越性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TriM-Net: Trinityformer-Mamba fusion for road extraction in remote sensing

Precise road information extraction is crucial for transportation and intelligent sensing. Recently, the fusion of CNN and Transformer architectures in remote sensing-based road extraction, along with U-shaped semantic segmentation networks, has gained significant attention. However, existing methods rely heavily on global features while overlooking local details, limiting accuracy in complex road scenarios. To address this, we propose Trinityformer-Mamba Network (TriM-Net) to enhance local feature extraction. TriM-Net adopts Trinityformer, a modified Transformer architecture. This architecture optimizes local feature perception and reduces computational overhead by replacing the traditional softmax with an improved self-attention mechanism and a novel normalization method. The feedforward network employs a Kolmogorov-Arnold network (KAN), reducing neuron count while enhancing local detail capture using edge activation functions and the Arnold transform. Additionally, the normalization layer integrates the benefits of BatchNorm and LayerNorm for better performance. Furthermore, TriM-Net incorporates an MT_block built with stacked Mamba networks. By leveraging their internal CausalConv1D and SSM modules, this block enhances modeling and local perception while effectively merging Transformer and CNN information for improved image reconstruction. Experimental results demonstrate TriM-Net’s significant superiority over existing state-of-the-art models. On the LSRV dataset, it outperformed the second-best model with advantages of 2.17% in Precision, 0.34% in Recall, 1.72% in IoU, and 2.09% in F1-score. Similarly, on the Massachusetts Road Dataset, it achieved superior Recall (0.45%), IoU (1.41%), and F1-score (1.07%) over its closest competitor. These substantial improvements highlight TriM-Net’s outstanding performance in road information extraction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Egyptian Journal of Remote Sensing and Space Sciences Multiple-

CiteScore

8.10

自引率

0.00%

发文量

审稿时长

48 weeks

期刊介绍： The Egyptian Journal of Remote Sensing and Space Sciences (EJRS) encompasses a comprehensive range of topics within Remote Sensing, Geographic Information Systems (GIS), planetary geology, and space technology development, including theories, applications, and modeling. EJRS aims to disseminate high-quality, peer-reviewed research focusing on the advancement of remote sensing and GIS technologies and their practical applications for effective planning, sustainable development, and environmental resource conservation. The journal particularly welcomes innovative papers with broad scientific appeal.