Mingzhu Xu,Sen Wang,Yupeng Hu,Haoyu Tang,Runmin Cong,Liqiang Nie
{"title":"Cross-Model Nested Fusion Network for Salient Object Detection in Optical Remote Sensing Images.","authors":"Mingzhu Xu,Sen Wang,Yupeng Hu,Haoyu Tang,Runmin Cong,Liqiang Nie","doi":"10.1109/tcyb.2025.3571913","DOIUrl":null,"url":null,"abstract":"Recently, salient object detection (SOD) in optical remote sensing images, dubbed ORSI-SOD, has attracted increasing research interest. Although deep-based models have achieved impressive performance, several limitations remain: a single image contains multiple objects with varying scales, complex topological structures, and background interference. These unresolved issues render ORSI-SOD a challenging task. To address these challenges, we introduce a distinctive cross-model nested fusion network (CMNFNet), which leverages heterogeneous features to increase the performance of ORSI-SOD. Specifically, the proposed model comprises two heterogeneous encoders, a conventional CNN-based encoder that can model local features, and a specially designed graph convolutional network (GCN)-based encoder with local and global receptive fields that can model local and global features simultaneously. To effectively differentiate between multiple salient objects of different sizes or complex topological structures within an image, we project the image into two different graphs with different receptive fields and conduct message passing through two parallel graph convolutions. Finally, the heterogeneous features extracted from the two encoders are fused in the well-designed attention enhanced cross model nested fusion module (AECMNFM). This module is meticulously crafted to integrate features progressively, allowing the model to adaptively eliminate background interference while simultaneously refining the feature representations. We conducted comprehensive experimental analyzes on benchmark datasets. The results demonstrate the superiority of our CMNFNet over 16 state-of-the-art (SOTA) models.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"66 1","pages":""},"PeriodicalIF":10.5000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tcyb.2025.3571913","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, salient object detection (SOD) in optical remote sensing images, dubbed ORSI-SOD, has attracted increasing research interest. Although deep-based models have achieved impressive performance, several limitations remain: a single image contains multiple objects with varying scales, complex topological structures, and background interference. These unresolved issues render ORSI-SOD a challenging task. To address these challenges, we introduce a distinctive cross-model nested fusion network (CMNFNet), which leverages heterogeneous features to increase the performance of ORSI-SOD. Specifically, the proposed model comprises two heterogeneous encoders, a conventional CNN-based encoder that can model local features, and a specially designed graph convolutional network (GCN)-based encoder with local and global receptive fields that can model local and global features simultaneously. To effectively differentiate between multiple salient objects of different sizes or complex topological structures within an image, we project the image into two different graphs with different receptive fields and conduct message passing through two parallel graph convolutions. Finally, the heterogeneous features extracted from the two encoders are fused in the well-designed attention enhanced cross model nested fusion module (AECMNFM). This module is meticulously crafted to integrate features progressively, allowing the model to adaptively eliminate background interference while simultaneously refining the feature representations. We conducted comprehensive experimental analyzes on benchmark datasets. The results demonstrate the superiority of our CMNFNet over 16 state-of-the-art (SOTA) models.
期刊介绍:
The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.