Rahim Khan , Nada Alzaben , Yousef Ibrahim Daradkeh , Xianxun Zhu , Inam Ullah
{"title":"基于逐级多阶段迭代特征细化的金字塔关注显著目标分割","authors":"Rahim Khan , Nada Alzaben , Yousef Ibrahim Daradkeh , Xianxun Zhu , Inam Ullah","doi":"10.1016/j.imavis.2025.105670","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate detection of salient objects in complex visual scenes remains a fundamental yet challenging task in visual intelligence, often impeded by significant scale variation, background clutter, and indistinct object boundaries. While recent approaches attempt to exploit multi-level features, they frequently encounter limitations such as semantic misalignment across feature hierarchies, spatial detail degradation, and weak cross-dataset generalization. To overcome these challenges, we propose a novel Pyramidal Attention Mechanism (PAM) with Progressive Multi-stage Iterative Feature Refinement Network (PIFRNet) designed for robust and precise Salient Object Detection (SOD). Specifically, our method begins by hierarchically aggregating features from four representative stages of a powerful backbone, ensuring rich multi-scale context and semantic diversity. To bridge semantic gaps and recover fine structures, we introduce a Progressive Bilateral Feature Refinement (PBFR) module, which enhances early-stage features through cascaded convolutions and spatial attention. Furthermore, the novel PAM, equipped with dilated convolutions, is introduced to refine high-level semantics and reinforce object completeness. The network integrates these components through a multi-stage iterative refinement process, enabling gradual enhancement of spatial precision and structural fidelity. Extensive experiments conducted on five public SOD benchmarks demonstrate that our approach achieves superior performance compared to state-of-the-art methods, both quantitatively and qualitatively. Cross-dataset evaluations further validate its strong generalization capability, making it highly applicable to real-world visual intelligence scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105670"},"PeriodicalIF":4.2000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pyramidal attention with progressive multi-stage iterative feature refinement for salient object segmentation\",\"authors\":\"Rahim Khan , Nada Alzaben , Yousef Ibrahim Daradkeh , Xianxun Zhu , Inam Ullah\",\"doi\":\"10.1016/j.imavis.2025.105670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate detection of salient objects in complex visual scenes remains a fundamental yet challenging task in visual intelligence, often impeded by significant scale variation, background clutter, and indistinct object boundaries. While recent approaches attempt to exploit multi-level features, they frequently encounter limitations such as semantic misalignment across feature hierarchies, spatial detail degradation, and weak cross-dataset generalization. To overcome these challenges, we propose a novel Pyramidal Attention Mechanism (PAM) with Progressive Multi-stage Iterative Feature Refinement Network (PIFRNet) designed for robust and precise Salient Object Detection (SOD). Specifically, our method begins by hierarchically aggregating features from four representative stages of a powerful backbone, ensuring rich multi-scale context and semantic diversity. To bridge semantic gaps and recover fine structures, we introduce a Progressive Bilateral Feature Refinement (PBFR) module, which enhances early-stage features through cascaded convolutions and spatial attention. Furthermore, the novel PAM, equipped with dilated convolutions, is introduced to refine high-level semantics and reinforce object completeness. The network integrates these components through a multi-stage iterative refinement process, enabling gradual enhancement of spatial precision and structural fidelity. Extensive experiments conducted on five public SOD benchmarks demonstrate that our approach achieves superior performance compared to state-of-the-art methods, both quantitatively and qualitatively. Cross-dataset evaluations further validate its strong generalization capability, making it highly applicable to real-world visual intelligence scenarios.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"162 \",\"pages\":\"Article 105670\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625002586\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002586","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Pyramidal attention with progressive multi-stage iterative feature refinement for salient object segmentation
Accurate detection of salient objects in complex visual scenes remains a fundamental yet challenging task in visual intelligence, often impeded by significant scale variation, background clutter, and indistinct object boundaries. While recent approaches attempt to exploit multi-level features, they frequently encounter limitations such as semantic misalignment across feature hierarchies, spatial detail degradation, and weak cross-dataset generalization. To overcome these challenges, we propose a novel Pyramidal Attention Mechanism (PAM) with Progressive Multi-stage Iterative Feature Refinement Network (PIFRNet) designed for robust and precise Salient Object Detection (SOD). Specifically, our method begins by hierarchically aggregating features from four representative stages of a powerful backbone, ensuring rich multi-scale context and semantic diversity. To bridge semantic gaps and recover fine structures, we introduce a Progressive Bilateral Feature Refinement (PBFR) module, which enhances early-stage features through cascaded convolutions and spatial attention. Furthermore, the novel PAM, equipped with dilated convolutions, is introduced to refine high-level semantics and reinforce object completeness. The network integrates these components through a multi-stage iterative refinement process, enabling gradual enhancement of spatial precision and structural fidelity. Extensive experiments conducted on five public SOD benchmarks demonstrate that our approach achieves superior performance compared to state-of-the-art methods, both quantitatively and qualitatively. Cross-dataset evaluations further validate its strong generalization capability, making it highly applicable to real-world visual intelligence scenarios.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.