基于逐级多阶段迭代特征细化的金字塔关注显著目标分割

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-10-01 Epub Date: 2025-07-28 DOI:10.1016/j.imavis.2025.105670

Rahim Khan , Nada Alzaben , Yousef Ibrahim Daradkeh , Xianxun Zhu , Inam Ullah

{"title":"基于逐级多阶段迭代特征细化的金字塔关注显著目标分割","authors":"Rahim Khan , Nada Alzaben , Yousef Ibrahim Daradkeh , Xianxun Zhu , Inam Ullah","doi":"10.1016/j.imavis.2025.105670","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate detection of salient objects in complex visual scenes remains a fundamental yet challenging task in visual intelligence, often impeded by significant scale variation, background clutter, and indistinct object boundaries. While recent approaches attempt to exploit multi-level features, they frequently encounter limitations such as semantic misalignment across feature hierarchies, spatial detail degradation, and weak cross-dataset generalization. To overcome these challenges, we propose a novel Pyramidal Attention Mechanism (PAM) with Progressive Multi-stage Iterative Feature Refinement Network (PIFRNet) designed for robust and precise Salient Object Detection (SOD). Specifically, our method begins by hierarchically aggregating features from four representative stages of a powerful backbone, ensuring rich multi-scale context and semantic diversity. To bridge semantic gaps and recover fine structures, we introduce a Progressive Bilateral Feature Refinement (PBFR) module, which enhances early-stage features through cascaded convolutions and spatial attention. Furthermore, the novel PAM, equipped with dilated convolutions, is introduced to refine high-level semantics and reinforce object completeness. The network integrates these components through a multi-stage iterative refinement process, enabling gradual enhancement of spatial precision and structural fidelity. Extensive experiments conducted on five public SOD benchmarks demonstrate that our approach achieves superior performance compared to state-of-the-art methods, both quantitatively and qualitatively. Cross-dataset evaluations further validate its strong generalization capability, making it highly applicable to real-world visual intelligence scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105670"},"PeriodicalIF":4.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pyramidal attention with progressive multi-stage iterative feature refinement for salient object segmentation\",\"authors\":\"Rahim Khan , Nada Alzaben , Yousef Ibrahim Daradkeh , Xianxun Zhu , Inam Ullah\",\"doi\":\"10.1016/j.imavis.2025.105670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate detection of salient objects in complex visual scenes remains a fundamental yet challenging task in visual intelligence, often impeded by significant scale variation, background clutter, and indistinct object boundaries. While recent approaches attempt to exploit multi-level features, they frequently encounter limitations such as semantic misalignment across feature hierarchies, spatial detail degradation, and weak cross-dataset generalization. To overcome these challenges, we propose a novel Pyramidal Attention Mechanism (PAM) with Progressive Multi-stage Iterative Feature Refinement Network (PIFRNet) designed for robust and precise Salient Object Detection (SOD). Specifically, our method begins by hierarchically aggregating features from four representative stages of a powerful backbone, ensuring rich multi-scale context and semantic diversity. To bridge semantic gaps and recover fine structures, we introduce a Progressive Bilateral Feature Refinement (PBFR) module, which enhances early-stage features through cascaded convolutions and spatial attention. Furthermore, the novel PAM, equipped with dilated convolutions, is introduced to refine high-level semantics and reinforce object completeness. The network integrates these components through a multi-stage iterative refinement process, enabling gradual enhancement of spatial precision and structural fidelity. Extensive experiments conducted on five public SOD benchmarks demonstrate that our approach achieves superior performance compared to state-of-the-art methods, both quantitatively and qualitatively. Cross-dataset evaluations further validate its strong generalization capability, making it highly applicable to real-world visual intelligence scenarios.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"162 \",\"pages\":\"Article 105670\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625002586\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/28 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002586","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/28 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在复杂的视觉场景中准确检测显著目标是视觉智能领域的一项基本但具有挑战性的任务，经常受到显著尺度变化、背景杂波和模糊目标边界的阻碍。虽然最近的方法试图利用多层次特征，但它们经常遇到诸如特征层次之间的语义不对齐、空间细节退化和弱跨数据集泛化等限制。为了克服这些挑战，我们提出了一种新的金字塔注意机制（PAM），该机制采用渐进式多阶段迭代特征优化网络（PIFRNet），旨在实现鲁棒和精确的显著目标检测（SOD）。具体来说，我们的方法首先从强大骨干的四个代表性阶段分层聚合特征，确保丰富的多尺度上下文和语义多样性。为了弥合语义差距并恢复精细结构，我们引入了渐进式双边特征细化（PBFR）模块，该模块通过级联卷积和空间注意来增强早期特征。在此基础上，引入了扩展卷积的PAM，改进了高级语义，增强了对象完备性。该网络通过多阶段迭代优化过程集成这些组件，从而逐步提高空间精度和结构保真度。在五个公开的SOD基准测试中进行的大量实验表明，与最先进的方法相比，我们的方法在定量和定性方面都取得了卓越的性能。跨数据集评估进一步验证了其强大的泛化能力，使其高度适用于现实世界的视觉智能场景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Pyramidal attention with progressive multi-stage iterative feature refinement for salient object segmentation

Accurate detection of salient objects in complex visual scenes remains a fundamental yet challenging task in visual intelligence, often impeded by significant scale variation, background clutter, and indistinct object boundaries. While recent approaches attempt to exploit multi-level features, they frequently encounter limitations such as semantic misalignment across feature hierarchies, spatial detail degradation, and weak cross-dataset generalization. To overcome these challenges, we propose a novel Pyramidal Attention Mechanism (PAM) with Progressive Multi-stage Iterative Feature Refinement Network (PIFRNet) designed for robust and precise Salient Object Detection (SOD). Specifically, our method begins by hierarchically aggregating features from four representative stages of a powerful backbone, ensuring rich multi-scale context and semantic diversity. To bridge semantic gaps and recover fine structures, we introduce a Progressive Bilateral Feature Refinement (PBFR) module, which enhances early-stage features through cascaded convolutions and spatial attention. Furthermore, the novel PAM, equipped with dilated convolutions, is introduced to refine high-level semantics and reinforce object completeness. The network integrates these components through a multi-stage iterative refinement process, enabling gradual enhancement of spatial precision and structural fidelity. Extensive experiments conducted on five public SOD benchmarks demonstrate that our approach achieves superior performance compared to state-of-the-art methods, both quantitatively and qualitatively. Cross-dataset evaluations further validate its strong generalization capability, making it highly applicable to real-world visual intelligence scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.