Peipei Song , Wenyu Li , Peiyan Zhong , Jing Zhang , Piotr Konuisz , Feng Duan , Nick Barnes
{"title":"Synergizing triple attention with depth quality for RGB-D salient object detection","authors":"Peipei Song , Wenyu Li , Peiyan Zhong , Jing Zhang , Piotr Konuisz , Feng Duan , Nick Barnes","doi":"10.1016/j.neucom.2024.127672","DOIUrl":null,"url":null,"abstract":"<div><p>Salient object refers to the conspicuous objects or regions within an image that stand out prominently from its surroundings. Depth maps are commonly utilized as supplementary inputs for salient object detection, referred to as RGB-D SOD. Due to the diverse acquisition sensors, such as infrared detectors and stereo cameras, the quality of acquired depth maps varies considerably. The low-quality depth introduces noise that seriously reduces detection accuracy. To tackle this problem, a triple attention architecture based on a 3D convolutional neural network tailored for quality-aware salient object detection is proposed in this paper, which capitalizes on the strengths across modality, channel, and spatial dimensions. The modality attention learns the quality factors based on the overall modal features. The channel attention highlights features in the dimension of channels, and the patch-level spatial attention establishes long-range dependencies. Thus, the quality factors, channel differences, and spatial contrast are combined to achieve global and local fusion. To enable the evaluations on low-quality depth maps, an assessment criterion is further introduced to categorize the RGB-D datasets. Experimental results of state-of-the-art methods on different quality levels demonstrate the proposed method’s effectiveness, especially for the low-quality depth.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"589 ","pages":"Article 127672"},"PeriodicalIF":6.5000,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224004430","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Salient object refers to the conspicuous objects or regions within an image that stand out prominently from its surroundings. Depth maps are commonly utilized as supplementary inputs for salient object detection, referred to as RGB-D SOD. Due to the diverse acquisition sensors, such as infrared detectors and stereo cameras, the quality of acquired depth maps varies considerably. The low-quality depth introduces noise that seriously reduces detection accuracy. To tackle this problem, a triple attention architecture based on a 3D convolutional neural network tailored for quality-aware salient object detection is proposed in this paper, which capitalizes on the strengths across modality, channel, and spatial dimensions. The modality attention learns the quality factors based on the overall modal features. The channel attention highlights features in the dimension of channels, and the patch-level spatial attention establishes long-range dependencies. Thus, the quality factors, channel differences, and spatial contrast are combined to achieve global and local fusion. To enable the evaluations on low-quality depth maps, an assessment criterion is further introduced to categorize the RGB-D datasets. Experimental results of state-of-the-art methods on different quality levels demonstrate the proposed method’s effectiveness, especially for the low-quality depth.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.