Zhihao Wu;Chengliang Liu;Jie Wen;Yong Xu;Jian Yang;Xuelong Li
{"title":"Spatial Continuity and Nonequal Importance in Salient Object Detection With Image-Category Supervision","authors":"Zhihao Wu;Chengliang Liu;Jie Wen;Yong Xu;Jian Yang;Xuelong Li","doi":"10.1109/TNNLS.2024.3436519","DOIUrl":null,"url":null,"abstract":"Due to the inefficiency of pixel-level annotations, weakly supervised salient object detection with image-category labels (WSSOD) has been receiving increasing attention. Previous works usually endeavor to generate high-quality pseudolabels to train the detectors in a fully supervised manner. However, we find that the detection performance is often limited by two types of noise contained in pseudolabels: 1) holes inside the object or at the edge and outliers in the background and 2) missing object portions and redundant surrounding regions. To mitigate the adverse effects caused by them, we propose local pixel correction (LPC) and key pixel attention (KPA), respectively, based on two key properties of desirable pseudolabels: 1) spatial continuity, meaning an object region consists of a cluster of adjacent points; and 2) nonequal importance, meaning pixels have different importance for training. Specifically, LPC fills holes and filters out outliers based on summary statistics of the neighborhood as well as its size. KPA directs the focus of training toward ambiguous pixels in multiple pseudolabels to discover more accurate saliency cues. To evaluate the effectiveness of our method, we design a simple yet strong baseline we call weakly supervised saliency detector with Transformer (WSSDT) and unify the proposed modules into WSSDT. Extensive experiments on five datasets demonstrate that our method significantly improves the baseline and outperforms all existing congeneric methods. Moreover, we establish the first benchmark to evaluate WSSOD robustness. The results show that our method can improve detection robustness as well. The code and robustness benchmark are available at <uri>https://github.com/Horatio9702/SCNI</uri>.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 5","pages":"8565-8576"},"PeriodicalIF":8.9000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10664530/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Due to the inefficiency of pixel-level annotations, weakly supervised salient object detection with image-category labels (WSSOD) has been receiving increasing attention. Previous works usually endeavor to generate high-quality pseudolabels to train the detectors in a fully supervised manner. However, we find that the detection performance is often limited by two types of noise contained in pseudolabels: 1) holes inside the object or at the edge and outliers in the background and 2) missing object portions and redundant surrounding regions. To mitigate the adverse effects caused by them, we propose local pixel correction (LPC) and key pixel attention (KPA), respectively, based on two key properties of desirable pseudolabels: 1) spatial continuity, meaning an object region consists of a cluster of adjacent points; and 2) nonequal importance, meaning pixels have different importance for training. Specifically, LPC fills holes and filters out outliers based on summary statistics of the neighborhood as well as its size. KPA directs the focus of training toward ambiguous pixels in multiple pseudolabels to discover more accurate saliency cues. To evaluate the effectiveness of our method, we design a simple yet strong baseline we call weakly supervised saliency detector with Transformer (WSSDT) and unify the proposed modules into WSSDT. Extensive experiments on five datasets demonstrate that our method significantly improves the baseline and outperforms all existing congeneric methods. Moreover, we establish the first benchmark to evaluate WSSOD robustness. The results show that our method can improve detection robustness as well. The code and robustness benchmark are available at https://github.com/Horatio9702/SCNI.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.