{"title":"用于无监督突出物体检测的对比学习框架","authors":"Huankang Guan;Jiaying Lin;Rynson W. H. Lau","doi":"10.1109/TIP.2025.3558674","DOIUrl":null,"url":null,"abstract":"Existing unsupervised salient object detection (USOD) methods usually rely on low-level saliency priors, such as center and background priors, to detect salient objects, resulting in insufficient high-level semantic understanding. These low-level priors can be fragile and lead to failure when the natural images do not satisfy the prior assumptions, e.g., these methods may fail to detect those off-center salient objects causing fragmented objects in the segmentation. To address these problems, we propose to eliminate the dependency on flimsy low-level priors, and extract high-level saliency from natural images through a contrastive learning framework. To this end, we propose a Contrastive Saliency Network (CSNet), which is a prior-free and label-free saliency detector, with two novel modules: 1) a Contrastive Saliency Extraction (CSE) module to extract high-level saliency cues, by mimicking the human attention mechanism within an instance discriminative task through a contrastive learning framework, and 2) a Feature Re-Coordinate (FRC) module to recover spatial details, by calibrating high-level features with low-level features in an unsupervised fashion. In addition, we introduce a novel local appearance triplet (LAT) loss to assist the training process by encouraging similar saliency scores for regions with homogeneous appearances. Extensive experiments show that our approach is effective and outperforms state-of-the-art methods on popular SOD benchmarks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2487-2498"},"PeriodicalIF":13.7000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Contrastive-Learning Framework for Unsupervised Salient Object Detection\",\"authors\":\"Huankang Guan;Jiaying Lin;Rynson W. H. Lau\",\"doi\":\"10.1109/TIP.2025.3558674\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing unsupervised salient object detection (USOD) methods usually rely on low-level saliency priors, such as center and background priors, to detect salient objects, resulting in insufficient high-level semantic understanding. These low-level priors can be fragile and lead to failure when the natural images do not satisfy the prior assumptions, e.g., these methods may fail to detect those off-center salient objects causing fragmented objects in the segmentation. To address these problems, we propose to eliminate the dependency on flimsy low-level priors, and extract high-level saliency from natural images through a contrastive learning framework. To this end, we propose a Contrastive Saliency Network (CSNet), which is a prior-free and label-free saliency detector, with two novel modules: 1) a Contrastive Saliency Extraction (CSE) module to extract high-level saliency cues, by mimicking the human attention mechanism within an instance discriminative task through a contrastive learning framework, and 2) a Feature Re-Coordinate (FRC) module to recover spatial details, by calibrating high-level features with low-level features in an unsupervised fashion. In addition, we introduce a novel local appearance triplet (LAT) loss to assist the training process by encouraging similar saliency scores for regions with homogeneous appearances. Extensive experiments show that our approach is effective and outperforms state-of-the-art methods on popular SOD benchmarks.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"2487-2498\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10964591/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10964591/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Contrastive-Learning Framework for Unsupervised Salient Object Detection
Existing unsupervised salient object detection (USOD) methods usually rely on low-level saliency priors, such as center and background priors, to detect salient objects, resulting in insufficient high-level semantic understanding. These low-level priors can be fragile and lead to failure when the natural images do not satisfy the prior assumptions, e.g., these methods may fail to detect those off-center salient objects causing fragmented objects in the segmentation. To address these problems, we propose to eliminate the dependency on flimsy low-level priors, and extract high-level saliency from natural images through a contrastive learning framework. To this end, we propose a Contrastive Saliency Network (CSNet), which is a prior-free and label-free saliency detector, with two novel modules: 1) a Contrastive Saliency Extraction (CSE) module to extract high-level saliency cues, by mimicking the human attention mechanism within an instance discriminative task through a contrastive learning framework, and 2) a Feature Re-Coordinate (FRC) module to recover spatial details, by calibrating high-level features with low-level features in an unsupervised fashion. In addition, we introduce a novel local appearance triplet (LAT) loss to assist the training process by encouraging similar saliency scores for regions with homogeneous appearances. Extensive experiments show that our approach is effective and outperforms state-of-the-art methods on popular SOD benchmarks.