再思考弱监督对象定位的擦除策略

IF 2.7 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication Pub Date : 2025-02-18 DOI:10.1016/j.image.2025.117280

Yuming Fan , Shikui Wei , Chuangchuang Tan , Xiaotong Chen , Dongming Yang , Yao Zhao

{"title":"再思考弱监督对象定位的擦除策略","authors":"Yuming Fan , Shikui Wei , Chuangchuang Tan , Xiaotong Chen , Dongming Yang , Yao Zhao","doi":"10.1016/j.image.2025.117280","DOIUrl":null,"url":null,"abstract":"<div><div>Weakly supervised object localization (WSOL) is a challenging task that aims to locate object regions in images using image-level labels as supervision. Early research utilized erasing strategy to expand the localization regions. However, those methods usually adopt a fix threshold resulting in over- or under-fitting of the object region. Additionally, recent pseudo-label paradigm decouples the classification and localization tasks, causing confusion between foreground and background regions. In this paper, we propose the Soft-Erasing (SoE) method for Weakly Supervised Object Localization (WSOL). It includes two key modules: the Adaptive Erasing (AE) and Flip Erasing (FE). The AE module dynamically adjusts the erasing threshold using the object’s structural information, while the noise information module ensures the classifier focuses on the foreground region. The FE module effectively decouples object and background information by using normalization and inversion techniques. Additionally, we introduce activation loss and reverse loss to strengthen semantic consistency in foreground regions. Experiments on public datasets demonstrate that our SoE framework significantly improves localization accuracy, achieving 70.86% on GT-Known Loc for ILSVRC and 95.84% for CUB-200-2011.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117280"},"PeriodicalIF":2.7000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rethinking erasing strategy on weakly supervised object localization\",\"authors\":\"Yuming Fan , Shikui Wei , Chuangchuang Tan , Xiaotong Chen , Dongming Yang , Yao Zhao\",\"doi\":\"10.1016/j.image.2025.117280\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Weakly supervised object localization (WSOL) is a challenging task that aims to locate object regions in images using image-level labels as supervision. Early research utilized erasing strategy to expand the localization regions. However, those methods usually adopt a fix threshold resulting in over- or under-fitting of the object region. Additionally, recent pseudo-label paradigm decouples the classification and localization tasks, causing confusion between foreground and background regions. In this paper, we propose the Soft-Erasing (SoE) method for Weakly Supervised Object Localization (WSOL). It includes two key modules: the Adaptive Erasing (AE) and Flip Erasing (FE). The AE module dynamically adjusts the erasing threshold using the object’s structural information, while the noise information module ensures the classifier focuses on the foreground region. The FE module effectively decouples object and background information by using normalization and inversion techniques. Additionally, we introduce activation loss and reverse loss to strengthen semantic consistency in foreground regions. Experiments on public datasets demonstrate that our SoE framework significantly improves localization accuracy, achieving 70.86% on GT-Known Loc for ILSVRC and 95.84% for CUB-200-2011.</div></div>\",\"PeriodicalId\":49521,\"journal\":{\"name\":\"Signal Processing-Image Communication\",\"volume\":\"135 \",\"pages\":\"Article 117280\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Signal Processing-Image Communication\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S092359652500027X\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092359652500027X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

弱监督对象定位（WSOL）是一项具有挑战性的任务，它旨在使用图像级标签作为监督来定位图像中的对象区域。早期的研究利用擦除策略来扩大定位区域。然而，这些方法通常采用固定的阈值，导致目标区域的过拟合或欠拟合。此外，最近的伪标签范式解耦了分类和定位任务，导致前景和背景区域之间的混淆。本文提出了一种基于软擦除（SoE）的弱监督对象定位方法。它包括两个关键模块：自适应擦除（AE）和翻转擦除（FE）。AE模块根据目标的结构信息动态调整擦除阈值，而噪声信息模块确保分类器聚焦于前景区域。有限元模块通过使用归一化和反演技术，有效地解耦了对象和背景信息。此外，我们引入了激活损失和反向损失来增强前景区域的语义一致性。在公共数据集上的实验表明，我们的SoE框架显著提高了定位精度，ILSVRC在GT-Known Loc上的定位精度达到70.86%，CUB-200-2011的定位精度达到95.84%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Rethinking erasing strategy on weakly supervised object localization

Weakly supervised object localization (WSOL) is a challenging task that aims to locate object regions in images using image-level labels as supervision. Early research utilized erasing strategy to expand the localization regions. However, those methods usually adopt a fix threshold resulting in over- or under-fitting of the object region. Additionally, recent pseudo-label paradigm decouples the classification and localization tasks, causing confusion between foreground and background regions. In this paper, we propose the Soft-Erasing (SoE) method for Weakly Supervised Object Localization (WSOL). It includes two key modules: the Adaptive Erasing (AE) and Flip Erasing (FE). The AE module dynamically adjusts the erasing threshold using the object’s structural information, while the noise information module ensures the classifier focuses on the foreground region. The FE module effectively decouples object and background information by using normalization and inversion techniques. Additionally, we introduce activation loss and reverse loss to strengthen semantic consistency in foreground regions. Experiments on public datasets demonstrate that our SoE framework significantly improves localization accuracy, achieving 70.86% on GT-Known Loc for ILSVRC and 95.84% for CUB-200-2011.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Signal Processing-Image Communication 工程技术-工程：电子与电气

CiteScore

8.40

自引率

2.90%

发文量

138

审稿时长

5.2 months

期刊介绍： Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following: To present a forum for the advancement of theory and practice of image communication. To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems. To contribute to a rapid information exchange between the industrial and academic environments. The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world. Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments. Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.