{"title":"Dual-branch contrastive learning for weakly supervised object localization","authors":"Zebin Guo, Dong Li, Zhengjun Du, Bingfeng Seng","doi":"10.1007/s10489-025-06514-1","DOIUrl":null,"url":null,"abstract":"<div><p>The weakly supervised object localization task uses image-level labels to train object localization models. Traditional convolutional neural network (CNN)-based methods usually localize objects using a class activation map. However, the class activation map usually suffers from the problem of activating a small part of the object that is most discriminative. Meanwhile, the methods based on the Vision Transformer can capture long-range feature dependencies but tend to ignore local feature details. In this paper, we innovatively propose a dual-branch contrastive learning (DBC) method that consists of a Transformer and a CNN branch. The method can effectively separate the background and foreground of an image and fuse the features of Transformer and CNN through contrastive learning. Specifically, the method separates the background and foreground representations of the image using the initially generated class-agnostic activation maps. Then, the representations of the same image from different branches form positive pairs for contrastive learning. The background and foreground representations from the same branch form negative pairs. Finally, the DBC method forces the model to separate the background and foreground representations through negative contrastive loss and makes the model fuse the features of two branches through positive contrastive loss. Experiments on the ILSVRC benchmark show that the proposed method can achieve a Top-1 localization accuracy of 59.9% and a GT-known localization accuracy of 71.7%, which are better metrics than those of the state-of-the-art methods with the same parameter complexity.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06514-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The weakly supervised object localization task uses image-level labels to train object localization models. Traditional convolutional neural network (CNN)-based methods usually localize objects using a class activation map. However, the class activation map usually suffers from the problem of activating a small part of the object that is most discriminative. Meanwhile, the methods based on the Vision Transformer can capture long-range feature dependencies but tend to ignore local feature details. In this paper, we innovatively propose a dual-branch contrastive learning (DBC) method that consists of a Transformer and a CNN branch. The method can effectively separate the background and foreground of an image and fuse the features of Transformer and CNN through contrastive learning. Specifically, the method separates the background and foreground representations of the image using the initially generated class-agnostic activation maps. Then, the representations of the same image from different branches form positive pairs for contrastive learning. The background and foreground representations from the same branch form negative pairs. Finally, the DBC method forces the model to separate the background and foreground representations through negative contrastive loss and makes the model fuse the features of two branches through positive contrastive loss. Experiments on the ILSVRC benchmark show that the proposed method can achieve a Top-1 localization accuracy of 59.9% and a GT-known localization accuracy of 71.7%, which are better metrics than those of the state-of-the-art methods with the same parameter complexity.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.