Anh-Khoa Nguyen Vu , Thanh-Toan Do , Vinh-Tiep Nguyen , Tam Le , Minh-Triet Tran , Tam V. Nguyen
{"title":"Few-shot object detection via synthetic features with optimal transport","authors":"Anh-Khoa Nguyen Vu , Thanh-Toan Do , Vinh-Tiep Nguyen , Tam Le , Minh-Triet Tran , Tam V. Nguyen","doi":"10.1016/j.cviu.2025.104350","DOIUrl":null,"url":null,"abstract":"<div><div>Few-shot object detection aims to simultaneously localize and classify the objects in an image with limited training samples. Most existing few-shot object detection methods focus on extracting the features of a few samples of novel classes, which can lack diversity. Consequently, they may not sufficiently capture the data distribution. To address this limitation, we propose a novel approach that trains a generator to produce synthetic data for novel classes. Still, directly training a generator on the novel class is ineffective due to the scarcity of novel data. To overcome this issue, we leverage the large-scale dataset of base classes by training a generator that captures the data variations of the dataset. Specifically, we train the generator with an optimal transport loss that minimizes the distance between the real and synthetic data distributions, which encourages the generator to capture data variations in base classes. We then transfer the captured variations to novel classes by generating synthetic data with the trained generator. Extensive experiments on benchmark datasets demonstrate that the proposed method outperforms the state of the art.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"257 ","pages":"Article 104350"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225000736","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Few-shot object detection aims to simultaneously localize and classify the objects in an image with limited training samples. Most existing few-shot object detection methods focus on extracting the features of a few samples of novel classes, which can lack diversity. Consequently, they may not sufficiently capture the data distribution. To address this limitation, we propose a novel approach that trains a generator to produce synthetic data for novel classes. Still, directly training a generator on the novel class is ineffective due to the scarcity of novel data. To overcome this issue, we leverage the large-scale dataset of base classes by training a generator that captures the data variations of the dataset. Specifically, we train the generator with an optimal transport loss that minimizes the distance between the real and synthetic data distributions, which encourages the generator to capture data variations in base classes. We then transfer the captured variations to novel classes by generating synthetic data with the trained generator. Extensive experiments on benchmark datasets demonstrate that the proposed method outperforms the state of the art.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems