DIM: long-tailed object detection and instance segmentation via dynamic instance memory

IF 4.6 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology Pub Date : 2023-08-23 DOI:10.1088/2632-2153/acf362

Zhao-Min Chen, Xin Jin, Xiaoqin Zhang, C. Xia, Zhiyong Pan, Ruoxi Deng, Jie Hu, Heng Chen

{"title":"DIM: long-tailed object detection and instance segmentation via dynamic instance memory","authors":"Zhao-Min Chen, Xin Jin, Xiaoqin Zhang, C. Xia, Zhiyong Pan, Ruoxi Deng, Jie Hu, Heng Chen","doi":"10.1088/2632-2153/acf362","DOIUrl":null,"url":null,"abstract":"Object detection and instance segmentation have been successful on benchmarks with relatively balanced category distribution (e.g. MSCOCO). However, state-of-the-art object detection and segmentation methods still struggle to generalize on long-tailed datasets (e.g. LVIS), where a few classes (head classes) dominate the instance samples, while most classes (tailed classes) have only a few samples. To address this challenge, we propose a plug-and-play module within the Mask R-CNN framework called dynamic instance memory (DIM). Specifically, we augment Mask R-CNN with an auxiliary branch for training. It maintains a dynamic memory bank storing an instance-level prototype representation for each category, and shares the classifier with the existing instance branch. With a simple metric loss, the representations in DIM can be dynamically updated by the instance proposals in the mini-batch during training. Our DIM introduces a bias toward tailed classes to the classifier learning along with a class frequency reversed sampler, which learns generalizable representations from the original data distribution, complementing the existing instance branch. Comprehensive experiments on LVIS demonstrate the effectiveness of DIM, as well as the significant advantages of DIM over the baseline Mask R-CNN.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/2632-2153/acf362","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Object detection and instance segmentation have been successful on benchmarks with relatively balanced category distribution (e.g. MSCOCO). However, state-of-the-art object detection and segmentation methods still struggle to generalize on long-tailed datasets (e.g. LVIS), where a few classes (head classes) dominate the instance samples, while most classes (tailed classes) have only a few samples. To address this challenge, we propose a plug-and-play module within the Mask R-CNN framework called dynamic instance memory (DIM). Specifically, we augment Mask R-CNN with an auxiliary branch for training. It maintains a dynamic memory bank storing an instance-level prototype representation for each category, and shares the classifier with the existing instance branch. With a simple metric loss, the representations in DIM can be dynamically updated by the instance proposals in the mini-batch during training. Our DIM introduces a bias toward tailed classes to the classifier learning along with a class frequency reversed sampler, which learns generalizable representations from the original data distribution, complementing the existing instance branch. Comprehensive experiments on LVIS demonstrate the effectiveness of DIM, as well as the significant advantages of DIM over the baseline Mask R-CNN.

查看原文本刊更多论文

DIM:通过动态实例内存进行长尾目标检测和实例分割

对象检测和实例分割在具有相对平衡的类别分布(例如MSCOCO)的基准测试中是成功的。然而，最先进的目标检测和分割方法仍然难以在长尾数据集(例如LVIS)上进行泛化，其中少数类(头部类)主导了实例样本，而大多数类(尾部类)只有少数样本。为了解决这一挑战，我们在Mask R-CNN框架中提出了一个即插即用模块，称为动态实例内存(DIM)。具体来说，我们用一个辅助分支来增强Mask R-CNN的训练。它维护一个动态内存库，存储每个类别的实例级原型表示，并与现有的实例分支共享分类器。通过一个简单的度量损失，DIM中的表示可以在训练过程中被小批量中的实例建议动态更新。我们的DIM在分类器学习中引入了对尾类的偏向，以及类频率反向采样器，它从原始数据分布中学习可推广的表示，补充了现有的实例分支。在LVIS上的综合实验证明了DIM的有效性，以及DIM相对于基线Mask R-CNN的显著优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine Learning Science and Technology Computer Science-Artificial Intelligence

CiteScore

9.10

自引率

4.40%

发文量

审稿时长

5 weeks

期刊介绍： Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.