DIM: long-tailed object detection and instance segmentation via dynamic instance memory

IF 6.3 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhao-Min Chen, Xin Jin, Xiaoqin Zhang, C. Xia, Zhiyong Pan, Ruoxi Deng, Jie Hu, Heng Chen
{"title":"DIM: long-tailed object detection and instance segmentation via dynamic instance memory","authors":"Zhao-Min Chen, Xin Jin, Xiaoqin Zhang, C. Xia, Zhiyong Pan, Ruoxi Deng, Jie Hu, Heng Chen","doi":"10.1088/2632-2153/acf362","DOIUrl":null,"url":null,"abstract":"Object detection and instance segmentation have been successful on benchmarks with relatively balanced category distribution (e.g. MSCOCO). However, state-of-the-art object detection and segmentation methods still struggle to generalize on long-tailed datasets (e.g. LVIS), where a few classes (head classes) dominate the instance samples, while most classes (tailed classes) have only a few samples. To address this challenge, we propose a plug-and-play module within the Mask R-CNN framework called dynamic instance memory (DIM). Specifically, we augment Mask R-CNN with an auxiliary branch for training. It maintains a dynamic memory bank storing an instance-level prototype representation for each category, and shares the classifier with the existing instance branch. With a simple metric loss, the representations in DIM can be dynamically updated by the instance proposals in the mini-batch during training. Our DIM introduces a bias toward tailed classes to the classifier learning along with a class frequency reversed sampler, which learns generalizable representations from the original data distribution, complementing the existing instance branch. Comprehensive experiments on LVIS demonstrate the effectiveness of DIM, as well as the significant advantages of DIM over the baseline Mask R-CNN.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/2632-2153/acf362","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Object detection and instance segmentation have been successful on benchmarks with relatively balanced category distribution (e.g. MSCOCO). However, state-of-the-art object detection and segmentation methods still struggle to generalize on long-tailed datasets (e.g. LVIS), where a few classes (head classes) dominate the instance samples, while most classes (tailed classes) have only a few samples. To address this challenge, we propose a plug-and-play module within the Mask R-CNN framework called dynamic instance memory (DIM). Specifically, we augment Mask R-CNN with an auxiliary branch for training. It maintains a dynamic memory bank storing an instance-level prototype representation for each category, and shares the classifier with the existing instance branch. With a simple metric loss, the representations in DIM can be dynamically updated by the instance proposals in the mini-batch during training. Our DIM introduces a bias toward tailed classes to the classifier learning along with a class frequency reversed sampler, which learns generalizable representations from the original data distribution, complementing the existing instance branch. Comprehensive experiments on LVIS demonstrate the effectiveness of DIM, as well as the significant advantages of DIM over the baseline Mask R-CNN.
DIM:通过动态实例内存进行长尾目标检测和实例分割
对象检测和实例分割在具有相对平衡的类别分布(例如MSCOCO)的基准测试中是成功的。然而,最先进的目标检测和分割方法仍然难以在长尾数据集(例如LVIS)上进行泛化,其中少数类(头部类)主导了实例样本,而大多数类(尾部类)只有少数样本。为了解决这一挑战,我们在Mask R-CNN框架中提出了一个即插即用模块,称为动态实例内存(DIM)。具体来说,我们用一个辅助分支来增强Mask R-CNN的训练。它维护一个动态内存库,存储每个类别的实例级原型表示,并与现有的实例分支共享分类器。通过一个简单的度量损失,DIM中的表示可以在训练过程中被小批量中的实例建议动态更新。我们的DIM在分类器学习中引入了对尾类的偏向,以及类频率反向采样器,它从原始数据分布中学习可推广的表示,补充了现有的实例分支。在LVIS上的综合实验证明了DIM的有效性,以及DIM相对于基线Mask R-CNN的显著优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Machine Learning Science and Technology
Machine Learning Science and Technology Computer Science-Artificial Intelligence
CiteScore
9.10
自引率
4.40%
发文量
86
审稿时长
5 weeks
期刊介绍: Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信