Transformer-Based Approach Via Contrastive Learning for Zero-Shot Detection.

IF 6.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Neural Systems Pub Date : 2023-07-01 DOI:10.1142/S0129065723500351

Wei Liu, Hui Chen, Yongqiang Ma, Jianji Wang, Nanning Zheng

{"title":"Transformer-Based Approach Via Contrastive Learning for Zero-Shot Detection.","authors":"Wei Liu, Hui Chen, Yongqiang Ma, Jianji Wang, Nanning Zheng","doi":"10.1142/S0129065723500351","DOIUrl":null,"url":null,"abstract":"<p><p>Zero-shot detection (ZSD) aims to locate and classify unseen objects in pictures or videos by semantic auxiliary information without additional training examples. Most of the existing ZSD methods are based on two-stage models, which achieve the detection of unseen classes by aligning object region proposals with semantic embeddings. However, these methods have several limitations, including poor region proposals for unseen classes, lack of consideration of semantic representations of unseen classes or their inter-class correlations, and domain bias towards seen classes, which can degrade overall performance. To address these issues, the Trans-ZSD framework is proposed, which is a transformer-based multi-scale contextual detection framework that explicitly exploits inter-class correlations between seen and unseen classes and optimizes feature distribution to learn discriminative features. Trans-ZSD is a single-stage approach that skips proposal generation and performs detection directly, allowing the encoding of long-term dependencies at multiple scales to learn contextual features while requiring fewer inductive biases. Trans-ZSD also introduces a foreground-background separation branch to alleviate the confusion of unseen classes and backgrounds, contrastive learning to learn inter-class uniqueness and reduce misclassification between similar classes, and explicit inter-class commonality learning to facilitate generalization between related classes. Trans-ZSD addresses the domain bias problem in end-to-end generalized zero-shot detection (GZSD) models by using balance loss to maximize response consistency between seen and unseen predictions, ensuring that the model does not bias towards seen classes. The Trans-ZSD framework is evaluated on the PASCAL VOC and MS COCO datasets, demonstrating significant improvements over existing ZSD models.</p>","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":"33 7","pages":"2350035"},"PeriodicalIF":6.4000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Neural Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/S0129065723500351","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Zero-shot detection (ZSD) aims to locate and classify unseen objects in pictures or videos by semantic auxiliary information without additional training examples. Most of the existing ZSD methods are based on two-stage models, which achieve the detection of unseen classes by aligning object region proposals with semantic embeddings. However, these methods have several limitations, including poor region proposals for unseen classes, lack of consideration of semantic representations of unseen classes or their inter-class correlations, and domain bias towards seen classes, which can degrade overall performance. To address these issues, the Trans-ZSD framework is proposed, which is a transformer-based multi-scale contextual detection framework that explicitly exploits inter-class correlations between seen and unseen classes and optimizes feature distribution to learn discriminative features. Trans-ZSD is a single-stage approach that skips proposal generation and performs detection directly, allowing the encoding of long-term dependencies at multiple scales to learn contextual features while requiring fewer inductive biases. Trans-ZSD also introduces a foreground-background separation branch to alleviate the confusion of unseen classes and backgrounds, contrastive learning to learn inter-class uniqueness and reduce misclassification between similar classes, and explicit inter-class commonality learning to facilitate generalization between related classes. Trans-ZSD addresses the domain bias problem in end-to-end generalized zero-shot detection (GZSD) models by using balance loss to maximize response consistency between seen and unseen predictions, ensuring that the model does not bias towards seen classes. The Trans-ZSD framework is evaluated on the PASCAL VOC and MS COCO datasets, demonstrating significant improvements over existing ZSD models.

查看原文本刊更多论文

基于变压器对比学习的零弹检测方法。

零射击检测(Zero-shot detection, ZSD)的目的是在不需要额外训练样例的情况下，利用语义辅助信息对图片或视频中的未见物体进行定位和分类。现有的ZSD方法大多基于两阶段模型，通过将目标区域建议与语义嵌入对齐来实现未见类的检测。然而，这些方法有一些局限性，包括对不可见类的较差的区域建议，缺乏对不可见类的语义表示或它们的类间相关性的考虑，以及对可见类的域偏差，这可能会降低整体性能。为了解决这些问题，提出了Trans-ZSD框架，这是一个基于变压器的多尺度上下文检测框架，它明确地利用了可见类和未见类之间的类间相关性，并优化特征分布以学习判别特征。Trans-ZSD是一种单阶段方法，它跳过提案生成，直接执行检测，允许在多个尺度上对长期依赖进行编码，以学习上下文特征，同时需要更少的归纳偏差。Trans-ZSD还引入了前景-背景分离分支，以缓解未见类和背景的混淆;引入对比学习，以了解类间独特性，减少相似类之间的错误分类;引入显式类间共性学习，以促进相关类之间的泛化。Trans-ZSD解决了端到端广义零射击检测(GZSD)模型中的域偏差问题，通过使用平衡损失来最大化可见和未见预测之间的响应一致性，确保模型不会偏向可见类。Trans-ZSD框架在PASCAL VOC和MS COCO数据集上进行了评估，显示了对现有ZSD模型的显着改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Neural Systems 工程技术-计算机：人工智能

CiteScore

11.30

自引率

28.80%

发文量

116

审稿时长

24 months

期刊介绍： The International Journal of Neural Systems is a monthly, rigorously peer-reviewed transdisciplinary journal focusing on information processing in both natural and artificial neural systems. Special interests include machine learning, computational neuroscience and neurology. The journal prioritizes innovative, high-impact articles spanning multiple fields, including neurosciences and computer science and engineering. It adopts an open-minded approach to this multidisciplinary field, serving as a platform for novel ideas and enhanced understanding of collective and cooperative phenomena in computationally capable systems.