DIGCN: A Dynamic Interaction Graph Convolutional Network Based on Learnable Proposals for Object Detection

IF 4.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research Pub Date : 2024-04-04 DOI:10.1613/jair.1.15698

Pingping Cao, Yanping Zhu, Yuhao Jin, Benkun Ruan, Qiang Niu

{"title":"DIGCN: A Dynamic Interaction Graph Convolutional Network Based on Learnable Proposals for Object Detection","authors":"Pingping Cao, Yanping Zhu, Yuhao Jin, Benkun Ruan, Qiang Niu","doi":"10.1613/jair.1.15698","DOIUrl":null,"url":null,"abstract":"We propose a Dynamic Interaction Graph Convolutional Network (DIGCN), an image object detection method based on learnable proposals and GCN. Existing object detection methods usually work on dense candidates, resulting in redundant and near-duplicate results. Meanwhile, non-maximum suppression post-processing operations are required to eliminate negative effects, which increases the computational complexity. Although the existing sparse detector avoids cumbersome post-processing operations, it ignores the potential relationship between objects and proposals, which hinders detection accuracy improvement. Therefore, we propose a dynamic interaction GCN module in the DIGCN, which performs dynamic interaction and relational modeling on the proposal boxes and proposal features to improve the object detection accuracy. In addition, we introduce a learnable proposal method with a sparse set of learned object proposals to eliminate a huge number of hand-designed object candidates, avoiding complicated tasks such as object candidate design and many-to-one label assignment, and reducing object detection model complexity to a certain extent. DIGCN demonstrates accuracy and run-time performance on par with the well-established and highly optimized detector baselines on the challenging COCO dataset, e.g. with the ResNet-101FPN as the backbone our method attains the accuracy of 46.5 AP while processing 13 frames per second. Our work provides a new method for object detection research.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":null,"pages":null},"PeriodicalIF":4.5000,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1613/jair.1.15698","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

We propose a Dynamic Interaction Graph Convolutional Network (DIGCN), an image object detection method based on learnable proposals and GCN. Existing object detection methods usually work on dense candidates, resulting in redundant and near-duplicate results. Meanwhile, non-maximum suppression post-processing operations are required to eliminate negative effects, which increases the computational complexity. Although the existing sparse detector avoids cumbersome post-processing operations, it ignores the potential relationship between objects and proposals, which hinders detection accuracy improvement. Therefore, we propose a dynamic interaction GCN module in the DIGCN, which performs dynamic interaction and relational modeling on the proposal boxes and proposal features to improve the object detection accuracy. In addition, we introduce a learnable proposal method with a sparse set of learned object proposals to eliminate a huge number of hand-designed object candidates, avoiding complicated tasks such as object candidate design and many-to-one label assignment, and reducing object detection model complexity to a certain extent. DIGCN demonstrates accuracy and run-time performance on par with the well-established and highly optimized detector baselines on the challenging COCO dataset, e.g. with the ResNet-101FPN as the backbone our method attains the accuracy of 46.5 AP while processing 13 frames per second. Our work provides a new method for object detection research.

查看原文本刊更多论文

DIGCN：基于可学习建议的动态交互图卷积网络，用于物体检测

我们提出的动态交互图卷积网络（DIGCN）是一种基于可学习建议和 GCN 的图像对象检测方法。现有的物体检测方法通常针对密集的候选对象，结果冗余且近乎重复。同时，为了消除负面影响，需要进行非最大抑制后处理操作，从而增加了计算复杂度。现有的稀疏检测器虽然避免了繁琐的后处理操作，但却忽略了对象与提议之间的潜在关系，从而阻碍了检测精度的提高。因此，我们在 DIGCN 中提出了动态交互 GCN 模块，对提案框和提案特征进行动态交互和关系建模，以提高对象检测精度。此外，我们还引入了一种可学习的提案方法，利用稀疏的学习对象提案集来消除大量手工设计的候选对象，避免了候选对象设计和多对一标签分配等复杂任务，在一定程度上降低了对象检测模型的复杂度。在极具挑战性的 COCO 数据集上，DIGCN 的准确度和运行时间性能与成熟的、高度优化的检测器基线相当，例如，以 ResNet-101FPN 为骨干，我们的方法在每秒处理 13 帧图像的情况下，准确度达到 46.5 AP。我们的工作为物体检测研究提供了一种新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Artificial Intelligence Research 工程技术-计算机：人工智能

CiteScore

9.60

自引率

4.00%

发文量

审稿时长

4 months

期刊介绍： JAIR(ISSN 1076 - 9757) covers all areas of artificial intelligence (AI), publishing refereed research articles, survey articles, and technical notes. Established in 1993 as one of the first electronic scientific journals, JAIR is indexed by INSPEC, Science Citation Index, and MathSciNet. JAIR reviews papers within approximately three months of submission and publishes accepted articles on the internet immediately upon receiving the final versions. JAIR articles are published for free distribution on the internet by the AI Access Foundation, and for purchase in bound volumes by AAAI Press.