GMTNet: Dense Object Detection via Global Dynamically Matching Transformer Network

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-25 DOI:10.1109/TCSVT.2024.3522661

Chaojun Dong;Chengxuan Wang;Yikui Zhai;Ye Li;Jianhong Zhou;Pasquale Coscia;Angelo Genovese;Vincenzo Piuri;Fabio Scotti

{"title":"GMTNet: Dense Object Detection via Global Dynamically Matching Transformer Network","authors":"Chaojun Dong;Chengxuan Wang;Yikui Zhai;Ye Li;Jianhong Zhou;Pasquale Coscia;Angelo Genovese;Vincenzo Piuri;Fabio Scotti","doi":"10.1109/TCSVT.2024.3522661","DOIUrl":null,"url":null,"abstract":"In recent years, object detection models have been extensively applied across various industries, leveraging learned samples to recognize and locate objects. However, industrial environments present unique challenges, including complex backgrounds, dense object distributions, object stacking, and occlusion. To address these challenges, we propose the Global Dynamic Matching Transformer Network (GMTNet). GMTNet partitions images into blocks and employs a sliding window approach to capture information from each block and their interrelationships, mitigating background interference while acquiring global information for dense object recognition. By reweighting key-value pairs in multi-scale feature maps, GMTNet enhances global information relevance and effectively handles occlusion and overlap between objects. Furthermore, we introduce a dynamic sample matching method to tackle the issue of excessive candidate boxes in dense detection tasks. This method adaptively adjusts the number of matched positive samples according to the specific detection task, enabling the model to reduce the learning of irrelevant features and simplify post-processing. Experimental results demonstrate that GMTNet excels in dense detection tasks and outperforms current mainstream algorithms. The code will be available at <uri>http://github.com/yikuizhai/GMTNet</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4923-4936"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10816179/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, object detection models have been extensively applied across various industries, leveraging learned samples to recognize and locate objects. However, industrial environments present unique challenges, including complex backgrounds, dense object distributions, object stacking, and occlusion. To address these challenges, we propose the Global Dynamic Matching Transformer Network (GMTNet). GMTNet partitions images into blocks and employs a sliding window approach to capture information from each block and their interrelationships, mitigating background interference while acquiring global information for dense object recognition. By reweighting key-value pairs in multi-scale feature maps, GMTNet enhances global information relevance and effectively handles occlusion and overlap between objects. Furthermore, we introduce a dynamic sample matching method to tackle the issue of excessive candidate boxes in dense detection tasks. This method adaptively adjusts the number of matched positive samples according to the specific detection task, enabling the model to reduce the learning of irrelevant features and simplify post-processing. Experimental results demonstrate that GMTNet excels in dense detection tasks and outperforms current mainstream algorithms. The code will be available at http://github.com/yikuizhai/GMTNet.

查看原文本刊更多论文

GMTNet：基于全局动态匹配变压器网络的密集目标检测

近年来，目标检测模型被广泛应用于各个行业，利用学习到的样本来识别和定位目标。然而，工业环境提出了独特的挑战，包括复杂的背景、密集的物体分布、物体堆叠和遮挡。为了应对这些挑战，我们提出了全球动态匹配变压器网络（GMTNet）。GMTNet将图像划分为块，并采用滑动窗口方法从每个块中捕获信息及其相互关系，在获取密集目标识别的全局信息的同时减轻背景干扰。GMTNet通过对多尺度特征映射中的键值对进行重加权，增强了全局信息的相关性，有效地处理了目标间的遮挡和重叠。此外，我们引入了一种动态样本匹配方法来解决密集检测任务中候选盒过多的问题。该方法根据具体的检测任务自适应调整匹配阳性样本的数量，使模型减少了不相关特征的学习，简化了后处理。实验结果表明，GMTNet在密集检测任务中表现优异，优于当前主流算法。代码可在http://github.com/yikuizhai/GMTNet上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.