Jiahui Yu;Yifan Chen;Xuna Wang;Long Chen;Hang Chen;Dalin Zhou;Yingke Xu;Zhaojie Ju
{"title":"Video Object Detection Considering Dynamic Neighborhood Feature Multiplexing","authors":"Jiahui Yu;Yifan Chen;Xuna Wang;Long Chen;Hang Chen;Dalin Zhou;Yingke Xu;Zhaojie Ju","doi":"10.1109/TSMC.2025.3572123","DOIUrl":null,"url":null,"abstract":"Video object detection is essential for human-interaction applications, including bimanual manipulation sensing (BMS). The effects of video detection in practical applications still need to be improved, as they are restricted by long-range spatiotemporal dependency analysis. How do humans sense bimanual manipulation in videos, especially for deteriorated clips? We argue that humans analyze the current clips based on earlier memory, namely, long-term spatial and temporal dependencies (LTSTD). However, most existing methods have yet to report significant results, as the limited exploration of these dependencies limits them. Developing an easy-to-integrate module is generally preferred for future applications rather than designing a complex end-to-end framework. Therefore, we propose a dynamic neighborhood feature multiplexing mechanism for online video object detection in this article, which is better at learning LTSTD in flexible and robust ways, boosting existing detection results, called DNFM. Specifically, we develop dynamic memory enhancement neural networks for better long-term feature aggregation with negligible additional computation costs. We multiplex each frame feature to aggregate key enhanced representations under the guidance of dynamic memory recall. The DNFM contributes to various famous detectors in BMS and other challenging detection tasks, and particular attention has been devoted to “low-quality” frame detection. Experimental results show that, while achieving state-of-the-art detection performance, DNFM clearly illustrates the easy-to-integrate operation for boosting the video object detection results.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 8","pages":"5451-5463"},"PeriodicalIF":8.6000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11025159/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Video object detection is essential for human-interaction applications, including bimanual manipulation sensing (BMS). The effects of video detection in practical applications still need to be improved, as they are restricted by long-range spatiotemporal dependency analysis. How do humans sense bimanual manipulation in videos, especially for deteriorated clips? We argue that humans analyze the current clips based on earlier memory, namely, long-term spatial and temporal dependencies (LTSTD). However, most existing methods have yet to report significant results, as the limited exploration of these dependencies limits them. Developing an easy-to-integrate module is generally preferred for future applications rather than designing a complex end-to-end framework. Therefore, we propose a dynamic neighborhood feature multiplexing mechanism for online video object detection in this article, which is better at learning LTSTD in flexible and robust ways, boosting existing detection results, called DNFM. Specifically, we develop dynamic memory enhancement neural networks for better long-term feature aggregation with negligible additional computation costs. We multiplex each frame feature to aggregate key enhanced representations under the guidance of dynamic memory recall. The DNFM contributes to various famous detectors in BMS and other challenging detection tasks, and particular attention has been devoted to “low-quality” frame detection. Experimental results show that, while achieving state-of-the-art detection performance, DNFM clearly illustrates the easy-to-integrate operation for boosting the video object detection results.
期刊介绍:
The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.