Jie Hua;Zhongyuan Wang;Xin Tian;Qin Zou;Jinsheng Xiao;Jiayi Ma
{"title":"全感知头:弥合局部和全局特征之间的差距","authors":"Jie Hua;Zhongyuan Wang;Xin Tian;Qin Zou;Jinsheng Xiao;Jiayi Ma","doi":"10.1109/JAS.2025.125333","DOIUrl":null,"url":null,"abstract":"Object detection is a fundamental task in computer vision that involves identifying and localizing objects within an image. Local features extracted by convolutions, etc., capture fine-grained details such as edges and textures, while global features extracted by full connection layers, etc., represent the overall structure and long-range relationships within the image. These features are crucial for accurate object detection, yet most existing methods focus on aggregating local and global features, often overlooking the importance of medium-range dependencies. To address this gap, we propose a novel full perception module (FP-Module), a simple yet effective feature extraction module designed to simultaneously capture local details, medium-range dependencies, and long-range dependencies. Building on this, we construct a full perception head (FP-Head) by cascading multiple FP-Modules, enabling the prediction layer to leverage the most informative features. Experimental results in the MS COCO dataset demonstrate that our approach significantly enhances object recognition and localization, achieving 2.7-5.7 AP<inf>val</inf> gains when integrated into standard object detectors. Notably, the FP-Module is a universal solution that can be seamlessly incorporated into existing detectors to boost performance. The code will be released at https://github.com/Idcogroup/FP-Head.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 7","pages":"1391-1406"},"PeriodicalIF":19.2000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Full Perception Head: Bridging the Gap Between Local and Global Features\",\"authors\":\"Jie Hua;Zhongyuan Wang;Xin Tian;Qin Zou;Jinsheng Xiao;Jiayi Ma\",\"doi\":\"10.1109/JAS.2025.125333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Object detection is a fundamental task in computer vision that involves identifying and localizing objects within an image. Local features extracted by convolutions, etc., capture fine-grained details such as edges and textures, while global features extracted by full connection layers, etc., represent the overall structure and long-range relationships within the image. These features are crucial for accurate object detection, yet most existing methods focus on aggregating local and global features, often overlooking the importance of medium-range dependencies. To address this gap, we propose a novel full perception module (FP-Module), a simple yet effective feature extraction module designed to simultaneously capture local details, medium-range dependencies, and long-range dependencies. Building on this, we construct a full perception head (FP-Head) by cascading multiple FP-Modules, enabling the prediction layer to leverage the most informative features. Experimental results in the MS COCO dataset demonstrate that our approach significantly enhances object recognition and localization, achieving 2.7-5.7 AP<inf>val</inf> gains when integrated into standard object detectors. Notably, the FP-Module is a universal solution that can be seamlessly incorporated into existing detectors to boost performance. The code will be released at https://github.com/Idcogroup/FP-Head.\",\"PeriodicalId\":54230,\"journal\":{\"name\":\"Ieee-Caa Journal of Automatica Sinica\",\"volume\":\"12 7\",\"pages\":\"1391-1406\"},\"PeriodicalIF\":19.2000,\"publicationDate\":\"2025-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ieee-Caa Journal of Automatica Sinica\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11004449/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11004449/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Full Perception Head: Bridging the Gap Between Local and Global Features
Object detection is a fundamental task in computer vision that involves identifying and localizing objects within an image. Local features extracted by convolutions, etc., capture fine-grained details such as edges and textures, while global features extracted by full connection layers, etc., represent the overall structure and long-range relationships within the image. These features are crucial for accurate object detection, yet most existing methods focus on aggregating local and global features, often overlooking the importance of medium-range dependencies. To address this gap, we propose a novel full perception module (FP-Module), a simple yet effective feature extraction module designed to simultaneously capture local details, medium-range dependencies, and long-range dependencies. Building on this, we construct a full perception head (FP-Head) by cascading multiple FP-Modules, enabling the prediction layer to leverage the most informative features. Experimental results in the MS COCO dataset demonstrate that our approach significantly enhances object recognition and localization, achieving 2.7-5.7 APval gains when integrated into standard object detectors. Notably, the FP-Module is a universal solution that can be seamlessly incorporated into existing detectors to boost performance. The code will be released at https://github.com/Idcogroup/FP-Head.
期刊介绍:
The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control.
Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.