Zihang Zhang;Yuling Liu;Zhili Zhou;Gaobo Yang;Xin Liao;Q. M. Jonathan Wu
{"title":"DGeC: Dynamically and Globally Enhanced Convolution","authors":"Zihang Zhang;Yuling Liu;Zhili Zhou;Gaobo Yang;Xin Liao;Q. M. Jonathan Wu","doi":"10.1109/TAI.2024.3502577","DOIUrl":null,"url":null,"abstract":"We explore the reasons for the poorer feature extraction ability of vanilla convolution and discover that there mainly exist three key factors that restrict its representation capability, i.e., regular sampling, static aggregation, and limited receptive field. With the cost of extra parameters and computations, existing approaches merely alleviate part of the limitations. It drives us to seek a more lightweight operator to further improve the extracted image features. Through a closer examination of the convolution process, we discover that it is composed of two distinct interactions: spatial-wise interaction and channel-wise interaction. Based on this discovery, we decouple the convolutional blocks into these two interactions which not only reduces the parameters and computations but also enables a richer ensemble of interactions. Then, we propose the dynamically and globally enhanced convolution (DGeC), which includes several components as follows: a dynamic area perceptor block (DAP) that dynamically samples spatial cues, an adaptive global context block (AGC) that introduces the location-aware global image information, and a channel attention perceptor block (CAP) that merges different channel-wise features. The experiments on ImageNet for image classification and on COCO-2017 for object detection validate the effectiveness of DGeC. As a result, our proposed method consistently improves the performance with fewer parameters and computations. In particular, DGeC achieves a 3.1% improvement in top-1 accuracy on ImageNet dataset compared to ResNet50. Moreover, with Faster RCNN and RetinaNet, our DGeC-ResNet50 also consistently outperforms ResNet and ResNeXt.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"921-933"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10758802/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We explore the reasons for the poorer feature extraction ability of vanilla convolution and discover that there mainly exist three key factors that restrict its representation capability, i.e., regular sampling, static aggregation, and limited receptive field. With the cost of extra parameters and computations, existing approaches merely alleviate part of the limitations. It drives us to seek a more lightweight operator to further improve the extracted image features. Through a closer examination of the convolution process, we discover that it is composed of two distinct interactions: spatial-wise interaction and channel-wise interaction. Based on this discovery, we decouple the convolutional blocks into these two interactions which not only reduces the parameters and computations but also enables a richer ensemble of interactions. Then, we propose the dynamically and globally enhanced convolution (DGeC), which includes several components as follows: a dynamic area perceptor block (DAP) that dynamically samples spatial cues, an adaptive global context block (AGC) that introduces the location-aware global image information, and a channel attention perceptor block (CAP) that merges different channel-wise features. The experiments on ImageNet for image classification and on COCO-2017 for object detection validate the effectiveness of DGeC. As a result, our proposed method consistently improves the performance with fewer parameters and computations. In particular, DGeC achieves a 3.1% improvement in top-1 accuracy on ImageNet dataset compared to ResNet50. Moreover, with Faster RCNN and RetinaNet, our DGeC-ResNet50 also consistently outperforms ResNet and ResNeXt.