增强卷积神经网络的几何建模：极限可变形卷积

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems Pub Date : 2025-03-04 DOI:10.1007/s40747-025-01799-8

Wei Wang, Yuanze Meng, Han Li, Guiyong Chang, Shun Li, Chenghong Zhang

{"title":"增强卷积神经网络的几何建模：极限可变形卷积","authors":"Wei Wang, Yuanze Meng, Han Li, Guiyong Chang, Shun Li, Chenghong Zhang","doi":"10.1007/s40747-025-01799-8","DOIUrl":null,"url":null,"abstract":"<p>Convolutional neural networks (CNNs) are constrained in their capacity to model geometric transformations due to their fixed geometric structure. To overcome this problem, researchers introduce deformable convolution, which allows the convolution kernel to be deformable on the feature map. However, deformable convolution may introduce irrelevant contextual information during the learning process and thus affect the model performance. DCNv2 introduces a modulation mechanism to control the diffusion of the sampling points to control the degree of contribution of offsets through weights, but we find that such problems still exist in practical use. Therefore, we propose a new limit deformable convolution to address this problem, which enhances the model ability by adding adaptive limiting units to constrain the offsets and adjusts the weight constraints on the offsets to enhance the image-focusing ability. In the subsequent work, we perform lightweight work on the limit deformable convolution and design three kinds of LDBottleneck to adapt to different scenarios. The limit deformable network, equipped with the optimal LDBottleneck, demonstrated an improvement in mAP75 of 1.4% compared to DCNv1 and 1.1% compared to DCNv2 on the VOC2012+2007 dataset. Furthermore, on the CoCo2017 dataset, different backbones equipped with our limit deformable module achieved satisfactory results. The source code for this work is publicly available at https://github.com/1977245719/LDCN.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"9 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing geometric modeling in convolutional neural networks: limit deformable convolution\",\"authors\":\"Wei Wang, Yuanze Meng, Han Li, Guiyong Chang, Shun Li, Chenghong Zhang\",\"doi\":\"10.1007/s40747-025-01799-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Convolutional neural networks (CNNs) are constrained in their capacity to model geometric transformations due to their fixed geometric structure. To overcome this problem, researchers introduce deformable convolution, which allows the convolution kernel to be deformable on the feature map. However, deformable convolution may introduce irrelevant contextual information during the learning process and thus affect the model performance. DCNv2 introduces a modulation mechanism to control the diffusion of the sampling points to control the degree of contribution of offsets through weights, but we find that such problems still exist in practical use. Therefore, we propose a new limit deformable convolution to address this problem, which enhances the model ability by adding adaptive limiting units to constrain the offsets and adjusts the weight constraints on the offsets to enhance the image-focusing ability. In the subsequent work, we perform lightweight work on the limit deformable convolution and design three kinds of LDBottleneck to adapt to different scenarios. The limit deformable network, equipped with the optimal LDBottleneck, demonstrated an improvement in mAP75 of 1.4% compared to DCNv1 and 1.1% compared to DCNv2 on the VOC2012+2007 dataset. Furthermore, on the CoCo2017 dataset, different backbones equipped with our limit deformable module achieved satisfactory results. The source code for this work is publicly available at https://github.com/1977245719/LDCN.</p>\",\"PeriodicalId\":10524,\"journal\":{\"name\":\"Complex & Intelligent Systems\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Complex & Intelligent Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s40747-025-01799-8\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-025-01799-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络（CNN）由于其固定的几何结构，其几何变换建模能力受到限制。为了克服这一问题，研究人员引入了可变形卷积，允许卷积核在特征图上可变形。然而，可变形卷积可能会在学习过程中引入无关的上下文信息，从而影响模型性能。DCNv2 引入了控制采样点扩散的调制机制，通过权重来控制偏移的贡献程度，但我们发现在实际应用中仍然存在这样的问题。因此，我们提出了一种新的极限可变形卷积来解决这一问题，通过增加自适应极限单元来约束偏移量，并调整偏移量的权重约束来增强图像聚焦能力。在后续工作中，我们对极限可变形卷积进行了轻量化处理，并设计了三种 LDBottleneck 以适应不同场景。在 VOC2012+2007 数据集上，配备最优 LDBottleneck 的极限可变形网络的 mAP75 与 DCNv1 相比提高了 1.4%，与 DCNv2 相比提高了 1.1%。此外，在 CoCo2017 数据集上，配备了我们的极限可变形模块的不同骨干网都取得了令人满意的结果。这项工作的源代码可在 https://github.com/1977245719/LDCN 公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing geometric modeling in convolutional neural networks: limit deformable convolution

Convolutional neural networks (CNNs) are constrained in their capacity to model geometric transformations due to their fixed geometric structure. To overcome this problem, researchers introduce deformable convolution, which allows the convolution kernel to be deformable on the feature map. However, deformable convolution may introduce irrelevant contextual information during the learning process and thus affect the model performance. DCNv2 introduces a modulation mechanism to control the diffusion of the sampling points to control the degree of contribution of offsets through weights, but we find that such problems still exist in practical use. Therefore, we propose a new limit deformable convolution to address this problem, which enhances the model ability by adding adaptive limiting units to constrain the offsets and adjusts the weight constraints on the offsets to enhance the image-focusing ability. In the subsequent work, we perform lightweight work on the limit deformable convolution and design three kinds of LDBottleneck to adapt to different scenarios. The limit deformable network, equipped with the optimal LDBottleneck, demonstrated an improvement in mAP75 of 1.4% compared to DCNv1 and 1.1% compared to DCNv2 on the VOC2012+2007 dataset. Furthermore, on the CoCo2017 dataset, different backbones equipped with our limit deformable module achieved satisfactory results. The source code for this work is publicly available at https://github.com/1977245719/LDCN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.