具身视觉识别中卷积神经网络的校准梯度下降

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-05-08 DOI:10.1016/j.imavis.2025.105568

Zhiming Wang , Sheng Xu , Li’an Zhuo , Baochang Zhang , Yanjing Li , Zhenqian Wang , Guodong Guo

{"title":"具身视觉识别中卷积神经网络的校准梯度下降","authors":"Zhiming Wang , Sheng Xu , Li’an Zhuo , Baochang Zhang , Yanjing Li , Zhenqian Wang , Guodong Guo","doi":"10.1016/j.imavis.2025.105568","DOIUrl":null,"url":null,"abstract":"<div><div>Embodied visual computing seeks to learn from the real world, which requires an efficient machine learning methods. In conventional stochastic gradient descent (SGD) and its variants, the gradient estimators are expensive to be computed in many scenarios. This paper introduces a calibrated gradient descent (CGD) algorithm for efficient deep neural network optimization. A theorem is developed to prove that an unbiased estimator for network parameters can be obtained in a probabilistic way based on the Lipschitz hypothesis. We implement our CGD algorithm based on widely-used SGD and ADAM optimizers. We achieve a generic gradient calibration layer (<span><math><mrow><mi>G</mi><mi>C</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi></mrow></math></span>) which can be used to improve the performance of convolutional neural networks (C-CNNs). Our <span><math><mrow><mi>G</mi><mi>C</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi></mrow></math></span> only introduces extra parameters during training process, but not affect the efficiency of the inference process. Our method is generic and effective to optimize both CNNs and also quantized neural networks (C-QNNs). Extensive experimental results demonstrate that our method can achieve the state-of-the-art performance for a variety of tasks. For example, our 1-bit Faster-RCNN achieved by C-QNN obtains 20.5% mAP on COCO, leading to a new state-of-the-art performance. This work brings new insights for developing more efficient optimizers and analyzing the back-propagation algorithm.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"160 ","pages":"Article 105568"},"PeriodicalIF":4.2000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Calibrated gradient descent of convolutional neural networks for embodied visual recognition\",\"authors\":\"Zhiming Wang , Sheng Xu , Li’an Zhuo , Baochang Zhang , Yanjing Li , Zhenqian Wang , Guodong Guo\",\"doi\":\"10.1016/j.imavis.2025.105568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Embodied visual computing seeks to learn from the real world, which requires an efficient machine learning methods. In conventional stochastic gradient descent (SGD) and its variants, the gradient estimators are expensive to be computed in many scenarios. This paper introduces a calibrated gradient descent (CGD) algorithm for efficient deep neural network optimization. A theorem is developed to prove that an unbiased estimator for network parameters can be obtained in a probabilistic way based on the Lipschitz hypothesis. We implement our CGD algorithm based on widely-used SGD and ADAM optimizers. We achieve a generic gradient calibration layer (<span><math><mrow><mi>G</mi><mi>C</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi></mrow></math></span>) which can be used to improve the performance of convolutional neural networks (C-CNNs). Our <span><math><mrow><mi>G</mi><mi>C</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi></mrow></math></span> only introduces extra parameters during training process, but not affect the efficiency of the inference process. Our method is generic and effective to optimize both CNNs and also quantized neural networks (C-QNNs). Extensive experimental results demonstrate that our method can achieve the state-of-the-art performance for a variety of tasks. For example, our 1-bit Faster-RCNN achieved by C-QNN obtains 20.5% mAP on COCO, leading to a new state-of-the-art performance. This work brings new insights for developing more efficient optimizers and analyzing the back-propagation algorithm.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"160 \",\"pages\":\"Article 105568\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625001568\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001568","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

嵌入式视觉计算寻求从现实世界中学习，这需要一种高效的机器学习方法。在传统的随机梯度下降（SGD）及其变体中，梯度估计量在许多情况下的计算成本很高。介绍了一种用于深度神经网络优化的校正梯度下降算法（CGD）。在Lipschitz假设的基础上，提出了一个定理，证明了网络参数的无偏估计可以用概率方法得到。我们基于广泛使用的SGD和ADAM优化器实现了我们的CGD算法。我们实现了一个通用的梯度校准层（glayer），它可以用来提高卷积神经网络（c - cnn）的性能。我们的glayer只是在训练过程中引入额外的参数，而不影响推理过程的效率。该方法对于优化cnn和量化神经网络（c - qnn）具有通用性和有效性。大量的实验结果表明，我们的方法可以在各种任务中达到最先进的性能。例如，我们通过C-QNN实现的1位Faster-RCNN在COCO上获得了20.5%的mAP，从而获得了新的最先进的性能。这项工作为开发更有效的优化器和分析反向传播算法带来了新的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Calibrated gradient descent of convolutional neural networks for embodied visual recognition

Embodied visual computing seeks to learn from the real world, which requires an efficient machine learning methods. In conventional stochastic gradient descent (SGD) and its variants, the gradient estimators are expensive to be computed in many scenarios. This paper introduces a calibrated gradient descent (CGD) algorithm for efficient deep neural network optimization. A theorem is developed to prove that an unbiased estimator for network parameters can be obtained in a probabilistic way based on the Lipschitz hypothesis. We implement our CGD algorithm based on widely-used SGD and ADAM optimizers. We achieve a generic gradient calibration layer (

G C l a y e r

) which can be used to improve the performance of convolutional neural networks (C-CNNs). Our

G C l a y e r

only introduces extra parameters during training process, but not affect the efficiency of the inference process. Our method is generic and effective to optimize both CNNs and also quantized neural networks (C-QNNs). Extensive experimental results demonstrate that our method can achieve the state-of-the-art performance for a variety of tasks. For example, our 1-bit Faster-RCNN achieved by C-QNN obtains 20.5% mAP on COCO, leading to a new state-of-the-art performance. This work brings new insights for developing more efficient optimizers and analyzing the back-propagation algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.