Zhiming Wang , Sheng Xu , Li’an Zhuo , Baochang Zhang , Yanjing Li , Zhenqian Wang , Guodong Guo
{"title":"具身视觉识别中卷积神经网络的校准梯度下降","authors":"Zhiming Wang , Sheng Xu , Li’an Zhuo , Baochang Zhang , Yanjing Li , Zhenqian Wang , Guodong Guo","doi":"10.1016/j.imavis.2025.105568","DOIUrl":null,"url":null,"abstract":"<div><div>Embodied visual computing seeks to learn from the real world, which requires an efficient machine learning methods. In conventional stochastic gradient descent (SGD) and its variants, the gradient estimators are expensive to be computed in many scenarios. This paper introduces a calibrated gradient descent (CGD) algorithm for efficient deep neural network optimization. A theorem is developed to prove that an unbiased estimator for network parameters can be obtained in a probabilistic way based on the Lipschitz hypothesis. We implement our CGD algorithm based on widely-used SGD and ADAM optimizers. We achieve a generic gradient calibration layer (<span><math><mrow><mi>G</mi><mi>C</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi></mrow></math></span>) which can be used to improve the performance of convolutional neural networks (C-CNNs). Our <span><math><mrow><mi>G</mi><mi>C</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi></mrow></math></span> only introduces extra parameters during training process, but not affect the efficiency of the inference process. Our method is generic and effective to optimize both CNNs and also quantized neural networks (C-QNNs). Extensive experimental results demonstrate that our method can achieve the state-of-the-art performance for a variety of tasks. For example, our 1-bit Faster-RCNN achieved by C-QNN obtains 20.5% mAP on COCO, leading to a new state-of-the-art performance. This work brings new insights for developing more efficient optimizers and analyzing the back-propagation algorithm.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"160 ","pages":"Article 105568"},"PeriodicalIF":4.2000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Calibrated gradient descent of convolutional neural networks for embodied visual recognition\",\"authors\":\"Zhiming Wang , Sheng Xu , Li’an Zhuo , Baochang Zhang , Yanjing Li , Zhenqian Wang , Guodong Guo\",\"doi\":\"10.1016/j.imavis.2025.105568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Embodied visual computing seeks to learn from the real world, which requires an efficient machine learning methods. In conventional stochastic gradient descent (SGD) and its variants, the gradient estimators are expensive to be computed in many scenarios. This paper introduces a calibrated gradient descent (CGD) algorithm for efficient deep neural network optimization. A theorem is developed to prove that an unbiased estimator for network parameters can be obtained in a probabilistic way based on the Lipschitz hypothesis. We implement our CGD algorithm based on widely-used SGD and ADAM optimizers. We achieve a generic gradient calibration layer (<span><math><mrow><mi>G</mi><mi>C</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi></mrow></math></span>) which can be used to improve the performance of convolutional neural networks (C-CNNs). Our <span><math><mrow><mi>G</mi><mi>C</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi></mrow></math></span> only introduces extra parameters during training process, but not affect the efficiency of the inference process. Our method is generic and effective to optimize both CNNs and also quantized neural networks (C-QNNs). Extensive experimental results demonstrate that our method can achieve the state-of-the-art performance for a variety of tasks. For example, our 1-bit Faster-RCNN achieved by C-QNN obtains 20.5% mAP on COCO, leading to a new state-of-the-art performance. This work brings new insights for developing more efficient optimizers and analyzing the back-propagation algorithm.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"160 \",\"pages\":\"Article 105568\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625001568\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001568","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Calibrated gradient descent of convolutional neural networks for embodied visual recognition
Embodied visual computing seeks to learn from the real world, which requires an efficient machine learning methods. In conventional stochastic gradient descent (SGD) and its variants, the gradient estimators are expensive to be computed in many scenarios. This paper introduces a calibrated gradient descent (CGD) algorithm for efficient deep neural network optimization. A theorem is developed to prove that an unbiased estimator for network parameters can be obtained in a probabilistic way based on the Lipschitz hypothesis. We implement our CGD algorithm based on widely-used SGD and ADAM optimizers. We achieve a generic gradient calibration layer () which can be used to improve the performance of convolutional neural networks (C-CNNs). Our only introduces extra parameters during training process, but not affect the efficiency of the inference process. Our method is generic and effective to optimize both CNNs and also quantized neural networks (C-QNNs). Extensive experimental results demonstrate that our method can achieve the state-of-the-art performance for a variety of tasks. For example, our 1-bit Faster-RCNN achieved by C-QNN obtains 20.5% mAP on COCO, leading to a new state-of-the-art performance. This work brings new insights for developing more efficient optimizers and analyzing the back-propagation algorithm.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.