基于卷积残差网络的抓握定位

2017 14th Conference on Computer and Robot Vision (CRV) Pub Date : 2017-05-01 DOI:10.1109/CRV.2017.14

Ludovic Trottier, P. Giguère, B. Chaib-draa

{"title":"基于卷积残差网络的抓握定位","authors":"Ludovic Trottier, P. Giguère, B. Chaib-draa","doi":"10.1109/CRV.2017.14","DOIUrl":null,"url":null,"abstract":"Object grasping is an important ability for carrying out complex manipulation tasks with autonomous robotic systems. The grasp localization module plays an essential role in the success of the grasp maneuver. Generally viewed as a vision perception problem, its goal is determining regions of high graspability by interpreting light and depth information. Over the past few years, several works in Deep Learning (DL) have shown the high potential of Convolutional Neural Networks (CNNs) for solving vision-related problems. Advances in residual networks have further facilitated neural network training by improving convergence time and generalization performances with identity skip connections and residual mappings. In this paper, we investigate the use of residual networks for grasp localization. A standard residual CNN for object recognition uses a global average pooling layer prior to the fully-connected layers. Our experiments have shown that this pooling layer removes the spatial correlation in the back-propagated error signal, and this prevents the network from correctly localizing good grasp regions. We propose an architecture modification that removes this limitation. Our experiments on the Cornell task have shown that our network obtained state-of-the-art performances of 10.85% and 11.86% rectangle metric error on image-wise and object-wise splits respectively. We did not use pre-training but rather opted for on-line data augmentation for managing overfitting. In comparison to previous approach that employed off-line data augmentation, our network used 15x fewer observations, which significantly reduced training time.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Convolutional Residual Network for Grasp Localization\",\"authors\":\"Ludovic Trottier, P. Giguère, B. Chaib-draa\",\"doi\":\"10.1109/CRV.2017.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Object grasping is an important ability for carrying out complex manipulation tasks with autonomous robotic systems. The grasp localization module plays an essential role in the success of the grasp maneuver. Generally viewed as a vision perception problem, its goal is determining regions of high graspability by interpreting light and depth information. Over the past few years, several works in Deep Learning (DL) have shown the high potential of Convolutional Neural Networks (CNNs) for solving vision-related problems. Advances in residual networks have further facilitated neural network training by improving convergence time and generalization performances with identity skip connections and residual mappings. In this paper, we investigate the use of residual networks for grasp localization. A standard residual CNN for object recognition uses a global average pooling layer prior to the fully-connected layers. Our experiments have shown that this pooling layer removes the spatial correlation in the back-propagated error signal, and this prevents the network from correctly localizing good grasp regions. We propose an architecture modification that removes this limitation. Our experiments on the Cornell task have shown that our network obtained state-of-the-art performances of 10.85% and 11.86% rectangle metric error on image-wise and object-wise splits respectively. We did not use pre-training but rather opted for on-line data augmentation for managing overfitting. In comparison to previous approach that employed off-line data augmentation, our network used 15x fewer observations, which significantly reduced training time.\",\"PeriodicalId\":308760,\"journal\":{\"name\":\"2017 14th Conference on Computer and Robot Vision (CRV)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th Conference on Computer and Robot Vision (CRV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CRV.2017.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th Conference on Computer and Robot Vision (CRV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV.2017.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

物体抓取是自主机器人完成复杂操作任务的一项重要能力。抓握定位模块对抓握机动的成功与否起着至关重要的作用。它通常被视为一个视觉感知问题，其目标是通过解释光和深度信息来确定高可抓性的区域。在过去的几年里，深度学习(DL)领域的几项研究都显示了卷积神经网络(cnn)在解决视觉相关问题方面的巨大潜力。残差网络的进步通过提高收敛时间和使用身份跳过连接和残差映射的泛化性能，进一步促进了神经网络的训练。在本文中，我们研究了残差网络在抓取定位中的应用。用于目标识别的标准残差CNN在完全连接层之前使用全局平均池化层。我们的实验表明，池化层消除了反向传播误差信号中的空间相关性，这阻碍了网络正确定位好的抓取区域。我们建议对架构进行修改，以消除这一限制。我们在康奈尔任务上的实验表明，我们的网络在图像和对象分割上分别获得了10.85%和11.86%的矩形度量误差。我们没有使用预训练，而是选择在线数据增强来管理过拟合。与之前使用离线数据增强的方法相比，我们的网络使用的观察值减少了15倍，这大大减少了训练时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Convolutional Residual Network for Grasp Localization

Object grasping is an important ability for carrying out complex manipulation tasks with autonomous robotic systems. The grasp localization module plays an essential role in the success of the grasp maneuver. Generally viewed as a vision perception problem, its goal is determining regions of high graspability by interpreting light and depth information. Over the past few years, several works in Deep Learning (DL) have shown the high potential of Convolutional Neural Networks (CNNs) for solving vision-related problems. Advances in residual networks have further facilitated neural network training by improving convergence time and generalization performances with identity skip connections and residual mappings. In this paper, we investigate the use of residual networks for grasp localization. A standard residual CNN for object recognition uses a global average pooling layer prior to the fully-connected layers. Our experiments have shown that this pooling layer removes the spatial correlation in the back-propagated error signal, and this prevents the network from correctly localizing good grasp regions. We propose an architecture modification that removes this limitation. Our experiments on the Cornell task have shown that our network obtained state-of-the-art performances of 10.85% and 11.86% rectangle metric error on image-wise and object-wise splits respectively. We did not use pre-training but rather opted for on-line data augmentation for managing overfitting. In comparison to previous approach that employed off-line data augmentation, our network used 15x fewer observations, which significantly reduced training time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 14th Conference on Computer and Robot Vision (CRV)

自引率

0.00%

发文量