基于体积表示的三维物体识别卷积神经网络

2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE) Pub Date : 2016-07-06 DOI:10.1109/SPLIM.2016.7528403

Xiaofang Xu, A. Dehghani, D. Corrigan, Sam Caulfield, D. Moloney

{"title":"基于体积表示的三维物体识别卷积神经网络","authors":"Xiaofang Xu, A. Dehghani, D. Corrigan, Sam Caulfield, D. Moloney","doi":"10.1109/SPLIM.2016.7528403","DOIUrl":null,"url":null,"abstract":"Following the success of Convolutional Neural Networks (CNNs) on object recognition using 2D images, they are extended in this paper to process 3D data. Nearly most of current systems require huge amount of computation for dealing with large amount of data. In this paper, an efficient 3D volumetric object representation, Volumetric Accelerator (VOLA), is presented which requires much less memory than the normal volumetric representations. On this basis, a few 3D digit datasets using 2D MNIST and 2D digit fonts with different rotations along the x, y, and z axis are introduced. Finally, we introduce a combination of multiple CNN models based on the famous LeNet model. The trained CNN models based on the generated dataset have achieved the average accuracy of 90.30% and 81.85% for 3D-MNIST and 3D-Fonts datasets, respectively. Experimental results show that VOLA-based CNNs perform 1.5x faster than the original LeNet.","PeriodicalId":297318,"journal":{"name":"2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Convolutional Neural Network for 3D object recognition using volumetric representation\",\"authors\":\"Xiaofang Xu, A. Dehghani, D. Corrigan, Sam Caulfield, D. Moloney\",\"doi\":\"10.1109/SPLIM.2016.7528403\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Following the success of Convolutional Neural Networks (CNNs) on object recognition using 2D images, they are extended in this paper to process 3D data. Nearly most of current systems require huge amount of computation for dealing with large amount of data. In this paper, an efficient 3D volumetric object representation, Volumetric Accelerator (VOLA), is presented which requires much less memory than the normal volumetric representations. On this basis, a few 3D digit datasets using 2D MNIST and 2D digit fonts with different rotations along the x, y, and z axis are introduced. Finally, we introduce a combination of multiple CNN models based on the famous LeNet model. The trained CNN models based on the generated dataset have achieved the average accuracy of 90.30% and 81.85% for 3D-MNIST and 3D-Fonts datasets, respectively. Experimental results show that VOLA-based CNNs perform 1.5x faster than the original LeNet.\",\"PeriodicalId\":297318,\"journal\":{\"name\":\"2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPLIM.2016.7528403\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPLIM.2016.7528403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

继卷积神经网络(cnn)在使用二维图像的目标识别上取得成功之后，本文将其扩展到处理三维数据。目前几乎大多数系统都需要大量的计算来处理大量的数据。本文提出了一种高效的三维体积物体表示方法——体积加速器(VOLA)，它比常规的体积物体表示方法需要更少的内存。在此基础上，介绍了几个使用2D MNIST和2D数字字体沿x、y、z轴不同旋转的三维数字数据集。最后，我们在著名的LeNet模型的基础上引入了多个CNN模型的组合。基于生成的数据集训练的CNN模型在3D-MNIST和3D-Fonts数据集上的平均准确率分别达到了90.30%和81.85%。实验结果表明，基于vola的cnn的运行速度比原来的LeNet快1.5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Convolutional Neural Network for 3D object recognition using volumetric representation

Following the success of Convolutional Neural Networks (CNNs) on object recognition using 2D images, they are extended in this paper to process 3D data. Nearly most of current systems require huge amount of computation for dealing with large amount of data. In this paper, an efficient 3D volumetric object representation, Volumetric Accelerator (VOLA), is presented which requires much less memory than the normal volumetric representations. On this basis, a few 3D digit datasets using 2D MNIST and 2D digit fonts with different rotations along the x, y, and z axis are introduced. Finally, we introduce a combination of multiple CNN models based on the famous LeNet model. The trained CNN models based on the generated dataset have achieved the average accuracy of 90.30% and 81.85% for 3D-MNIST and 3D-Fonts datasets, respectively. Experimental results show that VOLA-based CNNs perform 1.5x faster than the original LeNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE)

自引率

0.00%

发文量