{"title":"用于3D体素压缩的变分自编码器","authors":"Juncheng Liu, S. Mills, B. McCane","doi":"10.1109/IVCNZ51579.2020.9290656","DOIUrl":null,"url":null,"abstract":"3D scene sensing and understanding is a fundamental task in the field of computer vision and robotics. One widely used representation for 3D data is a voxel grid. However, explicit representation of 3D voxels always requires large storage space, which is not suitable for light-weight applications and scenarios such as robotic navigation and exploration. In this paper we propose a method to compress 3D voxel grids using an octree representation and Variational Autoencoders (VAEs). We first capture a 3D voxel grid –in our application with collaborating Realsense D435 and T265 cameras. The voxel grid is decomposed into three types of octants which are then compressed by the encoder and reproduced by feeding the latent code into the decoder. We demonstrate the efficiency of our method by two applications: scene reconstruction and path planing.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Variational Autoencoder for 3D Voxel Compression\",\"authors\":\"Juncheng Liu, S. Mills, B. McCane\",\"doi\":\"10.1109/IVCNZ51579.2020.9290656\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"3D scene sensing and understanding is a fundamental task in the field of computer vision and robotics. One widely used representation for 3D data is a voxel grid. However, explicit representation of 3D voxels always requires large storage space, which is not suitable for light-weight applications and scenarios such as robotic navigation and exploration. In this paper we propose a method to compress 3D voxel grids using an octree representation and Variational Autoencoders (VAEs). We first capture a 3D voxel grid –in our application with collaborating Realsense D435 and T265 cameras. The voxel grid is decomposed into three types of octants which are then compressed by the encoder and reproduced by feeding the latent code into the decoder. We demonstrate the efficiency of our method by two applications: scene reconstruction and path planing.\",\"PeriodicalId\":164317,\"journal\":{\"name\":\"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IVCNZ51579.2020.9290656\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IVCNZ51579.2020.9290656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
3D scene sensing and understanding is a fundamental task in the field of computer vision and robotics. One widely used representation for 3D data is a voxel grid. However, explicit representation of 3D voxels always requires large storage space, which is not suitable for light-weight applications and scenarios such as robotic navigation and exploration. In this paper we propose a method to compress 3D voxel grids using an octree representation and Variational Autoencoders (VAEs). We first capture a 3D voxel grid –in our application with collaborating Realsense D435 and T265 cameras. The voxel grid is decomposed into three types of octants which are then compressed by the encoder and reproduced by feeding the latent code into the decoder. We demonstrate the efficiency of our method by two applications: scene reconstruction and path planing.