{"title":"Image Classification Performance Evaluation for 3D Model Reconstruction","authors":"A. Yuniarti, N. Suciati, A. Arifin","doi":"10.1109/ICRAMET51080.2020.9298643","DOIUrl":null,"url":null,"abstract":"3D reconstruction of 2D images is a classical problem in computer vision. Conventional methods have been proposed using multiple image registration, intrinsic and extrinsic camera parameter estimation, and optimization methods. Recently, the availability of a 3D dataset publicly shared has encouraged a deep-learning-based method for single-view reconstruction. One approach was by employing direct image encoding to the 3D point decoding approach as in PointNet and AtlasNet. Some other research directions attempted to retrieve 3D data using deep-learning-based methods, which employed a convolutional neural network (CNN). However, the use of CNN in image classification specific for the 3D reconstruction task still needs to be investigated because, usually, CNN was used as an image encoder in an auto-encoder setting instead of a classification module in a point generation network. Moreover, there is a lack of reports on the performance evaluation of deep-learning-based method on images rendered from 3D data, such as ShapeNet rendering images. In this paper, we implemented several deep-learning models to decode the ShapeNet rendering images that contain 13 model categories to examine the various hyper-parameters' impacts on each 3D model category. Our experiments showed that the hyper-parameters of the learning rate and epochs set to either 0.001 or 0.0001 and 60-80 epochs significantly outperformed other settings. Moreover, we observed that regardless of network configuration, some categories (plane, watercraft, car) performed better throughout the study. Therefore, a 3D reconstruction based on image classification can be designed based on the best performing categories.","PeriodicalId":228482,"journal":{"name":"2020 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAMET51080.2020.9298643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
3D reconstruction of 2D images is a classical problem in computer vision. Conventional methods have been proposed using multiple image registration, intrinsic and extrinsic camera parameter estimation, and optimization methods. Recently, the availability of a 3D dataset publicly shared has encouraged a deep-learning-based method for single-view reconstruction. One approach was by employing direct image encoding to the 3D point decoding approach as in PointNet and AtlasNet. Some other research directions attempted to retrieve 3D data using deep-learning-based methods, which employed a convolutional neural network (CNN). However, the use of CNN in image classification specific for the 3D reconstruction task still needs to be investigated because, usually, CNN was used as an image encoder in an auto-encoder setting instead of a classification module in a point generation network. Moreover, there is a lack of reports on the performance evaluation of deep-learning-based method on images rendered from 3D data, such as ShapeNet rendering images. In this paper, we implemented several deep-learning models to decode the ShapeNet rendering images that contain 13 model categories to examine the various hyper-parameters' impacts on each 3D model category. Our experiments showed that the hyper-parameters of the learning rate and epochs set to either 0.001 or 0.0001 and 60-80 epochs significantly outperformed other settings. Moreover, we observed that regardless of network configuration, some categories (plane, watercraft, car) performed better throughout the study. Therefore, a 3D reconstruction based on image classification can be designed based on the best performing categories.