A High-Performance Learning-Based Framework for Monocular 3-D Point Cloud Reconstruction

IF 3.4 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE journal of radio frequency identification Pub Date : 2024-07-29 DOI:10.1109/JRFID.2024.3435875

AmirHossein Zamani;Kamran Ghaffari;Amir G. Aghdam

{"title":"A High-Performance Learning-Based Framework for Monocular 3-D Point Cloud Reconstruction","authors":"AmirHossein Zamani;Kamran Ghaffari;Amir G. Aghdam","doi":"10.1109/JRFID.2024.3435875","DOIUrl":null,"url":null,"abstract":"An essential yet challenging step in the 3D reconstruction problem is to train a machine or a robot to model 3D objects. Many 3D reconstruction applications depend on real-time data processing, so computational efficiency is a fundamental requirement in such systems. Despite considerable progress in 3D reconstruction techniques in recent years, developing efficient algorithms for real-time implementation remains an open problem. The present study addresses current issues in the high-precision reconstruction of objects displayed in a single-view image with sufficiently high accuracy and computational efficiency. To this end, we propose two neural frameworks: a CNN-based autoencoder architecture called Fast-Image2Point (FI2P) and a transformer-based network called TransCNN3D. These frameworks consist of two stages: perception and construction. The perception stage addresses the understanding and extraction process of the underlying contexts and features of the image. The construction stage, on the other hand, is responsible for recovering the 3D geometry of an object by using the knowledge and contexts extracted in the perception stage. The FI2P is a simple yet powerful architecture to reconstruct 3D objects from images faster (in real-time) without losing accuracy. Then, the TransCNN3D framework provides a more accurate 3D reconstruction without losing computational efficiency. The output of the reconstruction framework is represented in the point cloud format. The ShapeNet dataset is utilized to compare the proposed method with the existing ones in terms of computation time and accuracy. Simulations demonstrate the superior performance of the proposed strategy. Our dataset and code are available on IEEE DataPort website and first author’s GitHub repository respectively.","PeriodicalId":73291,"journal":{"name":"IEEE journal of radio frequency identification","volume":"8 ","pages":"695-712"},"PeriodicalIF":3.4000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal of radio frequency identification","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10614399/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

An essential yet challenging step in the 3D reconstruction problem is to train a machine or a robot to model 3D objects. Many 3D reconstruction applications depend on real-time data processing, so computational efficiency is a fundamental requirement in such systems. Despite considerable progress in 3D reconstruction techniques in recent years, developing efficient algorithms for real-time implementation remains an open problem. The present study addresses current issues in the high-precision reconstruction of objects displayed in a single-view image with sufficiently high accuracy and computational efficiency. To this end, we propose two neural frameworks: a CNN-based autoencoder architecture called Fast-Image2Point (FI2P) and a transformer-based network called TransCNN3D. These frameworks consist of two stages: perception and construction. The perception stage addresses the understanding and extraction process of the underlying contexts and features of the image. The construction stage, on the other hand, is responsible for recovering the 3D geometry of an object by using the knowledge and contexts extracted in the perception stage. The FI2P is a simple yet powerful architecture to reconstruct 3D objects from images faster (in real-time) without losing accuracy. Then, the TransCNN3D framework provides a more accurate 3D reconstruction without losing computational efficiency. The output of the reconstruction framework is represented in the point cloud format. The ShapeNet dataset is utilized to compare the proposed method with the existing ones in terms of computation time and accuracy. Simulations demonstrate the superior performance of the proposed strategy. Our dataset and code are available on IEEE DataPort website and first author’s GitHub repository respectively.

查看原文本刊更多论文

基于学习的高性能单目三维点云重建框架

在三维重建问题中，训练机器或机器人对三维物体进行建模是必不可少但又极具挑战性的一步。许多三维重建应用依赖于实时数据处理，因此计算效率是此类系统的基本要求。尽管近年来三维重建技术取得了长足进步，但开发实时实施的高效算法仍是一个有待解决的问题。本研究旨在解决目前以足够高的精度和计算效率对单视角图像中显示的物体进行高精度重建的问题。为此，我们提出了两个神经框架：一个是基于 CNN 的自动编码器架构，称为 "Fast-Image2Point (FI2P)"；另一个是基于变压器的网络，称为 "TransCNN3D"。这些框架包括两个阶段：感知和构建。感知阶段涉及对图像底层上下文和特征的理解和提取过程。另一方面，构建阶段负责利用感知阶段提取的知识和上下文恢复物体的三维几何形状。FI2P 是一种简单但功能强大的架构，可在不降低精度的情况下更快（实时）地从图像中重建 3D 物体。然后，TransCNN3D 框架在不降低计算效率的情况下提供了更精确的三维重建。重建框架的输出以点云格式表示。我们利用 ShapeNet 数据集，从计算时间和精确度方面对提出的方法和现有方法进行了比较。模拟结果表明，所提出的策略性能优越。我们的数据集和代码可分别从 IEEE DataPort 网站和第一作者的 GitHub 存储库中获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE journal of radio frequency identification

CiteScore

5.70

自引率

0.00%

发文量