{"title":"基于变压器方法的三维输入数据和可扩展性要素的应用:综述","authors":"Abubakar Sulaiman Gezawa, Chibiao Liu, Naveed Ur Rehman Junejo, Haruna Chiroma","doi":"10.1007/s11831-024-10108-4","DOIUrl":null,"url":null,"abstract":"<div><p>Outstanding effectiveness of transformers in visual tasks has resulted in its fast growth and adoption in three dimensions (3D) vision tasks. Vision transformers have shown numerous advantages over earlier convolutional neural network (CNN) architectures including broad modelling abilities, more substantial modelling capabilities, convolution complementarity, scalability to model data size, and better connection for enhancing the performance records of many visual tasks. We present thorough review that classifies and summarizes the popular transformer-based approaches based on key features for transformer integration such as the input data, scalability element that enables transformer processing, architectural design, and context level through which the transformer functions as well as a highlight of the primary contributions of each transformer approach. Furthermore, we compare the results of these techniques with commonly employed non-transformer techniques in 3D object classification, segmentation, and object detection using standard 3D datasets including ModelNet, SUN RGB-D, ScanNet, nuScenes, Waymo, ShapeNet, S3DIS, and KITTI. This study also includes the discussion of numerous potential future options and limitation for 3D vision transformers.</p></div>","PeriodicalId":55473,"journal":{"name":"Archives of Computational Methods in Engineering","volume":"31 7","pages":"4129 - 4147"},"PeriodicalIF":9.7000,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Applications of 3D Input Data and Scalability Element by Transformer Based Methods: A Review\",\"authors\":\"Abubakar Sulaiman Gezawa, Chibiao Liu, Naveed Ur Rehman Junejo, Haruna Chiroma\",\"doi\":\"10.1007/s11831-024-10108-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Outstanding effectiveness of transformers in visual tasks has resulted in its fast growth and adoption in three dimensions (3D) vision tasks. Vision transformers have shown numerous advantages over earlier convolutional neural network (CNN) architectures including broad modelling abilities, more substantial modelling capabilities, convolution complementarity, scalability to model data size, and better connection for enhancing the performance records of many visual tasks. We present thorough review that classifies and summarizes the popular transformer-based approaches based on key features for transformer integration such as the input data, scalability element that enables transformer processing, architectural design, and context level through which the transformer functions as well as a highlight of the primary contributions of each transformer approach. Furthermore, we compare the results of these techniques with commonly employed non-transformer techniques in 3D object classification, segmentation, and object detection using standard 3D datasets including ModelNet, SUN RGB-D, ScanNet, nuScenes, Waymo, ShapeNet, S3DIS, and KITTI. This study also includes the discussion of numerous potential future options and limitation for 3D vision transformers.</p></div>\",\"PeriodicalId\":55473,\"journal\":{\"name\":\"Archives of Computational Methods in Engineering\",\"volume\":\"31 7\",\"pages\":\"4129 - 4147\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2024-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Archives of Computational Methods in Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s11831-024-10108-4\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives of Computational Methods in Engineering","FirstCategoryId":"5","ListUrlMain":"https://link.springer.com/article/10.1007/s11831-024-10108-4","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
摘要
变换器在视觉任务中的出色表现使其在三维(3D)视觉任务中得到快速发展和采用。与早期的卷积神经网络(CNN)架构相比,视觉变换器显示出众多优势,包括广泛的建模能力、更强大的建模能力、卷积互补性、对模型数据大小的可扩展性,以及更好的连接性,从而提高许多视觉任务的性能记录。我们根据变压器集成的关键特征(如输入数据、实现变压器处理的可扩展性元素、架构设计和变压器发挥作用的上下文级别)以及每种变压器方法的主要贡献,对流行的基于变压器的方法进行了全面的分类和总结。此外,我们还使用标准 3D 数据集(包括 ModelNet、SUN RGB-D、ScanNet、nuScenes、Waymo、ShapeNet、S3DIS 和 KITTI),将这些技术的结果与 3D 对象分类、分割和对象检测中常用的非转换器技术进行了比较。本研究还讨论了三维视觉转换器的许多潜在未来选项和限制。
The Applications of 3D Input Data and Scalability Element by Transformer Based Methods: A Review
Outstanding effectiveness of transformers in visual tasks has resulted in its fast growth and adoption in three dimensions (3D) vision tasks. Vision transformers have shown numerous advantages over earlier convolutional neural network (CNN) architectures including broad modelling abilities, more substantial modelling capabilities, convolution complementarity, scalability to model data size, and better connection for enhancing the performance records of many visual tasks. We present thorough review that classifies and summarizes the popular transformer-based approaches based on key features for transformer integration such as the input data, scalability element that enables transformer processing, architectural design, and context level through which the transformer functions as well as a highlight of the primary contributions of each transformer approach. Furthermore, we compare the results of these techniques with commonly employed non-transformer techniques in 3D object classification, segmentation, and object detection using standard 3D datasets including ModelNet, SUN RGB-D, ScanNet, nuScenes, Waymo, ShapeNet, S3DIS, and KITTI. This study also includes the discussion of numerous potential future options and limitation for 3D vision transformers.
期刊介绍:
Archives of Computational Methods in Engineering
Aim and Scope:
Archives of Computational Methods in Engineering serves as an active forum for disseminating research and advanced practices in computational engineering, particularly focusing on mechanics and related fields. The journal emphasizes extended state-of-the-art reviews in selected areas, a unique feature of its publication.
Review Format:
Reviews published in the journal offer:
A survey of current literature
Critical exposition of topics in their full complexity
By organizing the information in this manner, readers can quickly grasp the focus, coverage, and unique features of the Archives of Computational Methods in Engineering.