{"title":"PVSTrans: Patch-view-shape progressive interaction transformer for 3D shape recognition","authors":"Xiangyu Ma, Jing Bai, Zenghui Su, Yubin Wang","doi":"10.1016/j.ipm.2025.104279","DOIUrl":null,"url":null,"abstract":"<div><div>3D shape recognition has made substantial progress due to its wide-ranging applications and increasing research interest. Existing studies have investigated the paradigm of aggregating 3D shape descriptors derived from independently extracted view features. However, this stepwise approach has not fully capitalized on the intrinsic correlations between local regions of varying granularity and global shapes. To address this gap, we propose the Patch-View-Shape Progressive Interaction Transformer (PVSTrans), which enhances shape-patch interactions through progressive view-patch and shape-view interactions, effectively capturing essential dependencies among intra-view features, inter-view features, and global 3D shape features. Furthermore, by utilizing the byproducts of the progressive interaction process, specifically the attention weights of views and intra-view patches, we introduce a Shape-Guided Patch Selection strategy to dynamically identify significant patches in each view, which in conjunction with the multi-view features, forms a more informative 3D shape descriptors for final classification. Experimental results across diverse datasets, including ModelNet40, ScanObjectNN, FG3D, and ShapeNet Core55, demonstrate the effectiveness and generalizability of PVSTrans in 3D shape recognition tasks. Additionally, comprehensive experiments involving various views with differing quantities and spatial relations highlight the robustness of PVSTrans in handling incomplete views and irregular spatial configurations, showcasing its substantial potential for application in complex real-world scenarios. The code is available on <span><span>https://github.com/Oli-lab-nun/PVSTrans</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104279"},"PeriodicalIF":7.4000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002201","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
3D shape recognition has made substantial progress due to its wide-ranging applications and increasing research interest. Existing studies have investigated the paradigm of aggregating 3D shape descriptors derived from independently extracted view features. However, this stepwise approach has not fully capitalized on the intrinsic correlations between local regions of varying granularity and global shapes. To address this gap, we propose the Patch-View-Shape Progressive Interaction Transformer (PVSTrans), which enhances shape-patch interactions through progressive view-patch and shape-view interactions, effectively capturing essential dependencies among intra-view features, inter-view features, and global 3D shape features. Furthermore, by utilizing the byproducts of the progressive interaction process, specifically the attention weights of views and intra-view patches, we introduce a Shape-Guided Patch Selection strategy to dynamically identify significant patches in each view, which in conjunction with the multi-view features, forms a more informative 3D shape descriptors for final classification. Experimental results across diverse datasets, including ModelNet40, ScanObjectNN, FG3D, and ShapeNet Core55, demonstrate the effectiveness and generalizability of PVSTrans in 3D shape recognition tasks. Additionally, comprehensive experiments involving various views with differing quantities and spatial relations highlight the robustness of PVSTrans in handling incomplete views and irregular spatial configurations, showcasing its substantial potential for application in complex real-world scenarios. The code is available on https://github.com/Oli-lab-nun/PVSTrans.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.