3DCOMPAT⁺⁺: An Improved Large-scale 3D Vision Dataset for Compositional Recognition.

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-08-11 DOI:10.1109/TPAMI.2025.3597476

Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny

{"title":"3DCOMPAT++: An Improved Large-scale 3D Vision Dataset for Compositional Recognition.","authors":"Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny","doi":"10.1109/TPAMI.2025.3597476","DOIUrl":null,"url":null,"abstract":"In this work, we present 3DCOMPAT++, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the partinstance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCOMPAT ++ covers 42 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at the CVPR conference, showcasing the winning method's utilization of a modified PointNet++ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision. The dataset and code have been made publicly available at https://3dcompat-dataset.org/v2/. 3D vision, dataset, 3D modeling, multimodal learning, compositional learning.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2025.3597476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this work, we present 3DCOMPAT⁺⁺, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the partinstance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCOMPAT ++ covers 42 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at the CVPR conference, showcasing the winning method's utilization of a modified PointNet⁺⁺ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision. The dataset and code have been made publicly available at https://3dcompat-dataset.org/v2/. 3D vision, dataset, 3D modeling, multimodal learning, compositional learning.

查看原文本刊更多论文

3dcompat++：一种改进的大规模3D视觉数据集，用于成分识别。

在这项工作中，我们提出了3dcompat++，一个多模态2D/3D数据集，具有1.6亿个渲染视图，超过1000万个风格化的3D形状，在部分实例级别仔细注释，以及匹配RGB点云，3D纹理网格，深度图和分割蒙版。3dcompat++涵盖42个形状类别，275个细粒度零件类别和293个细粒度材料类，可以组合应用于3D对象的零件。我们从四个等距视图和四个随机视图中渲染了100万个风格化形状的子集，总共产生了1.6亿张渲染图。部件在实例级别分段，具有粗粒度和细粒度语义级别。我们引入了一个新的任务，称为接地CoMPaT识别（GCR），以集体识别和地面材料的组成部分的三维物体。此外，我们还报告了在CVPR会议上组织的数据挑战的结果，展示了获胜方法利用改进的PointNet++模型训练6D输入，并探索了增强GCR的替代技术。我们希望我们的工作将有助于简化未来的合成3D视觉研究。数据集和代码已在https://3dcompat-dataset.org/v2/上公开提供。3D视觉，数据集，3D建模，多模态学习，组合学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量