{"title":"3dcompat++:一种改进的大规模3D视觉数据集,用于成分识别。","authors":"Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny","doi":"10.1109/TPAMI.2025.3597476","DOIUrl":null,"url":null,"abstract":"<p><p>In this work, we present 3DCOMPAT<sup>++</sup>, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the partinstance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCOMPAT ++ covers 42 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at the CVPR conference, showcasing the winning method's utilization of a modified PointNet<sup>++</sup> model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision. The dataset and code have been made publicly available at https://3dcompat-dataset.org/v2/. 3D vision, dataset, 3D modeling, multimodal learning, compositional learning.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"3DCOMPAT<sup>++</sup>: An Improved Large-scale 3D Vision Dataset for Compositional Recognition.\",\"authors\":\"Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny\",\"doi\":\"10.1109/TPAMI.2025.3597476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In this work, we present 3DCOMPAT<sup>++</sup>, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the partinstance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCOMPAT ++ covers 42 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at the CVPR conference, showcasing the winning method's utilization of a modified PointNet<sup>++</sup> model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision. The dataset and code have been made publicly available at https://3dcompat-dataset.org/v2/. 3D vision, dataset, 3D modeling, multimodal learning, compositional learning.</p>\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TPAMI.2025.3597476\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2025.3597476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
3DCOMPAT++: An Improved Large-scale 3D Vision Dataset for Compositional Recognition.
In this work, we present 3DCOMPAT++, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the partinstance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCOMPAT ++ covers 42 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at the CVPR conference, showcasing the winning method's utilization of a modified PointNet++ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision. The dataset and code have been made publicly available at https://3dcompat-dataset.org/v2/. 3D vision, dataset, 3D modeling, multimodal learning, compositional learning.