MGM-AE：使用网格图掩码自编码器的3D形状自监督学习。

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision Pub Date : 2024-01-01 Epub Date: 2024-04-09 DOI:10.1109/wacv57701.2024.00327

Zhangsihao Yang, Kaize Ding, Huan Liu, Yalin Wang

{"title":"MGM-AE：使用网格图掩码自编码器的3D形状自监督学习。","authors":"Zhangsihao Yang, Kaize Ding, Huan Liu, Yalin Wang","doi":"10.1109/wacv57701.2024.00327","DOIUrl":null,"url":null,"abstract":"The challenges of applying self-supervised learning to 3D mesh data include difficulties in explicitly modeling and leveraging geometric topology information and designing appropriate pretext tasks and augmentation methods for irregular mesh topology. In this paper, we propose a novel approach for pre-training models on large-scale, unlabeled datasets using graph masking on a mesh graph composed of faces. Our method, Mesh Graph Masked Autoencoders (MGM-AE), utilizes masked autoencoding to pre-train the model and extract important features from the data. Our pre-trained model outperforms prior state-of-the-art mesh encoders in shape classification and segmentation benchmarks, achieving 90.8% accuracy on ModelNet40 and 78.5 mIoU on ShapeNet. The best performance is obtained when the model is trained and evaluated under different masking ratios. Our approach demonstrates effectiveness in pretraining models on large-scale, unlabeled datasets and its potential for improving performance on downstream tasks.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"2024 ","pages":"3291-3301"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12435090/pdf/","citationCount":"0","resultStr":"{\"title\":\"MGM-AE: Self-Supervised Learning on 3D Shape Using Mesh Graph Masked Autoencoders.\",\"authors\":\"Zhangsihao Yang, Kaize Ding, Huan Liu, Yalin Wang\",\"doi\":\"10.1109/wacv57701.2024.00327\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The challenges of applying self-supervised learning to 3D mesh data include difficulties in explicitly modeling and leveraging geometric topology information and designing appropriate pretext tasks and augmentation methods for irregular mesh topology. In this paper, we propose a novel approach for pre-training models on large-scale, unlabeled datasets using graph masking on a mesh graph composed of faces. Our method, Mesh Graph Masked Autoencoders (MGM-AE), utilizes masked autoencoding to pre-train the model and extract important features from the data. Our pre-trained model outperforms prior state-of-the-art mesh encoders in shape classification and segmentation benchmarks, achieving 90.8% accuracy on ModelNet40 and 78.5 mIoU on ShapeNet. The best performance is obtained when the model is trained and evaluated under different masking ratios. Our approach demonstrates effectiveness in pretraining models on large-scale, unlabeled datasets and its potential for improving performance on downstream tasks.\",\"PeriodicalId\":73325,\"journal\":{\"name\":\"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision\",\"volume\":\"2024 \",\"pages\":\"3291-3301\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12435090/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/wacv57701.2024.00327\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/4/9 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/wacv57701.2024.00327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/4/9 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

将自监督学习应用于三维网格数据的挑战包括明确建模和利用几何拓扑信息以及为不规则网格拓扑设计适当的借口任务和增强方法的困难。在本文中，我们提出了一种新的方法，用于在由面组成的网格图上使用图掩模对大规模未标记数据集进行预训练模型。我们的方法，网格图遮罩自动编码器（MGM-AE），利用遮罩自动编码来预训练模型并从数据中提取重要特征。我们的预训练模型在形状分类和分割基准上优于先前最先进的网格编码器，在ModelNet40上达到90.8%的准确率，在ShapeNet上达到78.5 mIoU。在不同掩蔽比下对模型进行训练和评估，得到了最好的效果。我们的方法证明了在大规模、未标记数据集上预训练模型的有效性，以及它在提高下游任务性能方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MGM-AE: Self-Supervised Learning on 3D Shape Using Mesh Graph Masked Autoencoders.

The challenges of applying self-supervised learning to 3D mesh data include difficulties in explicitly modeling and leveraging geometric topology information and designing appropriate pretext tasks and augmentation methods for irregular mesh topology. In this paper, we propose a novel approach for pre-training models on large-scale, unlabeled datasets using graph masking on a mesh graph composed of faces. Our method, Mesh Graph Masked Autoencoders (MGM-AE), utilizes masked autoencoding to pre-train the model and extract important features from the data. Our pre-trained model outperforms prior state-of-the-art mesh encoders in shape classification and segmentation benchmarks, achieving 90.8% accuracy on ModelNet40 and 78.5 mIoU on ShapeNet. The best performance is obtained when the model is trained and evaluated under different masking ratios. Our approach demonstrates effectiveness in pretraining models on large-scale, unlabeled datasets and its potential for improving performance on downstream tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

自引率

0.00%

发文量