用于自监督学习的去噪自蒸馏掩码自编码器

International Journal of Image, Graphics and Signal Processing Pub Date : 2023-10-08 DOI:10.5815/ijigsp.2023.05.03

Jiashu Xu, Sergii Stirenko

{"title":"用于自监督学习的去噪自蒸馏掩码自编码器","authors":"Jiashu Xu, Sergii Stirenko","doi":"10.5815/ijigsp.2023.05.03","DOIUrl":null,"url":null,"abstract":"Self-supervised learning has emerged as an effective paradigm for learning universal feature representations from vast amounts of unlabeled data. It’s remarkable success in recent years has been demonstrated in both natural language processing and computer vision domains. Serving as a cornerstone of the development of large-scale models, self-supervised learning has propelled the advancement of machine intelligence to new heights. In this paper, we draw inspiration from Siamese Networks and Masked Autoencoders to propose a denoising self-distilling Masked Autoencoder model for Self-supervised learning. The model is composed of a Masked Autoencoder and a teacher network, which work together to restore input image blocks corrupted by random Gaussian noise. Our objective function incorporates both pixel-level loss and high-level feature loss, allowing the model to extract complex semantic features. We evaluated our proposed method on three benchmark datasets, namely Cifar-10, Cifar-100, and STL-10, and compared it with classical self-supervised learning techniques. The experimental results demonstrate that our pre-trained model achieves a slightly superior fine-tuning performance on the STL-10 dataset, surpassing MAE by 0.1%. Overall, our method yields comparable experimental results when compared to other masked image modeling methods. The rationale behind our designed architecture is validated through ablation experiments. Our proposed method can serve as a complementary technique within the existing series of self-supervised learning approaches for masked image modeling, with the potential to be applied to larger datasets.","PeriodicalId":378340,"journal":{"name":"International Journal of Image, Graphics and Signal Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Denoising Self-Distillation Masked Autoencoder for Self-Supervised Learning\",\"authors\":\"Jiashu Xu, Sergii Stirenko\",\"doi\":\"10.5815/ijigsp.2023.05.03\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Self-supervised learning has emerged as an effective paradigm for learning universal feature representations from vast amounts of unlabeled data. It’s remarkable success in recent years has been demonstrated in both natural language processing and computer vision domains. Serving as a cornerstone of the development of large-scale models, self-supervised learning has propelled the advancement of machine intelligence to new heights. In this paper, we draw inspiration from Siamese Networks and Masked Autoencoders to propose a denoising self-distilling Masked Autoencoder model for Self-supervised learning. The model is composed of a Masked Autoencoder and a teacher network, which work together to restore input image blocks corrupted by random Gaussian noise. Our objective function incorporates both pixel-level loss and high-level feature loss, allowing the model to extract complex semantic features. We evaluated our proposed method on three benchmark datasets, namely Cifar-10, Cifar-100, and STL-10, and compared it with classical self-supervised learning techniques. The experimental results demonstrate that our pre-trained model achieves a slightly superior fine-tuning performance on the STL-10 dataset, surpassing MAE by 0.1%. Overall, our method yields comparable experimental results when compared to other masked image modeling methods. The rationale behind our designed architecture is validated through ablation experiments. Our proposed method can serve as a complementary technique within the existing series of self-supervised learning approaches for masked image modeling, with the potential to be applied to larger datasets.\",\"PeriodicalId\":378340,\"journal\":{\"name\":\"International Journal of Image, Graphics and Signal Processing\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Image, Graphics and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijigsp.2023.05.03\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Image, Graphics and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijigsp.2023.05.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

自监督学习已经成为从大量未标记数据中学习通用特征表示的有效范例。近年来，它在自然语言处理和计算机视觉领域都取得了显著的成功。作为大规模模型发展的基石，自监督学习将机器智能的发展推向了新的高度。本文从Siamese网络和蒙面自编码器中汲取灵感，提出了一种用于自监督学习的去噪自提取蒙面自编码器模型。该模型由一个屏蔽自编码器和一个教师网络组成，它们共同工作以恢复被随机高斯噪声损坏的输入图像块。我们的目标函数结合了像素级损失和高级特征损失，使模型能够提取复杂的语义特征。我们在三个基准数据集(Cifar-10、Cifar-100和STL-10)上评估了我们提出的方法，并将其与经典的自监督学习技术进行了比较。实验结果表明，我们的预训练模型在STL-10数据集上取得了稍好的微调性能，比MAE高出0.1%。总的来说，与其他掩模图像建模方法相比，我们的方法产生了可比的实验结果。我们设计的架构背后的基本原理通过烧蚀实验得到验证。我们提出的方法可以作为现有的自监督学习方法系列中的一种补充技术，用于掩膜图像建模，具有应用于更大数据集的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Denoising Self-Distillation Masked Autoencoder for Self-Supervised Learning

Self-supervised learning has emerged as an effective paradigm for learning universal feature representations from vast amounts of unlabeled data. It’s remarkable success in recent years has been demonstrated in both natural language processing and computer vision domains. Serving as a cornerstone of the development of large-scale models, self-supervised learning has propelled the advancement of machine intelligence to new heights. In this paper, we draw inspiration from Siamese Networks and Masked Autoencoders to propose a denoising self-distilling Masked Autoencoder model for Self-supervised learning. The model is composed of a Masked Autoencoder and a teacher network, which work together to restore input image blocks corrupted by random Gaussian noise. Our objective function incorporates both pixel-level loss and high-level feature loss, allowing the model to extract complex semantic features. We evaluated our proposed method on three benchmark datasets, namely Cifar-10, Cifar-100, and STL-10, and compared it with classical self-supervised learning techniques. The experimental results demonstrate that our pre-trained model achieves a slightly superior fine-tuning performance on the STL-10 dataset, surpassing MAE by 0.1%. Overall, our method yields comparable experimental results when compared to other masked image modeling methods. The rationale behind our designed architecture is validated through ablation experiments. Our proposed method can serve as a complementary technique within the existing series of self-supervised learning approaches for masked image modeling, with the potential to be applied to larger datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Image, Graphics and Signal Processing

自引率

0.00%

发文量