{"title":"用于自监督学习的去噪自蒸馏掩码自编码器","authors":"Jiashu Xu, Sergii Stirenko","doi":"10.5815/ijigsp.2023.05.03","DOIUrl":null,"url":null,"abstract":"Self-supervised learning has emerged as an effective paradigm for learning universal feature representations from vast amounts of unlabeled data. It’s remarkable success in recent years has been demonstrated in both natural language processing and computer vision domains. Serving as a cornerstone of the development of large-scale models, self-supervised learning has propelled the advancement of machine intelligence to new heights. In this paper, we draw inspiration from Siamese Networks and Masked Autoencoders to propose a denoising self-distilling Masked Autoencoder model for Self-supervised learning. The model is composed of a Masked Autoencoder and a teacher network, which work together to restore input image blocks corrupted by random Gaussian noise. Our objective function incorporates both pixel-level loss and high-level feature loss, allowing the model to extract complex semantic features. We evaluated our proposed method on three benchmark datasets, namely Cifar-10, Cifar-100, and STL-10, and compared it with classical self-supervised learning techniques. The experimental results demonstrate that our pre-trained model achieves a slightly superior fine-tuning performance on the STL-10 dataset, surpassing MAE by 0.1%. Overall, our method yields comparable experimental results when compared to other masked image modeling methods. The rationale behind our designed architecture is validated through ablation experiments. Our proposed method can serve as a complementary technique within the existing series of self-supervised learning approaches for masked image modeling, with the potential to be applied to larger datasets.","PeriodicalId":378340,"journal":{"name":"International Journal of Image, Graphics and Signal Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Denoising Self-Distillation Masked Autoencoder for Self-Supervised Learning\",\"authors\":\"Jiashu Xu, Sergii Stirenko\",\"doi\":\"10.5815/ijigsp.2023.05.03\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Self-supervised learning has emerged as an effective paradigm for learning universal feature representations from vast amounts of unlabeled data. It’s remarkable success in recent years has been demonstrated in both natural language processing and computer vision domains. Serving as a cornerstone of the development of large-scale models, self-supervised learning has propelled the advancement of machine intelligence to new heights. In this paper, we draw inspiration from Siamese Networks and Masked Autoencoders to propose a denoising self-distilling Masked Autoencoder model for Self-supervised learning. The model is composed of a Masked Autoencoder and a teacher network, which work together to restore input image blocks corrupted by random Gaussian noise. Our objective function incorporates both pixel-level loss and high-level feature loss, allowing the model to extract complex semantic features. We evaluated our proposed method on three benchmark datasets, namely Cifar-10, Cifar-100, and STL-10, and compared it with classical self-supervised learning techniques. The experimental results demonstrate that our pre-trained model achieves a slightly superior fine-tuning performance on the STL-10 dataset, surpassing MAE by 0.1%. Overall, our method yields comparable experimental results when compared to other masked image modeling methods. The rationale behind our designed architecture is validated through ablation experiments. Our proposed method can serve as a complementary technique within the existing series of self-supervised learning approaches for masked image modeling, with the potential to be applied to larger datasets.\",\"PeriodicalId\":378340,\"journal\":{\"name\":\"International Journal of Image, Graphics and Signal Processing\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Image, Graphics and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijigsp.2023.05.03\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Image, Graphics and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijigsp.2023.05.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Denoising Self-Distillation Masked Autoencoder for Self-Supervised Learning
Self-supervised learning has emerged as an effective paradigm for learning universal feature representations from vast amounts of unlabeled data. It’s remarkable success in recent years has been demonstrated in both natural language processing and computer vision domains. Serving as a cornerstone of the development of large-scale models, self-supervised learning has propelled the advancement of machine intelligence to new heights. In this paper, we draw inspiration from Siamese Networks and Masked Autoencoders to propose a denoising self-distilling Masked Autoencoder model for Self-supervised learning. The model is composed of a Masked Autoencoder and a teacher network, which work together to restore input image blocks corrupted by random Gaussian noise. Our objective function incorporates both pixel-level loss and high-level feature loss, allowing the model to extract complex semantic features. We evaluated our proposed method on three benchmark datasets, namely Cifar-10, Cifar-100, and STL-10, and compared it with classical self-supervised learning techniques. The experimental results demonstrate that our pre-trained model achieves a slightly superior fine-tuning performance on the STL-10 dataset, surpassing MAE by 0.1%. Overall, our method yields comparable experimental results when compared to other masked image modeling methods. The rationale behind our designed architecture is validated through ablation experiments. Our proposed method can serve as a complementary technique within the existing series of self-supervised learning approaches for masked image modeling, with the potential to be applied to larger datasets.