viti - gan:基于视觉变换的自适应数据增强鉴别器

2021 3rd International Conference on Computer Communication and the Internet (ICCCI) Pub Date : 2021-06-25 DOI:10.1109/ICCCI51764.2021.9486805

Shota Hirose, Naoki Wada, J. Katto, Heming Sun

{"title":"viti - gan:基于视觉变换的自适应数据增强鉴别器","authors":"Shota Hirose, Naoki Wada, J. Katto, Heming Sun","doi":"10.1109/ICCCI51764.2021.9486805","DOIUrl":null,"url":null,"abstract":"These days, attention is thought to be an efficient way to recognize an image. Vision Transformer (ViT) uses a Transformer for images and has very high performance in image recognition. ViT has fewer parameters than Big Transfer (BiT) and Noisy Student. Therefore, we consider that Self-Attention-based networks are slimmer than convolution-based networks. We use a ViT as a Discriminator in a Generative Adversarial Network (GAN) to get the same performance with a smaller model. We name it ViT-GAN. Besides, we find parameter sharing is very useful to make parameter-efficient ViT. However, the performances of ViT heavily depend on the number of data samples. Therefore, we propose a new method of Data Augmentation. Our Data Augmentation, in which the strength of Data Augmentation varies adaptively, helps ViT for faster convergence and better performance. With our Data Augmentation, we show ViT-based discriminator can achieve almost the same FID but the number of the parameters of the discriminator is 35% fewer than the original discriminator.","PeriodicalId":180004,"journal":{"name":"2021 3rd International Conference on Computer Communication and the Internet (ICCCI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"ViT-GAN: Using Vision Transformer as Discriminator with Adaptive Data Augmentation\",\"authors\":\"Shota Hirose, Naoki Wada, J. Katto, Heming Sun\",\"doi\":\"10.1109/ICCCI51764.2021.9486805\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"These days, attention is thought to be an efficient way to recognize an image. Vision Transformer (ViT) uses a Transformer for images and has very high performance in image recognition. ViT has fewer parameters than Big Transfer (BiT) and Noisy Student. Therefore, we consider that Self-Attention-based networks are slimmer than convolution-based networks. We use a ViT as a Discriminator in a Generative Adversarial Network (GAN) to get the same performance with a smaller model. We name it ViT-GAN. Besides, we find parameter sharing is very useful to make parameter-efficient ViT. However, the performances of ViT heavily depend on the number of data samples. Therefore, we propose a new method of Data Augmentation. Our Data Augmentation, in which the strength of Data Augmentation varies adaptively, helps ViT for faster convergence and better performance. With our Data Augmentation, we show ViT-based discriminator can achieve almost the same FID but the number of the parameters of the discriminator is 35% fewer than the original discriminator.\",\"PeriodicalId\":180004,\"journal\":{\"name\":\"2021 3rd International Conference on Computer Communication and the Internet (ICCCI)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 3rd International Conference on Computer Communication and the Internet (ICCCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCI51764.2021.9486805\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Computer Communication and the Internet (ICCCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCI51764.2021.9486805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

如今，注意力被认为是识别图像的有效方法。视觉变压器(Vision Transformer, ViT)是一种使用变压器处理图像的技术，在图像识别方面具有很高的性能。与大传输(BiT)和噪声学生相比，ViT具有更少的参数。因此，我们认为基于自注意的网络比基于卷积的网络更精简。我们在生成对抗网络(GAN)中使用ViT作为判别器，以获得与较小模型相同的性能。我们将其命名为vitn - gan。此外，我们发现参数共享对于构造参数高效的ViT非常有用。然而，ViT的性能很大程度上取决于数据样本的数量。为此，我们提出了一种新的数据增强方法。我们的数据增强，其中数据增强的强度自适应变化，帮助ViT更快的收敛和更好的性能。通过我们的数据增强，我们发现基于vit的鉴别器可以实现几乎相同的FID，但鉴别器的参数数量比原始鉴别器减少了35%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ViT-GAN: Using Vision Transformer as Discriminator with Adaptive Data Augmentation

These days, attention is thought to be an efficient way to recognize an image. Vision Transformer (ViT) uses a Transformer for images and has very high performance in image recognition. ViT has fewer parameters than Big Transfer (BiT) and Noisy Student. Therefore, we consider that Self-Attention-based networks are slimmer than convolution-based networks. We use a ViT as a Discriminator in a Generative Adversarial Network (GAN) to get the same performance with a smaller model. We name it ViT-GAN. Besides, we find parameter sharing is very useful to make parameter-efficient ViT. However, the performances of ViT heavily depend on the number of data samples. Therefore, we propose a new method of Data Augmentation. Our Data Augmentation, in which the strength of Data Augmentation varies adaptively, helps ViT for faster convergence and better performance. With our Data Augmentation, we show ViT-based discriminator can achieve almost the same FID but the number of the parameters of the discriminator is 35% fewer than the original discriminator.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 3rd International Conference on Computer Communication and the Internet (ICCCI)

自引率

0.00%

发文量