The Application of Vision Transformer in Image Classification

Zhixuan He
{"title":"The Application of Vision Transformer in Image Classification","authors":"Zhixuan He","doi":"10.1145/3546607.3546616","DOIUrl":null,"url":null,"abstract":"This project aims to study the different performance between the Vision Transformer and a Convolu- tional Nerual Network. Google Colab will be used as the environment in this project. The dataset will use CIFAR-100 image dataset to train vision transformer and Convolutional Neural Network (CNN) separately, which are both built by Keras and Tensorflow in Python, and compare the performance of these two models through the training results. The experiment of this project has found that at the scale of 60,000 images, CNN has a slight better performance than vision transformer in general. The CNN's top-5 accuracy can reach 82.38% when using test set to evaluate the model, while the top-5 accuracy of vision transformer is 82.24%.","PeriodicalId":114920,"journal":{"name":"Proceedings of the 6th International Conference on Virtual and Augmented Reality Simulations","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Virtual and Augmented Reality Simulations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3546607.3546616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This project aims to study the different performance between the Vision Transformer and a Convolu- tional Nerual Network. Google Colab will be used as the environment in this project. The dataset will use CIFAR-100 image dataset to train vision transformer and Convolutional Neural Network (CNN) separately, which are both built by Keras and Tensorflow in Python, and compare the performance of these two models through the training results. The experiment of this project has found that at the scale of 60,000 images, CNN has a slight better performance than vision transformer in general. The CNN's top-5 accuracy can reach 82.38% when using test set to evaluate the model, while the top-5 accuracy of vision transformer is 82.24%.
视觉变换在图像分类中的应用
本课题旨在研究视觉变压器与卷积神经网络的性能差异。Google Colab将作为这个项目的环境。数据集将使用CIFAR-100图像数据集分别训练Keras和Tensorflow在Python中构建的vision transformer和Convolutional Neural Network (CNN),并通过训练结果比较这两个模型的性能。本项目的实验发现,在6万张图片的规模下,CNN的性能略好于一般的视觉变压器。使用测试集评价模型时,CNN的前5准确率达到82.38%,而视觉变压器的前5准确率为82.24%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信