The Application of Vision Transformer in Image Classification

Proceedings of the 6th International Conference on Virtual and Augmented Reality Simulations Pub Date : 2022-03-25 DOI:10.1145/3546607.3546616

Zhixuan He

引用次数: 0

Abstract

This project aims to study the different performance between the Vision Transformer and a Convolu- tional Nerual Network. Google Colab will be used as the environment in this project. The dataset will use CIFAR-100 image dataset to train vision transformer and Convolutional Neural Network (CNN) separately, which are both built by Keras and Tensorflow in Python, and compare the performance of these two models through the training results. The experiment of this project has found that at the scale of 60,000 images, CNN has a slight better performance than vision transformer in general. The CNN's top-5 accuracy can reach 82.38% when using test set to evaluate the model, while the top-5 accuracy of vision transformer is 82.24%.

查看原文本刊更多论文

视觉变换在图像分类中的应用

本课题旨在研究视觉变压器与卷积神经网络的性能差异。Google Colab将作为这个项目的环境。数据集将使用CIFAR-100图像数据集分别训练Keras和Tensorflow在Python中构建的vision transformer和Convolutional Neural Network (CNN)，并通过训练结果比较这两个模型的性能。本项目的实验发现，在6万张图片的规模下，CNN的性能略好于一般的视觉变压器。使用测试集评价模型时，CNN的前5准确率达到82.38%，而视觉变压器的前5准确率为82.24%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 6th International Conference on Virtual and Augmented Reality Simulations

自引率

0.00%

发文量