斯温变换器的实现及其在图像分类中的应用

Journal Port Science Research Pub Date : 2023-11-18 DOI:10.36371/port.2023.4.2

Rasha. A. Dihin, Ebtesam N. Al Shemmary, W. Al-Jawher

{"title":"斯温变换器的实现及其在图像分类中的应用","authors":"Rasha. A. Dihin, Ebtesam N. Al Shemmary, W. Al-Jawher","doi":"10.36371/port.2023.4.2","DOIUrl":null,"url":null,"abstract":"There are big differences between the field of view of the calculator and the field of natural languages, for example, in the field of vision, the difference is in the size of the object as well as in the accuracy of the pixels in the image, and this contradicts the words in the text, and this makes the adaptation of the transformers to see somewhat difficult.Very recently a vision transformer named Swin Transformer was introduced by the Microsoft research team in Asia to achieve state-of-the-art results for machine translation. The computational complexity is linear and proportional to the size of the input image, because the processing of subjective attention is within each local window separately, and thus results in processor maps that are hierarchical and in deeper layers, and thus serve as the backbone of the calculator's vision in image classification and dense recognition applications. This work focuses on applying the Swin transformer to a demonstrated mathematical example with step-by-step analysis. Additionally, extensive experimental results were carried out on several standardized databases from CIFAR-10, CIFAR-100, and MNIST. Their results showed that the Swin Transformer can achieve flexible memory savings. Test accuracy for CIFAR-10 gave a 71.54% score, while for the CIFAR-100 dataset the accuracy was 46.1%. Similarly, when the Swin transformer was applied to the MNIST dataset, the accuracy increased in comparison with other vision transformer results.","PeriodicalId":502904,"journal":{"name":"Journal Port Science Research","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Implementation Of The Swin Transformer and Its Application In Image Classification\",\"authors\":\"Rasha. A. Dihin, Ebtesam N. Al Shemmary, W. Al-Jawher\",\"doi\":\"10.36371/port.2023.4.2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are big differences between the field of view of the calculator and the field of natural languages, for example, in the field of vision, the difference is in the size of the object as well as in the accuracy of the pixels in the image, and this contradicts the words in the text, and this makes the adaptation of the transformers to see somewhat difficult.Very recently a vision transformer named Swin Transformer was introduced by the Microsoft research team in Asia to achieve state-of-the-art results for machine translation. The computational complexity is linear and proportional to the size of the input image, because the processing of subjective attention is within each local window separately, and thus results in processor maps that are hierarchical and in deeper layers, and thus serve as the backbone of the calculator's vision in image classification and dense recognition applications. This work focuses on applying the Swin transformer to a demonstrated mathematical example with step-by-step analysis. Additionally, extensive experimental results were carried out on several standardized databases from CIFAR-10, CIFAR-100, and MNIST. Their results showed that the Swin Transformer can achieve flexible memory savings. Test accuracy for CIFAR-10 gave a 71.54% score, while for the CIFAR-100 dataset the accuracy was 46.1%. Similarly, when the Swin transformer was applied to the MNIST dataset, the accuracy increased in comparison with other vision transformer results.\",\"PeriodicalId\":502904,\"journal\":{\"name\":\"Journal Port Science Research\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal Port Science Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.36371/port.2023.4.2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal Port Science Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36371/port.2023.4.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

计算器的视域与自然语言的视域存在很大差异，例如，在视域中，差异在于物体的大小以及图像中像素的精确度，这与文本中的文字相矛盾，这使得变换器对视觉的适应有些困难。最近，微软亚洲研究团队推出了一种名为 Swin Transformer 的视觉变换器，在机器翻译方面取得了最先进的成果。其计算复杂度是线性的，与输入图像的大小成正比，因为主观注意力的处理是在每个局部窗口内分别进行的，因此会产生分层和深层的处理器映射，从而成为图像分类和密集识别应用中计算器视觉的支柱。这项工作的重点是将 Swin 变换器应用于一个已演示的数学实例，并进行逐步分析。此外，还在 CIFAR-10、CIFAR-100 和 MNIST 的几个标准化数据库上进行了广泛的实验。结果表明，Swin Transformer 可以灵活地节省内存。CIFAR-10 的测试准确率为 71.54%，而 CIFAR-100 数据集的准确率为 46.1%。同样，当 Swin 变换器应用于 MNIST 数据集时，与其他视觉变换器结果相比，准确率也有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Implementation Of The Swin Transformer and Its Application In Image Classification

There are big differences between the field of view of the calculator and the field of natural languages, for example, in the field of vision, the difference is in the size of the object as well as in the accuracy of the pixels in the image, and this contradicts the words in the text, and this makes the adaptation of the transformers to see somewhat difficult.Very recently a vision transformer named Swin Transformer was introduced by the Microsoft research team in Asia to achieve state-of-the-art results for machine translation. The computational complexity is linear and proportional to the size of the input image, because the processing of subjective attention is within each local window separately, and thus results in processor maps that are hierarchical and in deeper layers, and thus serve as the backbone of the calculator's vision in image classification and dense recognition applications. This work focuses on applying the Swin transformer to a demonstrated mathematical example with step-by-step analysis. Additionally, extensive experimental results were carried out on several standardized databases from CIFAR-10, CIFAR-100, and MNIST. Their results showed that the Swin Transformer can achieve flexible memory savings. Test accuracy for CIFAR-10 gave a 71.54% score, while for the CIFAR-100 dataset the accuracy was 46.1%. Similarly, when the Swin transformer was applied to the MNIST dataset, the accuracy increased in comparison with other vision transformer results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal Port Science Research

自引率

0.00%

发文量