基于匹配网络方法的印尼语手语图像分类研究

Q3 Decision Sciences

JOIV International Journal on Informatics Visualization Pub Date : 2023-09-10 DOI:10.30630/joiv.7.3.1320

Irma Permata Sari

{"title":"基于匹配网络方法的印尼语手语图像分类研究","authors":"Irma Permata Sari","doi":"10.30630/joiv.7.3.1320","DOIUrl":null,"url":null,"abstract":"Huge datasets are important to build powerful pipelines and ground well to new images. In Computer Vision, the most basic problem is image classification. The classification of images may be a tedious job, especially when there are a lot of amounts. But CNN is known to be data-hungry while gathering. How can we build some models without much data? For example, in the case of Sign Language Recognition (SLR). One type of Sign Language Recognition system is vision-based. In Indonesian Sign Language dataset has a relatively small sample image. This research aims to classify sign language images using Computer Vision for Sign Language Recognition systems. We used a small dataset, Indonesian Sign Language. Our dataset is listed in 26 classes of alphabet, A-Z. It has loaded 12 images for each class. The methodology in this research is few-shot learning. Based on our experiment, the best accuracy for few-shot learning is Mnasnet1_0 (85.75%) convolutional network model for Matching Networks, and loss estimation is about 0,43. And the experiment indicates that the accuracy will be increased by increasing the number of shots. We can inform you that this model's matching network framework is unsuitable for the Inception V3 model because the kernel size cannot be greater than the actual input size. We can choose the best algorithm based on this research for the Indonesian Sign Language application we will develop further.","PeriodicalId":32468,"journal":{"name":"JOIV International Journal on Informatics Visualization","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Closer Look at Image Classification for Indonesian Sign Language with Few-Shot Learning Using Matching Network Approach\",\"authors\":\"Irma Permata Sari\",\"doi\":\"10.30630/joiv.7.3.1320\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Huge datasets are important to build powerful pipelines and ground well to new images. In Computer Vision, the most basic problem is image classification. The classification of images may be a tedious job, especially when there are a lot of amounts. But CNN is known to be data-hungry while gathering. How can we build some models without much data? For example, in the case of Sign Language Recognition (SLR). One type of Sign Language Recognition system is vision-based. In Indonesian Sign Language dataset has a relatively small sample image. This research aims to classify sign language images using Computer Vision for Sign Language Recognition systems. We used a small dataset, Indonesian Sign Language. Our dataset is listed in 26 classes of alphabet, A-Z. It has loaded 12 images for each class. The methodology in this research is few-shot learning. Based on our experiment, the best accuracy for few-shot learning is Mnasnet1_0 (85.75%) convolutional network model for Matching Networks, and loss estimation is about 0,43. And the experiment indicates that the accuracy will be increased by increasing the number of shots. We can inform you that this model's matching network framework is unsuitable for the Inception V3 model because the kernel size cannot be greater than the actual input size. We can choose the best algorithm based on this research for the Indonesian Sign Language application we will develop further.\",\"PeriodicalId\":32468,\"journal\":{\"name\":\"JOIV International Journal on Informatics Visualization\",\"volume\":\"91 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JOIV International Journal on Informatics Visualization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30630/joiv.7.3.1320\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOIV International Journal on Informatics Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30630/joiv.7.3.1320","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Decision Sciences","Score":null,"Total":0}

引用次数: 0

摘要

庞大的数据集对于建立强大的管道和处理新图像非常重要。在计算机视觉中，最基本的问题是图像分类。图像分类可能是一项繁琐的工作，特别是当有很多数量的时候。但众所周知，CNN在收集数据时非常需要数据。我们怎么能在没有太多数据的情况下建立一些模型呢?例如，在手语识别(SLR)的情况下。一种手语识别系统是基于视觉的。在印尼语的手语数据集中有一个相对较小的样本图像。本研究旨在利用计算机视觉对手语图像进行分类。我们使用了一个小数据集，印度尼西亚手语。我们的数据集分为26类字母，A-Z。它为每个类加载了12个图像。本研究采用少次学习方法。根据我们的实验，对于少镜头学习，准确率最高的是用于匹配网络的Mnasnet1_0(85.75%)卷积网络模型，损失估计约为0,43。实验表明，增加射击次数可以提高射击精度。我们可以告诉您，这个模型的匹配网络框架不适合Inception V3模型，因为内核大小不能大于实际输入大小。我们可以在此研究的基础上选择最佳的算法，用于我们将进一步开发的印尼语手语应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Closer Look at Image Classification for Indonesian Sign Language with Few-Shot Learning Using Matching Network Approach

Huge datasets are important to build powerful pipelines and ground well to new images. In Computer Vision, the most basic problem is image classification. The classification of images may be a tedious job, especially when there are a lot of amounts. But CNN is known to be data-hungry while gathering. How can we build some models without much data? For example, in the case of Sign Language Recognition (SLR). One type of Sign Language Recognition system is vision-based. In Indonesian Sign Language dataset has a relatively small sample image. This research aims to classify sign language images using Computer Vision for Sign Language Recognition systems. We used a small dataset, Indonesian Sign Language. Our dataset is listed in 26 classes of alphabet, A-Z. It has loaded 12 images for each class. The methodology in this research is few-shot learning. Based on our experiment, the best accuracy for few-shot learning is Mnasnet1_0 (85.75%) convolutional network model for Matching Networks, and loss estimation is about 0,43. And the experiment indicates that the accuracy will be increased by increasing the number of shots. We can inform you that this model's matching network framework is unsuitable for the Inception V3 model because the kernel size cannot be greater than the actual input size. We can choose the best algorithm based on this research for the Indonesian Sign Language application we will develop further.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊