{"title":"基于匹配网络方法的印尼语手语图像分类研究","authors":"Irma Permata Sari","doi":"10.30630/joiv.7.3.1320","DOIUrl":null,"url":null,"abstract":"Huge datasets are important to build powerful pipelines and ground well to new images. In Computer Vision, the most basic problem is image classification. The classification of images may be a tedious job, especially when there are a lot of amounts. But CNN is known to be data-hungry while gathering. How can we build some models without much data? For example, in the case of Sign Language Recognition (SLR). One type of Sign Language Recognition system is vision-based. In Indonesian Sign Language dataset has a relatively small sample image. This research aims to classify sign language images using Computer Vision for Sign Language Recognition systems. We used a small dataset, Indonesian Sign Language. Our dataset is listed in 26 classes of alphabet, A-Z. It has loaded 12 images for each class. The methodology in this research is few-shot learning. Based on our experiment, the best accuracy for few-shot learning is Mnasnet1_0 (85.75%) convolutional network model for Matching Networks, and loss estimation is about 0,43. And the experiment indicates that the accuracy will be increased by increasing the number of shots. We can inform you that this model's matching network framework is unsuitable for the Inception V3 model because the kernel size cannot be greater than the actual input size. We can choose the best algorithm based on this research for the Indonesian Sign Language application we will develop further.","PeriodicalId":32468,"journal":{"name":"JOIV International Journal on Informatics Visualization","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Closer Look at Image Classification for Indonesian Sign Language with Few-Shot Learning Using Matching Network Approach\",\"authors\":\"Irma Permata Sari\",\"doi\":\"10.30630/joiv.7.3.1320\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Huge datasets are important to build powerful pipelines and ground well to new images. In Computer Vision, the most basic problem is image classification. The classification of images may be a tedious job, especially when there are a lot of amounts. But CNN is known to be data-hungry while gathering. How can we build some models without much data? For example, in the case of Sign Language Recognition (SLR). One type of Sign Language Recognition system is vision-based. In Indonesian Sign Language dataset has a relatively small sample image. This research aims to classify sign language images using Computer Vision for Sign Language Recognition systems. We used a small dataset, Indonesian Sign Language. Our dataset is listed in 26 classes of alphabet, A-Z. It has loaded 12 images for each class. The methodology in this research is few-shot learning. Based on our experiment, the best accuracy for few-shot learning is Mnasnet1_0 (85.75%) convolutional network model for Matching Networks, and loss estimation is about 0,43. And the experiment indicates that the accuracy will be increased by increasing the number of shots. We can inform you that this model's matching network framework is unsuitable for the Inception V3 model because the kernel size cannot be greater than the actual input size. We can choose the best algorithm based on this research for the Indonesian Sign Language application we will develop further.\",\"PeriodicalId\":32468,\"journal\":{\"name\":\"JOIV International Journal on Informatics Visualization\",\"volume\":\"91 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JOIV International Journal on Informatics Visualization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30630/joiv.7.3.1320\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOIV International Journal on Informatics Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30630/joiv.7.3.1320","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Decision Sciences","Score":null,"Total":0}
Closer Look at Image Classification for Indonesian Sign Language with Few-Shot Learning Using Matching Network Approach
Huge datasets are important to build powerful pipelines and ground well to new images. In Computer Vision, the most basic problem is image classification. The classification of images may be a tedious job, especially when there are a lot of amounts. But CNN is known to be data-hungry while gathering. How can we build some models without much data? For example, in the case of Sign Language Recognition (SLR). One type of Sign Language Recognition system is vision-based. In Indonesian Sign Language dataset has a relatively small sample image. This research aims to classify sign language images using Computer Vision for Sign Language Recognition systems. We used a small dataset, Indonesian Sign Language. Our dataset is listed in 26 classes of alphabet, A-Z. It has loaded 12 images for each class. The methodology in this research is few-shot learning. Based on our experiment, the best accuracy for few-shot learning is Mnasnet1_0 (85.75%) convolutional network model for Matching Networks, and loss estimation is about 0,43. And the experiment indicates that the accuracy will be increased by increasing the number of shots. We can inform you that this model's matching network framework is unsuitable for the Inception V3 model because the kernel size cannot be greater than the actual input size. We can choose the best algorithm based on this research for the Indonesian Sign Language application we will develop further.