{"title":"Video Recognition of American Sign Language Using Two-Stream Convolution Neural Networks","authors":"Fikri Nugraha, E. C. Djamal","doi":"10.1109/ICEEI47359.2019.8988872","DOIUrl":null,"url":null,"abstract":"Sign language uses manual-visual to convey meaning. The style is expressed through manual sign flow in combination with non-manual elements. Sign gestures interpreted in the meaning of words, letters, and numbers. This study proposed Two-stream Convolutional Neural Networks (CNN) to recognize and classify words in hand motion images of video form. Two-stream CNN works with two processes, namely spatial and temporal stream. Spatial flow detects edges and overall global features. While temporal flow identifies local action features in stacked optical flow images of 10 frames, each stream passed Softmax function. Average Fusion function combines both of streams. Two-stream separated training reduced computing time and overcome resource limitations. In building a CNN two-stream model, a specific configuration is needed to update the weight during training such as VGG – SGD, Resnet – Adam, Resnet – SGD, Xceptionnet – Adam, and Xceptionnet – SGD. The result gave the best precision used Xceptionnet SGD of spatial flow and Xceptionnet Adam of temporal flow configuration. The architecture gave precision 89.4% of a combination of one choice or Top1 is 89.4% and 99.4% of the five choices or Top5.","PeriodicalId":236517,"journal":{"name":"2019 International Conference on Electrical Engineering and Informatics (ICEEI)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Electrical Engineering and Informatics (ICEEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEEI47359.2019.8988872","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Sign language uses manual-visual to convey meaning. The style is expressed through manual sign flow in combination with non-manual elements. Sign gestures interpreted in the meaning of words, letters, and numbers. This study proposed Two-stream Convolutional Neural Networks (CNN) to recognize and classify words in hand motion images of video form. Two-stream CNN works with two processes, namely spatial and temporal stream. Spatial flow detects edges and overall global features. While temporal flow identifies local action features in stacked optical flow images of 10 frames, each stream passed Softmax function. Average Fusion function combines both of streams. Two-stream separated training reduced computing time and overcome resource limitations. In building a CNN two-stream model, a specific configuration is needed to update the weight during training such as VGG – SGD, Resnet – Adam, Resnet – SGD, Xceptionnet – Adam, and Xceptionnet – SGD. The result gave the best precision used Xceptionnet SGD of spatial flow and Xceptionnet Adam of temporal flow configuration. The architecture gave precision 89.4% of a combination of one choice or Top1 is 89.4% and 99.4% of the five choices or Top5.