Maxalmina, Satria Kahfi, Kurniawan Nur Ramadhani, A. Arifianto
{"title":"Lip Motion Recognition for Indonesian Vowel Phonemes Using 3D Convolutional Neural Networks","authors":"Maxalmina, Satria Kahfi, Kurniawan Nur Ramadhani, A. Arifianto","doi":"10.1109/IC2IE50715.2020.9274562","DOIUrl":null,"url":null,"abstract":"Lip motion recognition is a technique for interpreting visual data that focuses on the mouth area and aims to recognize lip movement. The development of lip motion recognition is expected to be used to develop communication tools with deaf people and to automate the speech-to-text process visually. In the Indonesian language, the existence of vowel phonemes is needed to produce sounds so that words and sentences in the Indonesian language can be formed. This paper proposes a model that can recognize Indonesian vowel phonemes (/a/, /i/, /u/, /e/, and /o/) in lip movements. We proposed a model that uses 3D Convolutional Neural Networks. The data in this paper were processed by resizing into 112x56 pixel resolution then, proceed to the data augmentation by reversing the data horizontally and add blur to the data. The results of the testing of the vowel phoneme recognition model on lip motion show the highest accuracy rate of 84%.","PeriodicalId":211983,"journal":{"name":"2020 3rd International Conference on Computer and Informatics Engineering (IC2IE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Conference on Computer and Informatics Engineering (IC2IE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2IE50715.2020.9274562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Lip motion recognition is a technique for interpreting visual data that focuses on the mouth area and aims to recognize lip movement. The development of lip motion recognition is expected to be used to develop communication tools with deaf people and to automate the speech-to-text process visually. In the Indonesian language, the existence of vowel phonemes is needed to produce sounds so that words and sentences in the Indonesian language can be formed. This paper proposes a model that can recognize Indonesian vowel phonemes (/a/, /i/, /u/, /e/, and /o/) in lip movements. We proposed a model that uses 3D Convolutional Neural Networks. The data in this paper were processed by resizing into 112x56 pixel resolution then, proceed to the data augmentation by reversing the data horizontally and add blur to the data. The results of the testing of the vowel phoneme recognition model on lip motion show the highest accuracy rate of 84%.