Jayanth Shreekumar, Ganesh K Shet, Vijay P N, Preethi S J, Niranjana Krupa
{"title":"Improved Viseme Recognition using Generative Adversarial Networks","authors":"Jayanth Shreekumar, Ganesh K Shet, Vijay P N, Preethi S J, Niranjana Krupa","doi":"10.1109/TENCON50793.2020.9293784","DOIUrl":null,"url":null,"abstract":"The proliferation of convolutional neural networks (CNN) has resulted in increased interest in the field of visual speech recognition (VSR). However, while VSR for word-level and sentence-level classification has received much of this attention, recognition of visemes has remained relatively unexplored. This paper focuses on the visemic approach for VSR as it can be used to build language-independent models. Our method employs generative adversarial networks (GANs) to create synthetic images that are used for data augmentation. VGG16 is used for classification both before and after augmentation. The results obtained prove that data augmentation using GANs is a viable technique for improving the performance of VSR models. Augmenting the dataset with images generated using the Progressive Growing Generative Adversarial Network (PGGAN) model led to an average increase in test accuracy of 3.695% across speakers. An average increase in test accuracy of 2.59% was achieved by augmenting the dataset using images generated by the conditional Deep Convolutional Generative Adversarial Network (DCGAN) model.","PeriodicalId":283131,"journal":{"name":"2020 IEEE REGION 10 CONFERENCE (TENCON)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE REGION 10 CONFERENCE (TENCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON50793.2020.9293784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The proliferation of convolutional neural networks (CNN) has resulted in increased interest in the field of visual speech recognition (VSR). However, while VSR for word-level and sentence-level classification has received much of this attention, recognition of visemes has remained relatively unexplored. This paper focuses on the visemic approach for VSR as it can be used to build language-independent models. Our method employs generative adversarial networks (GANs) to create synthetic images that are used for data augmentation. VGG16 is used for classification both before and after augmentation. The results obtained prove that data augmentation using GANs is a viable technique for improving the performance of VSR models. Augmenting the dataset with images generated using the Progressive Growing Generative Adversarial Network (PGGAN) model led to an average increase in test accuracy of 3.695% across speakers. An average increase in test accuracy of 2.59% was achieved by augmenting the dataset using images generated by the conditional Deep Convolutional Generative Adversarial Network (DCGAN) model.