Dimitris Kastaniotis, Dimitrios Tsourounis, Aristotelis Koureleas, Bojidar Peev, C. Theoharatos, S. Fotopoulos
{"title":"Lip Reading in Greek words at unconstrained driving scenario","authors":"Dimitris Kastaniotis, Dimitrios Tsourounis, Aristotelis Koureleas, Bojidar Peev, C. Theoharatos, S. Fotopoulos","doi":"10.1109/IISA.2019.8900757","DOIUrl":null,"url":null,"abstract":"This work focuses on the problem of Lip Reading with Greek words in an unconstrained driving scenario. The goal of Lip Reading (LR) is to understand the spoken work using only visual information, a process also known as Visual Speech Recognition (VSR). This method has several advantages over Speech Recognition, as it can work from a distance and is not affected by other sounds like noise in the environment. In this manner, LR can be considered as an alternative method for speech decoding which can be combined with state-of-the-art speech recognition technologies. The contribution of this work is two-fold. Firstly, a novel dataset with image sequences from Greek words is presented. In total, 10 persons spoke 50 words while they were either driving or simply sitting in the passenger’s seat of a car. The image sequences were recorded with a mobile phone mounted on the windshield of the car. Secondly, the recognition pipeline consists of a Convolutional Neural Network followed by a Long-Short Term Memory Network with a plain attention mechanism. This architecture maps the image sequences to words following an end-to-end learning scheme. Experimental results with various protocols indicate that speaker independent Lip Reading is an extremely challenging problem.","PeriodicalId":371385,"journal":{"name":"2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA.2019.8900757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
This work focuses on the problem of Lip Reading with Greek words in an unconstrained driving scenario. The goal of Lip Reading (LR) is to understand the spoken work using only visual information, a process also known as Visual Speech Recognition (VSR). This method has several advantages over Speech Recognition, as it can work from a distance and is not affected by other sounds like noise in the environment. In this manner, LR can be considered as an alternative method for speech decoding which can be combined with state-of-the-art speech recognition technologies. The contribution of this work is two-fold. Firstly, a novel dataset with image sequences from Greek words is presented. In total, 10 persons spoke 50 words while they were either driving or simply sitting in the passenger’s seat of a car. The image sequences were recorded with a mobile phone mounted on the windshield of the car. Secondly, the recognition pipeline consists of a Convolutional Neural Network followed by a Long-Short Term Memory Network with a plain attention mechanism. This architecture maps the image sequences to words following an end-to-end learning scheme. Experimental results with various protocols indicate that speaker independent Lip Reading is an extremely challenging problem.