{"title":"Integrating lip-reading and thai speech to control electronic devices in a vehicle","authors":"Isamail Masamae, P. Chaikan","doi":"10.1109/ICSENGT.2015.7412440","DOIUrl":null,"url":null,"abstract":"This paper presents the use of lip-reading and Thai speech to control electronic devices in a vehicle. The Viola-Jones algorithm detects the face of the driver and the constrained local model detects their mouth area before three lips features are extracted. Hidden Markov models are utilized to recognize speech and lip movement, with the lip movement recognizer offering better accuracy than the speech recognizer in a noisy environment. Three fusion methods are utilized to combine lip-movement and speech. We propose the use of vehicle speed for selecting the appropriate recognizer for different speech signal-to-noise ratios.","PeriodicalId":410563,"journal":{"name":"2015 5th IEEE International Conference on System Engineering and Technology (ICSET)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 5th IEEE International Conference on System Engineering and Technology (ICSET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSENGT.2015.7412440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents the use of lip-reading and Thai speech to control electronic devices in a vehicle. The Viola-Jones algorithm detects the face of the driver and the constrained local model detects their mouth area before three lips features are extracted. Hidden Markov models are utilized to recognize speech and lip movement, with the lip movement recognizer offering better accuracy than the speech recognizer in a noisy environment. Three fusion methods are utilized to combine lip-movement and speech. We propose the use of vehicle speed for selecting the appropriate recognizer for different speech signal-to-noise ratios.