{"title":"喉切除器元音识别系统的声视结合模式","authors":"Rafal Pietruch, A. Grzanka","doi":"10.1109/NEUREL.2010.5644075","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of vowels recognition in patients after total laryngectomy using combined visual and acoustic features. The linear prediction coefficients were estimated from speech signal using weighted recursive least squares algorithm. Ten cross-sectional areas of vocal tract model were calculated. Face expression parameters related to the spoken vowel were extracted from video recordings. Lips width, lips height and jaw opening were measured from grabbed video frames. The principal component analysis was applied to show correlations of auditory and visual features. The vowel recognition procedures were based on single hidden layer neural networks. The recognition performances of visual, acoustic and fused modalities were compared. It was presented that recognition performance of sustained vowels using 10 cross-sectional areas estimates is very low. Facial expression analysis is needed when there is problem with estimation of standard acoustic parameters of pathological speech.","PeriodicalId":227890,"journal":{"name":"10th Symposium on Neural Network Applications in Electrical Engineering","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Combining acoustic and visual modalities in vowel recognition system for laryngectomees\",\"authors\":\"Rafal Pietruch, A. Grzanka\",\"doi\":\"10.1109/NEUREL.2010.5644075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper addresses the problem of vowels recognition in patients after total laryngectomy using combined visual and acoustic features. The linear prediction coefficients were estimated from speech signal using weighted recursive least squares algorithm. Ten cross-sectional areas of vocal tract model were calculated. Face expression parameters related to the spoken vowel were extracted from video recordings. Lips width, lips height and jaw opening were measured from grabbed video frames. The principal component analysis was applied to show correlations of auditory and visual features. The vowel recognition procedures were based on single hidden layer neural networks. The recognition performances of visual, acoustic and fused modalities were compared. It was presented that recognition performance of sustained vowels using 10 cross-sectional areas estimates is very low. Facial expression analysis is needed when there is problem with estimation of standard acoustic parameters of pathological speech.\",\"PeriodicalId\":227890,\"journal\":{\"name\":\"10th Symposium on Neural Network Applications in Electrical Engineering\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"10th Symposium on Neural Network Applications in Electrical Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NEUREL.2010.5644075\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"10th Symposium on Neural Network Applications in Electrical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEUREL.2010.5644075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Combining acoustic and visual modalities in vowel recognition system for laryngectomees
This paper addresses the problem of vowels recognition in patients after total laryngectomy using combined visual and acoustic features. The linear prediction coefficients were estimated from speech signal using weighted recursive least squares algorithm. Ten cross-sectional areas of vocal tract model were calculated. Face expression parameters related to the spoken vowel were extracted from video recordings. Lips width, lips height and jaw opening were measured from grabbed video frames. The principal component analysis was applied to show correlations of auditory and visual features. The vowel recognition procedures were based on single hidden layer neural networks. The recognition performances of visual, acoustic and fused modalities were compared. It was presented that recognition performance of sustained vowels using 10 cross-sectional areas estimates is very low. Facial expression analysis is needed when there is problem with estimation of standard acoustic parameters of pathological speech.