{"title":"比较音频和视觉信息的语音处理","authors":"David Dean, P. Lucey, S. Sridharan, T. Wark","doi":"10.1109/ISSPA.2005.1580195","DOIUrl":null,"url":null,"abstract":"This paper examines the utility of audio-visual speech for \nthe two related tasks of speech and speaker recognition. \nA study of the confusion that exists between speaker and \nspeech elements was performed to show that principal component \nanalysis (PCA) based visual speech is considerably \nbetter for the task of speaker recognition than for speech. \nDecision fusion speech and speaker recognition engines \nwere also tested under various levels of acoustic degradation \nto find that the optimal fusion configuration for speaker \nrecognition was substantially different than that for speech. \nThese results highlight the problem of employing similar \nvisual features for both speech and speaker recognition.","PeriodicalId":385337,"journal":{"name":"Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Comparing audio and visual information for speech processing\",\"authors\":\"David Dean, P. Lucey, S. Sridharan, T. Wark\",\"doi\":\"10.1109/ISSPA.2005.1580195\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper examines the utility of audio-visual speech for \\nthe two related tasks of speech and speaker recognition. \\nA study of the confusion that exists between speaker and \\nspeech elements was performed to show that principal component \\nanalysis (PCA) based visual speech is considerably \\nbetter for the task of speaker recognition than for speech. \\nDecision fusion speech and speaker recognition engines \\nwere also tested under various levels of acoustic degradation \\nto find that the optimal fusion configuration for speaker \\nrecognition was substantially different than that for speech. \\nThese results highlight the problem of employing similar \\nvisual features for both speech and speaker recognition.\",\"PeriodicalId\":385337,\"journal\":{\"name\":\"Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSPA.2005.1580195\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPA.2005.1580195","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparing audio and visual information for speech processing
This paper examines the utility of audio-visual speech for
the two related tasks of speech and speaker recognition.
A study of the confusion that exists between speaker and
speech elements was performed to show that principal component
analysis (PCA) based visual speech is considerably
better for the task of speaker recognition than for speech.
Decision fusion speech and speaker recognition engines
were also tested under various levels of acoustic degradation
to find that the optimal fusion configuration for speaker
recognition was substantially different than that for speech.
These results highlight the problem of employing similar
visual features for both speech and speaker recognition.