{"title":"改进语音识别的神经网络唇读系统","authors":"David G. Stork, G. Wolff, Earl Levine","doi":"10.1109/IJCNN.1992.226994","DOIUrl":null,"url":null,"abstract":"A modified time-delay neural network (TDNN) has been designed to perform both automatic lipreading (speech reading) in conjunction with acoustic speech recognition in order to improve recognition both in silent environments as well as in the presence of acoustic noise. The system is far more robust to acoustic noise and verbal distractors than is a system not incorporating visual information. Specifically, in the presence of high-amplitude pink noise, the low recognition rate in the acoustic only system (43%) is raised to 75% by the incorporation of visual information. The system responds to (artificial) conflicting cross-modal patterns in a way closely analogous to the McGurk effect in humans. The power of neural techniques is demonstrated in several difficult domains: pattern recognition; sensory integration; and distributed approaches toward 'rule-based' (linguistic-phonological) processing.<<ETX>>","PeriodicalId":286849,"journal":{"name":"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"137","resultStr":"{\"title\":\"Neural network lipreading system for improved speech recognition\",\"authors\":\"David G. Stork, G. Wolff, Earl Levine\",\"doi\":\"10.1109/IJCNN.1992.226994\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A modified time-delay neural network (TDNN) has been designed to perform both automatic lipreading (speech reading) in conjunction with acoustic speech recognition in order to improve recognition both in silent environments as well as in the presence of acoustic noise. The system is far more robust to acoustic noise and verbal distractors than is a system not incorporating visual information. Specifically, in the presence of high-amplitude pink noise, the low recognition rate in the acoustic only system (43%) is raised to 75% by the incorporation of visual information. The system responds to (artificial) conflicting cross-modal patterns in a way closely analogous to the McGurk effect in humans. The power of neural techniques is demonstrated in several difficult domains: pattern recognition; sensory integration; and distributed approaches toward 'rule-based' (linguistic-phonological) processing.<<ETX>>\",\"PeriodicalId\":286849,\"journal\":{\"name\":\"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1992-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"137\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.1992.226994\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.1992.226994","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Neural network lipreading system for improved speech recognition
A modified time-delay neural network (TDNN) has been designed to perform both automatic lipreading (speech reading) in conjunction with acoustic speech recognition in order to improve recognition both in silent environments as well as in the presence of acoustic noise. The system is far more robust to acoustic noise and verbal distractors than is a system not incorporating visual information. Specifically, in the presence of high-amplitude pink noise, the low recognition rate in the acoustic only system (43%) is raised to 75% by the incorporation of visual information. The system responds to (artificial) conflicting cross-modal patterns in a way closely analogous to the McGurk effect in humans. The power of neural techniques is demonstrated in several difficult domains: pattern recognition; sensory integration; and distributed approaches toward 'rule-based' (linguistic-phonological) processing.<>