{"title":"唇读译码编码器LSTM","authors":"Souheil Fenghour, Daqing Chen, Perry Xiao","doi":"10.1145/3328833.3328845","DOIUrl":null,"url":null,"abstract":"The success of automated lip reading has been constrained by the inability to distinguish between homopheme words, which are words that have different characters and produce the same lip movements (e.g. \"time\" and \"some\"), despite being intrinsically different. One word can often have different phonemes (units of sound) producing exactly the same viseme or visual equivalent of a phoneme for a unit of sound. Through the use of a Long-Short Term Memory Network with word embeddings, we can distinguish between homopheme words or words that produce identical lip movements. The neural network architecture achieved a character accuracy rate of 77.1% and a word accuracy rate of 72.2%.","PeriodicalId":172646,"journal":{"name":"Proceedings of the 8th International Conference on Software and Information Engineering","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Decoder-Encoder LSTM for Lip Reading\",\"authors\":\"Souheil Fenghour, Daqing Chen, Perry Xiao\",\"doi\":\"10.1145/3328833.3328845\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The success of automated lip reading has been constrained by the inability to distinguish between homopheme words, which are words that have different characters and produce the same lip movements (e.g. \\\"time\\\" and \\\"some\\\"), despite being intrinsically different. One word can often have different phonemes (units of sound) producing exactly the same viseme or visual equivalent of a phoneme for a unit of sound. Through the use of a Long-Short Term Memory Network with word embeddings, we can distinguish between homopheme words or words that produce identical lip movements. The neural network architecture achieved a character accuracy rate of 77.1% and a word accuracy rate of 72.2%.\",\"PeriodicalId\":172646,\"journal\":{\"name\":\"Proceedings of the 8th International Conference on Software and Information Engineering\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th International Conference on Software and Information Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3328833.3328845\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Conference on Software and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3328833.3328845","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The success of automated lip reading has been constrained by the inability to distinguish between homopheme words, which are words that have different characters and produce the same lip movements (e.g. "time" and "some"), despite being intrinsically different. One word can often have different phonemes (units of sound) producing exactly the same viseme or visual equivalent of a phoneme for a unit of sound. Through the use of a Long-Short Term Memory Network with word embeddings, we can distinguish between homopheme words or words that produce identical lip movements. The neural network architecture achieved a character accuracy rate of 77.1% and a word accuracy rate of 72.2%.