{"title":"基于粒子滤波和几何模型的唇部跟踪视觉语音识别","authors":"Islem Jarraya, S. Werda, W. Mahdi","doi":"10.5220/0005045601720179","DOIUrl":null,"url":null,"abstract":"The automatic lip-reading is a technology which helps understanding messages exchanged in the case of a noisy environment or of elderly hearing impairment. To carry out this system, we need to implement three subsystems. There is a locating and tracking lips system, labial descriptors extraction system and a classification and speech recognition system. In this work, we present a spatio-temporal approach to track and characterize lip movements for the automatic recognition of visemes of the French language. First, we segment lips using the color information and a geometric model of lips. Then, we apply a particle filter to track lip movements. Finally, we propose to extract and classify the visual informations to recognize the pronounced viseme. This approach is applied with multiple speakers in natural conditions.","PeriodicalId":438702,"journal":{"name":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Lip tracking using particle filter and geometric model for visual speech recognition\",\"authors\":\"Islem Jarraya, S. Werda, W. Mahdi\",\"doi\":\"10.5220/0005045601720179\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The automatic lip-reading is a technology which helps understanding messages exchanged in the case of a noisy environment or of elderly hearing impairment. To carry out this system, we need to implement three subsystems. There is a locating and tracking lips system, labial descriptors extraction system and a classification and speech recognition system. In this work, we present a spatio-temporal approach to track and characterize lip movements for the automatic recognition of visemes of the French language. First, we segment lips using the color information and a geometric model of lips. Then, we apply a particle filter to track lip movements. Finally, we propose to extract and classify the visual informations to recognize the pronounced viseme. This approach is applied with multiple speakers in natural conditions.\",\"PeriodicalId\":438702,\"journal\":{\"name\":\"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0005045601720179\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0005045601720179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Lip tracking using particle filter and geometric model for visual speech recognition
The automatic lip-reading is a technology which helps understanding messages exchanged in the case of a noisy environment or of elderly hearing impairment. To carry out this system, we need to implement three subsystems. There is a locating and tracking lips system, labial descriptors extraction system and a classification and speech recognition system. In this work, we present a spatio-temporal approach to track and characterize lip movements for the automatic recognition of visemes of the French language. First, we segment lips using the color information and a geometric model of lips. Then, we apply a particle filter to track lip movements. Finally, we propose to extract and classify the visual informations to recognize the pronounced viseme. This approach is applied with multiple speakers in natural conditions.