{"title":"微软Kinect摄像头上的唇读应用","authors":"Alper Yargic, M. Dogan","doi":"10.1109/INISTA.2013.6577656","DOIUrl":null,"url":null,"abstract":"Hearing-impaired people can read lips and lip reading applications may help them to improve their lip imitation skills. Speech of normal people can be recognized by even cellular phones but lip reading systems using only visual features remain important for hearing-impaired people. This paper aims to develop an application using MS Kinect camera to recognize Turkish color names to be used in the education of hearing-impaired children. Predefined lip points are located with depth information by the MS Kinect Face Tracking SDK. Words are segmented from the speech and the angles between the lip points are used as features to classify the words. Angles are computed using the 3D coordinates of the lip points. The KNN classifier is used to classify the words with Manhattan and Euclidean distances and the best feature vectors are tried to be found. As a result, the isolated words are classified with the success rate of 78.22%.","PeriodicalId":301458,"journal":{"name":"2013 IEEE INISTA","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"A lip reading application on MS Kinect camera\",\"authors\":\"Alper Yargic, M. Dogan\",\"doi\":\"10.1109/INISTA.2013.6577656\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hearing-impaired people can read lips and lip reading applications may help them to improve their lip imitation skills. Speech of normal people can be recognized by even cellular phones but lip reading systems using only visual features remain important for hearing-impaired people. This paper aims to develop an application using MS Kinect camera to recognize Turkish color names to be used in the education of hearing-impaired children. Predefined lip points are located with depth information by the MS Kinect Face Tracking SDK. Words are segmented from the speech and the angles between the lip points are used as features to classify the words. Angles are computed using the 3D coordinates of the lip points. The KNN classifier is used to classify the words with Manhattan and Euclidean distances and the best feature vectors are tried to be found. As a result, the isolated words are classified with the success rate of 78.22%.\",\"PeriodicalId\":301458,\"journal\":{\"name\":\"2013 IEEE INISTA\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE INISTA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INISTA.2013.6577656\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE INISTA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INISTA.2013.6577656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
摘要
听力受损的人可以读唇语,唇读应用程序可以帮助他们提高嘴唇模仿技能。正常人的语言甚至可以被手机识别,但仅使用视觉特征的唇读系统对听障人士来说仍然很重要。本文旨在开发一款使用MS Kinect摄像头识别土耳其语颜色名称的应用程序,用于听障儿童的教育。预定义的唇点由MS Kinect Face Tracking SDK定位深度信息。从语音中分割单词,并使用唇点之间的角度作为特征对单词进行分类。角度是使用唇点的三维坐标计算的。利用KNN分类器对具有曼哈顿距离和欧几里得距离的词进行分类,并试图找到最佳特征向量。结果表明,孤立词的分类成功率为78.22%。
Hearing-impaired people can read lips and lip reading applications may help them to improve their lip imitation skills. Speech of normal people can be recognized by even cellular phones but lip reading systems using only visual features remain important for hearing-impaired people. This paper aims to develop an application using MS Kinect camera to recognize Turkish color names to be used in the education of hearing-impaired children. Predefined lip points are located with depth information by the MS Kinect Face Tracking SDK. Words are segmented from the speech and the angles between the lip points are used as features to classify the words. Angles are computed using the 3D coordinates of the lip points. The KNN classifier is used to classify the words with Manhattan and Euclidean distances and the best feature vectors are tried to be found. As a result, the isolated words are classified with the success rate of 78.22%.