面向智能环境的音视频人物识别系统

2011 4th International Conference on Human System Interactions, HSI 2011 Pub Date : 2011-05-19 DOI:10.1109/HSI.2011.5937372

S. Anzalone, E. Menegatti, E. Pagello, Y. Yoshikawa, H. Ishiguro, A. Chella

{"title":"面向智能环境的音视频人物识别系统","authors":"S. Anzalone, E. Menegatti, E. Pagello, Y. Yoshikawa, H. Ishiguro, A. Chella","doi":"10.1109/HSI.2011.5937372","DOIUrl":null,"url":null,"abstract":"In this paper an audio-video system for intelligent environments with the capability to recognize people is presented. Users are tracked inside the environment and their positions and activities can be logged. Users identities are assessed through a multimodal approach by detecting and recognizing voices and faces through the different cameras and microphones installed in the environment. This approach has been chosen in order to create a flexible and cheap but reliable system, implemented using consumer electronics. Voice features are extracted by a short time cepstrum analysis, and face features are extracted using the eigenfaces technique. The recognition task is solved using the same Support Vector Machine for both voice and face features. The system learns the features of each person using SVM in a set-up phase, in which the two modalities are also bound together through a cross-anchoring learning rule based on the mutual exclusivity selection principle. In the running phase the system is able to recognize the identity of the person either using voice features, or face features or both. The system is scalable in the number of cameras and microphones thanks to NMM, a middleware software which manages the processing of the single sensors and the communications among the several software nodes.","PeriodicalId":384027,"journal":{"name":"2011 4th International Conference on Human System Interactions, HSI 2011","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Audio-video people recognition system for an intelligent environment\",\"authors\":\"S. Anzalone, E. Menegatti, E. Pagello, Y. Yoshikawa, H. Ishiguro, A. Chella\",\"doi\":\"10.1109/HSI.2011.5937372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper an audio-video system for intelligent environments with the capability to recognize people is presented. Users are tracked inside the environment and their positions and activities can be logged. Users identities are assessed through a multimodal approach by detecting and recognizing voices and faces through the different cameras and microphones installed in the environment. This approach has been chosen in order to create a flexible and cheap but reliable system, implemented using consumer electronics. Voice features are extracted by a short time cepstrum analysis, and face features are extracted using the eigenfaces technique. The recognition task is solved using the same Support Vector Machine for both voice and face features. The system learns the features of each person using SVM in a set-up phase, in which the two modalities are also bound together through a cross-anchoring learning rule based on the mutual exclusivity selection principle. In the running phase the system is able to recognize the identity of the person either using voice features, or face features or both. The system is scalable in the number of cameras and microphones thanks to NMM, a middleware software which manages the processing of the single sensors and the communications among the several software nodes.\",\"PeriodicalId\":384027,\"journal\":{\"name\":\"2011 4th International Conference on Human System Interactions, HSI 2011\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 4th International Conference on Human System Interactions, HSI 2011\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HSI.2011.5937372\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 4th International Conference on Human System Interactions, HSI 2011","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HSI.2011.5937372","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

本文提出了一种智能环境下具有人识别能力的音视频系统。用户在环境中被跟踪，他们的位置和活动可以被记录下来。通过安装在环境中的不同摄像头和麦克风检测和识别声音和面孔，通过多模式方法评估用户身份。选择这种方法是为了创建一个灵活、廉价但可靠的系统，使用消费电子产品实现。通过短时倒谱分析提取语音特征，利用特征脸技术提取人脸特征。使用相同的支持向量机来解决语音和面部特征的识别任务。系统在设置阶段使用支持向量机学习每个人的特征，其中两种模式也通过基于互斥选择原则的交叉锚定学习规则绑定在一起。在运行阶段，系统能够通过语音特征或面部特征或两者同时识别人的身份。由于NMM是一个中间件软件，它管理单个传感器的处理和多个软件节点之间的通信，因此该系统在相机和麦克风的数量上是可扩展的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Audio-video people recognition system for an intelligent environment

In this paper an audio-video system for intelligent environments with the capability to recognize people is presented. Users are tracked inside the environment and their positions and activities can be logged. Users identities are assessed through a multimodal approach by detecting and recognizing voices and faces through the different cameras and microphones installed in the environment. This approach has been chosen in order to create a flexible and cheap but reliable system, implemented using consumer electronics. Voice features are extracted by a short time cepstrum analysis, and face features are extracted using the eigenfaces technique. The recognition task is solved using the same Support Vector Machine for both voice and face features. The system learns the features of each person using SVM in a set-up phase, in which the two modalities are also bound together through a cross-anchoring learning rule based on the mutual exclusivity selection principle. In the running phase the system is able to recognize the identity of the person either using voice features, or face features or both. The system is scalable in the number of cameras and microphones thanks to NMM, a middleware software which manages the processing of the single sensors and the communications among the several software nodes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 4th International Conference on Human System Interactions, HSI 2011

自引率

0.00%

发文量